Talent Acquisition Manager at Hiver
Views:239 Applications:16 Rec. Actions:Recruiter Actions:11
Hiver - Site Reliability Engineering Manager (7-9 yrs)
Company Description :
Hiver (http://hiverhq.com) turns Gmail into a simple, powerful team collaboration tool. We're a profitable, rapidly growing SaaS company with a highly rated product, and with customers all over the world.
We're an agile, driven team deeply motivated by the idea of building a globally respected company from India. Our work culture is focussed on transparency, ownership, and openness. We are ambitious and focused, yet humble, warm and empathetic.
- Site Reliability Engineers (SREs) are responsible to ensure that our systems are healthy, monitored, automated, and designed to scale.
- As a manager of this team, you'll use your technical expertise to handle the growing infrastructure, make it reliable and scalable and work closely with our development teams from the early stages of design all the way through identifying and resolving production issues. Additionally, you will build and grow the SRE team to handle the above responsibilities.
- Build and Lead a team of SREs ensuring that production applications are stable and reliable.
- Be directly responsible for uptime.
- Manage on-call rotation across the SRE team.
- Own end-to-end availability and performance of key services and build automation to prevent problem recurrence.
- Automate current manual infrastructure management and alerts handling processes via Kubernetes, Terraform, CI/CD pipelines etc.
- Assist in the roll-out and deployment of new product features and installations.
- Find scalability bottlenecks and areas for performance improvements.
- Work closely with technical leads to ensure that platforms are designed with scale and operability in mind
- Help SREs in your team to grow and develop their careers through mentorship and performance management.
- 5+ years of technical experience in Site Reliability Engineering
- 3+ years of experience as a people manager in an Engineering or Operations capacity.
- Strong Linux administration skills with an emphasis on shell scripting.
- Expertise with AWS and GCP platform.
- Expertise with Terraform, Docker, Kubernetes (or other orchestration tools), and Jenkins.
- Experience with infrastructure monitoring platforms (Datadog, Prometheus) and Application Performance Management (APM) systems (New Relic).
- Experience with Configuration management tools (Puppet, Chef, Ansible).
- Experience with CI/CD pipeline configuration, deployment, and support.
- Experience making hiring decisions for SRE/DevOps teams.
- Hands-on technical experience with supporting multi-tenant applications is a plus.