Description
*Our roles are remote first, and can be based anywhere in India (#LI-Remote).
Responsibilities
- Monitor and continually improve the capacity of our production environment
- Design and implement scalable, reliable, and efficient infrastructure using Kubernetes, Terraform, AWS resources.
- Partner with development teams to improve services through rigorous testing and release procedures with CI pipelines (Github Actions, Dockerfiles)
- Gain a deeper understanding of RudderStack infrastructure and help debug incidents
- Proactively build software to help operations and support teams
- Identify opportunities for process improvements, automation, and cost savings
Requirements
- A Bachelor or Master degree in Computer Science or equivalent experience is required
- 5+ years of experience as a Site Reliability Engineer, Internal Platform Developer or similar role
- Strong understanding of cloud computing, containers, and DevOps practices
- Demonstrated Linux experience
- Excellent debugging skills
- Experience with Scripting and infrastructure automation
- Familiarity with distributed systems design patterns using tools such as Kubernetes
- Familiarity with AWS, Azure or Google Cloud Compute
- Excellent verbal and written communication skills
- Familiarity with Networking concepts like VPCs, proxies and CDNs
Here are examples of things we've worked on:
- Build and maintain a Kubernetes platform to deploy all our applications with high availability
- Build Kubernetes operator to automate 100s of deployments
- Managed 100s of postgres with HA for our deployments
- Provision and manage air-gapped on-premise deployments in diverse environments.
- Manage multi-region multi-cluster environment with hundreds of customer deployments in single-tenant and multi-tenant models.
- Complete Infrastructure as a code and enforced using GitOps model
- Automated migrations of complex, highly available services
- Working on compliance(i.e. SOC2 Type 2, HIPPA), security, scalability, and a lot more aspects to deliver top class, secure software
- We follow FinOps and continuously optimize our cloud costs.
How we achieve results:
- Empathy for the problems encountered by our customers.
- Collaboration with engineering teams to achieve results.
- Care deeply about the quality of your and the team's code
- Curiosity and understanding, for investigating causes and finding effective solutions.
- Output driven to provide value to our customers in a significant, measurable, and positive way.
- Focus on writing testable, performant, bug-free code to provide the right solutions to the problems.
Please mention the word **SHARPEST** and tag RNTQuMTg2LjgyLjI0Mw== when applying to show you read the job post completely (#RNTQuMTg2LjgyLjI0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.