Job Description
ZestMoney is creating the worlds most loved financial brand. We are Indias largest and fastest growing Buy Now, Pay Later platform.We built ZestMoney to change the way credit is delivered in the country, making life more convenient and affordable for millions of people, whilst helping retailers drive better quality growth and margin. We do this via a platform built on best in class AI-driven technology that embeds credit in payments.We are on a mission to make life more affordable for India using technology-led solutions.We also have the largest network of merchants with 10,000+ online partners and 75,000+ store partners, making the company a market leader in the space. ZestMoney has built a platform that integrates mobile technology, digital banking, and artificial intelligence, enabling people to apply for and get a digital credit line within seconds. The company partners with 25 banks and NBFCs to accelerate the adoption of BNPL in India.The company's vision is to be the 'one-stop financial' brand of choice for India's 300 mn+ households over the next decade, providing small-ticket loans, insurance, savings and other financial products with the most seamless digital user experience.We are backed by the world’s leading fintech investors such as Goldman Sachs, Quona Capital, Zip Co, Naspers, Ribbit Capital, Omidyar Network among others.We want to become the world’s most loved and trusted financial brand. Come join us on our journey
About The team
The SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. They maintain over hundred microservices, distributed across multiple EKS, ECS clusters, EC2 instances and lambda functions. The SRE team works in a high pace environment dealing with cloud operations, handling incidents and supporting more than 20 development teams.
Our stack:
You would be working in the environment with the following technologies: AWS, EKS, ECS, EC2, Docker, Bitbucket, Jenkins, RDS, MySQL, PostgreSQL, DynamoDB, MongoDB, NewRelic, Sumologic, Java, Go, Node.js, etc.
Key skills required
Experience in SRE and/or DevOps with at least 3 years
- Strong technical skills with at least 3 years of experience in the following areas: Cloud Infrastructure (AWS), CI/CD pipelines (like Jenkins, BitBucket, GitLab, CircleCI or similar)
- Hands on experience in monitoring, incident management and emergency response
- Contribute to automation of manual work and improving existing tooling
- Participation in an on-call rotation with the team
- At least 2 year of hands on experience with Kubernetes and its ecosystem
- Experience with infrastructure as code tools like Terraform, CloudFormation, etc.
- Track record of debugging and problem solving in applications running on microservice architecture
- Good knowledge of cloud network design and security
- Good understanding of configuration and secret management
- Good communication skills for communicating effectively within the team and with stakeholders
Additional skills:
- Any experience with API Gateways (like Kong), service mesh (Istio, Linkerd, Consul) would be considered a plus
- Any experience with authentication/authorization providers (with OAuth2, OpenId Connect protocols) would be considered a plus
- Experience with any of the observability tooling like Grafana Prometheus, DataDog, OpenTelemetry, Loki, ELK would be considered a plus
- Any software development experience with Go, Java, node.js or Python would be a plus
- Understanding cloud security best practices would be a plus