Job Description
The SRE Team of Huawei Mobile Services in India, Bangalore research centre, are responsible for building a large scale container management platform that includes capabilities such as resource scheduling, monitoring, alerting, job management, infra scaling, fault detection/analysis/recovery, chaos engineering in an intelligent and autonomous way. The platform will be used by the SREs to handle reliability and Ux experience engineering for 120+ services across 170+ countries.
We are looking for an expert to take charge of building this industry-leading platform from ideation to implementation and help develop the Huawei consumer cloud service ecosystem as an industry benchmark.
Huawei India, Bangalore Research Center?
You will be joining a company with a very strong brand in the Mobile Internet space and you will be doing exciting work which will see you building and running (rather than just using) Hybrid cloud infrastructure and cloud-native services. You will be joining a nascent SRE organization that has a lot to learn and is eager to put its SRE practices to the test. This is a fast-paced demanding environment where dedication is expected and rewarded, this isn't at all a 9 to 5 job.
Here at Huawei BRC, we have a very impressive multi-cultural, inclusive work environment in our own sprawling campus. We are always open to suggestions on how you can do your best work so please don’t hesitate to reach out to us
Roles and Responsibilities
- Develop strategy and market analysis for futuristic technologies planning to enhance our existing platform moving towards intelligent governance.
- Ideate, Research, design and develop an intelligent and autonomous management platform for containerized microservices having million+ endpoints with industry-leading performance.
- Plan and set up research cooperation projects with top University/Research institutes across the world.
Desired Candidate Profile
- Have more than 10 years of work experience as the Innovation or technology architect of the cloud-native domain.
- Have experience in planning and designing the architecture of a large-scale ops platform/tools, with distributed data capacity in petabytes.
- Have implemented Chaos Engineering Capability and Infrastructure as Code Solutions.
- Have active involvement in Open Source Community projects as a major participant and led the design or development of core modules of the projects.
- Have published patents, articles o in well-known journals/websites in the industry and Presentation/Speeches SRE/DevOps related conferences.
Have hands-on in product engineering lifecycle as an individual contributor
- Platforms: Enterprise clouds like AWS, GCP, Azure
- Service stacks: LAMP, LEMP
- Programming: Golang, Python, Java programming
- Scripting: Shell, Terraform, Ansible
- Networks: Load balancers, firewalls, DNS, CDN, Varnish
- Monitoring tools: Zabbix, Grafana, ELK, Prometheus, Thaonos, Telegraf, Datadog
- CI/CD pipeline: Jenkins, GitLab, Spinnaker
- Container products: Docker, Kubernetes, Rancher, Mesos, Nomad, Openshift
- Automation platforms: Stackstorm, Saltstack
- Middleware systems: Redis, Cassandra, ETCD, Zookeeper, HDFS
- Messaging Queue/logging: Kafka, Flume, Fluentd