It's all about who you work with. Opportunities below to join fantastic teams.


Lead DevOps Engineer - SRE

HYLA Mobile

HYLA Mobile

Software Engineering
India · Remote
Posted on Thursday, March 14, 2024

Lead DevOps Engineer-SRE, GCC, India

Our Lead DevOps Engineer - Service Reliability (SRE) will work with key organizational leaders and product owners to identify opportunities and drive technical vision of this SRE team. They will be responsible for technical roadmap and strategies with strong focus to drive operational excellence, improve performance and efficiency of both our application workload and infrastructure. These long-term strategies will have huge impact across all products and platforms in Assurant. The ideal candidate has broad and deep technical knowledge experience to improve application’s performance, capacity benchmarking, improve availability and reliability, design and evolve cloud/infrastructure architecture, and leverage engineering solutions to solve operational problems.

This position will be Remote at our India location.

What will be my duties and responsibilities in this job?

  • Serve as a thought leader in the field of Service Reliability Engineering, Performance Engineering, and Capacity Management.
  • Collaborate with Engineering leadership to build relationships with other teams and understand their goals, enabling you to develop a strategy and roadmap for your team and for products that span multiple teams.
  • Create operational excellence roadmaps, promote the adoption of best practices from design to delivery, and enhance the performance and efficiency of our products and infrastructure.
  • Propose and develop innovative solutions to complex problems related to application resiliency and availability.
  • Lead the design and development of next-generation reliability and performance debugging tools, inspiring others to do the same.
  • Utilize open-source technologies for capacity benchmarking, application profiling, and tuning.
  • Take a data-driven approach to make informed decisions regarding capacity needs, application reliability, and availability.
  • Mentor and provide leadership to engineers, leveraging your deep technical expertise to solve complex technology issues.
  • Foster a culture of sharing best practices and continuous process improvement within and across teams.
  • Possess strong problem-solving skills.

What are the requirements needed for this position?

  • Must have Datadog and Dynatrace Application Monitoring Skills (3+ years)
  • Proficiency in designing and architecting new and existing systems, with a focus on architecture, design patterns, reliability, and scaling.
  • Strong understanding of service level objectives (SLOs) and service level indicators (SLIs).
  • Specialization and expertise in one of the programming languages such as C#, SQL, API's.
  • Specialization and expertise in one of the scripting languages such as PowerShell or Python (Preference is PowerShell).
  • In-depth knowledge of compute resources and hardware profiles, both in on-prem and cloud environments.
  • Expertise in container orchestration (e.g., Kubernetes), container runtimes, and optimization.
  • Familiarity with DevOps concepts such as continuous delivery and infrastructure as code, as well as cloud architecture.
  • Experience with application monitoring and profiling tools (e.g., CPU and Memory profiling).
  • Knowledge of computer science data structures and algorithms.

What other skills/experience would be helpful to have?

  • 10+ years of professional SRE experience.
  • 8+ years of experience in architecture and design.
  • 4+ years of experience with a public cloud platform such as Azure, or others.
  • 6+ years of experience with open-source frameworks, including capacity modeling/benchmarking, application performance benchmarking, and auto-recovery/auto-remediation tools.