In this Role, you’ll get to:

  • Be the champion for developing and managing performance and availability of
    software systems and infrastructure for enterprise cloud solutions and internal
    development operations
  • Provide emergency response either by being on-call or by reacting to symptoms
    according to monitoring and escalation when needed
  • Propose ideas and solutions to reduce workloads through automation
  • Plan and execute configuration change operations both at the application and the
    infrastructure levels
  • Actively look for opportunities to improve the availability and performance of the
    system by applying the learnings from monitoring and observation

What we appreciate about your background:

General knowledge of 5 technical expertise areas, with deep knowledge in 2 areas

  • Chef (basic syntax, recipes, cookbooks) and Ansible (basic syntax, tasks,
    playbooks)
  • Terraform basic syntax and GitHub CI/CD configuration, pipelines, jobs
  • Cloud resources provisioning and configuration through CLI/API
  • Cloud services expertise across AWS, Azure, GCP
  • Kubernetes basic understanding, CLI, service re-provisioning
  • Provisioning and setup metric in Prometheus, Thanos, and Grafana, alerts and silences
  • Provision and setup logs and queries for general questions
  • Operating system (Linux) configuration, package management, startup and
    troubleshooting
  • Block and object storage configuration
  • Networking VPCs, proxies and CDNs
  • Experience with scripting - bash, shell, python

Get in touch with us

Fill out the form to let us know about you and your enquiry and one of our team will be in contact with you shortly.