Careers | Tookitaki-Site Reliability Engineer-India

In this Role, you’ll get to:

Be the champion for developing and managing performance and availability of
software systems and infrastructure for enterprise cloud solutions and internal
development operations
Provide emergency response either by being on-call or by reacting to symptoms
according to monitoring and escalation when needed
Propose ideas and solutions to reduce workloads through automation
Plan and execute configuration change operations both at the application and the
infrastructure levels
Actively look for opportunities to improve the availability and performance of the
system by applying the learnings from monitoring and observation

General knowledge of 5 technical expertise areas, with deep knowledge in 2 areas

Chef (basic syntax, recipes, cookbooks) and Ansible (basic syntax, tasks,
playbooks)
Terraform basic syntax and GitHub CI/CD configuration, pipelines, jobs
Cloud resources provisioning and configuration through CLI/API
Cloud services expertise across AWS, Azure, GCP
Kubernetes basic understanding, CLI, service re-provisioning
Provisioning and setup metric in Prometheus, Thanos, and Grafana, alerts and silences
Provision and setup logs and queries for general questions
Operating system (Linux) configuration, package management, startup and
troubleshooting
Block and object storage configuration
Networking VPCs, proxies and CDNs
Experience with scripting - bash, shell, python