Beschreibung:
Join us as a Site Reliability Engineer
* In this role, you'll support improvements to availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services
* You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way
* This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
* You'll need to have the flexibility to support the team by working shifts and weekends on rotation
What you'll do
As our Site Reliability Engineer, you'll contribute to the reliability, monitoring, and operational excellence of cloud-native platforms. You'll work closely with senior engineers to support production systems, implement Site Reliability Engineering (SRE) practices, and ensure services are observable, scalable, and resilient. You'll also participate in the 24/7 support and on-call rotation, gaining experience in incident response and platform operations.
In this role, we'll expect you to be involved as well in the operation of AWS-based Kubernetes platforms (EKS) while contributing to monitoring, alerting, and observability implementations using tools like Grafana and Prometheus. You'll also assist in incident management, troubleshooting, and root cause analysis.
In addition, you'll be:
* Participating in on-call rotations and production support activities
* Implementing infrastructure changes using Terraform and GitOps workflows
* Supporting continuous integration and continuous delivery (CI/CD) pipelines using GitLab, Argo CD, and deployment processes
* Helping improve system reliability through automation and operational improvements
* Following SRE practices such as runbooks, documentation, and post-incident reviews
* Working with DevOps and engineering teams to improve system performance and stability
* Ensuring solutions align with security, compliance, and operational standards
The skills you'll need
We're looking for an engineer with solid foundational experience in cloud platforms and a keen interest in reliability engineering and production operations. You must have experience working with AWS and Kubernetes (EKS) in a production or pre-production environment, along with familiarity with monitoring and observability tools such as Grafana and Prometheus. To succeed in this role, you should also have a good understanding of CI/CD pipelines and Git-based workflows, with GitLab preferred.
You'll also need:
* Exposure to Terraform or infrastructure-as-code concepts
* A basic understanding of SRE practices and production support models
* Experience troubleshooting applications or infrastructure issues
* An awareness of networking and security fundamentals in cloud environments
* A willingness to participate in on-call rotations and incident response
* A strong problem-solving mindset and an eagerness to learn
* Good communication and collaboration skills
Hours35
Job Posting Closing Date:03/06/2026
Ways of Working:Remote First
| Quelle: | Website des Unternehmens |
| Datum: | 28 Mai 2026 |
| Stellenangebote: | Job |
| Bereich: | Banken / Finanzen |
| Sprachkenntnisse: | Englisch |