(COMPANY NAME) is the leading SaaS core banking engine. If you're a customer of the largest digital bank in the EU, then you've probably interacted with our platform and didn't even know it. We are at the heart of what makes digital banks and lenders work - the system that processes banking transactions and updates accounts and other financial records from deposits to loans and credit balances. But (COMPANY NAME) is different. We are not just cloud-native, lean and flexible - we are helping to revolutionise financial services globally. We are in a growth phase and we've only just begun.
To help us on our mission, we bring together people with the best skills and attitude. It doesn't matter where you are from, what matters is the impact you have and your passion to make a difference.
We are looking for a passionate, skilled and enthusiastic Service Reliability Engineer - Observability
to join our team. As a Monitoring Engineer, you will build, operate and improve monitoring of (COMPANY NAME) core banking services, across all product engineering tribes and enable engineering teams by improving observability of their services.
* Enable observability and operability at (COMPANY NAME) by:
* Improving all aspects of monitoring of the (COMPANY NAME) Cloud Platform and all supporting services;
* Defining best practices for engineering teams and guiding them to get deep insights into their applications in production;
* Ensuring that dashboards and information radiators provide the right level of information to the right people in the organisation;
* Making events traceable and introducing improvements to help on-call engineers analysing (COMPANY NAME)'s distributed system.
* Continuously improve metrics of (COMPANY NAME) services by:
* Operating infrastructure and tools required to work with metrics of (COMPANY NAME) core banking services;
* Improving standards of gathering and processing metrics;
* Ensuring that development teams can produce custom metrics;
* Providing various reports and aggregation based on engineering or business needs;Monitoring SLA performance of (COMPANY NAME) APIs.
* Ensure logs processing and analysis by:
* Operating infrastructure and tools required to work with logs produced by (COMPANY NAME) core banking services;
* Implementing ways to process these logs and providing insights to development teams;Improving logs retention, processing strategies.
* Advocate correct alerting for engineering teams by:
* Providing developers tooling & guidance to define alert based on various needs;
* Monitoring, reporting and alerting on SLOs;
* Improving anomaly detection based on past performance of applications;
* Predicting capacity problems and reducing alert fatigue.
* Satisfy compliance requirements by:
* Ensuring that (COMPANY NAME) monitoring systems don't hold any personal identifiable information;
* Together with security and compliance, conducting regular reviews of the systems;
You need to have:
* Solid knowledge of public cloud services with a focus on services monitoring;
* Understanding of cloud native applications and distributed systems;
* Software development and testing skills (Go, Java, Python, etc.);
* Experience with monitoring applications on Kubernetes;
* Good understanding of distributed tracing;
* Experience with application performance monitoring tools;
* Experience with on-call rotation and incident handling;
* Monitoring of applications at world-wide scale;
* Strong communication, organisational and problem-solving skills.
(COMPANY NAME) has over 250+ live deployments, helping to revolutionise financial services in more than 46 countries globally, and we're just getting started;
We understand nothing ensures our customers' success more than a happy team, so (COMPANY NAME) is built on a culture of trust and a sense of ownership in everything we do