Moniepoint is a financial technology company digitising Africa’s real economy by building a financial ecosystem for businesses, providing them with all the payment, banking, credit and business management tools they need to succeed.
Read more about this company
We are seeking an experienced SRE to engineer the reliability of our highly distributed platform. You will combine deep knowledge of distributed systems with strong coding skills to define SLOs, lead incident response, and build automation and self-healing mechanisms into our systems.
You will balance immediate operational stability with long-term strategic engineering to ensure our services scale linearly with our hyper-growth.
Responsibilities
Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention.
Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.
Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.
Requirements
Minimum of 4 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility.
Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL) and streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.
AI's Impact on Jobs and Organisations (Nigeria report)This report examines the extent to which AI is affecting jobs and organisations in Nigeria. It brings together perspectives from HR professionals and managers across different industries.
30 Contract Staffing Risks That Could Get Your Company SuedThis piece outlines 30 contract staffing risks that have real legal consequences under Nigerian law. If you are a business owner, HR professional, or staffing agency operator, you will find this highly valuable.
10 Steps to Building an Effective Talent PipelineLearn how to keep a list of good candidates ready in advance, before a role becomes vacant. Discover step by step the process of building a talent pipeline that works.
2026 / 2027 NEPL / OERNL Joint Venture Tertiary Scholarship Scheme (National Merit Award)The NEPL/OERNL Joint Venture in pursuance of its Corporate Social Responsibility invites suitably qualified applicants for its 2026/2027 Tertiary Scholarship Scheme, commencing Tuesday, March 3, 2026, and concluding on Wednesday, April 1, 2026. For applicants from Non-Host/Transit Communities