Jobs Career Advice Post Job
X

Send this job to a friend

X

Did you notice an error or suspect this job is scam? Tell us.

  • Posted: Feb 17, 2026
    Deadline: Not specified
    • @gmail.com
    • @yahoo.com
    • @outlook.com
  • Moniepoint is a financial technology company digitising Africa’s real economy by building a financial ecosystem for businesses, providing them with all the payment, banking, credit and business management tools they need to succeed.
    Read more about this company

     

    Site Reliability Engineer

    Job Summary

    • We are seeking a Site Reliability Engineer (SRE) responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience.
    • The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.

    Responsibilities

    • Participate in on-call rotations to detect and triage service and reliability issues across all environments. Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders.
    • Create and maintain meaningful dashboards and alerts. Work with development teams to instrument their code to ensure visibility.
    • Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability across both applications and infrastructure.
    • Implement and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) defined by the engineering leadership.
    • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.

    Requirements

    • Minimum of 3 years of experience supporting enterprise applications as an SRE or similar role with proficiency in writing code in Java, Go or Python
    • Good understanding of distributed systems concepts, microservices architecture and software design patterns.
    • Hands-on experience with Kubernetes. You have managed applications on a major cloud provider (GCP, AWS, or Azure), and can troubleshoot common container issues.
    • Experience setting up dashboards in Grafana and using APM tools like Datadog, New Relic, Signoz.You have a  Solid understanding of metrics, logs, and traces.
    • Proficiency in SQL (e.g., PostgreSQL, MySQL). Ability to write complex queries to debug data issues and a basic understanding of database performance.

    go to method of application »

    Senior Site Reliability Engineer

    Job Summary

    • We are seeking an experienced SRE to engineer the reliability of our highly distributed platform. You will combine deep knowledge of distributed systems with strong coding skills to define SLOs, lead incident response, and build automation and self-healing mechanisms into our systems.
    • You will balance immediate operational stability with long-term strategic engineering to ensure our services scale linearly with our hyper-growth.

    Responsibilities

    • Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
    • Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
    • Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention.
    • Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.
    • Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.

    Requirements

    • Minimum of 4 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
    • Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
    • Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
    • Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility.
    • Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL) and streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.

    Method of Application

    Use the link(s) below to apply on company website.

     

    Build your CV for free. Download in different templates.

  • Send your application

    View All Vacancies at Moniepoint Inc. Back To Home

Subscribe to Job Alert

 

Join our happy subscribers

 
 
 
Send your application through

GmailGmail YahoomailYahoomail