Busha is one of Africa’s leading digital asset platforms. We are on a mission to onboard millions of Africans into the crypto economy, and we are building software and services that will enable our users to experience the blockchain-enabled future of finance.
Our customers are at the center of everything we do, and we are obsessed with creating a pleasa...
Read more about this company
Act as Incident Commander during major severity incidents affecting payments, trading, or compliance systems: coordinate cross-functional response, provide clear status updates, and drive post-mortems.
Design and implement observability strategies using Grafana, Sentry, and CloudWatch. Instrument Go services to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product and engineering teams.
Write production-ready code to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention. Contribute reliability patterns (circuit breakers, retries, backpressure) directly to backend services.
Partner with backend engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns from day one.
Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions, particularly for financial transactions and compliance workflows
Must Have
Minimum of 4 years of experience in SRE or Backend Engineering with good proficiency in Go. You can read, write, and review production Go code, not just deploy it.
Deep understanding of distributed systems architecture and design patterns. Strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
Hands-on experience with AWS (ECS, RDS, CloudWatch, Lambda) or GCP, and infrastructure as code. Proficiency in running production workloads and troubleshooting infrastructure issues.
Experience designing and implementing observability strategies using Prometheus, Grafana, OpenTelemetry, or similar tools. Ability to instrument code for proper monitoring and alerting
Familiarity with operating and tuning production data stores (PostgreSQL, ClickHouse) and streaming platforms (RabbitMQ, Kafka) in high-throughput environments.
Nice to Have
Fintech bonus: Understanding of financial systems reliability requirements, payment processing resilience patterns, or experience with compliance/regulatory
Go bonus: Proficiency in Go is a significant advantage. Our backend services are written in Go, and the ability to read, write, and contribute reliability patterns directly to production Go code will enable deeper collaboration with engineering teams and faster impact on system resilience.
What is Executive Recruitment and How Does it Work?In this guide, you'll learn what executive recruitment is, how the executive recruitment process works, why it differs from traditional recruitment, and how organisations can successfully hire executives who drive lasting business growth.
How to Build a Retention Strategy that WorksIn this article, you’ll learn how to build a retention strategy that works and keeps your employees invested in your organisation's success.