Job Summary:
The Data Scientist will be a key member of the data team, responsible for unlocking actionable insights from Centenary Bank’s vast data assets. This role will bridge the gap between our Data Lake infrastructure and the bank\\\'s strategic business units. You will collaborate with stakeholders across Marketing, Customer Experience, Risk, and Operations to understand their challenges and apply advanced analytical and machine learning techniques to provide data-driven solutions that enhance efficiency, manage risk, and improve customer satisfaction.
Key Responsibilities:
- Stakeholder Collaboration: Partner with business leaders to understand their objectives and translate them into data-led projects, defining key metrics and analytical requirements.
- Business Intelligence & Reporting: Develop, deploy, and maintain role-based dashboards and reports that provide visibility into technology, business, and customer performance.
- Advanced Analytics: Analyze large, complex datasets to identify trends, patterns, and root causes of business issues, such as transaction failures or process bottlenecks.
- Predictive Modelling: Design, train, and deploy machine learning models to support predictive use cases, including:
- Forecasting customer behaviour, including churn and product uptake.
- Predicting operational needs, such as teller workloads and cash requirements.
- Anticipating technology risks like infrastructure capacity exhaustion.
- Customer Experience Insight: Analyze customer journey data from digital channels to identify pain points and provide recommendations for improving customer retention and engagement.
- Risk Analytics: Develop dynamic risk registers by tracking key risk indicators, such as sustained declines in transaction volumes or unresolved technical issues on critical applications.
- Communication: Clearly communicate complex findings and the value of data-driveninsights to both technical and non-technical audiences.
Required Skills & Qualifications:
- Bachelor’s or Master’s degree in Data Science, Computer Science, Statistics, Economics, or a related quantitative field.
- Proven experience as a Data Scientist or Data Analyst, preferably within the financial services.
- Strong proficiency in data analysis and programming languages such as Python (Pandas, Scikit-learn, TensorFlow) and/or R.
- Expert-level knowledge of SQL and experience working with large relational and non-relational databases.
- Hands-on experience with BI and data visualization tools (e.g., Power BI, Tableau).
- Demonstrated experience in developing and deploying machine learning models to solve real-world business problems.
- Exceptional problem-solving skills and business acumen, with an ability to connect data insights to business impact.
- Excellent communication and interpersonal skills, with a proven ability to collaborate effectively with diverse teams.
go to method of application »
This profile outlines the skills, expertise, and experience required for a Data Lake Implementation Specialist responsible for guiding the setup or integration of on-premises and cloud data lakes to enable real-time analytics and AI in medium to large digital businesses. Experience in Apache Doris is an added advantage.
Core Skills & Expertise
Data Lake Architecture (Hybrid & Multi-Cloud)
- Designing modern data lakehouses with raw + curated layers, unified batch + streaming ingestion
- Integration with enterprise systems and support for schema-on-read
- Familiarity with lakehouse tools: Delta Lake, Apache Iceberg, Hudi
Real-Time Data Processing
- Expertise with streaming architectures: Apache Kafka, Flink, Spark Streaming
- Experience with event-driven design, CDC, and real-time ETL tool
- Delivered at least one large-scale Doris-based or comparable OLAP system in production.
- Tools: Debezium, StreamSets, Apache NiFi
Cloud & On-Prem Data Services
- Cloud: AWS (S3, Glue, EMR, Kinesis), Azure (ADLS Gen2, Synapse), GCP (BigLake, Dataflow)
- On-prem: Hadoop, Cloudera, MapR, private cloud environments
AI/ML Enablement
Data Preparation for AI/ML
- Building pipelines for feature extraction and versioning datasets
- Integration with feature stores and data quality enforcement
ML Ops Readiness
- Integration with ML pipelines (Kubeflow, MLflow, SageMaker)
- Model deployment, tuning, and monitoring at scale
Analytics & BI Integration
- Support for BI tools (Power BI, Tableau) and fast querying layers (Presto, Trino)
- Near real-time dashboard enablement
Governance, Observability, and Security
Enterprise Data Governance
- Implementing data ownership, lineage, and access policies
- Use of catalogs: Collibra, Apache Atlas, AWS Glue Catalog
Observability & Monitoring
- End-to-end pipeline visibility, logs, and metrics
- Tools: Prometheus, Grafana, OpenTelemetry, Monte Carlo
Security & Compliance
- Encryption, tokenization, and data masking
- Adhering to regulations: GDPR, HIPAA, SOC2
Execution Experience
Large-Scale Implementations
- Hands-on delivery of hybrid data lake architectures
- Experience with syncing on-prem and cloud data systems
Cross-Functional Leadership
- Working with data scientists, product teams, and security teams
- Leading data platform teams or Centers of Excellence
Agility at Scale
- Agile delivery models for data initiatives
- Delivering data products and ML capabilities incrementally
Ideal Profile Summary
A hands-on yet strategic data lake architect/engineer with deep knowledge of hybrid and multi-cloud systems, proven experience with streaming data and ML enablement, and the leadership to orchestrate teams around real-time analytics and decision intelligence for digital enterprise scale.
Bonus: Certifications & Tools
Certifications
- AWS/GCP/Azure Data Engineer or ML Engineer
- Databricks Lakehouse Accreditation
- CDMP or DAMaA certification
Tools Stack
- Airflow, dbt, Spark, Flink, Kafka
- Terraform, GitOps, CI/CD
- MLflow, Feature Store, SageMaker, Vertex AI
- Apache Ranger, Atlas, Lake Formation