Data Engineer

World Bank Group

Location:
Chennai, India
Grade:
GF
Category:
Professional Staff
Posted Jun 18, 2026Apply by Jul 3, 2026 (6d left)

The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. This role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements, supporting data integration and transformation initiatives.

Responsibilities

  • Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads.
  • Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms.
  • Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL.
  • Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization.
  • Build streaming data pipelines for real-time analytics and operational use cases.
  • Optimize pipeline performance, resource utilization, and cost efficiency.
  • Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data.
  • Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently.
  • Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy.
  • Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards.
  • Enable distributed data processing across domains while ensuring consistency through federated governance.
  • Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains.
  • Support the balance between domain autonomy and enterprise-wide governance requirements.
  • Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types.
  • Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize.
  • Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations.
  • Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging.
  • Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines.
  • Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products.
  • Create cookbooks and implementation guides that translate enterprise standards into actionable steps.
  • Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs.
  • Integrate data from multiple internal and external sources into unified data assets.
  • Build reusable data integration patterns and connectors for enterprise data sources.
  • Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks.
  • Develop API-based data integrations and file-based data processing workflows.
  • Ensure data consistency and reliability across integrated sources.
  • Support data migration efforts and legacy system integrations.
  • Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression.
  • Develop dimensional models, fact tables, and aggregations for analytics use cases.
  • Build data transformation logic that ensures accuracy, consistency, and business alignment.
  • Create reusable transformation components and modular pipeline designs.
  • Optimize data models for query performance and consumption patterns.
  • Support schema evolution and data versioning requirements.
  • Implement data quality checks, validation rules, and automated testing within pipelines.
  • Develop data profiling and anomaly detection to identify quality issues.
  • Build data reconciliation processes to ensure accuracy across systems.
  • Implement unit testing, integration testing, and regression testing for pipelines.
  • Monitor data quality metrics and remediate issues proactively.
  • Document data quality rules and thresholds for pipeline outputs.
  • Implement logging, monitoring, and alerting for pipeline health and performance.
  • Build dashboards to track pipeline execution, data freshness, and quality metrics.
  • Develop automated error handling, retry logic, and failure notifications.
  • Support incident response and troubleshooting for pipeline failures.
  • Implement data lineage tracking to support auditability and impact analysis.
  • Ensure pipelines meet SLAs for data availability and freshness.
  • Build data pipelines that enable analytics, reporting, and business intelligence use cases.
  • Prepare and serve data for machine learning and AI workloads.
  • Develop feature engineering pipelines for ML model development.
  • Create semantic layers and curated datasets that enable self-service analytics.
  • Support integration with analytics tools including Power BI and Tableau.
  • Build data products with clear documentation and consumption guidance.
  • Partner with data architects to align pipeline development with architectural standards.
  • Collaborate with business analysts and data scientists to understand data requirements.
  • Work with platform engineers to leverage platform capabilities effectively.
  • Contribute to technical documentation, runbooks, and knowledge sharing.
  • Support data consumers in understanding and accessing data assets.
  • Participate in code reviews and follow engineering best practices.
  • Support data engineering delivery with contractor and consultant teams under guidance from senior team members.
  • Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams.
  • Document best practices, lessons learned, and technical standards for data engineering.
  • Stay current with industry trends in data mesh, federated architectures, and cloud data services.
  • Share insights and learnings with the broader team to foster continuous improvement.
  • Assist in evaluating emerging data engineering technologies, frameworks, and tools.
  • Identify opportunities to enhance pipeline performance, reliability, and cost efficiency.
  • Contribute to the evolution of best practices and standards for data engineering.
  • Propose automation opportunities to reduce manual effort and improve consistency.
  • Other duties as assigned

Requirements

  • Typically requires a master's degree with 5 years of experience or a bachelor’s degree with a minimum of 7 years of relevant experience, or equivalent combination of education and experience.
  • Demonstrated expertise in Data Engineering, including the design, development, and optimization of scalable data pipelines, data platforms, and data processing solutions.
  • Strong knowledge of data modeling, data structures and algorithms, and data integration techniques to support efficient and reliable data management.
  • Advanced experience designing and implementing modern data lake architectures and leveraging Databricks to build and maintain data engineering solutions.
  • Proven experience applying DevOps principles and practices, including automation, deployment, monitoring, and continuous improvement of data products and platforms.
  • Strong understanding of workflow management and orchestration tools to support complex data processing and integration workflows.
  • Experience managing and supporting the Product Development Life Cycle (PDLC), from requirements gathering and solution design through deployment and operational support.
  • Demonstrated ability to leverage business intelligence concepts and tools to deliver actionable insights and support data-driven decision-making.
  • Strong business acumen with the ability to understand organizational priorities and translate business requirements into effective technical solutions.
  • Experience working within Agile environments, including the Scaled Agile Framework (SAFe), and collaborating effectively across cross-functional teams.
  • Excellent stakeholder management, communication, and influencing skills, with the ability to build consensus and drive outcomes across technical and non-technical audiences.
  • Recommended Certifications: SAFe Product Owner/Product Manager (PO/PM) certification or other relevant Agile certifications.
  • Industry-recognized certifications in Data Engineering, Data Analytics, Platform Architecture, Data Integration, Cloud Technologies, or related disciplines.

Skills

  • Data Engineering
  • Data Pipeline Development
  • Data Platform Optimization
  • Data Modeling
  • Data Structures
  • Algorithms
  • Data Integration
  • Data Lake Architecture
  • Databricks
  • DevOps Practices
  • Automation
  • Deployment
  • Monitoring
  • Workflow Management Skills
  • Orchestration tools
  • Product Development Lifecycle
  • Business Intelligence Reporting
  • Agile Methodologies
  • Scaled Agile Framework
  • Stakeholder Management
  • ETL

Languages

English