Data Engineer
Mergen Partners Private Limited
Are You The One?
As a Technical Lead Engineer - Data, you will architect, implement, and scale an end-to-end data platform built on AWS S3, Glue, Lake Formation, and DMS. You will lead a small team of engineers while collaborating cross-functionally with stakeholders from fraud, finance, product, and engineering to ensure reliable, timely, and secure data access across the business.
You will champion best practices in data design, governance, and observability, leveraging Generative AI tools to enhance engineering productivity and accelerate time to insight.
You will champion best practices in data design, governance, and observability, leveraging Generative AI tools to enhance engineering productivity and accelerate time to insight.
You Will Contribute To
- Owning the design and scalability of the data lake architecture for streaming and batch workloads, leveraging AWS-native services.
- Leading the development of ingestion, transformation, and storage pipelines using AWS Glue, DMS, Kinesis/Kafka, and PySpark.
- Structuring and evolving data into open table formats (Apache Iceberg, Delta Lake) to support real-time and time-travel queries for downstream services.
- Driving data productization, enabling API-first and self-service access to curated datasets for fraud detection, reconciliation, and reporting use cases.
- Defining and tracking SLAs and SLOs for critical data pipelines, ensuring high availability and data accuracy in a regulated fintech environment.
- Collaborating with InfoSec, SRE, and Data Governance teams to enforce data security, lineage tracking, access control, and compliance (GDPR, MAS TRM).
- Using Generative AI tools to enhance developer productivity, including auto-generating test harnesses, schema documentation, transformation scaffolds, and performance insights.
- Mentoring data engineers, setting technical direction, and ensuring delivery of high-quality, observable data pipelines.
Responsibilities
- Architect scalable, cost-optimized pipelines across real-time and batch paradigms using tools like AWS Glue, Step Functions, Airflow, or EMR.
- Manage ingestion from transactional sources using AWS DMS, focusing on schema drift handling and low-latency replication.
- Design efficient partitioning, compression, and metadata strategies for Iceberg or Hudi tables stored in S3, cataloged with Glue and Lake Formation.
- Build data marts, audit views, and analytics layers to support machine-driven processes (e.g., fraud engines) and human-readable interfaces (e.g., dashboards).
- Ensure robust data observability with metrics, alerting, and lineage tracking via OpenLineage or Great Expectations.
- Lead quarterly reviews of data cost, performance, schema evolution, and architecture design with stakeholders and senior leadership.
- Enforce version control, CI/CD, and infrastructure-as-code practices using GitOps and tools like Terraform.
- At least 7 years of experience in data engineering.
- Deep hands-on experience with AWS data stack: Glue (Jobs & Crawlers), S3, Athena, Lake Formation, DMS, and Redshift Spectrum.
- Expertise in designing data pipelines for real-time, streaming, and batch systems, including schema design, format optimization, and SLAs.
- Strong programming skills in Python (PySpark) and advanced SQL for analytical processing and transformation.
- Proven experience managing data architectures using open table formats (Iceberg, Delta Lake, Hudi) at scale.
- Understanding of stream processing with Kinesis/Kafka and orchestration via Airflow or Step Functions.
- Experience implementing data access controls, encryption policies, and compliance workflows in regulated environments.
- Ability to integrate GenAI tools into data engineering processes to drive measurable productivity and quality gains — with strong engineering hygiene.
- Demonstrated ability to lead teams, drive architectural decisions, and collaborate with cross-functional stakeholders.
Brownie Points
- Experience working in a PCI DSS or any other central bank regulated environment with audit logging and data retention requirements.
- Experience in the payments or banking domain, with use cases around reconciliation, chargeback analysis, or fraud detection.
- Familiarity with data contracts, data mesh patterns, and data as a product principles.
- Experience using GenAI to automate data documentation, generate data tests, or support reconciliation use cases.
- Exposure to performance tuning and cost optimization strategies in AWS Glue, Athena, and S3.