The data engineering landscape is rapidly shifting toward cloud‑native, scalable pipelines, making expertise in Databricks and PySpark highly sought after. Companies are eager to replace legacy ETL tools with modern, modular solutions to boost performance and reduce costs. This role offers a chance to lead that transformation on a long‑term, high‑impact project.

Job Summary

We are looking for a seasoned Data Engineer to modernize legacy Ab Initio ETL workflows by building robust PySpark pipelines on Databricks. You will design, develop, and optimize end‑to‑end data flows, integrate with Snowflake, ensure data lineage, and support migration, testing, and production operations.

Top 3 Critical Skills Table

Skill	Why it's critical	Mastery Level
PySpark	Core engine for building scalable data pipelines on Databricks	Senior
Databricks Platform	Provides the unified analytics environment and orchestration needed for migration	Senior
ETL Migration (Ab Initio → PySpark)	Ensures legacy workloads are accurately refactored and decommissioned	Senior

Interview Preparation

How would you approach migrating an Ab Initio workflow to a PySpark pipeline on Databricks?
What the interviewer is looking for: Understanding of legacy ETL concepts, mapping to Spark transformations, handling data types, and migration strategy (dual‑run, validation).
Explain how you would design a modular, reusable component in PySpark for common data transformations.
What the interviewer is looking for: Knowledge of functions, UDFs, parameterization, and best practices for code reuse and testing.
What techniques would you use to ensure data lineage and traceability in a Databricks environment?
What the interviewer is looking for: Familiarity with Unity Catalog, Delta Lake metadata, logging, and documentation practices.
Describe how you would integrate Snowflake as a source and sink within a Databricks pipeline.
What the interviewer is looking for: Experience with Snowflake connector, handling authentication, push‑down optimization, and schema management.
How do you monitor and troubleshoot performance issues in a near‑real‑time Spark job?
What the interviewer is looking for: Use of Spark UI, Ganglia/Datadog metrics, caching strategies, and job profiling.

Resume Optimization

Databricks
PySpark
ETL Migration
Ab Initio
Snowflake Integration
Data Lineage
Delta Lake
Batch & Near Real‑Time Processing
Control‑M Scheduling
Unit & Integration Testing

Application Strategy

When reaching out to the recruiter, send a concise email greeting, attach your resume, and explicitly highlight your top skills that match the role. Make sure to mention related skills you possess, such as PySpark pipeline development, Databricks orchestration, and legacy ETL migration. Reference specific projects where you modernized data workflows and emphasize your experience with Snowflake and data governance.

Career Roadmap

Current Role	Typical Experience	Core Focus	Next Position
Databricks Engineer	5+ years in PySpark & cloud ETL	Migration, pipeline architecture, performance tuning	Senior Data Engineer
Senior Data Engineer	7‑9 years, end‑to‑end data solutions	Strategy, cross‑team leadership, advanced analytics	Lead Data Engineer
Lead Data Engineer	10+ years, large‑scale data platforms	Architecture, governance, stakeholder alignment	Data Architecture Manager

Databricks Engineer (PySpark and Data Lake)

Job Description & Details