Back to Jobs

Databricks Engineer (PySpark and Data Lake)

Not Disclosed

Job Description & Details

The data engineering landscape is rapidly shifting toward cloud‑native, scalable pipelines, making expertise in Databricks and PySpark highly sought after. Companies are eager to replace legacy ETL tools with modern, modular solutions to boost performance and reduce costs. This role offers a chance to lead that transformation on a long‑term, high‑impact project.

Job Summary

We are looking for a seasoned Data Engineer to modernize legacy Ab Initio ETL workflows by building robust PySpark pipelines on Databricks. You will design, develop, and optimize end‑to‑end data flows, integrate with Snowflake, ensure data lineage, and support migration, testing, and production operations.

Top 3 Critical Skills Table

Skill Why it's critical Mastery Level
PySpark Core engine for building scalable data pipelines on Databricks Senior
Databricks Platform Provides the unified analytics environment and orchestration needed for migration Senior
ETL Migration (Ab Initio → PySpark) Ensures legacy workloads are accurately refactored and decommissioned Senior

Interview Preparation

  1. How would you approach migrating an Ab Initio workflow to a PySpark pipeline on Databricks?
    What the interviewer is looking for: Understanding of legacy ETL concepts, mapping to Spark transformations, handling data types, and migration strategy (dual‑run, validation).
  2. Explain how you would design a modular, reusable component in PySpark for common data transformations.
    What the interviewer is looking for: Knowledge of functions, UDFs, parameterization, and best practices for code reuse and testing.
  3. What techniques would you use to ensure data lineage and traceability in a Databricks environment?
    What the interviewer is looking for: Familiarity with Unity Catalog, Delta Lake metadata, logging, and documentation practices.
  4. Describe how you would integrate Snowflake as a source and sink within a Databricks pipeline.
    What the interviewer is looking for: Experience with Snowflake connector, handling authentication, push‑down optimization, and schema management.
  5. How do you monitor and troubleshoot performance issues in a near‑real‑time Spark job?
    What the interviewer is looking for: Use of Spark UI, Ganglia/Datadog metrics, caching strategies, and job profiling.

Resume Optimization

  • Databricks
  • PySpark
  • ETL Migration
  • Ab Initio
  • Snowflake Integration
  • Data Lineage
  • Delta Lake
  • Batch & Near Real‑Time Processing
  • Control‑M Scheduling
  • Unit & Integration Testing

Application Strategy

When reaching out to the recruiter, send a concise email greeting, attach your resume, and explicitly highlight your top skills that match the role. Make sure to mention related skills you possess, such as PySpark pipeline development, Databricks orchestration, and legacy ETL migration. Reference specific projects where you modernized data workflows and emphasize your experience with Snowflake and data governance.

Career Roadmap

Current Role Typical Experience Core Focus Next Position
Databricks Engineer 5+ years in PySpark & cloud ETL Migration, pipeline architecture, performance tuning Senior Data Engineer
Senior Data Engineer 7‑9 years, end‑to‑end data solutions Strategy, cross‑team leadership, advanced analytics Lead Data Engineer
Lead Data Engineer 10+ years, large‑scale data platforms Architecture, governance, stakeholder alignment Data Architecture Manager