About this role
Principal Databricks Data Engineer Job Title: Principal Databricks Data Engineer
Location: Toronto, ON, Canada
Work Model: Onsite (4 Days/Week)
Job Type: Full-Time Permanent - FTE
Experience Required: 12 18 Years
Job Summary We are seeking an experienced Principal Databricks Data Engineer to lead enterprise-scale data modernization initiatives, with a strong focus on migrating legacy Cloudera-based platforms to the Databricks Lakehouse architecture.
The ideal candidate will be a seasoned data engineering leader with deep expertise in enterprise Data Warehousing, Data Lakes, Databricks, Spark, Delta Lake, and cloud-native data platforms. This individual will play a key role in defining enterprise data architecture, modernizing finance and risk data ecosystems, and establishing engineering best practices across large-scale transformation programs.
This is a highly visible technical leadership role requiring expertise in data architecture, cloud migration, governance, performance optimization, and stakeholder collaboration across Finance, Risk, Analytics, and Data Governance teams.
Key Responsibilities Enterprise Data Architecture • Design and govern enterprise-scale Data Warehouse, Data Lake, and Lakehouse architectures.
• Define scalable, secure, and high-performance data platforms aligned with business objectives.
• Implement modern Lakehouse architecture using Databricks and Delta Lake.
• Design layered data architecture including:
• Raw/Landing Layer
• Curated/Conformed Layer
• Semantic/Consumption Layer
• Modernize traditional Enterprise Data Warehouse (EDW) platforms into cloud-native Lakehouse environments.
Cloudera to Databricks Modernization • Lead migration initiatives from legacy Cloudera platforms (CDH/CDP) to Databricks Lakehouse.
• Redesign ingestion, transformation, and consumption patterns from HDFS-based environments to cloud object storage.
• Migrate and refactor Hive, Impala, Spark, and HBase workloads into Databricks.
• Convert legacy Hive and Impala SQL logic into scalable PySpark and Spark SQL ELT pipelines.
• Ensure data reconciliation, audit integrity, and consistency throughout migration activities.
• Develop migration strategies with minimal business disruption.
Data Engineering & Pipeline Development • Design, develop, and optimize large-scale distributed data pipelines using:
• PySpark
• Spark SQL
• Delta Lake
• Databricks SQL
• Implement scalable ETL/ELT frameworks supporting enterprise analytics.
• Build resilient, reusable, and maintainable data engineering solutions.
• Optimize processing performance for high-volume financial and risk datasets.
Databricks Platform Engineering • Implement Medallion Architecture:
• Bronze
• Silver
• Gold
• Optimize Databricks workloads using:
• Z-ORDER
• OPTIMIZE
• Caching
• Partitioning
• Cluster Optimization
• Develop reusable engineering standards and best practices.
• Implement scalable data engineering frameworks for enterprise adoption.
Finance & Risk Data Modeling • Design enterprise finance and risk data models supporting:
• General Ledger (GL)
• Sub-ledger
• Financial Hierarchies
• Risk Exposure Models
• Credit Risk
• Liquidity Risk
• Market Risk
• Support regulatory, management, and executive reporting requirements.
• Build analytical data models optimized for reporting and business intelligence.
Semantic Layer & Business Analytics • Design and manage enterprise semantic and consumption layers.
• Define:
• Business Metrics
• KPIs
• Dimensions
• Hierarchies
• Enable reporting capabilities including:
• Aggregation
• Drill-down
• Drill-back
• Self-service analytics
• Support enterprise BI and reporting platforms.
Data Governance & Quality • Implement enterprise data governance frameworks.
• Design reconciliation controls and audit mechanisms.
• Develop:
• Data Quality Frameworks
• Exception Handling Processes
• Validation Rules
• Data Lineage Solutions
• Metadata Management
• Ensure compliance with enterprise security, privacy, and regulatory standards.
Cloud & DevOps • Build cloud-native data platforms on AWS and/or Azure.
• Develop CI/CD pipelines using:
• Git
• Terraform
• Jenkins
• Azure DevOps
• Implement deployment automation and Infrastructure as Code (IaC).
• Utilize orchestration tools including:
• Apache Airflow
• Databricks Workflows
Technical Leadership • Serve as the technical authority for enterprise data engineering initiatives.
• Define enterprise engineering standards and architectural best practices.
• Mentor senior engineers, architects, and technical teams.
• Conduct architecture reviews and technical design sessions.
• Drive innovation across data engineering and platform modernization initiatives.
Stakeholder Management • Collaborate closely with Finance, Risk, Analytics, Data Governance, and Business teams.
• Translate complex business requirements into scalable technical solutions.
• Present architectural strategies and technical recommendations to senior leadership.
• Partner with enterprise architects and platform engineering teams to define long-term roadmaps.
Required Skills & Qualifications Experience • 12 18 years of overall Data Engineering experience.
• 8+ years of experience building Enterprise Data Warehouse and Data Lake platforms.
• 5+ years of hands-on experience with Databricks and Apache Spark.
• Extensive experience leading enterprise data modernization programs.
• Proven experience delivering large-scale cloud migration initiatives.
Core Technical Skills • Databricks
• Apache Spark
• PySpark
• Spark SQL
• Delta Lake
• Databricks SQL
• Data Lakehouse Architecture
• Enterprise Data Warehouse (EDW)
• Data Lake Design
• Medallion Architecture
• ELT/ETL Frameworks
• Cloud Object Storage
Legacy Platform Modernization • Cloudera CDH/CDP
• Hive
• Impala
• HDFS
• HBase
• Spark Migration
Data Modeling • Enterprise Data Modeling
• Financial Data Models
• Risk Data Models
• Dimensional Modeling
• Semantic Layer Design
Data Governance • Data Quality
• Data Lineage
• Metadata Management
• Audit Controls
• Data Reconciliation
• Security & Compliance
Cloud Platforms • Microsoft Azure
• Amazon Web Services (AWS)
DevOps & Automation • Git
• Terraform
• Jenkins
• Azure DevOps
• CI/CD Pipelines
• Apache Airflow
• Databricks Workflows
Preferred Tools • dbt
• Delta Live Tables (Preferred)
• Unity Catalog (Preferred)
Nice to Have • Experience in Banking, Financial Services, Insurance (BFSI), or Capital Markets.
• Strong knowledge of Regulatory Reporting and Finance Data Platforms.
• Experience with:
• SAP Finance
• Oracle Financials
• SAP S/4HANA
• Exposure to AI/ML data engineering workloads.
• Databricks Certified Professional certifications.
• AWS or Azure Cloud Certifications.
Leadership Competencies • Enterprise Architecture Leadership
• Technical Mentorship
• Strategic Thinking
• Executive Communication
• Stakeholder Management
• Solution Architecture
• Cross-functional Team Leadership
• Decision Making
• Problem Solving
• Innovation & Continuous Improvement
Business Impact In this role, you will:
• Lead enterprise-wide Cloudera to Databricks transformation initiatives.
• Shape the organization's next-generation finance and risk data platform.
• Enable scalable analytics, regulatory reporting, and executive decision-making.
• Establish engineering standards for enterprise Lakehouse architecture.
• Drive cloud modernization and enterprise data transformation programs.
Work Location Toronto, ON (Onsite 4 Days per Week)
Experience: 12 18 Years
Key Skills Databricks, Apache Spark, PySpark, Spark SQL, Delta Lake, Databricks SQL, Data Lakehouse, Enterprise Data Warehouse, Cloudera, CDH, CDP, Hive, Impala, HDFS, Data Modeling, Finance Data, Risk Data, AWS, Azure, Airflow, dbt, Git, Terraform, Jenkins, Azure DevOps, Medallion Architecture, Data Governance, CI/CD