Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, participant profile alignment, and success criteria
- High-level migration approaches and risk considerations
- Setting up workspaces, repositories, and lab datasets
Day 1 — Migration Fundamentals and Architecture
- Lakehouse concepts, Delta Lake overview, and Databricks architecture
- SMP vs MPP differences and implications for migration
- Medallion (Bronze→Silver→Gold) design and Unity Catalog overview
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure to a notebook
- Mapping temp tables and cursors to DataFrame transformations
- Validation and comparison with original output
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution
- OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning
Day 2 Lab — Incremental Ingestion & Optimization
- Implementing Auto Loader ingestion and MERGE workflows
- Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results
- Measuring read/write performance improvements
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, JSON/array handling
- Reading the Spark UI, DAGs, shuffles, stages, tasks, and bottleneck diagnosis
- Query tuning patterns: broadcast joins, hints, caching, and spill reduction
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactor a heavy SQL process into optimized Spark SQL
- Use Spark UI traces to identify and fix skew and shuffle issues
- Benchmark before/after and document tuning steps
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
- Transforming loops and cursors into vectorized DataFrame operations
- Modularization, UDFs/pandas UDFs, widgets, and reusable libraries
Day 4 Lab — Refactoring Procedural Scripts
- Refactor a procedural ETL script into modular PySpark notebooks
- Introduce parametrization, unit-style tests, and reusable functions
- Code review and best-practice checklist application
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling
- Designing incremental Medallion pipelines with quality rules and schema validation
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assemble Bronze→Silver→Gold pipeline orchestrated with Workflows
- Implement logging, auditing, retries, and automated validations
- Run full pipeline, validate outputs, and prepare deployment notes
Operationalization, Governance, and Production Readiness
- Unity Catalog governance, lineage, and access controls best practices
- Cost, cluster sizing, autoscaling, and job concurrency patterns
- Deployment checklists, rollback strategies, and runbook creation
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned
- Gap analysis, recommended follow-up activities, and training materials handoff
- References, further learning paths, and support options
Requirements
- An understanding of data engineering concepts
- Experience with SQL and stored procedures (Synapse / SQL Server)
- Familiarity with ETL orchestration concepts (ADF or similar)
Audience
- Technology managers with a data engineering background
- Data engineers transitioning procedural OLAP logic to Lakehouse patterns
- Platform engineers responsible for Databricks adoption
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 8000 € + VAT*
Contact us for an exact quote and to hear our latest promotions
Testimonials (1)
All the topics covered, although many were very quick, give us an idea of what we will need to delve into further. Additionally, I liked that we got to do some hands-on practice, although I still believe the course deserves more.
Sandra Mariela Lopez Bernal - Kueski
Course - Databricks
Machine Translated