Course Outline

Introduction:

  • Apache Spark in Hadoop Ecosystem
  • Short intro for python, scala

Basics (theory):

  • Architecture
  • RDD
  • Transformation and Actions
  • Stage, Task, Dependencies

Using Databricks environment understand the basics (hands-on workshop):

  • Exercises using RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join
  • Caching strategies
  • Exercises using DataFrame API
  • SparkSQL
  • DataFrame: select, filter, group, sort
  • UDF (User Defined Function)
  • Looking into DataSet API
  • Streaming

Using AWS environment understand the deployment (hands-on workshop):

  • Basics of AWS Glue
  • Understand differencies between AWS EMR and AWS Glue
  • Example jobs on both environment
  • Understand pros and cons

Extra:

  • Introduction to Apache Airflow orchestration

Requirements

Programing skills (preferably python, scala)

SQL basics

 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 4800 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (3)

Upcoming Courses

Related Categories