Limited time · 90% off Premium Membership - claim $199 deal →
Mammoth Club All levels 9 sections 37 lectures

Databricks Certified Apache Spark Developer Exam Preparation with 10 Practice Exams

Processing massive datasets is a real engineering challenge. To do it efficiently, you need to master the industry standard for distributed computing: Apache Spark.

01
Skill level
All levels
02
Sections
9
03
Lectures
37
04
Instructor
Team Mammoth
What's inside

This course includes.

9
Sections
14
Quizzes
Certificate of completion
Included
Mobile and desktop access
Included
AI learning assistance
Included
Unlock all courses with our Subscription Bundle! Get unlimited access to entire course library, books and assets. Learn more and subscribe today!
Course content

Curriculum & lectures.

9 sections · 37 lectures
+ Section 0: Welcome! 3 lectures
Lecture 0.01 Welcome + What you will learn Locked
Lecture 0.02 Prerequisites Locked
Lecture 0.03 Introduction Locked
+ Section 1: Apache Spark Architecture and Components 7 lectures
Lecture 1.01: Advantages and Challenges of Implementing Spark Locked
Lecture 1.02: Core Components of Spark Architecture (Cluster, Driver, Executors, CPU & Memory) Locked
Lecture 1.03: Spark Architecture Details – DataFrames, Datasets, SparkSession Lifecycle, Caching & S Locked
Lecture 1.04: Spark Execution Hierarchy – Jobs, Stages, Tasks Locked
Lecture 1.05: Partitioning in Spark – Partitions, Shuffles, and Optimizing Data Distribution Locked
Lecture 1.06: Execution Patterns – Transformations, Actions, and Lazy Evaluation Locked
Lecture 1.07: Apache Spark Modules – Core, Spark SQL, DataFrames, Pandas API, Structured Streaming, Locked
+ Section 2: Using Spark SQL 4 lectures
Lecture 2.01: Reading and Writing Data with Spark SQL (Data Sources, JDBC, Partitioning, Overwrite) Locked
Lecture 2.02: Querying Files Directly with Spark SQL (ORC, JSON, CSV, Text, Delta) and Save Modes Locked
Lecture 2.03: Persistent Tables, Sorting and Partitioning for Optimized Data Retrieval Locked
Lecture 2.04: Temporary Views and SQL Queries on DataFrames Locked
+ Section 3: Developing DataFrame/Dataset API Applications 10 lectures
Lecture 3.01: Column and Row Manipulation – Adding, Dropping, Renaming, Splitting, and Filtering Locked
Lecture 3.02: Data Deduplication and Validation Locked
Lecture 3.03: Aggregations – Count, Approximate Count Distinct, Mean, and Summary Stats Locked
Lecture 3.04: Working with Dates and Timestamps Locked
Lecture 3.05: Combining DataFrames – Joins (Inner, Left, Broadcast, etc.), Unions, and Set Operation Locked
Lecture 3.06: Input/Output Operations – Reading, Writing, and Schemas Locked
Lecture 3.07: Misc DataFrame Operations – Sorting, Iteration, Schema Inspection, Conversion Locked
Lecture 3.08: User-Defined Functions (UDFs) and Stateful Operations (incl. StateStore) Locked
Lecture 3.09: Shared Variables – Broadcast Variables and Accumulators Locked
Lecture 3.10: Broadcast Joins – Purpose and Implementation Locked
+ Section 4: Troubleshooting and Tuning DataFrame Applications 3 lectures
Lecture 4.01: Performance Tuning Strategies – Partitioning, Repartitioning, Coalescing, Data Skew Locked
Lecture 4.02: Adaptive Query Execution (AQE) and Its Benefits Locked
Lecture 4.03: Logging and Monitoring – Driver & Executor Logs, Diagnosing Errors and Utilization Locked
+ Section 5: Structured Streaming 4 lectures
Lecture 5.01: Structured Streaming Engine – Model, Micro-Batch Processing, Exactly-Once Semantics Locked
Lecture 5.02: Creating and Writing Streaming DataFrames – Output Modes and Sinks Locked
Lecture 5.03: Operating on Streaming DataFrames – Selection, Projection, Windows, Aggregation Locked
Lecture 5.04: Streaming Deduplication – With and Without Watermark Locked
+ Section 6: Using Spark Connect to Deploy Applications 2 lectures
Lecture 6.01: Spark Connect – Features and Architecture Locked
Lecture 6.02: Deployment Modes – Client vs Cluster vs Local Locked
+ Section 7: Using Pandas API on Spark 2 lectures
Lecture 7.01: Advantages of Pandas API on Spark Locked
Lecture 7.02: Creating and Using Pandas UDFs Locked
+ Section 8: Test Your Knowledge! 2 lectures
Lecture 08.01 Summary Locked
Lecture 08.02 Test Your Knowledge! Locked
Description

About this course.

This program teaches you to build reliable and performant data processing applications by mastering the fundamentals of the Spark architecture and its core programming APIs, including the DataFrame API and Spark SQL.


✅ Understand the fundamentals of the Spark architecture, including how applications are executed on a cluster.

✅ Learn to perform core data manipulation tasks using the DataFrame API, including selecting, filtering, and aggregating data.

✅ Work with complex data types, handle missing data, and join multiple DataFrames to answer sophisticated questions.

✅ Use Spark SQL and a wide range of built-in functions to query and transform data effectively.


Whether you are a data engineer building ETL pipelines or a data scientist performing large-scale data analysis, this course provides the foundational Spark knowledge required to work with big data effectively.


🎁 Includes 10 full-length practice exams. Solidify your understanding of the Spark APIs. Code with confidence.


If you're ready to prove your ability to handle large-scale data processing challenges and build a core skill for any data role, this course is your developer guide.

Ready to start building?

Processing massive datasets is a real engineering challenge. To do it efficiently, you need to master the industry standard for distributed computing: Apache Spark.

Buy lifetime access →