Kurz PySpark – Big Data Analysis in Databricks (PYSPARK1)

[ Python ]

PySpark – Big Data Analysis in Databricks (PYSPARK1)

Databases, Data Analytics

Do you work with Excel, Power Query, SQL or Pandas but need to process gigabytes to terabytes of data? PySpark is the Python interface to Apache Spark, a big data engine for scalable, distributed processing that can handle datasets larger than one machine's memory and speed up analysis.

The workshop runs entirely in Databricks Community Edition in your browser - no local setup required. You'll learn the DataFrame API and Spark SQL, work with notebooks, clusters and data uploads, and apply familiar SQL skills to scale analyses to large datasets.

Location, current course term

Praha + online (volitelně)

4/14/2026 - 4/15/2026 CZECH
Order

7/9/2026 - 7/10/2026 CZECH
Order

10/1/2026 - 10/2/2026 CZECH
Order

Custom

Customized Training (date, location, content, duration)

The course:

Hide detail

Getting started with Databricks

What PySpark is and when to use it
Creating an account in Databricks Community Edition
Navigating the environment – workspace, notebooks, cluster
Uploading data to Databricks

DataFrame – basic operations

Creating DataFrames
Schema and data types
Selecting columns (select)
Filtering rows (filter, where)
Adding and transforming columns (withColumn)

Spark SQL

Registering a DataFrame as a table (createTempView)
Running SQL queries on data (spark.sql)
Combining DataFrame API and SQL
Using SQL functions in the DataFrame API

Data sources

CSV files
Parquet – optimal format for Spark
JSON files
Delta Lake (basics)

Data processing

Column and type transformations
Handling missing values (null)
Joining tables (join)
Combining datasets (union)

Data aggregation

Grouping (groupBy)
Aggregate functions (count, sum, avg, min, max)
Multiple aggregations at once (agg)
Pivot tables

Troubleshooting

Reading PySpark error messages
Common errors: data types, missing columns
Data checks and debugging

Outputs and exporting data

Saving to files (CSV, Parquet)
Visualizations in Databricks
Downloading results

Assumed knowledge:: Basic Python (variables, loops, functions); experience with SQL, Excel, Power Query or Pandas is advantageous.
Recommended previous course:: Python – Programming Basics (PYTH1)
Schedule:: 2 days (9:00 AM - 5:00 PM )
Course price:: 432.00 € ( 522.72 € incl. 21% VAT)
Language:

Quick links

Course Categories

PySpark – Big Data Analysis in Databricks (PYSPARK1)

Databases, Data Analytics

The course: