50.000,00 EGP
-
LevelAll Levels
-
Total Enrolled3
-
Duration50 hours 28 minutes
-
Last UpdatedJune 22, 2026
Hi, Welcome back!
50.000,00 EGP
-
LevelAll Levels
-
Total Enrolled3
-
Duration50 hours 28 minutes
-
Last UpdatedJune 22, 2026
Course content:
Introduction to Big Data and Data Engineering – part1
Big Data needs Data Engineering because raw data is too large, fast, and messy to process or use directly.
Data Engineering solves this by building scalable systems (scale-out) to collect, store, and process data efficiently.
Introduction to Big Data and Data Engineering – part2
Data is growing continuously because of social media, IoT, and digital systems, so Big Data is large, fast, and diverse data that has 5Vs (Volume, Velocity, Variety, Veracity, Value).
We handle it using Batch Processing for large historical data and Real-Time Analytics for instant insights and fast decision-making.
-
02:15
-
02:13
-
01:23
-
02:17
-
05:05
-
01:33
-
01:32
-
02:37
Introduction to Big Data and Data Engineering – part3
Big Data faces challenges like huge volume, variety, and processing complexity, so systems like OLTP, Data Warehouses, Data Lakes, and Lakehouses are used to manage it.
ETL transforms data before loading, while ELT loads data first then transforms it for modern big data processing.
-
Big data challenges
00:54 -
OLTP
02:15 -
Data warehouse
03:38 -
Data Lake
01:38 -
Data Lakehouse
01:38 -
Schema-on-write
01:20 -
Schema-on-read
01:38 -
ETL
01:05 -
ELT
01:20 -
Understand the role of network infrastructure in big data
01:20
Introduction to Big Data and Data Engineering – part4
-
Introduction to Distributed Systems in Big Data
02:09 -
Why Big Data Needs Distributed Systems
01:19 -
Basics of Distributed Computing
01:39 -
How Distributed Systems Work
01:07 -
Key Concepts in Distributed Systems-part1
01:37 -
Key Concepts in Distributed Systems-part2
00:48 -
Role in Big Data Architecture
00:51 -
Single Machine vs Distributed Systems
00:43 -
Scalability
00:57 -
Fault Tolerance
01:04 -
Parallel Processing
00:48 -
Data Distribution
00:46
Data Engineering with SQL & Python
-
Introduction
03:34 -
IDE
02:38 -
Anaconda
12:38 -
Install anaconda
16:31 -
Jupyter notebook
15:40 -
Print
10:40 -
Numbers
03:07 -
Variables
05:54 -
Strings
01:13 -
Strings method
19:52 -
Data Structures
05:48 -
lists
01:00 -
Dictionaries
02:44 -
Tuples
02:21 -
sets
03:03 -
Booleans
04:05 -
Comparisons Operators
03:20 -
Conditional statement If, Else, Elif Statements
03:27 -
Loops for loop and while loop
08:30 -
Functions
05:06 -
introduction to pandas
00:59 -
Data Frame
01:36 -
Create DataFrame
04:27 -
Working with columns
07:16 -
Working with Rows
08:57 -
Subsets
03:32 -
working with files
05:48 -
Method1
04:23 -
Method2
22:25 -
summarize data
00:51 -
View the first few rows of a DataFrame
01:47 -
View the last few rows of a DataFrame
00:46 -
Get summary of the DataFrame
00:57 -
Generate summary statistics for numerical columns
00:50 -
Get the number of rows and columns in a DataFrame
00:46 -
View the column names of the DataFrame
00:17 -
Access the index of the DataFrame
00:29 -
Check the data types of each column
00:24 -
Check for missing values in the DataFrame
01:12 -
Remove rows with missing values
01:21 -
Rename columns or rows in the DataFrame
00:56 -
Sample method
01:09 -
Sort the DataFrame
03:10 -
Group data by one or more columns to perform aggregation
05:57 -
Merge two DataFrames
03:35 -
Create a pivot table for summarizing data
02:02 -
file format
03:55 -
CSV file
03:23 -
Excel file
01:48 -
Json file
02:25 -
Parquet file
01:21 -
XML file
00:45 -
Data Quality
00:39 -
handling missing value
08:07 -
duplicated
02:12 -
Data Consistency
01:34 -
Data Transformation and Normalization
03:51 -
Validating Data Types
02:07 -
Fixing Data Entry Errors
02:32 -
Consistency in Categorical Data
01:13 -
Standardizing Data Formats
02:20 -
Data Types conversion
01:00 -
Data Validation and Verification
01:09 -
Handling Categorical Variables
02:00 -
Project 1 : Bank Transactions
22:48 -
project 2 : Customer Purchases
23:13 -
Project3 : Payroll analysis
16:01 -
Introduction to Database
02:19 -
Table
01:09 -
schema
00:41 -
Relational Database
01:05 -
A primary key and A foreign key
02:47 -
Relationships between Tables
01:04 -
One-to-One
00:27 -
One-to-Many
00:28 -
Many-to-Many
00:22 -
RDBMS
00:55 -
SQL
00:52 -
Type of sql commands
01:56 -
DDL – Data Definition Language
00:50 -
DQL – Data Query Language
00:47 -
DML – Data Manipulation Language
00:42 -
DCL – Data Control Language
00:38 -
TCL – Transaction Control Language
00:40 -
SQL Server Management Studio (SSMS)
01:06 -
Install SQL Server Developer and SQL Server Management Studio (SSMS)
10:38 -
SSMS Overview
03:38 -
Transportation database
06:40 -
Sql server data types
06:40 -
SELECT
01:49 -
AND – OR conditions
04:01 -
ORDER BY
02:07 -
DISTINCT
00:42 -
Between
00:36 -
IN
01:41 -
like
02:00 -
Introduction to Joins
02:17 -
Inner JOIN
04:05 -
Left JOIN
02:57 -
Right JOIN
01:21 -
Full JOIN
01:51 -
Aggregation functions and Groupby
03:15 -
COUNT
00:47 -
SUM
00:59 -
AVG
00:28 -
MAX
00:25 -
MIN
00:17 -
GROUB BY
01:18 -
HAVING
01:08 -
String Function
03:04 -
Date Functions
05:32 -
Conversion Functions
01:49 -
Subquery
00:40 -
Noncorrelated Subqueries
01:44 -
Correlated Subqueries
01:13 -
Window functions
00:50 -
ROW_NUMBER()
02:02 -
RANK ()
01:56 -
DENSE_RANK()
01:39 -
CASE statement
02:09 -
A Common Table Expression
01:22 -
Project : Transportation Analysis project
23:53
Hadoop Production Deployment & Cluster Setup
-
Why do we develop something
03:34 -
Challenges in Data Processing Before Hadoop
02:38 -
Hadoop History
12:38 -
Introduction to Hadoop
16:31 -
Hadoop Ecosystem Overview
15:40 -
Hadoop Architecture-part1
10:40 -
Hadoop Architecture-part2
03:07 -
Hadoop Architecture-part3
05:54 -
Hadoop Architecture-part4
01:13 -
Hadoop Architecture-part5
19:52 -
HDFS Architecture – NameNode
05:48 -
HDFS Architecture – DataNode
01:00 -
Read Operation and Write Operation
02:44 -
Block and Replication
02:21 -
Secondary NameNode Role
03:03 -
HDFS Block Storage
04:05 -
Fault Tolerance in HDFS
03:20 -
Data Locality Concept
03:27 -
Hadoop Installation (virtualbox) + Services + HDFS Commands Practice
08:30 -
Start Hadoop Services
05:06 -
jps command
00:59 -
Stop Hadoop Services
01:36 -
NameNode UI (HDFS Web Interface)
04:27 -
Basic Navigation Commands
07:16 -
File Upload & Download Commands
08:57 -
File Viewing Commands
03:32 -
File & Directory Management
05:48 -
System & Advanced Commands
02:41 -
HDFS High Availability & Rack Awareness Architecture
22:25 -
Apache ZooKeeper: Architecture, Coordination & Distributed Leadership in Hadoop
04:04 -
Project 1 : HDFS Small Files Optimization
06:18 -
Project 2 : HDFS Block Size & Replication
05:57 -
Project 3 : HDFS File Format & Compression Optimization
08:27
Enterprise Data Engineering with Apache Spark
-
Challenges in Big Data Before Apache Spark and Understanding Apache Spark
20:02 -
Differences Between Spark 2.x and Spark 1.x
05:17 -
Apache Spark 2.x Architecture
12:41 -
Fault Tolerance & Scalability: Hadoop vs Spark
06:05 -
Spark setup
03:42 -
PySpark Setup
01:41 -
Spark UI
01:59 -
Spark UI – Test Spark Job
05:39 -
Jobs details
11:32 -
Stage details – 0
11:23 -
Stage details – 1
09:38 -
Stage details – 2
03:32 -
Stage details – 3
07:36 -
Stage details – 4
06:34 -
Executors Tab
00:58 -
Environment Tab
00:54 -
Storage Tab
01:08 -
SQL Tab and SQL metrics
00:59 -
Structured Streaming Tab
02:02 -
JDBC/ODBC Server Tab
01:17 -
Shuffle read and Shuffle write in Spark
12:12 -
PySpark RDD API and RDD (Resilient Distributed Dataset)
05:10 -
RDD Transformations
08:30 -
RDD Actions
02:34 -
Run (Spark Job Execution Flow)
00:54 -
DAG (Directed Acyclic Graph)
00:47 -
Stages and Tasks
01:09 -
Partitioning Strategy
01:35 -
Caching & Persisting
01:18 -
Performance Tuning Basics
02:20 -
PySpark DataFrame API
02:34 -
DataFrame API
01:08 -
Dataset API
01:03 -
Schema Inference
11:51 -
Schema Enforcement
02:25 -
Column Operations
30:15 -
Filtering & Aggregation
11:08 -
Normal join – Broadcast join – production problem
17:36 -
Handling Null Values
06:46 -
UDF (User Defined Functions)
12:14 -
Window Functions in pyspark
26:33 -
PySpark SQL and Spark SQL
11:12 -
Spark Streaming and Structured Streaming
01:27 -
Streaming Sources (Kafka, Files)
01:55 -
Window Operations
04:59 -
Late data handling (watermark)
02:03 -
Stateful Streaming + Final Streaming Pipeline Architecture
09:00 -
Memory Management
03:53 -
Executor Memory Structure
02:48 -
JVM Heap Memory in Spark
02:15 -
Memory for Execution vs Storage
00:48 -
Unified Memory Management
01:25 -
Off-Heap Memory
01:37 -
Garbage Collection (GC) Impact
02:16 -
Partition Tuning
02:29 -
Reduce Shuffle Operations
01:20 -
Broadcast Join Optimization
00:58 -
Avoid Wide Transformations
00:48 -
Data Skew Handling
01:57 -
File Format Optimization (Parquet, ORC)
01:15 -
Predicate Pushdown
01:04 -
Column Pruning
00:43 -
Project 1: DataFrame API Performance Optimization Project
12:40 -
Project 2: Spark SQL Query Optimization Project
07:27 -
Project 3: Spark Memory Tuning & Resource Optimization
08:13 -
Project 4: Shuffle Handling & Optimization in Spark
04:30 -
Project 5: Data Skew Detection & Mitigation
03:11 -
Project 6: Broadcast Join Optimization
17:36
Introduction to Hive and Sqoop
-
Introduction to Hive
01:53 -
Hive Architecture
06:11 -
Hive Data Model
01:06 -
Hive Query Language (HQL)
00:32 -
Data Types in Hive
01:03 -
DAG
04:02 -
install virtualbox on windows
01:59 -
install putty software on windows
01:09 -
install winscp on windows
01:21 -
Install Cloudera and Setting Up Hadoop
09:20 -
Hive Installation
01:50 -
Create Database
02:30 -
Creating and Managing Tables in Hive
07:23 -
Loading Data into Hive Tables
12:55 -
Managed Table in hive
02:07 -
External Table in hive
07:28 -
hive query -HQL
09:49 -
The partitioning in Hive
00:50 -
Static Partitioning
04:08 -
Dynamic Partitioning
05:45 -
Hive Bucketing
09:52 -
HIVE JOINS
07:29 -
Hive Optimization Techniques
01:55 -
Apache Sqoop
02:21 -
Sqoop Architecture
00:53 -
Key Features of Sqoop
02:20 -
Sqoop Connectors
01:22 -
Sqoop Commands Overview
00:56 -
Sqoop Installation
00:36 -
MySQL Database
02:32 -
Importing Data from RDBMS to HDFS
06:22 -
Exporting Data from HDFS to RDBMS
09:23 -
Adding more mappers to a Sqoop
04:21 -
handling portions of data with Sqoop
06:29 -
Incremental Data Import in Sqoop
15:59 -
Data Compression with Sqoop
04:20 -
Avro format
03:53 -
SequenceFile format
03:16 -
Parquet format
02:46 -
Create sqoop job
04:14 -
Sqoop Performance Optimization
01:00
Kafka: From Zero to Production
-
Starting the Kafka Journey – part1
24:05 -
Starting the Kafka Journey – part2
12:26 -
Starting the Kafka Journey – part3
27:39 -
Starting the Kafka Journey – part4
18:30 -
Starting the Kafka Journey – part5
07:08 -
Starting the Kafka Journey – part6
47:43 -
What is Kafka
01:36 -
Event Streaming Concept
01:11 -
Kafka vs traditional messaging systems
02:44 -
Kafka ecosystem overview
01:57 -
Real-time Data Pipelines
00:39 -
Log Aggregation
00:39 -
Streaming Analytics
00:41 -
Event-driven Microservices
01:56 -
CDC (Change Data Capture)
01:01 -
Distributed event streaming platform
00:52 -
Publish/subscribe messaging
01:31 -
Durable commit log
00:42 -
Replayable events
00:58 -
High throughput architecture
00:44 -
Horizontal scalability
00:50 -
Zookeeper mode and KRaft mode
04:34 -
Lab 1 : Install Kafka
09:47 -
Lab 1 : Start broker
01:22 -
Lab 1 : Create topic
06:18 -
Lab 1 : Produce & consume messages
03:22 -
Lab 1 : Inspect logs
02:46 -
Kafka Architecture Overview
01:47 -
Producers
01:00 -
Consumers
01:24 -
Topics
00:45 -
Partitions
00:40 -
Brokers
00:51 -
Clusters
00:47 -
Kafka Log Structure
00:43 -
Event flow (routing)
01:08 -
Append-only log design
00:43 -
Partition-based scalability
01:02 -
Offset indexing
00:45 -
Distributed storage model
01:01 -
Event routing
00:56 -
Lab 2 :Create multi-partition topics
02:52 -
Lab 2 : Observe partition distribution
01:30 -
Lab 2 : Test message ordering
03:36 -
Lab 2 : Send keyed messages
03:49 -
Lab 2 : Explore broker storage directories
02:29 -
Broker Internals
01:49 -
Partitions and Replication Factor
01:19 -
Leader/Follower Model
01:15 -
ISR (In-Sync Replicas)
01:19 -
Leader Election
01:48 -
Fault Tolerance
01:19 -
Offset Concept
01:03 -
Ordering Guarantees
00:58 -
Durability Model
01:24 -
Data Consistency Model
01:01 -
Throughput vs Latency
01:07 -
Retention Policies
01:25 -
Log Segments
01:07 -
Replayability
01:17 -
Replicated partitions
01:02 -
Automatic failover
00:46 -
High durability storage
00:47 -
Segment-based logs
00:31 -
Time-based retention
00:42 -
Size-based retention
00:50 -
Lab 3 : Kafka Replication , ISR & Broker Failure
09:45 -
Producer API
02:14 -
Consumer API and Consumer Groups
01:22 -
Offset Commit Strategies
01:15 -
Rebalancing
01:40 -
Delivery Semantics
01:24 -
Idempotent Producer
01:53 -
Message Keys and Partitioning Logic
01:02 -
Batching and Compression
01:10 -
Retry Mechanisms – Error Handling – Dead Letter Queue (DLQ)
01:20 -
Parallel consumption
00:27 -
Offset management
00:47 -
Duplicate prevention
00:56 -
Exactly-once semantics
00:52 -
Lab 4 : Kafka Consumer Groups , Consumer lag – Rebalancing
11:27 -
Cluster Setup
01:14 -
Broker Configuration
01:21 -
Adding Brokers
01:12 -
Removing Brokers
00:59 -
Broker Replacement
00:59 -
Rebalancing Partitions
01:07 -
Rolling Restart
01:04 -
Kafka Upgrades
00:49 -
High Availability
00:57 -
ZooKeeper vs KRaft
00:51 -
Cluster Scaling
01:00 -
Network Tuning
00:40 -
Hardware Planning
00:47 -
Storage Planning
00:57 -
Failure Recovery
00:51 -
Multi-broker architecture
00:52 -
Partition reassignment
00:45 -
Zero-downtime upgrades
00:42 -
Cluster balancing
00:45 -
Lab 5 : Kafka Add/Remove Brokers, Scaling & Recovery
09:58 -
Retention Policies
02:42 -
Time-based Retention
01:04 -
Size-based Retention
01:12 -
Log Compaction
01:47 -
Segment Management
00:55 -
Offset Retention
00:47 -
Replication Tuning
01:24 -
Disaster Recovery
01:10 -
Data Durability Guarantees
01:24 -
Lab 6 : Configure retention rules
10:13 -
Kafka Security
09:52 -
Monitoring & Observability
07:09 -
Performance Tuning
07:09 -
Stream Processing
04:57 -
Kafka Integration Ecosystem
04:53 -
Project 1 : End-to-End Big Data Streaming Platform : Kafka – Spark – HDFS
46:09 -
Project 2 : Real-Time Data Pipeline using Kafka, Spark & Snowflake
47:00 -
Project 3 : End-to-End Big Data Streaming Platform with Apache Kafka, Apache Spark, PostgreSQL & Grafana
35:19 -
Project 4 : End-to-End CDC Pipeline with Apache Kafka, Debezium & PostgreSQL
58:07
Snowflake and dbt: Zero to Production Data Engineering
-
Cloud Data Warehousing Essentials
04:21 -
Getting Started with Snowflake Cloud
19:12 -
Snowflake as a SaaS Platform
03:08 -
Snowflake Account & Core Building Blocks
03:47 -
Snowflake Architecture & Execution Model
11:14 -
Databases & Table Structures in Snowflake
15:35 -
Time Travel & Data Recovery System
09:31 -
Schemas & Session Context Management
17:49 -
Data Integrity & Data Types
08:01 -
Zero-Copy Cloning & Data Replication
07:33 -
Stored Procedures & Automation Logic
05:10 -
Security, Roles & Access Control
13:35 -
Transactions & Data Consistency
06:55 -
Streams & Data Change Tracking
08:15 -
Task Automation & Workflow Scheduling
08:27 -
Automated data partitioning & incremental loading using snowflake tasks
07:23 -
Incremental load using snowflake tasks
08:50 -
SnowSQL & Command Line Operations
02:44 -
Snowflake COPY INTO Command for Data Loading and Unloading
06:11 -
External Storage
01:05 -
BI Integration with Power BI
03:53 -
Introduction to Modern Data Transformation
17:22 -
Data Modeling with DBT
07:37 -
Dynamic SQL with Jinja
05:21 -
Testing & Data Documentation
04:34 -
Seeds & Data Sources
07:52 -
Deployment & CI/CD Pipelines
05:44 -
DBT Best Practices & Optimization
05:58 -
Hooks & Workflow Extensions
05:48 -
Snapshots & Historical Tracking
06:40 -
DBT Packages & Ecosystem Extensions
01:32 -
Environment Setup & Prerequisites
05:40 -
Building the Snowflake Data Warehouse
06:33 -
Initializing the dbt Project
13:05 -
Configuring Snowflake Connection – Connecting dbt to Snowflake
02:03 -
Building Source Definitions in dbt
02:55 -
Creating Staging Models for Data Cleaning
05:04 -
Implementing Business Logic with Intermediate Models
00:45 -
Building Analytics-Ready Data Marts
01:14 -
Configuring dbt Project Settings
10:41 -
Running dbt Pipelines
04:49 -
Check Data Warehouse
05:28 -
Generate dbt Documentation
01:59 -
Building Final Retail KPI Reports
01:20 -
Additional Session 1 : DBT
01:19:35 -
Additional Session 2 : DBT
01:09:22 -
Project : Real-Time Analytics Pipeline with Kafka, Spark & Snowflake
47:00
Apache Airflow: From Basics to Production
-
What is Apache Airflow and Airflow architecture overview
15:13 -
Airflow Setup & Installation + lab
09:54 -
A DAG (Directed Acyclic Graph) + Lab
09:01 -
DAG structure in Airflow + lab
09:19 -
DAG parameters (start_date, schedule) + lab
01:56 -
Task definition basics + lab
01:58 -
Dependencies concept + lab
09:52 -
Scheduling
00:57 -
Cron expressions
01:58 -
Airflow scheduling system
01:12 -
Catchup vs no catchup
01:43 -
Backfill & Catchup
01:43 -
Timezones in scheduling
01:07 -
Manual vs automatic triggers
01:04 -
Scheduling + lab
08:45 -
Task Dependencies + Lab
05:15 -
Parallel tasks + Lab
02:51 -
Diamond Dependency + Lab
03:53 -
Branching workflows + Lab
04:38 -
Trigger rules + Lab
06:46 -
Data Passing , XCom and Push & pull mechanism
07:10 -
Variables in Airflow
02:46 -
Connections concept
05:31 -
Sensors (advanced use)
01:56 -
HTTP Operator
01:01 -
SQL Operator
01:27 -
Email alerts setup
34:29 -
Project 1 : End-to-End Real-Time Data Pipeline using Kafka, Spark, Hadoop & Airflow
09:42 -
Project 2 : End-to-End Data Pipeline with Kafka, Spark, HDFS, PostgreSQL & Airflow
01:02:43 -
Project 3 : End-to-End Data Engineering Platform with Hadoop, Spark, Airflow & dbt
41:05
Modern Data Lakehouse with Apache Iceberg
-
Introduction to Iceberg
11:27 -
Why Iceberg Was Created
07:14 -
Iceberg Architecture
12:44 -
Install Spark + enable Iceberg catalog and create first table
04:20 -
Building Your First Apache Iceberg Table with Spark and HDFS Catalog + Lab
10:19 -
Understanding Table Metadata
05:46 -
Iceberg Table Internals : Metadata JSON, Snapshots, and Version History + Lab
16:05 -
Exploring Iceberg Manifests, Snapshots, and File-Level Metadata + Lab
11:57 -
Hive Table Architecture and Partition Storage Analysis + Lab
10:36 -
Schema evolution + column IDs + backward/forward compatibility + snapshots + hidden partitioning + metadata inspection + Lab
13:11 -
Project : Event-Driven Lakehouse Pipeline: Kafka Ingestion, Spark Processing, Iceberg Storage
10:19
Building Real-Time Data Pipelines with Apache Flink
-
Introduction to Apache Flink
06:44 -
Real time vs Batch Processing
03:00 -
Dataflow Model
03:54 -
Sources (Data Input)
03:39 -
Transformations
01:35 -
Sinks (Data Output)
01:47 -
Parallelism (Distributed Processing)
02:53 -
Stateful Processing
08:15 -
Checkpointing (Fault Tolerance)
06:41 -
Event Time + Watermarks
09:30 -
Fault Tolerance + Recovery
02:06 -
Project : Building Real-Time Data Pipelines Using Kafka, Apache Flink & Flink SQL
05:17
End-to-End Data Flow Engineering with Apache NiFi
-
Introduction to Apache NiFi
03:53 -
Apache NiFi handles the big data challenges
01:33 -
The change from ETL to data streaming and how Apache NiFi fits in
01:32 -
Apache NiFi’s key features
02:30 -
How Apache NiFi Addresses Big Data Integration Challenges
00:50 -
When NiFi Might Not Be Ideal
01:10 -
NiFi and Big data tools
02:04 -
Flow-based programming (FBP)
02:36 -
Apache NiFi’s Main Components
02:42 -
NiFi GUI
01:25 -
Configuring and Tuning Apache NiFi
04:17 -
Scaling NiFi
00:54 -
Cluster Terminologies
01:18 -
Load Balancing Strategies
01:13 -
Benefits of Clustering
00:37 -
Case 1: Real-Time Operational Intelligence System
00:49 -
Case 2: Enterprise CDC Data Integration with Apache NiFi
03:52 -
Case 3: Data Governance & GDPR-Compliant Data Platform
02:13
Data Warehouse Design & Implementation
-
Evolution OLTP – DW – Data Lake → Lakehouse
02:03 -
OLTP Online Transaction Processing.
06:57 -
Data Warehouse (DW)
01:34 -
Why Do We Need a Data Warehouse
03:10 -
From OLTP to DW – ETL ELT Pipelines
01:41 -
Data Modelling – Star Schema
08:13 -
NORMALIZED (OLTP style – 3NF)
00:54 -
DENORMALIZED (Star Schema – Data Warehouse)
00:49 -
Normalize to write fast. Denormalize to read fast.
01:07 -
Historical Data and Time-Based Analysis
04:55 -
Why OLTP is BAD for Historical Analysis
00:57 -
Slowly Changing Dimensions (SCD)
02:22 -
Enables historical accuracy in reports.
02:49 -
Typical Data Warehouse Workloads
00:51 -
Enterprise Data Warehouse Technologies
01:06 -
OLTP vs Data Warehouse Limitations of Data Warehouses
00:51 -
Data Lake
02:33 -
Schema-on-Read
01:02 -
Data Lake Architecture
00:48 -
Data Zones in Data Lake
01:25 -
Use Cases for Data Lakes
00:37 -
Strengths and Weaknesses of Data Lakes
01:33 -
OLTP → DW → Data Lake
01:09 -
Lake house architecture
08:54 -
Enterprise Data Warehouse
04:28 -
Inmon vs Kimball vs Data Vault
05:14 -
Enterprise Analytical Requirements
03:43 -
Subject Areas & Conformed Dimensions
03:39 -
Data Domains & Ownership
03:33 -
Enterprise Bus Matrix
01:55 -
Advnaced topics
14:26 -
Project 0 Data warehouse for an airline sales analysis system
01:15 -
Project Step 1 – Business Understanding & Project Objectives
01:14 -
Project Step 2 – Airline Data Collection & CSV Preparation
08:51 -
Project Step 3 – HDFS Storage Layer Setup
01:10 -
Project Step 4 – Snowflake Data Warehouse Configuration
08:17 -
Project Step 5 – Staging Tables & Data Ingestion
06:53 -
Project Step 6 – Fact & Dimension Tables
04:40 -
Project Step 7 – ETL Pipeline Development
16:22 -
Project Step 8 – KPI Analytics, SQL Views & Stored Procedures
02:11 -
Project Step 9 – Managing DIM_CUSTOMER (SCD Type 2) History
05:50 -
Project Step 10 – Power BI Dashboard & Business Reporting
01:43
Distributed NoSQL with Apache Cassandra
-
Apache Cassandra
01:58 -
Distributed and Decentralized (Peer-to-Peer)
03:41 -
Cassandra architecture
03:07 -
Partitioning & Data Distribution
05:52 -
Replication
02:31 -
Tunable Consistency
04:24 -
Project : Distributed Real-Time Analytics System with Kafka, Cassandra & Airflow
45:40
What you will learn:
- Design and build scalable End-to-End Data Pipelines
- Develop batch and real-time data processing systems
- Implement modern Data Engineering architectures in production environments
- Build and manage Data Warehouses and ETL workflows
- Work with tools such as Python, SQL, Spark, Kafka, Airflow, DBT, and Snowflake
- Process large-scale data using distributed systems like Hadoop and Spark
- Apply best practices in data quality, governance, and monitoring
- Troubleshoot real-world issues such as pipeline failures and data inconsistencies
- Optimize data workflows for performance and scalability
- Build a strong, job-ready Data Engineering portfolio through real projects
Course requirements:
- A laptop with a stable internet connection
- Commitment to practice and complete projects
This course includes:
- Access to recorded sessions
- Live coaching and mentoring sessions
- Hands-on, production-level projects
- Pre-configured technical environment
- Real-world datasets and case studies
- Data pipeline and architecture templates
- Interview preparation resources
- Ongoing technical support
Reviews
No Review Yet
Instructor:
Eng Mohammed
Big Data Engineer and Data Consultant @ ISD Company