Dataeng
Repository fo Data Engineering Course
Install / Use
/learn @DataSystemsGroupUT/DataengREADME
Data Engineering:
Repository for the Data Engineering Course (LTAT.02.007)
<img src="https://upload.wikimedia.org/wikipedia/en/3/39/Tartu_%C3%9Clikool_logo.svg" width="250"><img src="./attachments/logo_dsg_vettoriale.png" width="250">
Graph View

Lecturer: Riccardo Tommasini, PhD
Teaching Assistants:
Home Page
Forum
Moodle
Acknowledgments
Special Thanks to Emanuele Della Valle and Marco Brambilla from Politecnico di Milano to letting me "steal" some of their great slides.
Overview
- Data Engineer
- [[Data Lifecycle]]
- Data Collection (Taught not tested)
- Data Processing
- [[Data Analysis]] (Extra Points - Suggestion for Data Science Project in Spring)
- Data Processing (from ETL to Data Pipelines, Introduce Big Data)
- Data Ingestion (Files, HDFS, MongoDB)
- [[Cleansing | Data Pre-Processing ]] (Python)
- Data Transformation (Airflow)
- [[Data Modeling]]
- [[ systems/Apache Hadoop | Parallel Processing ]]
- [[Data Serving]]
- [[Data Visualisation]]
- [[Querying]]
- Steaming Data Pipelines
- [[Apache Kafka | Streaming Data Ingestion ]]
- [[Streaming Data Pre-Processing | Cleansing]] (Java)
- Streaming Data Transformation (Java/SQL)
- [[Event Sourcing | Data Modelling for Data Streams]]
Course Goals: Build a data pipeline about: Internet Memes
Lectures
| Date | Title | Material | Mandatory Reads | Extras |
|-------|--------------------|----------|-----------------|--------|
| 01/09 | Course Intro | Slides - pdf slide 45-109) | ||
| 03/09 | Data Modeling | Slides - pdf slide 1-44 | Chp 4 p111-127, Chp 5 p151-156, Chp 6 p199-205 of [3]
| 10/09 | DM for Relational Databases | Slides - pdf slide 45-109 | Chp 2, 6, and 7 (Normal Forms) of [1] | Relational Model |
| 10/09 | DM for Data Warehouse | Slides - pdfslide 109-118| pdf video| Chp 2 of [2] |
| 17/09 | DM for Big Data | Slides - pdf| Chp 2 of [3], video|paper|
| 17/09 | Key Value Stores |Slides 1,Slides 2pdf||nosql|
| 24/10 | Column Oriented Databases |Slides 1 Slides 2 pdf||nosql|
| 24/10| Document Databases |Slides 1 Slides 2 pdf||nosql|
| 01/10| Graph Databases |Slides 1 Slides 2 pdf1 pdf2|Chp 3 and 5 of [5]|book|
| 08/10| Data Ingestion |Slides 1 Slide 2 Slide 3 Slide 4|||
| 15/10| Part 1 Recap |Slides 1 pdf|||
| 22/10| Midterm |||||
| 29/10| Data Engineering Pipelines (Part1) |Slides 1 slide 2 pdf|||
| 05/11| Data Engineering Pipelines (Part2) |Slides 1 Slides 2 Slides 3|Chp 10 of 3 R. Chang Pt 2 R. Chang Pt 3||
| 12/11| Streaming Data (Part 1) |Slide 1 Slide 2|Chp 11 of 3 Streaming 101 Streaming 102||
| 19/11| Data Journey|Slides|||
| 26/11| Streaming Data (Part 2) |Slide 1 Slide 2|||
| 03/12| Data Wrangling (Part 1) |pdf|||
| 10/12| Data Wrangling (Part 2) |pdf|||
Practices (Videos Will be Available after Group 2 issue)
| Date | Title | Material | Reads | Videos | Branch | Notes | |----------|-------------|----------|-------|-------|-------|----| | 07-8/09 | Docker | Slides - | |Video GP1 Video GP2 |Lab Branch | QA GP2 only | 14-15 /09 |Modeling and Querying Relational Data with Postgres|Slides|Chp 32 of [1]§ |Video|Homework 1|| | 21-22 /09 |Modeling and Querying Key Value Data with Redis|Slides||Video|Homework 2|| |28-29/09 |Modeling and Querying Document Data with MongoDB|Slides||Video|Homework 3|| |5-6/10 | Modeling and Querying Graph Data with Neo4J|Slides|CypherManual|Video|Homework 4|| |19-20-26-27/10| Data Ingestion with Apache Kafka|Slides||Video 1 Video 2 [Video 3](https://panopto.ut.ee/Panopto/Pages/Viewer.aspx?id=b7073df1-19cf
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
