Mahadi N.

Week 10 of Learning in Public with dltHub: Exploring Schema and Airflow Orchestration

👨🏽‍💻 Week 10 of Learning in Public with dltHub is up! This time we explored Schema in dlt — how data is structured into tables, columns, and types — and how dlt keeps it updated automatically as data evolves. We also touched on Airflow, showing how orchestration ties your pipelines together. 👉 Full post here: https://www.linkedin.com/posts/mahadi-nagassou-850a87254_week-10-of-learning-in-public-with-dlthub-activity-7382384938458894336-OEIF

Week 9 of Learning in Public with dltHub: Exploring State Management and Airflow Orchestration

👨🏽‍💻 Week 9 of Learning in Public with dltHub is up! This time we looked at State in dlt (how pipelines “remember” past runs) and took our first steps with Airflow for orchestration. 👉 Full post here:https://www.linkedin.com/posts/mahadi-nagassou-850a87254_week-9-of-learning-in-public-with-dlthub-activity-7379888544795770880-pOxT?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link

Week 8: Exploring Destinations in dlt for Data Loading and Integration

👨‍💻 Week 8: Destinations in dlt This week we explored destinations — the final stop where your data lands. With dlt, destinations can be databases (DuckDB, Postgres, BigQuery, Snowflake), cloud storage, or even custom APIs. What we covered: ✅ Declare destinations in 3 simple ways ✅ Configure credentials via TOML, env vars, or secrets ✅ Inspect and adjust destination capabilities ✅ Run extract → normalize → load in stages ✅ Handle naming conventions & case sensitivity ✅ Create a custom reverse-ETL destination 📦 Link :https://www.linkedin.com/posts/mahadi-nagassou-850a87254_week-8-of-learning-in-public-with-dlthub-activity-7377296988062588928-4WWy?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link

Week 7: Understanding Pipelines in dlt for Data Extraction, Normalization, and Loading

👨‍💻 Week 7: Pipelines in dlt This week we explored how pipelines — the backbone of dlt — move data from sources to destinations in 3 steps: Extract → Normalize → Load. With dlt.pipeline(), you define once, and dlt takes care of schema, state, and safe loading. Clean, reusable, and production-ready. LinkedIn:https://www.linkedin.com/posts/mahadi-nagassou-850a87254_week-7-pipelines-in-dltdlthub-over-activity-7374839397965352961-NfCy?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link

Week 6: Exploring Sources in dlt for Modular and Reusable Data Pipelines

👨‍💻 Week 6: Sources in dlt — Organizing Your Data This week we go deeper into how sources work in dlt. A source is a logical grouping of resources (like API endpoints, tables, or files) that makes pipelines modular and reusable. ✅ Define resources with @dlt.resource (e.g., users, posts, comments) ✅ Group them into a source with @dlt.source ✅ Filter, transform, or limit data directly inside a source ✅ Run your pipeline and load structured data into DuckDB or the cloud ✨ Why it matters: Sources keep your pipelines clean and flexible. Instead of writing separate scripts for every endpoint, you group them once and reuse everywhere. 📦 Repo – Week 6: https://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-06_sources 🔗 LinkedIn:https://www.linkedin.com/posts/mahadi-nagassou-850a87254_week-6-exploring-sources-with-dlt-activity-7372533143440015360-Wgtx?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link

Week 5: Understanding How dlt Works Behind the Scenes – Extract, Normalize, Load Explained

👨‍💻 Week 5: How dlt Works — Behind the Scenes This week we step back from coding and learn the theory of dlt. Every pipeline you run goes through 3 simple phases: ✅ Extract – pull raw data from your source ✅ Normalize – detect schema & flatten nested data ✅ Load – migrate it into your destination (DuckDB, BigQuery, etc.) ✨ Why it matters: Understanding these phases explains how dlt handles schema changes, incremental loads, and migrations — so you can focus on what data you need, not the plumbing. 📦 Repo – Week 5: [https://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-05_how-dlt-works] LinkedIn:https://www.linkedin.com/posts/mahadi-nagassou-850a87254_dataengineering-python-dlt-activity-7369681709837049861-hRP2?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link

Week 4 of Learning in Public with dlt: Building Modular Pipelines for Live GitHub Data Syncing

👨‍💻 Week 4 of Learning in Public with dlt is here! This week, we built powerful pipelines using dltHub to fetch live GitHub data, sync it incrementally, and handle deduplication — all with clean, modular Python code. 💡 What we learned: ✅ Connect to GitHub API with dlt ✅ Sync fresh data only with @dlt.incremental ✅ Use merge mode to update records cleanly ✅ Modularize with @dlt.source ✅ Build dynamic pipelines for multiple GitHub repos 🚀 Why it matters: We’re learning to build real pipelines that scale — ready for DuckDB, BigQuery, Snowflake & more with minimal changes. 🔗 Week 4 Repo: http://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-04_build-pipeline 📌 Follow the full journey: LinkedIn Post: https://www.linkedin.com/posts/mahadi-nagassou-850a87254_dataengineering-python-dlt-activity-7367151780440334336-LYEN?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link Let’s keep learning, building, and sharing — one pipeline at a time! 🌍💻

Loading local CSV files into DuckDB with dlt: Building modular data pipelines step by step

👨‍💻 Here is our third week of Learning in Public with dlt — an open-source Python library for building real-world data pipelines! This time, we focused on a common and practical use case: loading local CSV files into DuckDB using dlt. Step by step, we explored how to build a robust pipeline with modular components and real-world readiness. 💡 This week’s highlights ✅ Load multiple CSVs easily with dlt.filesystem() ✅ Use @dlt.transformer for chunked processing of large files ✅ Apply smart incremental loading with modification_date ✅ Merge datasets with schema and primary key awareness into DuckDB ✨ Why it matters: Even though this example uses local files, the exact same pattern works for cloud sources like AWS S3, GCS, or remote file systems — just by tweaking the config. We’re learning tools that scale! 🔗 GitHub Repo – Week 3: https://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-03_filesystem-to-duckdb 🔗 LinkedIn Post: https://www.linkedin.com/posts/mahadi-nagassou-850a87254_dataengineering-python-dlt-activity-7364642175470493696-NmZN?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link If you’ve been meaning to build a practical understanding of data engineering — this series is for you. Let’s keep learning and shipping together! 🚀

Data Engineering with DLT Week 2: Loading SQL Database into DuckDB Using Python

Hey everyone 👋, I’m continuing my public learning journey with dlt Each week, I share a step-by-step project — from beginner to advanced — with code, explanations, and real-world use cases. 📦 Week 2 is live: Load data from a SQL database (Rfam MySQL) into DuckDB in just a few lines of Python. 👉 Along the way: credentials handling, multiple tables, and different loading strategies (append, replace, merge, incremental). 🔗 Join the journey & follow along: LinkedIn Post: https://www.linkedin.com/posts/mahadi-nagassou-850a87254_dataengineering-python-dlt-activity-7362246228828471296-eHWC?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link GitHub Repo – Week 2: https://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-02_sql-from-db