π¨βπ» Here is our third week of Learning in Public with dlt β an open-source Python library for building real-world data pipelines!
This time, we focused on a common and practical use case: loading local CSV files into DuckDB using dlt.
Step by step, we explored how to build a robust pipeline with modular components and real-world readiness.
π‘ This weekβs highlights
β
Load multiple CSVs easily with dlt.filesystem()
β
Use @dlt.transformer for chunked processing of large files
β
Apply smart incremental loading with modification_date
β
Merge datasets with schema and primary key awareness into DuckDB
β¨ Why it matters:
Even though this example uses local files, the exact same pattern works for cloud sources like AWS S3, GCS, or remote file systems β just by tweaking the config. Weβre learning tools that scale!
π GitHub Repo β Week 3:
https://github.com/1997mahadi/Data-Engineering-with-dlt/tree/main/week-03_filesystem-to-duckdb
π LinkedIn Post:
https://www.linkedin.com/posts/mahadi-nagassou-850a87254_dataengineering-python-dlt-activity-7364642175470493696-NmZN?utm_medium=ios_app&rcm=ACoAAD7Os_wBFAAF1g7fKTnQFhj7G6uWNYukvWw&utm_source=social_share_send&utm_campaign=copy_link
If youβve been meaning to build a practical understanding of data engineering β this series is for you.
Letβs keep learning and shipping together! π