Data Engineering with DuckDb Tutorial | PySpark | SQL | Postgres | Python | ETL Data processing

Databracket
Databracket
528 بار بازدید - 2 ماه پیش - #dataengineering
#dataengineering #etl #pyspark #python
Learn DuckDB: A Superfast Python library that beats Pandas and offers Pyspark Capabilities with unlimited possibilities.

In this demo, we will witness how to connect to the Postgres SQL database and query data.
How to read CSV data to perform data analytics and data engineering.
Different transformations and actions of Pysprak and how DuckDB helps integrate spark functionality flawlessly. How to transform and write data to Postgres database. How DuckDB helps install database and connectivity extensions from an extensive collection.  How to perform end-to-end ETL using a blazingly fast Python library written in C++ programming language. End-to-end ETL pipeline to connect, extract, transform, and load data from and to Postgres SQL.

Code is available here: https://gist.github.com/Databracket9/...

00:00 - Introduction
01:45 - How to securely read and use environmental variables and secrets in Python using the ConfigParser library.
05:00 - How to install the Postgres extension and load it into DuckDB for connectivity and data analysis.
05:40 - Establising connectivity with Postgre SQL database using the connection string.
06:30 - Query SQL tables from Postgres
09:25 - How to read CSV files from DuckDB and load them as SQL views for data filtering.
12:20 - Import Experimental Pyspark functions to perform ETL data transformation.
14:00 - How to convert DuckDB class object into Pandas Dataframe.
14:18 - Create and Instantiate Pyspark Session.
14:35 - Convert Pandas DataFrame into PySpark Dataframe.
15:00 - Pyspark Transformation to filter and transform data.
18:35 - Write transformed data into Postgre SQL using DuckDB connection.

LET'S CONNECT!
🐦 Gumroad➔ https://databracket.gumroad.com/
📖Medium ➔ Medium: jay-reddy
📲 Substack➔ https://databracket.substack.com
📰 LinkedIn ➔ LinkedIn: jayachandra-sekhar-reddy
💁Fiverr ➔ https://www.fiverr.com/jayreddy9

#pythonprogramming #postgresql #sql #database #cplusplusprogramming #bigdata #data #dataanalytics #dataanalysis
2 ماه پیش در تاریخ 1403/02/21 منتشر شده است.
528 بـار بازدید شده
... بیشتر