Near Real Time Data Warehousing with Apache Spark and Delta Lake - Jasper Groot (Eventbrite)

Databricks
Databricks
12.1 هزار بار بازدید - 5 سال پیش - Timely data in a data
Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution. Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we'll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We'll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...

Connect with us:
Website: https://databricks.com
Facebook: Facebook: databricksinc
Twitter: Twitter: databricks
LinkedIn: LinkedIn: databricks
Instagram: Instagram: databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...
5 سال پیش در تاریخ 1398/07/30 منتشر شده است.
12,193 بـار بازدید شده
... بیشتر