Real-Time Forecasting at Scale using Delta Lake and Delta Caching

Databricks
Databricks
3.1 هزار بار بازدید - 4 سال پیش - GumGum receives around 30 billion
GumGum receives around 30 billion programmatic inventory impressions amounting to 25 TB of data each day. Inventory impression is the real estate to show potential ads on a publisher page. By generating near-real-time inventory forecast based on campaign-specific targeting rules, GumGum enables the account managers to set up successful future campaigns. This talk will highlight the data pipelines and architecture that help the company achieve a forecast response time of less than 30 seconds for this scale. Spark jobs efficiently sample the inventory impressions using AMIND sampling and write to Delta Lake. We will discuss the best practices and techniques to make efficient use of Delta Lake. GumGum caches the data on the cluster using Databricks Delta caching, which supports accelerated reads, reducing IO time as much as possible, and this talk will detail the advantages of Delta Lake caching over conventional Spark caching. We will talk about how GumGum enables time series forecasting with zero downtime for end users using auto ARIMA and sinusoids that can capture the trends in the inventory data, and will cover in detail AMIND sampling, Delta Lake to store the sampled data, Databricks Delta Lake caching for efficient reads and cluster use, and time series forecasting.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...

Connect with us:
Website: https://databricks.com
Facebook: Facebook: databricksinc
Twitter: Twitter: databricks
LinkedIn: LinkedIn: databricks
Instagram: Instagram: databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...
4 سال پیش در تاریخ 1399/05/14 منتشر شده است.
3,138 بـار بازدید شده
... بیشتر