Spark Basics | Shuffling

Palantir Developers
Palantir Developers
13.5 هزار بار بازدید - 2 سال پیش - Spark is a distributed computing
Spark is a distributed computing system that is used within Foundry to run data transformations at scale. This series covers the core Spark concepts you need to know for working with data in Foundry. This video builds on an understanding of data partitions (link below) to introduce shuffling, which is the process of rearranging data across partitions, and demonstrate how minimizing shuffling for a job can be used to reduce compute costs. Spark Basics | Partitioning:    • Spark Basics | Partitions  
2 سال پیش در تاریخ 1401/07/26 منتشر شده است.
13,554 بـار بازدید شده
... بیشتر