Scaling Big Data Mining Infrastructure: The Twitter Experience

DataWorks Summit
DataWorks Summit
3.1 هزار بار بازدید - 11 سال پیش - The analytics platform at Twitter
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this talk, we`ll discuss the evolution of our infrastructure and the development of capabilities for data mining on "big data". One important lesson is that successful big data mining in practice is about much more than what most academics would consider data mining: life "in the trenches" is occupied by much preparatory work that precedes the application of data mining algorithms and followed by substantial effort to turn preliminary models into robust solutions. In this context, we`ll discuss two topics: First, schemas play an important role in helping data scientists understand petabyte-scale data stores, but they`re insufficient to provide an overall "big picture" of the data available to generate insights. Second, we observe that a major challenge in building data analytics platforms stems from the heterogeneity of the various components that must be integrated together into production workflows—we refer to this as "plumbing". We`ll share our experiences as a case study, but make recommendations for best practices and point out opportunities for future work.
11 سال پیش در تاریخ 1392/01/10 منتشر شده است.
3,152 بـار بازدید شده
... بیشتر