Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

Sumit Mittal منتشر شده در تاریخ 1403/03/08

7.7 هزار بار بازدید - 3 ماه پیش - 𝐓𝐨 𝐞𝐧𝐡𝐚𝐧𝐜𝐞 𝐲𝐨𝐮𝐫 𝐜𝐚𝐫𝐞𝐞𝐫 𝐚𝐬

𝐓𝐨 𝐞𝐧𝐡𝐚𝐧𝐜𝐞 𝐲𝐨𝐮𝐫 𝐜𝐚𝐫𝐞𝐞𝐫 𝐚𝐬 𝐚 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫, 𝐂𝐡𝐞𝐜𝐤 trendytech.in/?src=youtube&sub=mockdec for curated courses developed by me. 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐌𝐚𝐬𝐭𝐞𝐫 𝐒𝐐𝐋? 𝐋𝐞𝐚𝐫𝐧 𝐒𝐐𝐋 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐰𝐚𝐲 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐬𝐨𝐮𝐠𝐡𝐭 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐮𝐫𝐬𝐞 - 𝐒𝐐𝐋 𝐂𝐡𝐚𝐦𝐩𝐢𝐨𝐧𝐬 𝐏𝐫𝐨𝐠𝐫𝐚𝐦! "𝐀 8 𝐰𝐞𝐞𝐤 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 𝐝𝐞𝐬𝐢𝐠𝐧𝐞𝐝 𝐭𝐨 𝐡𝐞𝐥𝐩 𝐲𝐨𝐮 𝐜𝐫𝐚𝐜𝐤 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰𝐬 𝐨𝐟 𝐭𝐨𝐩 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐛𝐚𝐬𝐞𝐝 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐛𝐲 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐢𝐧𝐠 𝐚 𝐭𝐡𝐨𝐮𝐠𝐡𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐚𝐧𝐝 𝐚𝐧 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐭𝐨 𝐬𝐨𝐥𝐯𝐞 𝐚𝐧 𝐮𝐧𝐬𝐞𝐞𝐧 𝐏𝐫𝐨𝐛𝐥𝐞𝐦." 𝐇𝐞𝐫𝐞 𝐢𝐬 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐫𝐞𝐠𝐢𝐬𝐭𝐞𝐫 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 - 𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLINR 𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐨𝐮𝐭𝐬𝐢𝐝𝐞 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLUSD I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years. BIG DATA INTERVIEW SERIES This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development Our highly experienced guest interviewer, Chandrali Sarkar, www.linkedin.com/in/chandrali-sarkar-4570a1102/ shares invaluable insights and practical guidance drawn from her extensive expertise in the Big Data Domain. Our expert guest interviewee, Soumya Ranjan Parida, www.linkedin.com/in/soumya-parida/ has an interesting approach to answering the interview questions on Apache Spark, SQL and Azure Cloud Services. Link of Free SQL & Python series developed by me are given below - SQL Playlist - • SQL tutorial for everyone by Sumit Si... Python Playlist - • Complete Python By Sumit Mittal Sir Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field! Social Media Links : LinkedIn - www.linkedin.com/in/bigdatabysumit/ Twitter - twitter.com/bigdatasumit Instagram - www.instagram.com/bigdatabysumit/ Student Testimonials - trendytech.in/#testimonials TIMESTAMPS : Questions Discussed 00:35 Introduction 01:40 Explain your project's end-to-end pipeline and overview. 03:17 What is the data source for your project? 03:36 Where does the data get ingested? 04:36 What types of data are being processed? 05:04 How do you capture incremental data in an OLTP environment? 07:52 What is the frequency and volume of the incoming data? 08:28 Which file formats have you worked with? 09:00 What is the predicate pushdown? 10:14 What optimizations have you applied in Spark? 10:45 Define broadcast join. 11:10 List some transformations you've used in Spark. 11:27 Explain narrow and wide transformations. 12:03 What is the difference between reduceByKey and groupByKey. 12:56 Have you encountered "out of memory" errors in Spark? How did you resolve them? 14:22 How will salting help in resolving out of memory error? 14:46 What is data skewness? 15:22 Explain cache and persist in Spark. 16:57 If memory and disk are full then in that case what will happen? 17:40 When would you use coalesce and repartition? 18:00 Provide a scenario where coalesce and repartition can be used? 18:38 Where does repartition happen at driver or executor level? 19:30 What is the difference between rank, dense rank, and row number functions? 22:06 Describe the internal process of submitting a Spark job. Music track: Retro by Chill Pulse Source: freetouse.com/music Background Music for Video (Free) Tags #mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

3 ماه پیش در تاریخ 1403/03/08 منتشر شده است.

7,721 بـار بازدید شده

... بیشتر