Cory MaklinThe Star Schema Is Dead Long Live Wide TablesIf you do a quick search for books data engineer must read, you’ll find two books: 1. Designing Data-Intensive Applications 2. The Data…May 29, 202315May 29, 202315
InPicnic EngineeringbyIliana IankoulovaScaling Business Intelligence with Python and SnowflakeThree big migration projects brought Picnic’s self-service Business Intelligence to new heights — speeding up the development cycle with…May 18, 20215May 18, 20215
Rui LiHow Bilibili Builds OLAP Data Lakehouse with Apache IcebergIntroductionJun 14, 20236Jun 14, 20236
Lackshu BalasubramaniamDesign Patterns for Data LakesData Lake is the heart of big data architecture, as a result there needs to be careful planning in designing and implementing a Data Lake.Apr 5, 20201Apr 5, 20201
InGeek CulturebyRahul KapoorSystem Design — Scaling from Zero to Millions Of UsersNote: I have read this great book System Design Interview — An insider’s guide by Alex Xu in depth. So most of my definitions and images…Jan 2, 20239Jan 2, 20239
InGeek CulturebyPriyal WalpitaManaging Data in Microservices ArchitectureEveryone knows that microservices architecture has lots of advantages like scalability, easy maintenance, and frequent deployments. But…Jun 3, 20212Jun 3, 20212
InCoupang Engineering BlogbyCoupang EngineeringData platform 2022: Transforming bytes to business insightsPart II of a two-part series about Coupang’s data platform, focusing on the analytics platformJun 24, 2022Jun 24, 2022
InTowards Data SciencebyKaya KupferschmidtBig Data Engineering —Declarative Data FlowsThis is part 3 of a series on data engineering in a big data environment. It will reflect my personal journey of lessons learnt and…Sep 30, 2020Sep 30, 2020
Ankush SinghFeather vs Pickle: A Comparative Analysis of Data StorageIntroductionJun 11, 2023Jun 11, 2023
Ankush SinghComparing Data Storage: Parquet vs. ArrowData storage formats have significant implications on how quickly and efficiently we can extract and process data. In today’s blog, we’re…Jun 11, 20233Jun 11, 20233
InData Engineer ThingsbyKhairinaFundamentals of Data Engineering Book — Key Learning PointsHere are my key learning points from the “Fundamentals of Data Engineering” book by Joe Reis and Matt HousleyMay 14, 20238May 14, 20238
Chenglong WuThe Spark 5S Optimization Series, Part 4: Excelling at Serialization Optimization for Stellar PerfOptimizing Serialization in Apache Spark: Strategies to Improve Performance Using the 5S Optimization FrameworkApr 23, 2023Apr 23, 2023
InLevel Up CodingbyArslan Ahmad16 System Design Concepts I Wish I Knew Before the Interview.Mastering System Design Interview: Essential Concepts for Every Software EngineerApr 3, 202320Apr 3, 202320
Philip S BellRun Presto at Meta scale with these lessons learned from a Meta Production EngineerPresto is a free, open source SQL query engine. We’ve been using it at Meta for the past ten years, and learned a lot while doing so…Apr 15, 2023Apr 15, 2023
InTeads EngineeringbyQuentin FernandezHow we made our reporting engine 17x fasterData relocation, the key to our new reporting solutionApr 13, 2023Apr 13, 2023
InPinterest Engineering BlogbyPinterest EngineeringLarge Scale Hadoop Upgrade At PinterestYongjun Zhang | Software Engineer; William Tom | Software Engineer; Shaowen Wang | Software Engineer; Bhavin Pathak | Software Engineer…Mar 30, 2022Mar 30, 2022
InPinterest Engineering BlogbyPinterest EngineeringScalable and reliable data ingestion at PinterestYu Yang | Pinterest engineer, DataMay 19, 20172May 19, 20172
InBuilding CartabyTroy HarveyCarta Spotlight: Data EngineeringIn this post I will share some stories highlighting data engineering and the role we played in building Carta’s data platform.Jan 25, 20233Jan 25, 20233
InTowards Data SciencebyChristianlauerFive Best Practices for stable Data ProcessingWhat you should keep in Mind when setting up Processes like ETL or ELTJan 19, 2021Jan 19, 2021