Google SRE Principles for Data Pipeline

Data Saint Consulting Inc
13 min readOct 25, 2022

book link: Google — Site Reliability Engineering (sre.google)

Challenges with the Periodic Pipeline Pattern

Periodic pipelines are generally stable when there are sufficient workers for the volume of data and execution demand is within computational capacity. In addition, instabilities such as processing bottlenecks are avoided when the number of chained jobs and the relative throughput between jobs remain uniform.

Periodic pipelines are useful and practical, and we run them on a regular basis at Google. They are written with frameworks like MapReduce [Dea04] and Flume [Cha10], among others.

However, the collective SRE experience has been that the periodic pipeline model is fragile. We discovered that when a periodic pipeline is first installed with worker sizing, periodicity, chunking technique, and other parameters carefully tuned, performance is initially reliable. However, organic growth and change inevitably begin to stress the system, and problems arise. Examples of such problems include jobs that exceed their run deadline, resource exhaustion, and hanging processing chunks that entail corresponding operational load.

Trouble Caused By Uneven Work Distribution

--

--

Data Saint Consulting Inc
Data Saint Consulting Inc

Written by Data Saint Consulting Inc

For Consultation services regarding Data Engineering and Analytics: datasaintconsulting@ gmail.com

No responses yet