Free eBook: Site Reliability Engineering

Free eBook: Site Reliability Engineering

Overview

Welcome to an enlightening journey into the heart of Site Reliability Engineering (SRE), curated by Google’s seasoned SRE team. This concise introduction paves the way to a deeper understanding of how Google manages to seamlessly build, deploy, monitor, and maintain its mammoth software systems with unmatched efficiency. Through this engaging post, you’re invited to explore not only the ‘what’ but the ‘why’ behind SRE, making it a must-read for anyone keen on decoding the secrets to Google’s software engineering success.

Site Reliability Engineering: Edited by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
Members of the SRE team explain how their engagement with the entire software lifecycle has enabled Google to build, deploy, monitor, and maintain some of the largest software systems in the world.

Site Reliability Engineering
You can read this book online for free, add it to your library or buy from Google Books.

Here"s an excerpt from the introduction of this book:

Software engineering has this in common with having children: the labor before the birth is painful and difficult, but the labor after the birth is where you actually spend most of your effort. Yet software engineering as a discipline spends much more time talking about the first period as opposed to the second, despite estimates that 40–90% of the total costs of a system are incurred after birth.1 The popular industry model that conceives of deployed, operational software as being “stabilized” in production, and therefore needing much less attention from software engineers, is wrong. Through this lens, then, we see that if software engineering tends to focus on designing and building software systems, there must be another discipline that focuses on the whole lifecycle of software objects, from inception, through deployment and operation, refinement, and eventual peaceful decommissioning. This discipline uses—and needs to use—a wide range of skills, but has separate concerns from other kinds of engineers. Today, our answer is the discipline Google calls Site Reliability Engineering.

So what exactly is Site Reliability Engineering (SRE)? We admit that it"s not a particularly clear name for what we do—pretty much every site reliability engineer at Google gets asked what exactly that is, and what they actually do, on a regular basis.