Contents

Jeff Dean, Google Fellow

Lessons from Building Large Scale Systems

Computing Environment

  • Large clusters of commodity PCs
    • not reliable, but cheap
    • best performance / $
  • Unreliable means that LOTS of stuff can go awry
    • Anticipate failure and design around it

Real Hardware

  • Typical failures for a new cluster in its first year
    • ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
    • ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
    • ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
    • ~1 network rewiring (rolling ~5% of machines down over 2-day span)
    • ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
    • ~5 racks go wonky (40-80 machines see 50% packetloss)
    • ~8 network maintenances (4 might cause ~30-minute random connectivity losses)
    • ~12 router reloads (takes out DNS and external vips for a couple minutes)
    • ~3 router failures (have to immediately pull traffic for an hour)
    • ~dozens of minor 30-second blips for dns
    • ~1000 individual machine failures
    • ~thousands of hard drive failures
  • Hardest things to detect are partial failures:
    • Bad memory
    • Bad disks with slow reads
    • Routers with 50% packet loss

Distributed Systems Problems

  • Data/request volume too large for a single machine or even a single data center
  • Multiple data centers around the world
  • Mutable state issues

Designing Services

Services

  • Most products are services, not shrink-wrapped
    • Staged delpyment
    • Easy to define APIs bewteen internal services
    • Easy to roll out new versions
    • Incrementally
    • In isolation
    • Easy load testing
    • Development cycles largely decoupled!
    • Leads to simpler designs for services
    • Fewer, clearly specified dependencies
    • Easy to test and run experiments on
    • Communication between services done via custom RPC mechanism

Design Tips

  • Always aim for
    • simplicity
    • scalability
    • performance
    • reliability
    • generality
    • features
  • Get advice early
    • before writing code
    • before writing detailed design documents
  • Brainstorm with others on a whiteboard
  • Prioritize features to get a first-cut implemented
    • Early feedback -> better design iterations

Designing Good Interfaces

  • Imagine hypothetical clients using the interface
  • Document the interface precisely
  • Avoid a constraining implementation that will prevent scalability, growth
  • Always get feedback before implementing
  • Learn by examining well-designed interfaces

Flexible Protocols

  • Google employs a custom protocol description language
  • Self-describing
    • XML-like message descriptions, but not inefficient like XML
    • Allows for graceful client upgrades. Clients skip what they don't understand.
  • Efficient message encoding and decoding
  • Language independent design using auto-generated language wrappers

Knowledge for Good Design

  • A key skill is the ability to judge the performance of a design on paper.
  • Numbers everyone should know:
    • L1 cache reference 0.5 ns
    • Branch mispredict 5 ns
    • L2 cache reference 7 ns
    • Mutex lock/unlock 100 ns
    • Main memory reference 100 ns
    • Compress 1K bytes with Zippy 10,000 ns
    • Send 2K bytes over 1 Gbps network 20,000 ns
    • Read 1 MB sequentially from memory 250,000 ns
    • Round trip within same datacenter 500,000 ns
    • Disk seek 10,000,000 ns
    • Read 1 MB sequentially from network 10,000,000 ns
    • Read 1 MB sequentially from disk 30,000,000 ns
    • Send packet CA->Netherlands->CA 150,000,000 ns
  • By doing back of the envelope calculations, good designs can be chosen before implementation and even before defining interfaces

Design with Benchmarking In Mind

  • Micro-benchmarks are incredibly useful
    • Example: how long does it take to insert a random element into a hash table?
  • Great for understanding performance
  • Reduces development cycle time by pinpointing performance regression

Know Your Building Blocks

  • Core language libraries, basic data structures, SSTables, protocol buffers, file system, indexers, RDBMS, MapReduce
  • Not just the interfaces, but the implementation at a high level
  • Why? If you don't understand whats going on, its hard to ballpark performance calculations.

System Implementation

Building Infrastructure

  • Identify common problems and address them in a general way
  • Prune requirements to avoid complexity
  • Design for growth. Anticipate how the requirements will evolve, keeping likely features in mind.
  • Ensure the system performs when scaling 10-20x, but don't feel obligated to design for scaling by significant orders of magnitude.
  • Design for low latency, not high throughput

Data Consistency in Distributed Systems

  • Avoid a dependence on /strong/ consistency
  • Disconnected/partitioned operation is a relatively common design
  • Insisting on strong consistency can lead to unreliability
  • Most products aim to gravitate towards "eventual consistency" rather than immediate consistency

Local Performance

  • Parallelism works! Use threads!
  • Understand the data access patterns for your problem
  • If you have more CPU resources than network resources, encode your data for transport
  • Choose compression/speed trade-offs wisely
    • Better compression for long term storage
    • Faster compression for data transfer

Handling Failure

  • Canary requests to prevent losing an entire cluster
  • Failover to other replicas
  • If a bad backend is detected, stop using it for live requests until behavior improves
  • Load balancing
    • Passive during normal traffic conditions
    • Agreeing only when traffic is severe
  • If data retrieval fails or is too slow, continue without it

Monitoring and Debugging

  • Export HTML status pages for quick diagnosing
  • Export key/value pairs for service environment
  • Log usage/load statistics

Source Code Philosophy

  • One large, shared source base
    • Each directory has an owner
    • Strict code reviews before check-ins
  • Shared core libraries

Practice Good Software Engineering Hygience

  • Perform design reviews, code reviews
  • Perform lots of testing
    • Unit tests for modules
    • Large tests for systems

Writeup Information

  • Author
    • Chris Brigham - brigham at cs.stanford.edu
  • Talk Date
    • Monday, May 21, 2007
Last modified May 29, 2007 6:18 am / Skin by Kevin Hughes
MediaWiki