|
Jeff Dean, Google Fellow
Lessons from Building Large Scale Systems
Computing Environment
- Large clusters of commodity PCs
- not reliable, but cheap
- best performance / $
- Unreliable means that LOTS of stuff can go awry
- Anticipate failure and design around it
Real Hardware
- Typical failures for a new cluster in its first year
- ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
- ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
- ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
- ~1 network rewiring (rolling ~5% of machines down over 2-day span)
- ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
- ~5 racks go wonky (40-80 machines see 50% packetloss)
- ~8 network maintenances (4 might cause ~30-minute random connectivity losses)
- ~12 router reloads (takes out DNS and external vips for a couple minutes)
- ~3 router failures (have to immediately pull traffic for an hour)
- ~dozens of minor 30-second blips for dns
- ~1000 individual machine failures
- ~thousands of hard drive failures
- Hardest things to detect are partial failures:
- Bad memory
- Bad disks with slow reads
- Routers with 50% packet loss
Distributed Systems Problems
- Data/request volume too large for a single machine or even a single data center
- Multiple data centers around the world
- Mutable state issues
Designing Services
Services
- Most products are services, not shrink-wrapped
- Staged delpyment
- Easy to define APIs bewteen internal services
- Easy to roll out new versions
- Incrementally
- In isolation
- Easy load testing
- Development cycles largely decoupled!
- Leads to simpler designs for services
- Fewer, clearly specified dependencies
- Easy to test and run experiments on
- Communication between services done via custom RPC mechanism
Design Tips
- Always aim for
- simplicity
- scalability
- performance
- reliability
- generality
- features
- Get advice early
- before writing code
- before writing detailed design documents
- Brainstorm with others on a whiteboard
- Prioritize features to get a first-cut implemented
- Early feedback -> better design iterations
Designing Good Interfaces
- Imagine hypothetical clients using the interface
- Document the interface precisely
- Avoid a constraining implementation that will prevent scalability, growth
- Always get feedback before implementing
- Learn by examining well-designed interfaces
Flexible Protocols
- Google employs a custom protocol description language
- Self-describing
- XML-like message descriptions, but not inefficient like XML
- Allows for graceful client upgrades. Clients skip what they don't understand.
- Efficient message encoding and decoding
- Language independent design using auto-generated language wrappers
Knowledge for Good Design
- A key skill is the ability to judge the performance of a design on paper.
- Numbers everyone should know:
- L1 cache reference 0.5 ns
- Branch mispredict 5 ns
- L2 cache reference 7 ns
- Mutex lock/unlock 100 ns
- Main memory reference 100 ns
- Compress 1K bytes with Zippy 10,000 ns
- Send 2K bytes over 1 Gbps network 20,000 ns
- Read 1 MB sequentially from memory 250,000 ns
- Round trip within same datacenter 500,000 ns
- Disk seek 10,000,000 ns
- Read 1 MB sequentially from network 10,000,000 ns
- Read 1 MB sequentially from disk 30,000,000 ns
- Send packet CA->Netherlands->CA 150,000,000 ns
- By doing back of the envelope calculations, good designs can be chosen before implementation and even before defining interfaces
Design with Benchmarking In Mind
- Micro-benchmarks are incredibly useful
- Example: how long does it take to insert a random element into a hash table?
- Great for understanding performance
- Reduces development cycle time by pinpointing performance regression
Know Your Building Blocks
- Core language libraries, basic data structures, SSTables, protocol buffers, file system, indexers, RDBMS, MapReduce
- Not just the interfaces, but the implementation at a high level
- Why? If you don't understand whats going on, its hard to ballpark performance calculations.
System Implementation
Building Infrastructure
- Identify common problems and address them in a general way
- Prune requirements to avoid complexity
- Design for growth. Anticipate how the requirements will evolve, keeping likely features in mind.
- Ensure the system performs when scaling 10-20x, but don't feel obligated to design for scaling by significant orders of magnitude.
- Design for low latency, not high throughput
Data Consistency in Distributed Systems
- Avoid a dependence on /strong/ consistency
- Disconnected/partitioned operation is a relatively common design
- Insisting on strong consistency can lead to unreliability
- Most products aim to gravitate towards "eventual consistency" rather than immediate consistency
Local Performance
- Parallelism works! Use threads!
- Understand the data access patterns for your problem
- If you have more CPU resources than network resources, encode your data for transport
- Choose compression/speed trade-offs wisely
- Better compression for long term storage
- Faster compression for data transfer
Handling Failure
- Canary requests to prevent losing an entire cluster
- Failover to other replicas
- If a bad backend is detected, stop using it for live requests until behavior improves
- Load balancing
- Passive during normal traffic conditions
- Agreeing only when traffic is severe
- If data retrieval fails or is too slow, continue without it
Monitoring and Debugging
- Export HTML status pages for quick diagnosing
- Export key/value pairs for service environment
- Log usage/load statistics
Source Code Philosophy
- One large, shared source base
- Each directory has an owner
- Strict code reviews before check-ins
- Shared core libraries
Practice Good Software Engineering Hygience
- Perform design reviews, code reviews
- Perform lots of testing
- Unit tests for modules
- Large tests for systems
Writeup Information
- Author
- Chris Brigham - brigham at cs.stanford.edu
- Talk Date
|
|