Contents

Speaker Bio

  • Speaker: Amitabh Srivastava
  • 1999-2003: Programming Productivity Research Center
    • Part of Microsoft Research that created internal development tools
    • Freedom to try new things, but no control over the tools or processes used by anyone else
  • 2003-Present: Center for Software Excellence
    • Amitabh took the VP of Development position for Windows Vista, the PPRC moved with him and took a new name
    • Had control over the development process for the core OS, but stopped interacting with the other teams the research side had been working with
    • Owned the kernel, virtualization and engineers, so they could make the rules for everyone else

Large Scale Production

  • All about scale
    • Lines of code
    • Complexity of product
    • Feature set
    • Number of customers
    • Growing => more successful => more money => more complicated project
  • Need to automate the process and have good tools
    • Bug finding
    • Determining which tests to run
    • Detecting architecture violations
  • Vista is huge
    • 50+ million lines of code
    • 5000+ engineers
    • 90+ languages
    • 10+ SKUs
    • 10+ OEMs
    • 1000+ hardward devices
    • Multiple hardware platforms
    • Hundreds of millions of customers
  • New Versions
    • Want people to write applications for your platform
    • Need to keep backward compatibility (Win95 broke compatibility, and there was a cost to be paid)
    • Some things (ie, security) are worth risking compatibility to implement

Tool-making Challenges

  • Must solve an important problem
  • Soundness/Completeness does not scale
    • Using good heuristics is good enough
    • The developer prefers to find fewer bugs faster
  • Ease of use
    • Easier to use => Faster adoption
    • Can it run on the developer's desktop?
    • Does it run quickly?
    • Does it require effort? (ie, source annotation)
  • Actionable feedback
  • "Is it going to help me?"
  • Tool should "just work"
    • Developer doesn't have to figure out what to run when, just click "Go"
    • Tool should return with "This is a problem, this is not"
  • The onus is not on the developer to take up the practice, but on the tool-builders to show value

Software Integration

  • How do 5000 people integrate changes?
  • Organization
    • No interns on the core (kernel, etc.)
    • Each team has Development, Program and Test managers
      • Very senior architects that understand the design of the whole OS
      • Keep the technical integrity of the project
    • Each team is 10-15 people (some are larger, ie shell or IE)
  • How do you find bad code before you break the build?
  • How do you keep bad code out of the build?
  • How do you find the cause of the bug?

Process

  • Pre-integration
    • Specification
    • Design/Security review
    • Write code
    • Code verification
    • Architecture layer verification
    • Performance testing
    • Feature testing
      • First in isolation
      • Several levels of integration testing before reaching final build
      • Testing hierarchy appears as a tree
    • Build verification test (not exhaustive)
    • Main build lab
      • Nothing should break in the main build lab
  • Post-integration
    • Global code verification
    • Integration testing
    • Application compatibility testing
    • Stress testing
    • Localization testing
    • Deployment validation
    • Security penetration testing
    • Vista builds
    • If a serious bug is found that prevents Vista from working, the fix goes through director approval and is checked directly into the post-integration build
  • Use
    • Used to work on the product for two years, then suddenly release a beta to everyone
    • Found that there are plenty of people (including engineers) who are happy to use a barely functional version before any actual release
    • Every day at 7 PM, no more checkins are allowed
    • Overnight, do everything as if deploying the final product
    • Build takes a few hours, then sign/write CDs, etc.
    • Disks ready to install at 6 AM
    • Everyone installs, immediately there are 5k testers
  • Daily Builds
    • People don't want to set up Outlook, etc. every day
    • Create an image with 90% of what people need
    • Have the option every day of upgrading without wiping data or doing a fresh install
    • Can always keep an external drive with your data on it
    • Metrics are kept of how long it takes to install, etc. and compared at every build

Bug Finding

  • Instrumented entire OS
  • Doesn't only repost on crashes
    • What happens during installation?
    • If installation fails, tells Microsoft so that they can fix it, rather than letting you give up and return to your old system
    • Why is the OS running slowly?
    • Reliability
    • UI
  • "Agent"
    • Size of memory prevents taking the entire dump
    • Selectively asks about parts of the machine state
    • Do some processing on the client side
    • Privacy prohibits taking current doc, etc.
    • Sometimes can only return the system configuration and stack
  • When a dump is received
    • Analyze the stack trace, find who to blame, and assign to someone
    • Build histogram to see most common problems
      • Clever heuristics needed to match stacks
  • Customer needs to see that sending the report is worthwhile
    • Find the bug, issue a patch
    • Next time someone hits the bug, offer the patch instead of reporting
  • Some customers will not send anything back, but will let you analyze their machine personally

Automation

  • Automation ensures quality
  • Advanced tools drive the process
    • Developers hate processes, having to run each part themselves
  • Central services to verify code
    • Don't have to run individual tools
    • Lots of expensive machines to make it fast
    • Developer just submits the binaries, and gets back "Fix this" or "OK"

Static Analysis

  • Currently working on a number of tools
  • Lightweight: PREfast, ESPX
    • Symbolic path analysis
    • User-supplied bug finding plugins traverse the AST
    • Runs on desktop, easily customizable
    • Monitors code being checked in
    • Originally took 22 days to run over Windows, and there was lots of noise
      • Worked on it for 5 years, got it down to 2-3 days and 70-80% accuracy
  • Inter-procedural simulation: PREfix
    • Symbolic evaluation on a fixed number of paths through every function
    • Builds incomplete sumbolic function models
    • Reports a defect when a bad state arises

Standard Annotation Language (SAL)

  • Annotations to each function that describes a contract
  • All windows code is annotated for mutable buffers
    • Improves buffer overrun detection
    • Reduces false positives from PREfix/PREfast
  • Annotations were only mandated for buffers, but developers annotated much more because they saw value
  • PREfast enforces annotation on the developer's desktop before allowing any integration

Architectural Layering

  • MaX dependency analysis
    • Constructs dependency graph from binaries
    • Cannot have dependencies from lower layers to higher ones
  • Effective, scalable
  • Isolates new dependencies for review

Run-time

  • Time travel tracing
    • Generates a compressed instruction level recording of execution
  • Time travel debugging
    • Allows developer to move forwards/backwards in trace
  • TruScan
    • Analyzes traces for bugs
    • Finds memory leaks, uninitialized memory, etc.
    • Low false positive rate because it runs on a execution
  • TruScan + Time Travel made diagnosis easy for Vista

Test Prioritization

  • Full tests take weeks to run
    • Tens of millions of tests
    • Tens of millions of lines of source
  • Must prioritize tests to find bugs early
  • Some critical fixes must be release within days
    • The fix might break something else
    • What subset of tests should be run?
  • Developers need to run tests before checkin
    • What should be run to exercise the changed code?
  • What the system does
    • Analyzes old and new image for differences
    • Determines which blocks have changed
    • Finds the blocks covered by existing tests
    • Selects minimal set of tests to cover impacted blocks

Lessons

  • Achieve agility through tools
    • Manual processes do not scale
  • Quality is built from the start
    • Use what you build daily and use the feedback you get
  • Focus should be on finding customer facing bugs
    • Heuristics suffice, soundness and completeness are not necessary
    • A strong theoretical foundation is important and necessary
  • Developers are rational
    • Usage of tools is based on cost/benefit
    • Convince them that it's worth it, and they'll do it themselves
  • Testing is ignored in research areas
    • Test prioritization is not new, it's been around since the '90s
    • None of the previous solutions scaled

Writeup Authors

  • Kristjan Petursson - kristjan [at] cs [dot] stanford [dot] edu
  • Monica Lam - lam [at] cs [dot] stanford [dot] edu
Last modified May 15, 2007 8:18 am / Skin by Kevin Hughes
MediaWiki