|
Speaker Bio
- Speaker: Amitabh Srivastava
- 1999-2003: Programming Productivity Research Center
- Part of Microsoft Research that created internal development tools
- Freedom to try new things, but no control over the tools or processes used by anyone else
- 2003-Present: Center for Software Excellence
- Amitabh took the VP of Development position for Windows Vista, the PPRC moved with him and took a new name
- Had control over the development process for the core OS, but stopped interacting with the other teams the research side had been working with
- Owned the kernel, virtualization and engineers, so they could make the rules for everyone else
Large Scale Production
- All about scale
- Lines of code
- Complexity of product
- Feature set
- Number of customers
- Growing => more successful => more money => more complicated project
- Need to automate the process and have good tools
- Bug finding
- Determining which tests to run
- Detecting architecture violations
- Vista is huge
- 50+ million lines of code
- 5000+ engineers
- 90+ languages
- 10+ SKUs
- 10+ OEMs
- 1000+ hardward devices
- Multiple hardware platforms
- Hundreds of millions of customers
- New Versions
- Want people to write applications for your platform
- Need to keep backward compatibility (Win95 broke compatibility, and there was a cost to be paid)
- Some things (ie, security) are worth risking compatibility to implement
Tool-making Challenges
- Must solve an important problem
- Soundness/Completeness does not scale
- Using good heuristics is good enough
- The developer prefers to find fewer bugs faster
- Ease of use
- Easier to use => Faster adoption
- Can it run on the developer's desktop?
- Does it run quickly?
- Does it require effort? (ie, source annotation)
- Actionable feedback
- "Is it going to help me?"
- Tool should "just work"
- Developer doesn't have to figure out what to run when, just click "Go"
- Tool should return with "This is a problem, this is not"
- The onus is not on the developer to take up the practice, but on the tool-builders to show value
Software Integration
- How do 5000 people integrate changes?
- Organization
- No interns on the core (kernel, etc.)
- Each team has Development, Program and Test managers
- Very senior architects that understand the design of the whole OS
- Keep the technical integrity of the project
- Each team is 10-15 people (some are larger, ie shell or IE)
- How do you find bad code before you break the build?
- How do you keep bad code out of the build?
- How do you find the cause of the bug?
Process
- Pre-integration
- Specification
- Design/Security review
- Write code
- Code verification
- Architecture layer verification
- Performance testing
- Feature testing
- First in isolation
- Several levels of integration testing before reaching final build
- Testing hierarchy appears as a tree
- Build verification test (not exhaustive)
- Main build lab
- Nothing should break in the main build lab
- Post-integration
- Global code verification
- Integration testing
- Application compatibility testing
- Stress testing
- Localization testing
- Deployment validation
- Security penetration testing
- Vista builds
- If a serious bug is found that prevents Vista from working, the fix goes through director approval and is checked directly into the post-integration build
- Use
- Used to work on the product for two years, then suddenly release a beta to everyone
- Found that there are plenty of people (including engineers) who are happy to use a barely functional version before any actual release
- Every day at 7 PM, no more checkins are allowed
- Overnight, do everything as if deploying the final product
- Build takes a few hours, then sign/write CDs, etc.
- Disks ready to install at 6 AM
- Everyone installs, immediately there are 5k testers
- Daily Builds
- People don't want to set up Outlook, etc. every day
- Create an image with 90% of what people need
- Have the option every day of upgrading without wiping data or doing a fresh install
- Can always keep an external drive with your data on it
- Metrics are kept of how long it takes to install, etc. and compared at every build
Bug Finding
- Instrumented entire OS
- Doesn't only repost on crashes
- What happens during installation?
- If installation fails, tells Microsoft so that they can fix it, rather than letting you give up and return to your old system
- Why is the OS running slowly?
- Reliability
- UI
- "Agent"
- Size of memory prevents taking the entire dump
- Selectively asks about parts of the machine state
- Do some processing on the client side
- Privacy prohibits taking current doc, etc.
- Sometimes can only return the system configuration and stack
- When a dump is received
- Analyze the stack trace, find who to blame, and assign to someone
- Build histogram to see most common problems
- Clever heuristics needed to match stacks
- Customer needs to see that sending the report is worthwhile
- Find the bug, issue a patch
- Next time someone hits the bug, offer the patch instead of reporting
- Some customers will not send anything back, but will let you analyze their machine personally
Automation
- Automation ensures quality
- Advanced tools drive the process
- Developers hate processes, having to run each part themselves
- Central services to verify code
- Don't have to run individual tools
- Lots of expensive machines to make it fast
- Developer just submits the binaries, and gets back "Fix this" or "OK"
Static Analysis
- Currently working on a number of tools
- Lightweight: PREfast, ESPX
- Symbolic path analysis
- User-supplied bug finding plugins traverse the AST
- Runs on desktop, easily customizable
- Monitors code being checked in
- Originally took 22 days to run over Windows, and there was lots of noise
- Worked on it for 5 years, got it down to 2-3 days and 70-80% accuracy
- Inter-procedural simulation: PREfix
- Symbolic evaluation on a fixed number of paths through every function
- Builds incomplete sumbolic function models
- Reports a defect when a bad state arises
- Annotations to each function that describes a contract
- All windows code is annotated for mutable buffers
- Improves buffer overrun detection
- Reduces false positives from PREfix/PREfast
- Annotations were only mandated for buffers, but developers annotated much more because they saw value
- PREfast enforces annotation on the developer's desktop before allowing any integration
Architectural Layering
- MaX dependency analysis
- Constructs dependency graph from binaries
- Cannot have dependencies from lower layers to higher ones
- Effective, scalable
- Isolates new dependencies for review
Run-time
- Time travel tracing
- Generates a compressed instruction level recording of execution
- Time travel debugging
- Allows developer to move forwards/backwards in trace
- TruScan
- Analyzes traces for bugs
- Finds memory leaks, uninitialized memory, etc.
- Low false positive rate because it runs on a execution
- TruScan + Time Travel made diagnosis easy for Vista
Test Prioritization
- Full tests take weeks to run
- Tens of millions of tests
- Tens of millions of lines of source
- Must prioritize tests to find bugs early
- Some critical fixes must be release within days
- The fix might break something else
- What subset of tests should be run?
- Developers need to run tests before checkin
- What should be run to exercise the changed code?
- What the system does
- Analyzes old and new image for differences
- Determines which blocks have changed
- Finds the blocks covered by existing tests
- Selects minimal set of tests to cover impacted blocks
Lessons
- Achieve agility through tools
- Manual processes do not scale
- Quality is built from the start
- Use what you build daily and use the feedback you get
- Focus should be on finding customer facing bugs
- Heuristics suffice, soundness and completeness are not necessary
- A strong theoretical foundation is important and necessary
- Developers are rational
- Usage of tools is based on cost/benefit
- Convince them that it's worth it, and they'll do it themselves
- Testing is ignored in research areas
- Test prioritization is not new, it's been around since the '90s
- None of the previous solutions scaled
Writeup Authors
- Kristjan Petursson - kristjan [at] cs [dot] stanford [dot] edu
- Monica Lam - lam [at] cs [dot] stanford [dot] edu
|
|