Velocity 2011 - Part 3: Wednesday (2nd day)

My notes on the second conference day at the Velocity Conference.

The keynotes where again a highlight, to be topped only by the talk about Automating for Success: Production Begins in Development which happened to confirm all my theories about web operations and package-based deployment :-)

Videos are available on the Velocity 2011 Videos page, slides can be found on the Velocity 2011 Speakers Slides and Video page.

Read also about the Workshops and the first day.

Keynotes Thursday

World IPv6 Day: Lessons Learned

Ian Flint, yahoo

http://www.youtube.com/watch?v=T04o6bQN8Ls
Last /8 net assigned in 2011
NAT is bad for geolocating clients
- bad for business
- bad for targeting
What is the catch of using IPv6?
- 0.2% of users have IPv6 so far
- dual stack setups oftenly have broken IPv6 setups, browsers prefer IPv6
- OS timeout for switching from IPv6 to IPv4 is long (Linux/Windows 21sec, OS X 75sec, phones no fallback)
Checken/egg problem: Which website will go first dual stack?
- All of them: 434 participants signed up for World IPv6 Day
- June 8, 2011
Yahoo implementation details for yahoo.com
- 37 markets
- served from 10 datacenters
- setup IPv6 proxy server in 7 locations, reduce risk of turning on IPv6
- Install 6to4 Relays in all peering points
- Certify all network gear at scale
- Retrofit custom global DNS
- Retrofit DOS protection layer
- Retrofit Audience Data Pipeline
IPv6 Test in 38 languages, user help pages
2 15-minute test before the IPv6 day
- first test showed that problematic health checks in the DNS infrastructure routed all India traffic to Santa Clara
Panning for Decision Points
- When would be things bad enough to force Yahoo to roll back
- Never do big changes at times of traffic changes
Always make sure you can look at things from more than one point of view
Practice makes perfect. For a major change always run some tests before

Facebook Open Compute & Other Infrastructure

Jonathan Heiliger, facebook

http://www.youtube.com/watch?v=urG0dQ7kc3w
Very good !!!
Growth of users was also matched by growth of innovation and speed of change
- This is very unusal, usually innovation speed becomes less as companies grow
Run down of facebook of the growth story
- HPHP brought a great improvement for site performance
- power consumption became a big issue, decision to look at all parts involved
facebook started building their own datacenter and servers
Conclusions:
- Make audacious bets and iterate quickly
- Smart and hungry beats large and capable every time
- Make it work
- Manage risk with hedges

Velocity Culture

Jon Jenkins, Amazon

http://www.youtube.com/watch?v=dxk8b9rSKOo
http://assets.en.oreilly.com/1/event/60/Velocity%20Culture%20Presentation.pdf
Web performance drives real value for the business
- Case studies from bing, google, shopzilla, msn show this
- Steve Sauders did a lot for that
What about operations? How does ops provide value for business
What if the size of your server fleet could be totally flexible?
Case study 1: Downscaling
- weekly traffic patterns high and low
- at amazon up to 39% server capacity goes to waste
- for high traffic months this can be even up to 75%
- Since November 10, 2010 all amazon.com traffic is served by EC2
  - Reduced spending on server capacity
  - Fleet scales dynamically in increments as small as a single host
Case study 2: Continuous Deployment
- Mean time between deployments: 11.2sec
- 1079 deployments per hour maximum rate in May 2011
- Deployments roll through server groups
  - Problems: Complex workflow, slow, error scenarios very complex to handle
- Solution: If capacity is unlimited then one could simply spawn a new set of server groups
- More and more deployments use this method
  - 75% reduction in outages triggered by software deployments since 2006
  - 90% reductiion in outage minutes triggered by software deployments
  - instantaneous automated rollback (switch LB back to old server group)
  - Reduction in complexity, no upgrades on server, just make new servers
The Challange for Velocity 2012
- save millions $ by optimizing server utilizations
- became faster and more available by using flexible server capacity
- Please come back in 2012 and tell your story how ops managed to contribute business value

Artur on SSD

Artur Bergman, fastly.com

http://www.youtube.com/watch?v=H7PJ1oeEyGg
Mac Laptop boot time: 13 seconds
If you don’t use SSDs, you waste your life
fastly uses only (or mostly) SSDs in their data center
Show this to the boss to get an SSD :-)

Cisco and OpenStack

Lew Tucker, Cisco

http://www.youtube.com/watch?v=kWP9VE4K8cU
http://assets.en.oreilly.com/1/event/60/Cisco%20and%20Open%20Stack%20Presentation.ppt
Modern web pioneers are DIY "builders", you need to build your own because you can’t buy it
Enterprise is scale up architecture, HA failover model. Commercial
Web is scale out architecture, designed for failure. Open Source
OpenStack

State of the Infrastructure

Rachel Chalmers, The 451 Group

http://www.youtube.com/watch?v=EbmuxSeVnpY
http://assets.en.oreilly.com/1/event/60/State%20of%20the%20Infrastructure%20%20Presentation.zip
Tells many anecdotes and details about IT innovators and their learning
VMware is probably the last company to go from 0 to 1 bil $ through a purely proprietary licensing model
Modern infrastructure is open source
We are already at the brink of a post-Windows world
!!! nice

Holistic Performance

John Resig

http://www.youtube.com/watch?v=WuMEQN7aph0
About jQuery
Client-side JavaScript performance issues
- Analyzing performance not trivial, e.g. is wall-clock time relevant? Or CPU consumption?
- Memory consumption, what about memory leaks?
- Parse time, the more you download the more to parse
- Battery consumption (Mobile!)
Example: Dictionary Lookups in JavaScript
- Most solutions optimize for file (download) size
- Bad parse time
- Succinct Trie is the best both by file size, memory consumption and lookup performance
dynaTrace - useful tool to dig into the details
jQuery project
- Bug reports need a reproducible test case
- Performance enhancements need to be proven through http://jsperf.com

Lightning Demos

http://www.youtube.com/watch?v=3PPd1kXWb-Q

Page Speed

Michael Schneider, Google

New work on page speed
page speed firefox addon
Now also for chrome
page speed is a tab in the web inspector
page speed is a tool to analyze page load times and suggest optimizations
http://pagespeed.googlelabs.com online version of page speed
- get mobile report to analyze page load timings for mobile devices
Experimental hints about avoiding unneccessary reflows

dynaTrace

Andreas Grabner, Dynatrace

http://ajax.dynatrace.com
What is new in dynaTrace AJAX Edition 3
Compare IE and FF performance side-by-side
speed of the Web: new service, compare your own website against Alexa 1000
slides and other useful info at http://blog.dynatrace.com

Chrome Developer Tools

Paul Irish, Google Chrome relations team

New things
Task manager: Right click on a task gives many internals and details, e.g. Number of Goats Teleported
JavaScript Performance APIs:
- performance.timing
- performance.memory (need --enable-memory-info command line option)
- window.onerror
- console.profile() and console.profiles[] - CPU profiling also as an JS object. Can be send back to the webserver for analysis
- console.markTimeline() - set markers that show up in the Timeline to help group JS actions
Heap Profiler
- dig into memory consumptions
- snapshot diffs between different states
- find memory leaks
Remote Debugging
- --remote-debugging-port # command line option
- Developer Tools run a little web server
- allows remote analysis
- This is part of WebKit and should be soon available for all webkit browsers

showslow.com

Sergey Chernyshev, showslow

collects performance data from various services and show it
dashboard-like overview and drill down into detail
help create a business case for performance optimizations

Cast - The Open Deployment Plattform

Paul Querna, Rackspace

Deployment as a RESTful API
Service Management
- Start, Stop, Restart
Version Management
- Distribution of release
- Upgrade
- Rollback
Service Monitoring
- Logfiles
- Network Ports
- Processes
Service Coordination
- ?
Open Source
http://cast-project.org

Making the Web instant

Arvind Jain & Sreeram Ramachandran, Google

Still, most pages take 5 seconds to load
How to make it instant?
- We humans are not as fast as computers
- It takes about 300ms between onMouseOver and onClick
- This time can be used to optimize loading by prefetching the content
Google search with Google Instant Pages
- Predict & preload
- Guess what the user will click and load the while the user still thinks about what to click next
- Works only on Chrome so far
- Chrome loads target in hidden frame and replaces frame
Instant everywhere
- Chrome supports preloading pages when typing into the address bar
Everybody can use it, web page authors usually know more about the next likely page
- Instruct the browser that this is the likely next page
Beware:
- This creates more load on the client and on the server!
- Accounting (ads, analytics) gets more difficult
  - don’t want to count hidden pages that the user never saw
  - google submitted an RFC to the W3C to support an API for page visibility API to determine if a page is actually visible to the user or still in
Benefit: Better and faster internet browsing experience

Wikia: Going Active/Active

Jason Cook, Wikia

Active/Active means rear everywhere, write in master data center
Wikia built on top of MediaWiki
Story of Wikia with typical startup problems
What about earthquakes? Time To Recover?
FULL DR Site
- In a nuclear bunker
- In the middle of nowhere in Iowa

Automation for Success: Production begins in Development

Lee Thompson, CTO Travel/Transportation, HP
Damon Edwards, Co-founter DTO Solutions, DevOps Days organizer, DevOps Cafe

http://www.slideshare.net/dev2ops/velocity-2011-production-begins-in-development
Very good, especially if you believe that Chef and Puppet are not the end of innovation !!!
Webtone
- Clouds
- DevOps
- Continuous Deployment/Delivery
- Lean Startup
How to measure DevOps success
- Alignment - how well do different parts of the organization work together
- Quality - of processes and deliveries
- Cycle Time
Risk tolerance
- How much change do you want, how much risk can you tolerate?
- "Move fast and break thungs. Unless you are breaking stuff, you are not moving fast enough." - Mark Zuckerberg
Webtone utilities
- Reliable
- Repeatable
- Scalable
It all starts in Development!
- But what do we tell them to do?
- and how to we get them to do it?
Share ownership of availability
- Developers must wear pagers (on-call)
- Incident command trainig so everyone knows their roles
- Notification mechanism?
- Access provisioning (emergency access for people who usually don’t have it)?
Non-functional requirements are first class citizens
Strive for parity between dev & prod
- should be really the same
- test data fictures for all environment
- implement mock services for major infrastructure pieces for Developer users (usually Ops needs to help with this), typically authentication systems.
- Continuous integration means integrate early
- Use all the deployment, config and packaging tools in dev
Push config management discipline back to Dev
- Dev is about creating variation, Ops is about eliminating variation
- Augment deployment toolchain to support the variation
- Do developers use the tools?
- Accept config contributions and patches from dev
Packaging … it’s not just for the OS
- high performing web operations organizations needs to take change management serious
- Strict versioning
- It’s about beeing idempotent
- Transfer packaging responsibility to dev
- Define the packaging constructs you will support
Config is code
- if it’s code it needs to be managed like code
- Should be transparent and identical SDLC in both dev and ops
- Avoid or eliminate asymetric release processes (config = software)
Tailor release artifacts to roles
- "Small teams make better software"
- One team stuck should not prevent other teams from releasing (org coupling)
- Large codebases suffer software entropy effects
- Build an infrastructure that can reliably manage lots of smaller artifacts
- Org conflict is a good time to suggest breaking up a codebase into separate concerns
Standard management vocabulary
- Consistent and expected management behaviour
- Accross components and releases
- "start, stop, status, update, install …"
Rollback
- Rollback that works
- Tested and proven
- Test rollback for each release
Standard metrics abstractions
- Dev surface metrics to Ops
- Use a standard framework
- https://github.com/codehale/metrics
- Use standard types (gauge, counter, timer …)
- Ops knows what to expect and how to visualize
Push test ownership to the edges
- QA = Quality Assurance
- QA writing tests = bottleneck and avoiding responsability
- Test Driven Development
- Test Driven Operations (yes, you too!)
- Bottom line: Everyone owns quality
Test outside of the box
- Crowd test, A/B test
- Simulation
Continuos Delivery
- Delivery Pipelines
- Continous Deployment
- Don’t be too dogmatic, a hybrid model is also good

DevOps Metrics: Measuring the devops gap

Patrick Debois Andrew Shafer

http://assets.en.oreilly.com/1/event/60/Measuring%20the%20devops%20gap%20Presentation.pdf
Very good presentation on how to get going with DevOps

Search This Blog

Schlomo Schapiro