Stealthy shipping with atomic deploys

Every person sees each page of Pinterest differently, based on factors such as their interests and who they follow. Each pageload is computed anew by our back-end servers, with JavaScript and XHR powering client-side rendering for interactive content and infinite scrolling. This process provides Pinners with unique experiences, but at the scale of our operations, it can create friction.

Here I’ll discuss the atomic deploy system, a solution to some of the challenges that occur during deployment, and a path to successful and ongoing deployments.

Introducing the atomic deploy system

When a Pinner first visits the site, the backend server instructs the web browser to load a particular JavaScript bundle, and ensures the bundle matches the version of the backend software running on that server. So far, everything’s in lock-step, but when we deploy an update, we create a problem for ourselves.

We deploy software at Pinterest in waves, taking 10% of our servers offline in a batch, replacing the software, and putting them back into service. This allows us to have continuous service during our deployments, but it also means we’re running a mixed fleet of old- and new-version servers.

The atomic deploy system is our answer, which grew out of our desire to balance rapid innovation with a consistent, seamless user experience. We aim to develop our technology as quickly as possible, so this system was designed to avoid the intricate dance of backward-compatible updates. At the same time, we wanted to avoid the jarring experience of a page reloading by itself (potentially even losing the user’s context), or forcing the user to click on something to force the site to reload itself.

We hadn’t heard of other cases of doing deploys this way, which meant it could be a terrible idea, or a great one. We set out to find out for ourselves.

Managing “flip flops”

Say for example you visit the website on a Monday morning. You’re likely to view a page generated by a version of our front-end software, version “A”. If you hit reload, you’ll get pages generated by version “A”. Furthermore, we use XHR, so when a you interact with the web app, you’ll be served by dozens of requests in the background. All of these requests are powered by version “A.”

Later, you might wish to deploy “B”. Our standard model for deployments is to roll through our fleet 10-15% at a time slowly converting one web server from serving “A” to “B”.

Image may be NSFW.
Clik here to view.

Now a request has a chance of being served by “A” or “B” with no guarantees. With standard page-reloads this is not a problem, but much of Pinterest is XHR-based, meaning only part of a page will reload when a link is clicked. Our web framework can detect when it’s expecting a certain version and it gets something unexpected, in which case it’ll often force a reload.

For example if you go to www.pinterest.com and it’s served by A and you click a Pin, and the XHR is served by B, you’ll get a page reload. At which point you might click on another Pin which might be served by an mismatched version, which will cause another reload. In fact, you can’t escape a chance of reloads until the deploy is complete, which we call the “flip-flopping effect”. In this case, rather than browse Pinterest smoothly with nice clean interactions, you’ll get a number of full-page reloads.

Our architecture changes

When you visit the site, you talk to a load balancer which chooses a varnish front-end which in turn talks to our web front-ends which used to run nine python processes. Each of these processes are serving the exact same version on any given web front-end.

What we really wanted was a system that would only force a user to reload at most once during a deploy. The best way to ensure this was to ensure atomicity, meaning if we’re running version “A” and you’re deploying “B”, we flip a switch to version “B” and all users are on “B.”

We decided the best way to achieve this was to support serving two different versions of Pinterest and have Varnish intelligently decide which version to use. We created beefier web front-end boxes (c3.8xlarges from c1.xlarges), which could not only handle more load, but easily run 64 Python processes where half were serving the current version of Pinterest and the other serving the previous. The new and old versions were backed behind nginx with a unique port per each version of the site being served. For example, port 8000 might serve version “A” on one host, and port 8001 might serve version “B”.

Varnish will happily serve either the current version of the site or the previous version if you specify which you want (presumably you wouldn’t, but our JavaScript framework would). If you make a request without specifying a version you’ll get the current version of the site. Varnish will route to the right host/port which happens to serve the desired version.

Coordination and deploys

In order to inform Varnish what we should do, we developed a series of barriers, which tell Varnish what version to serve and when. Additionally we created “server sets” in ZooKeeper that let Varnish know which upstream nginx are serving.

Let’s imagine a steady state where “A” is our previous version, “B” is our current version. Users can reach either version “A” or “B”, and within a page load, they will always stay on either “A” or “B” and not switch unless they reload their browser. If they reload their browser they will get version “B”.

If we decide to roll out version C we do the following:

Through ZooKeeper we tell Varnish to no longer serve version “A”.
Varnish responds when it’s no longer serving version “A”.
We roll through our web fleet and uninstall “A” and install “C” in it’s place.
When all the web has “C” available we let varnish know that it’s ok to serve.
Varnish responds when all the varnish nodes can serve “C”.
We switch the default version from “B” to “C”.

By using these barriers, it’s not until the second step that people who were on “A” are now being forced onto “B”. At step 6 we allow new users to be on “C” by default, and users who were on “B” stay on “B” until the next deploy.

A look at the findings

Image may be NSFW.
Clik here to view.

The absolute values are redacted, but you can see the relative effect. Note the dips correspond with weekends, which is when we tend not to deploy our web app. In mid-April, we switched completely to the new atomic deploy system.

We found that the new atomic deployments reduced annoyances for Pinners and contributed to an overall improved user experience. This ultimately means that deploys are stealthier and can we can reasonably do more deploys throughout the day or as the business might require.

Nick Taylor is a software engineer at Pinterest.

Acknowledgements: Jeremy Stanley and Dave Dash, whose contributions helped make this technology a reality.

Stealthy shipping with atomic deploys

Introducing the atomic deploy system

Managing “flip flops”

Our architecture changes

Coordination and deploys

A look at the findings

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112