Quantcast
Channel: Making Pinterest
Viewing all 64 articles
Browse latest View live

A look behind search guides

0
0

We launched Guided Search last year to give Pinners an exploratory search where they can discover the best ideas by clicking different guides to filter results. We’re continuing to make updates, such as recent improvements to show more personalized results based on who’s searching and building a smarter platform to understand queries. Today searches derived from guide clicking is one of the major sources of our search traffic. In fact, the number of guides clicked per day has tripled in the last six months, and we’re seeing patterns of its momentum. Guides change based on engagement, so the more people search and Pin, the better the experience gets. In this post, you’ll learn how we create and rank guides, as well as gain insights into trends around how Guided Search is being used for discovery every day.  

Who’s clicking guides?

On average a Pinner clicks 3.6 guides daily when they use Guided Search. As for geography, Pinners outside the U.S. click guides more often than American Pinners. For example, Pinners in Argentina, Australia, Brazil, Canada, France, Germany, Italy, Japan, Mexico, Netherlands, Philippines and the U.K. are more likely to click a search guide than those in the U.S. Among these countries, Mexico has the highest guides click rate, where users are 46 percent more likely to click a guide.

The topic of the search also plays a role in whether or not a guide is clicked. For instance,  Pinners who search for topics related to Celebrities, Fitness, Health, Home Decor, Humor, Men’s Fashion, Photography or Women’s Fashion are more likely to click a guide than those who search Gardening or History. Fitness related searches are 56 percent more likely to see guide clicking than Gardening related searches.

In general, men are more likely to click guides than women. We found men click guides most often when searching the topics of Art, Cars, Fitness, Health, Men’s Fashion, Outdoors and Shopping. Women Pinners tend to click guides when searching for topics like Food and Drink, Home Decor and Technology.

When and where are guides being clicked?

Guided Search launched on mobile first and was designed with a small screen in mind and optimized for tapping instead of typing, so it’s no surprise guides are clicked more often on mobile than web. iPhone has the highest guides clicking rate, followed by Android phone and iPad. In fact, iPhone users are 50 percent more likely to click a guide than those on desktop.

Additionally, Pinners are more likely to click guides during weekends than weekdays.

How are search guides created?

When we first launched Guided Search, we started from organic search logs. Since Pinners typically refine their search queries by adding or changing words, we wanted to find a way to extract these refinements from search queries.

We built a model to collect search queries in corpus. Queries are processed using a TF/IDF algorithm to get the most unique queries. Then, an entity extraction model is applied to the query corpora. Queries are partitioned into entity and guides. For example query “Red Nike Shoes” is extracted as entity “Shoes” and guides “Red” and “Nike.” We built a system to group synonyms, detect typos and remove porn or spammy terms. Both guides and entities are processed in the system to avoid showing inappropriate words.

Search guides generated in this way can cover 49 percent of Pinterest search traffic. Most of the search queries that show guides are popular queries. This is good coverage, but we wanted to push its boundary. So in September we started looking for a better way to generate guides for long-tail queries. If you draw the search pipeline vertically, you can imagine our first approach is top-down where guides are generated from user queries from the top. Instead, we challenged ourselves to try solving the problem from bottom-up.

This was our second approach to generate guides from search result Pins. On Pinterest, each Pin is hand-picked and labeled by Pinners to a board, and so the meta information associated with Pins is of a high quality. We tag Pins with annotations generated from meta information including container boards’ data, Pin descriptions, interest categories, linked third-party web page text and meta information of similar looking Pins. Figure 3 shows one Pin from the results of the search query “Outdoor Living” and the annotations generated for this Pin. We aggregated annotations of search result Pins into annotation corpora for each search query we saw. We built a system to partition Pin annotations of each search query so that each partition represents a meaningful subset of the query’s search results. For example, query “Outdoor Living” has guides generated from annotations “DIY,” “Decks,” “Patios,” “Ideas,” etc. Guides generated in this way can guarantee the composed query (e.g. “Outdoor Living DIY”) has enough high quality search results. Figure 4 depicts the architecture of guides generation.

After its launch, the search guides coverage was improved to 73 percent (as shown in Figure 5). Each search guide is associated with a cover image based on a Pin from the composed query search results. For example, the cover image of guide “Chicken” for query “Recipe” is a Pin in the search results of “Chicken Recipe.” We use “the most interesting” Pin as the cover image, which is selected based on several factors including the number of times a Pin is searched for, pinned and clicked on, and color tone. Each guide is associated with a list of cover image candidates which we dedup for each query dynamically, so Pinners are less likely to see the same image used for different guides of a query.

We store the meta data of queries and guides in QueryJoin to serve other features such as search query expansion and rewrite.

How are search guides ranked?

We’ve come a long way in the ranking of search guides. Today we have a sophisticated scoring and ranking system to order guides, and the ranking of guides for each query is calculated based on following scores:

  • Interests to guides. How do Pinners click each guide of a query? The more interest a Pinner shows to a guide, the higher the guide’s rank.
  • Quality of the results of composed queries. How confident we are with the search results after Pinners click a guide? The confidence score is calculated based on how Pinners click the result Pins and how often they add them to their boards. The score also considers the quality of third party web pages that the result Pins link to. In other words, the more Pinners like the search results, the higher the guide’s rank.
  • Location. We started ranking guides by localization last December, and have since seen a 5 - 10 percent increase in guide clicks in many treatment countries, which shows us the search guides are more relevant and useful to users across the globe. As part of localization efforts, we also calculate guide location scores based on how much interest Pinners of various countries show to each guide. Figure 6 shows guides ranked differently to users in the U.S. and U.K. For example, Pinners in the U.K. have more interests in “London Street Styles” than “Parisian Street Styles.”
  • Gender. In general, male Pinners have different interests in guides than females, and so we rank guides differently based on what’s trending for each group. Gender scores are orthogonal to location scores in ranking. For example, male users in Mexico see guides ranked specifically for their demographic.
  • Current trend. We built a time sensitive scoring function to detect the current trend of users’ interests in guides. This function applies a recency boost to guides that have a momentum in ranking. If a large number of  Pinners are interested in a guide in a short amount of time, this guide becomes a popular guide. Popular guides can be boosted to a higher rank for days. Once they lose their momentum, meaning less people are engaged in this guide, the function quickly ranks the guide to a later position.
  • Spam detection. We detect spammy Pins, search queries and users, which are removed from guides ranking.

Since its launch, Guided Search has become an important driver of Pinterest search traffic and Pinner engagements. The number of daily searches on Pinterest also has greatly increased, with a 25 percent uptick in searches per-person. We’ll continue to make updates to Guided Search to make guides and results more personal and localized, with increasingly higher quality. Check back for more on these updates throughout the year.

If you’re interested in building search and discovery products like Guided Search, join us!

Kevin Ma is a software engineer at Pinterest on the Discovery team.

Acknowledgements: This technology was built in collaboration with Rui Jiang, Yuliang Yin, Pei Yin, Alex Bao, Yuan Wei and Ningning Hu. People across the whole company helped launch the features with their insights and feedbacks.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.


How we made JavaScript testing 15x faster

0
0

Testing is an important pillar in our engineering infrastructure. We have hundreds of A/B experiments running at any given time. To keep these experiments running smoothly, it’s critical to have numerous tests running as part of the build.

Unfortunately, our Javascript test framework was beginning to creak under the strain of hundreds of test files and dozens of simultaneous experiments. It was slow, taking 15 minutes to run the full suite of tests, and often broke due to experiments changing behavior and network/browser issues. As a result, trust in the system degraded, and the tests were removed from our automated build processes until the system could be repaired.

This was an excellent opportunity to revisit our web testing framework, keep what was working, throw away what wasn’t and build a testing framework that would scale for us. The result is our new web testing framework we call Affogato (yes, named after the espresso and ice cream beverage, because automated test coverage is sweet!).


Making tests faster

We tried optimizing our JavaScript testing framework by running them in multiple parallel headless browsers. But the full suite still took too long, and this setup would lead to unpredictable machine resource contention issues. Browsers are hungry beasts from a resource perspective.

This is where jsdom saved the day. It’s a node.js command-line utility implementing WHATWG DOM and HTML standards and isn’t concerned with rendering, painting and other tasks that make a browser CPU and memory hungry. Internal benchmarks showed remarkable 5-20x increases in speed for most of our tests. DOM-heavy tests had the biggest performance improvements.

To take advantage of our build system, which leverages an arbitrary number of processor cores, we broke our suite of tests into small chunks for our test runner to consume. As the number of tests we have scale into the thousands, we can scale the number of cores the tests run on. While we anticipate eventually having to run test suite on multiple machines the speed improvements ensure that one beefy computer is fast enough for the foreseeable future.

Making tests reliable

Ensuring our tests were reliable and trustworthy required a multi-faceted approach. First, we avoided costly data lookups and network transfers by using fixtures, which are files containing JSON-data that describe an object to be tested. The testing framework uses fixtures to create the appropriate mock objects. Fixtures make it easy to test various object states without manually writing the boilerplate code to instantiate mock objects.

Unfortunately, network requests cannot be eliminated completely from web tests without making the tests less powerful or expressive. If a developer wants to test some web code that makes a call to our server-side API, we want to facilitate this impulse, not discourage it. To eliminate test failures caused by network hiccups when making these server-side calls, we wrote an XHR recorder which listens to ajax requests and saves responses to files for playback later. To avoid having to refactor our web code to support this recorder, we patched the JavaScript XHR object directly. The recorder has the bonus side effect of reducing test runtimes by 30 percent on average.

Making tests pleasant to write

With multiple experiments always running, a single experiment can change an arbitrary number of code paths in both the JavaScript and our API. This means experiments need to be a top-level concept in our testing framework and that setting up the “experiment environment” for a given unit test should be as simple as possible. We wrapped the Mocha framework with some rich syntactic sugar that made it easy to target an arbitrary number of experiments as part of a given test. Using Sinon.JS and its sandboxing, the global environment is automatically cleaned up after every test is finished. An ES6 promise polyfill was used to make writing asynchronous tests simpler.

Finally, for a little bit of whimsy, we went full bore on the coffee and cream metaphors, so writing a test begins with typing “cream.sugar(…)”. We figured no one could object to working with something so sweet.

Tying it all together

The overall test suite runs an order of magnitude faster than before, down to one minute instead of 15. In the three months since we started using the framework internally, there hasn’t been any reported incidents of a test failure caused by a flaky test. Its ability to test experiments have put the nail in the coffin to several recurring experiment-related bugs in difficult-to-understand code. Additionally, internal feedback on the new framework has been positive.

The performance and reliability improvements allow us to run the tests all the time. Every time an engineer saves a web file, the relevant tests are run immediately and will fire an alert if a test fails. When a pull request (PR) is submitted to our web repository we run the tests using Jenkins and deny the PR if there is a failure. This fast-feedback makes debugging a test much easier and faster. Time spent debugging test failures has plummeted, trust in our tests has improved and tests are being regularly written again.

David West is a software engineer on the Web team

Acknowledgements: The leap forward in testing at Pinterest was made possible with significant contributions from Kelei Xu, Jeremy Stanley and the Web team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Discover Pinterest: Search and Discovery

0
0

As we continue to focus on making search improvements and building a discovery engine, we recently invited members of the local search communities to Pinterest for a Discover Pinterest event. Hugh Williams joined a few Pinterest engineers to keynote the event and share insights he’s learned from over two decades in the field. Hui Xu, Charles Gordon, Rui Jiang and Kevin Jing from our discovery team shared some of the work they’ve been focused on over the past year and how it’s improving the Pinner experience. You can find a recap of the speakers and their presentations below or by checking out the videos (one has a bonus sneak-peek of what’s to come!) on our YouTube channel.

Here’s a quick look at the speakers.

A whirlwind tour of a search engine (Hugh Williams)

Hugh spent nearly 25 years working in search and information retrieval prior to his current role as a consultant, advisor and leadership coach. He led the team that built eBay’s user experiences, search engine, big data technologies and platforms. Before eBay, he managed a search engine R&D team at Microsoft’s Bing and spent five years running his own startup and consultancy. He’s published over 100 works, mostly in the field of information retrieval, and holds 19 U.S. patents. In his talk, Hugh explains how search works in practice, touching on everything from web crawling and spam detection, to search ranking and search infrastructure.

Discovery overview (Hui Xu)

As head of the Pinterest Search and Discovery team, Hui oversees the teams responsible for building a discovery engine based on the world’s largest human-curated indices. Previously, Hui was a core member of Google’s web search indexing team, co-authored Google’s Caffeine indexing system and managed Google Custom Search. Watch Hui explain how human curation builds the interest graph that we use to build our discovery products.

The life of a Pinterest search query (Charles Gordon)

Charles joined Pinterest in 2012 as the first search engineer, and has since architected and built major parts of the serving and indexing infrastructure. Previously, he worked as a principal engineer for AWS CloudSearch at Amazon and IMDb.com. In his presentation, Charles walks through the steps our search engine takes once a query is submitted through analyzing the images, comments and meta tags of different Pins. Is there a gender associated with the profile? Our system takes that into account to return the most relevant Pins. Is there a location? We look at that, too.  

Search guides (Rui Jiang)

Rui joined Pinterest in  2013 as the tech lead for search quality and search features, where he helped launch Guided Search. Before Pinterest, Rui worked at Google on the custom search engine and at Microsoft on a distributed build system and code search. In his tech talk, Rui dives into search guides, the horizontal keyword search prompts that help Pinners narrow down their searches.

Visual search (Kevin Jing)

Kevin joined Pinterest and created our Visual Discovery team in early 2014 after the acquisition of VisualGraph, a visual-search company he co-founded with two ex-Googlers. Before founding VisualGraph, Kevin worked at Google Research for seven years in the space of computer vision and machine learning, helping to develop Google’s first image processing application in 2004. In his presentation, Kevin walks through the history of visual search, starting with its uses for analyzing spam patterns and predicting abusive material, and ending with the techniques that are in place in the industry today.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

How holdout groups drive sustainable growth

0
0

When it comes to growth, one potential pitfall is over optimizing for short-term wins. Growth teams operate at a pretty fast pace, and our team is no exception. We’re always running dozens of experiments at any given time, and once we find something that works, we ship it and move on to the next experiment. However, sometimes it’s important to take a step back and validate that a new tweak or feature really delivers long-term sustainable growth and isn’t just a short-term win that users will get tired of after prolonged exposure. In this post I’ll cover how we optimize for long-term sustainable growth.

Last year, we started including a badge number with all our push notifications. For many people, when an app has a badge number on it, the impulse to open and clear it is irresistible.

When we launched badges, we ran an A/B experiment, as we do with any change, and the initial results were fantastic. Badging showed a 7 percent lift in daily active users (DAUs) and a significant lift in other key engagement metrics such as Pin close-ups, repins and Pin click-throughs. With such fantastic results, we quickly shipped the experiment. However, we had a nagging question about the long-term effectiveness of badging. Is badging effective long-term, or does user fatigue eventually set in and make users immune to it?

To answer this question we created a 1 percent holdout group. A holdout group is an A/B experiment where you ship a feature to 99 percent of Pinners (users) and keep 1 percent  from seeing the feature in order to measure the long-term impact. We will typically run a holdout group whenever we have questions about the long-term impact or effectiveness of a particular feature.

We ran a holdout experiment for a little over one year. What we found was that the initial lift of 7 percent in DAUs settled on a long-term baseline of a 2.5 percent lift in DAUs after a couple months (see mobile views in Figure 2). Then last fall we launched a new feature, Pinterest News, a digest of recent activity of the Pinners you follow. As part of News, we would also badge Pinners when there were new News items. As a result, News helped increase the long-term lift of badging from 2.5 percent to 4 percent.

We also found that badging was effective at increasing engagement levels. We classify Pinners into core, casual, marginal, etc., and we found badging had a statistically significant impact on attracting those who would have fallen in the marginal or dormant bucket to instead become core or casual users. This finding was compelling since it proved that badging is effective at improving long-term retention.

Holdout groups have been an effective way for us to ensure we’re building for long-term growth. We also have hold out groups for features like ads, user education, etc. In general, holdout groups should be used anytime there is a question about the long-term impact of a feature. In the case of badging, it allowed us to understand how Pinners responded to badging over a prolonged period of time, which will help inform our notification strategy going forward.

John Egan is a tech lead on the Growth team.

If you’re interested in solving Growth challenges, join our team!

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Real-time analytics at Pinterest

0
0

As thousands of people gather in the Bay Area this week for Strata + Hadoop World, we wanted to share how data-driven decision making is in our company DNA.  Most recently, we built a real-time data pipeline to ingest data into MemSQL using Spark Streaming, as well as a highly scalable infrastructure that collects, stores and processes user engagement data in real-time, while solving challenges that would allow us to achieve:

  • Higher performance event logging
  • Reliable log transport and storage
  • Faster query execution on real-time data

Let’s take a deeper look.

Higher performance event logging

We developed a high performance logging agent called Singer, which is deployed on all of our application servers to collect event logs and ship them to a centralized repository. Applications write their event logs to local disk, from where Singer collects and parses these log files, and then sends them to a central log repository. Singer is built with at-least-once delivery semantics and works well with Kafka, which acts as our log transport system.

Reliable log transport and storage

Apache Kafka, a high throughput message bus, forms our log transport layer. We chose Kafka because it comes with many desirable features including support for high volume event streams, replicated durability and low latency at-least-once delivery. Once the event logs are delivered to Kafka, a variety of consumers from Storm, Spark and other custom built log readers process these events in real-time. One of these consumers is a log persistence service called Secor that reliably writes these events to Amazon S3. Secor was originally built with the purpose of saving logs produced by our monetization pipeline where 0-data loss was extremely critical. It reads the event logs from Kafka and writes them to S3, overcoming its weak eventual consistency model. After this point, our self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing.

Spark + MemSQL Integration

While Kafka allows for consuming events at real-time, it’s not a great interface for a human to ask questions on the real-time data. We wanted to enable running SQL queries on the real-time events as they arrive. As MemSQL was built for this exact purpose, we built a real-time data pipeline to ingest data into MemSQL using Spark Streaming. The pipeline is in a prototyping phase as we continue to work with the MemSQL team on productionizing it.

Figure 1 shows the elements of the real-time analytics platform we’ve described so far. We’ve been running the Singer -> Kafka -> Secor -> S3 pipeline in production for a few months. Currently, we’re evaluating the Spark -> MemSQL integration by building a prototype where we feed the Pin engagement data into a Kafka topic. The data in Kafka is consumed by a Spark streaming job.

In this job, each Pin is filtered and then enriched by adding geolocation and Pin category information. The enriched data is then persisted to MemSQL using MemSQL’s spark connector and is made available for query serving. The goal of this prototype was to test if MemSQL could enable our analysts to use familiar SQL to explore the real-time data and derive interesting insights.  

While we continue to evaluate MemSQL, we’ll be showcasing a demo of it at Strata + Hadoop World 2015 along with the MemSQL team on Thursday, February 19 at the San Jose Convention Center. Come visit us at the MemSQL Booth 1015 for more details.

Demo built with Spark & MemSQL

Krishna Gade is an engineering manager on the Data team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Fighting spam at Pinterest

0
0

Spammers used to love us, but not anymore.

Pinterest is a great platform to spam because of the large amount of traffic we drive to other sites. Spammers want to divert traffic to their sites so Pinners will fall for scams. To do this, they’ll disguise Pins as promising weight loss products, work-from-home opportunities, cheap designer handbags and more. This is where the Pinterest BlackOps team comes in. Our mission isn’t to fight spam, but to make it so we don’t need to.

To be successful, spammers must make lots of spam and get lots of people to see and click, and all the while without us knowing. A typical spammer will try to look like a good user by making realistic accounts from computers spread all over the world or by hijacking accounts. There are always several subtle flaws that make these spammers stand out, and once we find one, we’re able to shut them down. They then evolve their tactics and the race begins again. Our job is to be one step ahead of them at all times and make spamming Pinterest  unlucrative.

How we defend against the bad guys

So, how do you fight a foe that tries to look like a good user and rapidly changes what it looks like and how it attacks? Military warfare combined with economic modeling.

To successfully execute against this strategy, we need systems that allow us to observe and respond to an attack quickly and effectively while also not harming good users.

Last year, we began building a new system called Stingray that our spam analysts can use to quickly observe attacks, write rules to respond to them, stop the attack, clean up and evolve, all within minutes. Stingray is a distributed stream processor and rule engine that enables us to react to known malicious behavior in milliseconds. We can even pre-empt attacks if they match signatures along hundreds of different dimensions and stop the attack before it starts. Because we architected Stingray with certain fundamental distributed systems guarantees, we’ll soon be able to write a rule and easily apply it in the past completely annihilating an attack and the mess it leaves.

Over the last six months we’ve added a strong integration test environment and comprehensive monitoring everywhere to help us speedily develop and easily detect problems. We made major gains in our operational strategy faster than ever before, and in just a few months:

  • The amount of spam reported on Pinterest has nose dived to the point where it’s not a useful metric
  • Our system now responds twice as fast to internal spam requests
  • The number of Pinners who click on spam has dropped in half (from few to even fewer)
  • Our system’s ability to successfully respond to bad behavior improved from 95 percent to 99.99 percent

We can dismantle entire attacks in milliseconds, whereas 12 months ago it would have taken us four hours to a day

We fight spam so Pinners can enjoy their experience, but spammers will keep trying to improve as long as we’re a great platform for them to showcase their content. As you’re reading this, they’re mounting a new and improved attack!

If you’d like to learn more about how we fight spam day-to-day, I made a short documentary. For more information on staying safe on Pinterest as a Pinner, check out our Help Center.

If you’d like to wage war with us, we always need very strong generalists with a passion for building and architecting large complex distributed systems. Join our Black Ops team!

Marty Weiner is a manager on the Black Ops team

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Open-sourcing PINCache

0
0

Because the Pinterest iOS app downloads and processes an enormous amount of data, we use a caching system to cache models and images to avoid eating into our Pinners’ (users’) data plans. For quite some time we used TMCache to persist GIFs, JPEGs and models to memory and disk, but after using it in production, Pinners were reporting the app was hanging. After attributing the issue to TMCache, we re-architected a significant portion and forked the project, which resulted in our new open-source caching library, PINCache, a non-deadlocking object cache for iOS and OSX. Here’s how we went from deadlocks to forking.

Making an asynchronous method synchronous

First, we identified the problem. TMCache has native asynchronous methods and uses a common pattern to provide synchronous versions of those methods:

dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[doWorkAsyncrounouslyAndCallback:^{
	dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);

Unfortunately, this pattern has a fatal flaw, as it blocks the calling thread and waits for a signal from a dispatched queue. But what happens when you make a bunch of these types of calls? Thread starvation. If every thread is waiting on another operation to complete and there are no more threads to go around, you wind up in a deadlocked state.

Making a synchronous method asynchronous

The obvious solution is to make our ‘native’ methods synchronous and wrap dispatch_async around them for our asynchronous versions. It turns out that’s step one to solving TMCache’s issues. But there’s more. TMCache uses a serial queue to protect ivars and guarantee thread safety, which, according to Apple’s documentation for migrating away from threads to GCD, is a great idea. However, there’s one minor detail hidden in those docs: “…as long as you submit your tasks to a serial queue asynchronously, the queue can never deadlock.” Reading between the lines, if you want to avoid deadlocking, you can’t synchronously access a serial queue used as a resource lock. This is also something we observed empirically by writing a unit test that deadlocks TMCache every time:

- (void)testDeadlocks
{
    NSString *key = @"key";
    NSUInteger objectCount = 1000;
    [self.cache setObject:[self image] forKey:key];
    dispatch_queue_t testQueue = dispatch_queue_create("test queue", DISPATCH_QUEUE_CONCURRENT);
    
    NSLock *enumCountLock = [[NSLock alloc] init];
    __block NSUInteger enumCount = 0;
    dispatch_group_t group = dispatch_group_create();
    for (NSUInteger idx = 0; idx < objectCount; idx++) {
        dispatch_group_async(group, testQueue, ^{
            [self.cache objectForKey:key];
            [enumCountLock lock];
            enumCount++;
            [enumCountLock unlock];
        });
    }
    
    dispatch_group_wait(group, [self timeout]);
    STAssertTrue(objectCount == enumCount, @"was not able to fetch 1000 objects, possibly due to deadlock.");
}

Back to semaphores

If we want synchronous methods available on the cache, we have to protect our ivars and guarantee thread safety with another mechanism. What we need is a lock. Standard locks are going to reduce our performance, but using a dispatch_semaphore has a slight advantage:

Dispatch semaphores call down to the kernel only when the calling thread needs to be blocked. If the calling semaphore does not need to block, no kernel call is made.

So how do you use a dispatch_semaphore as a lock? Easy:

dispatch_semaphore_t lockSemaphore = dispatch_semaphore_create(1);
//lock the lock:
dispatch_semaphore_wait(lockSemaphore, DISPATCH_TIME_FOREVER);
//do work inside lock
 
  ...

//unlock the lock:
dispatch_semaphore_signal(lockSemaphore);

The difference between using a semaphore as a lock in this manner and the common pattern we mentioned in the beginning of this post is we don’t need a separate thread to release the lock.

Introducing PINCache

The decision to fork TMCache was made after a lengthy conversation through email with the github maintainers, who decided they weren’t comfortable making a major architectural change in a short timeframe. To allow existing users of TMCache to opt out of the significant changes we made, we decided to fork the project. Here are the main differences between TMCache and PINCache:

  • PINCache is similar to TMCache in that it owns instances of both a memory cache and disk cache. It propagates calls to each, relying first on the fast memory cache and falling back to the disk cache.
  • PINMemoryCache has synchronous native methods, and the asynchronous versions wrap them. It uses a dispatch_semaphore as a lock to guarantee thread safety.
  • PINDiskCache also has synchronous native methods and asynchronous versions simply wrap them, too.
  • Instead of using a shared queue, PINDiskCache provides two methods (one asynchronous and one synchronous) to operate on files safely:

    lockFileAccessWhileExecutingBlock:(PINDiskCacheBlock)block
    synchronouslyLockFileAccessWhileExecutingBlock:(PINDiskCacheBlock)block;

  • One other major difference is that multiple instances of PINDiskCache operate independently. This can increase performance, but it’s no longer safe to have two instances of PINDiskCache with the same name.

Replacing TMCache with PINCache

If you have an app in production that uses TMCache and you want to switch to PINCache, there’s a bit of work. First, since sharedQueue is no longer available on PINDiskCache, you’ll want to use lockFileAccessWhileExecutingBlock:. Second, you’ll need to migrate all your users’ disk caches over to PINCache or clean them up. Just run this snippet somewhere before you initialize any PINDiskCache or PINCache instances:

//migrate TMCache to PINCache
- (void)migrateDiskCachesWithNames:(NSArray *)cacheNames
{
    //migrate TMCache to PINCache
    NSString *rootPath = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) firstObject];
    for (NSString *cacheName in cacheNames) {
        NSString *oldPathExtension = [NSString stringWithFormat:@"com.tumblr.TMDiskCache.%@", cacheName];
        NSURL *oldCachePath = [NSURL fileURLWithPathComponents:@[rootPath, oldPathExtension]];
        NSString *newPathExtension = [oldPathExtension stringByReplacingOccurrencesOfString:@"tumblr" withString:@"pinterest"];
        newPathExtension = [newPathExtension stringByReplacingOccurrencesOfString:@"TMDiskCache" withString:@"PINDiskCache"];
        NSURL *newCachePath = [NSURL fileURLWithPathComponents:@[rootPath, newPathExtension]];
        if (oldCachePath && [[NSFileManager defaultManager] fileExistsAtPath:[oldCachePath path]]) {
            NSError *error;
            [[NSFileManager defaultManager] moveItemAtURL:oldCachePath toURL:newCachePath error:&error];
            if (error) {
                [[NSFileManager defaultManager] removeItemAtURL:oldCachePath error:nil];
            }
        }
    }
}

Contributing to PINCache

We use PINCache heavily and want to see it become the best caching library available on the platform. With that in mind, we welcome pull requests and bug reports! We promise to address them as quickly as possible. We can’t wait to see the awesome, performant, non-deadlocking apps you make with this!

Garrett Moon is an iOS engineer on the Mobile team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

4 steps to better goals and metrics

0
0

“Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.” - Sun Tzu

I’ve found over and over again that many organizations suffer from the same problem, goal setting . It’s not always clear how goals are set or how to set them. This is especially true of startups. I had to learn this process the hard way by making lots of mistakes and banging my head against walls. So you can spare your own head and a few innocent walls, I’m sharing my brain dump on formulating goals and metrics.

Goals

Goals are a combination of what you’re trying to accomplish in a defined amount of time and how you’re measuring progress against those accomplishments.  They can be aspirational (goals your team hopes to achieve) and/or commitments (promises to others outside your team).

Goals are useful for several reasons. Here are two of my favorites:

  1. They’re great for picking your head up out of the trenches of engineering/product warfare to think hard about the direction you’re going and what you can reasonably achieve in a certain amount of time.
  2. Goals are an effective means of gaining alignment on where you’re headed while  instilling a sense of urgency.

Everybody responsible for setting goals must know how goal lines are chosen (step 1) and the importance of hitting goals (step 2). In steps 3 and 4, I’ll discuss how to actually set a good goal. WARNING: Skipping past the first two steps will cause you headaches, loss of appetite, thinning of the hair and occasional nausea.

Step 1. Communicate how the goal line is chosen

Everybody has to know how the goal line is set in the company. The purpose is to ensure alignment about where on the football field we’re trying to run. Here are a few options to seed the discussion:

  1. Set goals that are incredible big stretches
  2. Set goals you’re 70 percent likely to hit
  3. Set goals you’re likely to hit
  4. Set goals you’re easily going to hit

Each option has interesting ramifications, and you should choose the one that works best for your culture. At Pinterest, we have a culture of setting goals that we can likely hit 70 percent of the time, and we push really hard to hit them. If we miss our goal, we discuss how we could improve on our strategy and/or tactics or our ability to set goals.  

If it’s not clear which style the whole org uses, communication will break down, likely in subtle ways (you should read “subtle” as a curse word in this context). For example, if Alice assumes goal (2) is how the company operates and Bob is setting a goal for (3), Alice could think Bob is a sandbagging *expletive*. If Alice and Bob aren’t aware they’re working on different basic assumptions, their communications will likely break down and they won’t even know it.

Step 2. Communicate the importance of hitting a goal

Your company needs a well communicated philosophy of how goal setting and meeting/not meeting goals is treated. A few options:

  1. You MUST hit your goal!
  2. Strive hard to meet your goal. Big kudos if you do. Discuss what could have been better if you don’t.
  3. Goals are just guidelines. No big deal.

Again, it’s important to choose which one best matches your company culture. Even more importantly, ensure  the choice is well communicated. The impact of poor communication could be that Alice assumed (1) and Bob assumed (2), and the goal wasn’t met. The result could be Alice is upset the goal wasn’t met, and Bob is confused because he feels that he did nothing wrong. Likely, Bob will be defensive , a sure sign effective communication has ceased.

So which do you choose?

Different companies and departments approach setting their goal lines differently. If you’re not sure which to choose, get the appropriate folks together, gather sentiment and choose one combination to start with.

After you’ve selected one “goal line” choice and one “importance of goals” choice, communicate it. Communicate it over and over again until it’s adopted in the DNA of the company. Communicate it at each goal meeting. Communicate it over beer. Then drink lots of beer so more communication happens.

Once you’ve had enough beer, it’s time to choose a goal.

Step 3. Choose a goal

Strive for a metric-driven goal, but not to the point of losing the human element. A great way to define goals is with OKRs. If you’re already familiar with OKRs, skip the next paragraph. If not, read on.

An OKR is an Objective and a Key Result. The objective is what you’re trying to achieve with this goal. It should be qualitative and inspirational (e.g. grow our user base to the moon). The key result is the metric you’ll be using to monitor your progress (e.g. grow to 10 million active users or version 1.3 shipped). A key result should be qualitative and specify a measurement window. Let’s improve the first example to “grow to 10 million monthly active users,” meaning we’re measuring ourselves against all active users in the last month. Here’s moreaboutOKRs. Eat this stuff up. It’s good for you and low in sodium.

Furthermore, goals should cascade. At the highest level, you might have a goal around growth. Supporting that goal could be sub-goals other teams maintain for improving performance and reliability of site load time. Underlying that goal could be improving new server deployment speed. And so on.

With this in mind, let’s discuss choosing a solid metric. I hate acronyms, but I’m gonna use one anyhow. A metric should be Meaningful, Measurable, Operational, and Motivational, otherwise known as MMOM. (Why does “Operational” have to start with “O”? Ruined a potentially great acronym!)

Meaningful

Your metric needs to measure or contribute to your business objective in some fairly obvious way (or at least in a way everybody can agree on). Growing the number of active users to 10 million is a pretty good way to gauge your progress toward increasing the user base. On the other hand, using the number of episodes of Star Trek you can quote as a way to measure revenue is not so good.

The times you’ll struggle with meaningfulness will usually be when you have a metric that defines your objective pretty well, but not perfectly. For instance, does “number of times content is flagged” meaningfully measure bad experiences? Perhaps, but you’d prefer to know “number of times somebody has a bad experience”(which can be impossible to measure). You’ll have to make some tough trade-offs between metrics and constantly striving to improve.

Measurable

You should be able to measure progress on a regular basis. For instance, if you want to improve growth, you could measure how many people visited your site in the last seven days, which is pretty easy to do with a simple map-reduce job.

But sometimes measurability can be damn near impossible. For instance, how do you measure how much money a spammer is making from your site? No matter how many times I thought a metric would be impossible to measure, we’ve found a way. It may have not been perfect, but starting with something, anything, will help push progress forward. For the spam metric above, we started with measuring how much traffic we sent to spammers’ sites in the last month. This at least gives us some approximation of their revenue on which we can start operating on.

Operational

A highly operational metric is one that your team can affect and see the effect quickly. You need to be able to move the needle on your metric? And, the faster your metric responds to a change in the system, the faster you can iterate.

Measuring “how many people are currently on your site in the last 10 seconds” is very operational. You could change the color of the home screen and immediately see if it has an impact. You could change the color 20+ times in the next hour, if you want.

On the other end of the stick, you might measure how many people return to your site after 14 days. Iterating now becomes much slower because you may have to wait 14 days, but sometimes that’s a necessary tradeoff so that you measure what you actually want to measure (more meaningful to the company goals).

You could also consider having a few goals, one highly operational one for your team, and, separately, one less operational one that’s more meaningful to the rest of the company. They should be closely related.

Motivational

Don’t forget motivation! We’re dealing with people here. If people aren’t motivated, getting up in the morning to conquer this metric kinda sucks. Sometimes metrics themselves are motivational, such as increasing growth or increasing the snack to person ratio. The amount you push a metric can also have a heavy influence on motivation (discussed next in Step 4).

An important but naturally unmotivational metric can usually be remedied by discussing the impact or linking it to something more interesting. For instance, increasing light bulb brightness by 0.00003 percent sounds boring. Instead, how about stating the impact will increase our revenue by $3 million? Wow!

Others are motivated by mastery of the field. Relating this metric to how they’ll be the engineers of the best light bulb in the world can be quite compelling to some.

Some are motivated by the challenge. But, if it’s a damn near impossible goal, some will feel the it’s worthless or out of touch. If the goal is too easy, you lose others.

And still others are motivated in other ways: parties, money, recognition, beer, donuts, bacon. In reality, people are motivated by several of these at once. You must deeply understand what drives your team, not just for setting goals, but also for being an effective inspirational leader. Consider learning more about human motivation, starting with watching this great talk by Dan Pink.

Step 4. Push the liiiine!

Once we know what we’re measuring and how we’re measuring it, the next step is to figure out how far you can push the metric on what timeline, and why.

One very common timeline to operate on is quarters. Set up monthly check-ins with higher ups to build trust with them and provide relevant updates.

First, what is the time range to which your metric holds you accountable?

Some metrics should cover the whole quarter, such as maintenance metrics (e.g., don’t lose ground on availability). Some metrics, especially covering areas of fast improvement, could be unmotivational if the window is too long. For instance, measuring the teams performance before critical infrastructure has had a chance to be built doesn’t make a whole lot of sense. If you’re at 99 percent availability and want to push to 99.9 percent by the end of the quarter, you’re going to need to ship several key optimizations that may not be ready until half way. In this instance, perhaps it’s better to only measure against the last two weeks.

As a rule of thumb, I feel a metric should never cover a window shorter than two weeks. There’s generally way too much noise. Second, how far can you move the metric?

This is where things get tougher. Sometimes a gut feel is a sufficient answer, but propping your answer up with data can give you a far better guess of where you’ll be by the end of the quarter and build trust with everybody else involved. Use whatever data you have available and gather new data to understand what leverage you have.

Progress Over Perfection

When choosing a goal for the first time, it can be very hard to discover one that meets all three M’s and one O. Sometimes there are too many metrics to choose from, sometimes there seem to be none. Just remember,  you’re better off choosing a less than perfect goal to begin with rather than nothing at all. In some cases, you may find there isn’t a perfect metric or even that a metric isn’t appropriate, but give it a good hard try.

Examples

Here are two examples of this model applied to different situations: site performance and amount of spam.

Example 1 — Site performance

Let’s set a goal around site performance. Say you have a young site that’s never measured site performance. You first need a baseline, an awareness of what levers to use to push the metric in the right direction, as well as a goal.

First, choose your strategy for setting goal lines and tell everybody in the goal setting meeting. For example, set a goal line that you can hit with a 70 percent chance, and, if you don’t hit it, study why and get better at setting the goal. Also, let’s assume you’re setting a goal that you’d like to achieve by the end of the quarter that’s about to begin.

After doing some profiling to determine why the site is slow and sometimes not responding, you find that the databases are causing major availability and latency issues and that performing some key optimizations can improve latency and availability:

Objective: Improve customer facing site performance

Key Result 1: Increase availability measured over the last two weeks of the quarter from 98.5 percent to 99 percent

Key Result 2: Decrease 99.9 percentile latency from 200ms to 100ms measured over the last two weeks of the quarter

When discussing performance, it’s always a good idea  to include an availability metric (a measure of how often you return an answer to your client without timing out or erroring) and a performance metric (a measure of how fast the site loads for a reasonable portion of your users). (By the way, if you think you understand latency, think again.)

KR1 and KR2 combine to give a pretty great MMOM story. First, these meaningfully measure your customer’s experience. If you improve the metric, the (hopefully small) leap of faith you’re making is that user satisfaction will go up (you can measure that as another higher OKR).

These are easily measured. You could set up StatsD and get this data now. You could set up alerting to know when you might be risking violating your goal.

These metrics are measured over a week, so operationally they’re a little too long. But, you can measure the most recent five minutes and one hour of availability and latency and report that to your engineers. That represents the wider goal pretty well. That way the engineers know if an optimization has an impact in just a few minutes.

Finally, these metrics, in my opinion, are very motivational. Nothing gets my engineering gears fired up like making the site faster and more reliable! Plus, we’re talking about a fairly large jump Also, you have until the last two weeks to push really hard on some of those projects so the majority of the team can focus.

You should probably designate a (implicit or explicit) goal around maintenance, as well as some resourcing to that end. It’d be bad to go from 98 percent last quarter to 93 percent for most of this quarter and then back up to 99 percent during the measurement window.

Example 2— Amount of spam

Measuring the success of reducing spam is surprisingly difficult. This is because there’s a paradox afoot.  If I knew something was spam, I’d get rid of it. I’d like to know how much spam is left, but how do I measure the stuff I don’t know about?

At Pinterest we went through several rounds of refining our spam metric, and we continue refining to this day. As mentioned earlier, we assume we’re shooting for a 70 percent likelihood of hitting a goal. One metric we used to use was the following:

Objective: Reduce negative experiences of Pinners due to spam

Key Result: Decrease pin reports by 30 percent this quarter over last

Pin reports are a count of the number of times somebody flagged a Pin on Pinterest as spam. This metric is super easy to measure — just count the number of reports that have come in with any standard stats package. Operationally, looking at a range of three months could be hard to react to quickly, but this metric is simply a count of all reports. We can keep a minute-by-minute graph that allows us to observe and react to attacks quickly and see if our rules and models are effective within minutes. Therefore, this metric is still very operational.

This metric was very motivational. We saw occasional daily spikes which could push us in the wrong direction, and low level spam attacks that made the daily average higher. e tore into the data and while we felt that a 30 percent reduction would be tough, we had a strong plan of attack. We could also try lots of different approaches and react quickly.

Meaningful is where this metric is interesting. We want to measure negative experiences as a result of spam. Sometimes Pin reports are false positives (e.g., the Pinner flagged something they didn’t like, but it wasn’t necessarily spam). Additionally, Pin reports don’t really tell us how bad of an experience somebody had. Sometimes people don’t report Pins because they don’t know that they can. And, ideally, it’d be nice to know how successful spammers are (though this isn’t explicitly called out in the objective). However, Pin reports did show us when big attacks were hitting us, and they correlated with helpdesk tickets Pinners were sending in during big attacks.

As we’ve gotten better at fighting spam, Pin reports are so low that they’re largely noise. In response, we’ve now swapped to measuring how many MAUs (monthly active users) click on spam each day. This metric is more meaningful and maps directly to our spam fighting strategy (but was a bit harder to measure and operationalize at first).

Follow my adventures on Twitter and Pinterest.

Email: marty@pinterest.com

Thanks to Jimmy Sopko and Chris Walters for sharing this goal setting, head banging adventure with me!

Thanks to Philip Ogden-Fisher and Sriram Sankar for their substantial feedback and insights!

Marty is a manager on the BlackOps team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.


Serving configuration data at scale with high availability

0
0

We have a lot of important and common data that’s not modified frequently but accessed at a very high rate. One example is our spam domain blacklist. Since we don’t want to show Pinners spammy Pins, our app/API server needs to check a Pin’s domain against this domain blacklist when rendering the Pin. This is just one example, but there are hundreds of thousands of Pin requests every second, which generates enormous demand for access to this list.

Existing Problem

Previously, we stored this kind of list in a Redis sorted set which provided us with easy access to keep the list structure in a time sorted order. We also have a local in-memory and file-based cache that’s kept in sync via polling the Redis host for any updates. Things went well in the beginning, but as the number of servers and size of the list grew, we began to see a network saturation problem. In the five minutes after the list was updated, all the servers tried to download the latest copy of data from a single Redis master causing the network to saturate on Redis master and resulting in a lot of Redis connection errors.

We identified a few potential solutions:

  1. Spread the download of this data over a longer period. But for our use case, we wanted the updates to converge within a few minutes at most.
  2. Shard the data. Unfortunately, since this data is a single list, sharding it would add more complexity.
  3. Replicate this data. Use a single Redis master with multiple Redis slaves to store this data and randomly pick a slave for reads. However, we weren’t confident about Redis replication (we were running v2.6). Moreover, it wouldn’t be cost effective since most of the time (when the data is not updated) these Redis boxes will be idle due to client side caching.

Solution

As each of the above solutions has its own shortcoming, we asked ourselves, how would we design a solution if we were building from ground up?

Formalizing the requirements of the problem:

  • Frequent read access (>100k/sec) and rare updates (several times a day, at most).
  • Quick (within one minute, or several at most) converge the updates across all boxes. Ideally, a push-based model instead of clients polling for updates.

We engineered a solution by combining the solutions to the smaller problems:

  • Cache the data in-memory so that high read access won’t be a problem.
  • Use Apache ZooKeeper as a notifier when updates are made.

This is conceptually similar to the design of ZooKeeper resiliency, but if we stored the entire data in one ZooKeeper node, it would still cause a huge spike in network traffic on the ZooKeeper nodes during an update. Since ZooKeeper is distributed, the load would be spread across multiple ZooKeeper nodes. Yet, we didn’t want to burden ZooKeeper unnecessarily as it’s a critical piece of our infrastructure.

We finally arrived at a solution where we use ZooKeeper as the notifier and S3 for the storage. Since S3 provides very high availability and throughput, it seemed to be a good fit for our use case in absorbing the sudden load spikes. We call this solution managed list aka config v2.

Config v2 at work

Config v2 takes full advantage of the work we have already done, except that the source of truth is in S3. Further, we added logic to avoid concurrent updates and to deal with S3 eventual consistency. We store a version number (that’s actually a timestamp) in ZooKeeper node which also will be a suffix of the S3 path to identify the current data.

If a managed list’s data needs to be modified, a developer has the option to change it via an admin web UI or a console app. The following steps are executed by the Updater app on save:

  • First, grab a Zk lock to prevent concurrent write to the same managed list.
  • Then, compare the old data with the one in S3 and only upload the new data to S3 if it matches - Compare And Swap update. This prevents dirty writes while a previous update is converging.
  • Finally, write the version to Zk node and release the Zk lock.

As soon as the Zk node’s value is updated, ZooKeeper notifies all its watchers. In this case, triggering the Daemon processes on all servers to download the data from S3.

How we grappled with S3’s consistency model

Amazon’s S3 gives great availability and durability guarantees even under heavy load, but it’s eventually consistent. What we needed was “read after write” consistency. Fortunately, it does give “read after create” consistency in some regions*. Instead of updating the same S3 file, we create a new file for every write. And yet, this introduces a new problem of synchronizing the new S3 filename across all the nodes. We solved this problem by using ZooKeeper to keep the filename in sync across all the nodes.

Introducing Decider

When a new feature or service is ready for launch, we gradually ramp up traffic in the new code path and check to make sure everything is good before going all in. This resulted in the need to build a switch that can allow a developer to decide how much traffic should be sent to the new feature. Also, this traffic ramp-up tool (aka “Decider”) should be flexible enough so that developers can add new experiments and change the values of existing experiments without requiring a re-deploy to the entire fleet. In addition, any changes should converge quickly and reliably across the fleet.

Earlier Solution

Every experiment is a ZooKeeper node and has a value [0-100] that can be controlled from the web UI. When the value is changed from the web UI, it’s updated in the corresponding node, and ZooKeeper takes care of updating all the watchers. While this solution worked, it was plagued with the same scaling issues we previously experienced since the entire fleet was directly connecting to ZooKeeper.

Our Decider framework consisted of two components: a web-based admin UI to control the experiments and a library (both in Python and Java) that can be plugged in where branch control is needed.

Current Solution

Once we realized the gains of managed list, we built managed hashmap and migrated values of all Zk nodes containing the experiments. Essentially, the underlying managed hashmap file content is a json dump of the hash table that contains experiment names as the keys and an integer [0-100] as the value.

API

def decide_expermiment(experiment_name):
    return random.randrange(0, 100, 1) 

How this is used in code:

if decide_experiment("my_rocking_experiment"):
    // new code 
else
    // existing code

Another use case of Decider: dark read and dark write

We use the terminology “dark read” and “dark write” when we duplicate the production read or write request and send it to a new service. We call it dark because the response from the new service doesn’t impact the original code path whether it’s a success or failure. If asynchronous behavior is needed then we wrap the the new code path in gevent.spawn().

Here’s a code snippet for dark read:

try:
if decider.decide_experiment("dark_read_for_new_service"):
        new_service.foo()
except Exception as e:
    log.info("new_service.foo exception: %s" % e)

*In the rare event that S3 returns “file not found” due to eventually consistency, the daemon is designed to refresh all the content every 30 mins, and those nodes will eventually catch up. So far, we haven’t seen any instances where the nodes got out of sync for more than a few minutes.

If you’re interested in working on engineering challenges like this, join our team!

Pavan Chitumalla and Jiacheng Hong are software engineers on the Infrastructure team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Open-sourcing Pinball

0
0

As we continue to build in a fast and dynamic environment, we need a workflow manager that’s flexible and can keep up with our data processing needs. After trying a few options, we decided to build one in-house. Today we’re open-sourcing Pinball, which is designed to accommodate the needs of a wide range of data processing pipelines composed of jobs ranging from simple shell scripts to elaborate Hadoop workloads. Pinball joins our other open-sourced projects like Secor and Bender, available on our Github.

Building highly customizable workflow manager

Development of new product features is often constrained by the availability of data that powers them. The raw data, typically originating from logs, is sliced and diced, merged and filtered across multiple dimensions before it reaches the shape suitable for ingestion by downstream applications. The process of data transformation is often modelled as a workflow, which, in abstract terms, are directed graphs of nodes representing processing steps and edges describing the run-after dependencies.

Workflows can be arbitrarily complex. In realistic settings it’s not uncommon to encounter a workflow composed of hundreds of nodes. Building, running and maintaining workflows of that complexity requires specialized tools. A Bash script won’t do.

After experimenting with a few open-source workflow managers we found none of them to be flexible enough to accommodate the ever-changing landscape of our data processing solutions. In particular, current available solutions are either scoped to support a specific type of job (e.g. Apache Oozie optimized for Hadoop computations) or abstractly broad and hard to extend (e.g. monolithic Azkaban). With that in mind, we took on the challenge of implementing a highly customizable workflow manager build to survive the evolution of the data processing use cases ranging from execution of basic shell commands to elaborate ETL-style computations on top of Hadoop, Hive and Spark.

Pinball is used by all of our engineering teams, and handles hundreds of workflows with thousands of jobs that process almost three petabytes of data on a daily basis in our Hadoop clusters. The largest workflow has more than 500 jobs. Workflows generate analytics reports, build search indices, train ML models and perform a myriad of other tasks.

Platform vs. end product

Pinball offers a complete end-to-end solution ready to use right out of the box. At the same time, its component-wise design allows for easy alterations. The workflow management layer of Pinball is built on top of generic abstractions implementing atomic state updates.

Conceptually, Pinball’s architecture follows a master-worker (or master-client to avoid naming confusion with a special type of client that we introduce below) paradigm where the stateful central master acts as a source of truth about the current system state to stateless clients. Clients come in different flavors ranging from workers taking care of job execution, to scheduler controlling when a workflow should run and UI allowing users to interact with the system, to command line tools. All clients talk the same language (protocol) exposed by the master, and they don’t communicate directly. Consequently, clients are independent and can be easily replaced with alternative implementations. Because of the flexibility of Pinball’s design, it’s a platform for building customized workflow management solutions.

While customization is possible, it’s worth emphasizing that Pinball comes with default implementation of clients allowing users to define, run and monitor workflows.

Workflow life cycle

Workflow is defined through a configuration file or a UI workflow builder, or even imported from another workflow management system. Pinball offers a pluggable parser concept allowing users to express their workflows in a format that makes most sense to them. Parser translates an opaque workflow definition to a collection of tokens representing workflow jobs in a format understandable by Pinball. (Read more about Pinball’s features.)

Workflow gets deployed through a command line tool or a UI component. Deployment invokes the parser to extract schedule token from the workflow configuration and stores it in the master. A schedule token contains metadata such as the location of the workflow config, the time at which the workflow should run, the recurrence of executions and an overrun policy. The policy describes how the system should behave if the previous run of the workflow hasn’t finished by the time a new execution is due. Example policies allow aborting the currently running workflow, starting another workflow instance in parallel to the running one or delaying the workflow start until the previous run finishes.

When the time comes, the scheduler uses the information stored in the schedule token to locate the workflow config, parse it and generate job tokens representing individual jobs in that workflow. Job tokens are posted to the master under a unique workflow instance ID. Workflow instances are controlled independently of one another giving the user the flexibility to run multiple instances of the same workflow in parallel.

Job tokens are claimed and executed by idle workers. A job is described by a command line that the worker runs in a subprocess. The output of the subprocess is captured and exposed in the UI. Pinball interprets specially formatted log lines as values to be exposed in the UI or passed on to downstream jobs. This allows us to directly embed a link to a Hadoop job tracker page in the Pinball UI or propagate parameters from an upstream job to its downstream dependents.

If any post-processing is needed on job failure (e.g. one may want to remove partial output), Pinball offers the ability to attach arbitrary cleanup command to job configuration. Cleanups are guaranteed to run even if the worker that claimed the job died in the middle of its execution.

Failed jobs may be retried automatically or manually. Users can choose any subset of jobs to retry by interacting with the workflow diagram. Bulking of actions significantly improves the usability when working with larger job hierarchies.

When the workflow instance finishes (either failing or succeeding), optional email notifications are sent out to workflow owners.

Workflow configuration and job templates

To end users, workflow manager is often a black box that employs faerie magic to schedule and execute their jobs, but the workflow itself needs to be defined in one way or another. While designing Pinball, we made a conscious choice to not make the configuration syntax part of the system core in order to give developers a lot of flexibility to define workflow configurations in a way that makes the most sense in a given setting. At the same time, we wanted to offer a complete package with a low barrier to entry. So, we decided to include a simplified version of the parser and job templates that we use in our open-source release.

Out of the box, we support a Python-based workflow configuration syntax. We also provide a number of job templates for configuring simple shell scripts as well as more elaborate computations on the Hadoop platform. We offer a native support for EMR and Qubole platforms with some power features such as embedding of job links in the Pinball UI and cleaning up resources after failed jobs. We also propose the notion of a condition that allows users to model data dependencies between jobs (think of a job being delayed until the data it needs becomes available).

Give our projects a try and let us know what you think! If you’re interested in working on projects like this from the inside, join our team.

Pawel Garbacki is a software engineer on the Monetization team. Mao Ye, Changshu Liu and Jooseong Kim are software engineers on the Data team.

Acknowledgements: Thanks to Krishna Gade, Mohammad Shahangian, Tongbo Huang, Julia Oh and Roger Wang for their invaluable contributions to the Pinball project.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Making Pinterest HTTPS

0
0

Pinner safety is a top priority for us, and so earlier this year we joined the growing list of websites that are fully HTTPS. As we build trust with Pinners, it significantly improves security in one fell swoop. Migrating to HTTPS presented a number of expected - and unexpected - engineering challenges, starting with finding the right CDN provider partner.

Expected challenges

We identified and mitigated many technical challenges in the discovery process of the migration. One of the biggest challenges was working with our CDN providers, which support HTTPS and our certificates. We also knew that CDN image distribution over HTTPS can potentially cost significantly more. Other technical challenges included:

  • Surfacing hard coded HTTP URLs and functions in source files
  • Performance impact
  • Older browser support
  • Referral header removal from HTTPS to HTTP sites
  • Mixed content warnings (broken lock in browser bar)

Unexpected challenges

Once we felt comfortable enough to start testing, we launched a test in the UK where we have an active Pinner community. Our tests showed there was an insignificant impact on SEO and little effect on any one browser. It wasn’t until we cast a larger net to a percentage of our global audience did we see the following other challenges:

  • Missed CDN content that broke the Pin It button for several hours
  • Not all sitemap files were updated to point to HTTPS domains
  • An unknown Safari issue

Tackling challenges

Although we anticipated a number of challenges, we were able to tackle those unexpected ones with a lean and fast-moving team. Here’s how we did it:

  • Broken “Pin It” button. We were able to quickly mitigate by a swift DNS change to a new CDN provider.
  • Referral header remover issues. We used a meta referrer header to support HTTPS tracking to HTTP sites.
  • Unknown Safari issue. Our UK experiment provided data that showed a small percentage of users had problems logging in after the migration. We pinpointed this to Safari users, which allowed us to start investigating the root cause.

In addition, having multiple CDN providers that supported HTTPS gave us options for performance as well as commercial leverage.

In the end, we enhanced the privacy of Pinners by enabling encryption while also hindering exploitation by way of man-in-the-middle attacks, session hijacking, content injection, etc. This also paved the way for future products that may require HTTPS to launch. Finally, the move to HTTPS resulted in a 10 percent (max) increase in signups a day, because we were able to remove the redirect flow from HTTP to the HTTPS signup page.

We will continue our journey towards HTTPS with further enhancements including HTTP Strict Transport Security (HSTS), which will prevent SSL stripping. We also plan to work with Chromium to preload our domain to prevent SSL stripping on a user’s first visit to Pinterest.

Introducing our paid bug bounty program

Prior to the HTTPS migration, we were hesitant to open a paid bug bounty program because of a number of known vulnerabilities associated with being only HTTP. Now that a number of gaps have been closed as a result of the migration, we’re happy to announce that we’ve upgraded the program with payouts results, with a 10x increase in reports since launching the paid program. We highly encourage the whitehat hacker community to use our program and report bugs, which helps us keep Pinners safe and increase our security posture.

If you’re interested in working on security engineering challenges like this, join our team!

Paul Moreno is the security engineering lead on the Cloud team.

Migrating to HTTPS wasn’t a smooth process. It took several members of various teams to pull off, and there were a number of moving parts. Special thanks to engineers Amine Kamel, Chris Danford, Danilo Stefanovic and Anna Majkowska for their hard work making Pinterest a safer place.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Pinnability: Machine learning in the home feed

0
0

Pinterest hosts more than 30 billion Pins (and growing) with rich contextual and visual information. Tens of millions of Pinners (users) interact with the site every day by browsing, searching, Pinning, and clicking through to external sites. The home feed, a collection of Pins from the people, boards and interests followed, as well as recommendations including Picked for You, is the most heavily user-engaged part of the service, and contributes a large fraction of total repins. The more people Pin, the better Pinterest can get for each person, which puts us in a unique position to serve up inspiration as a discovery engine on an ongoing basis.

The home feed is a key way to discover new content, which is valuable to the Pinner, but poses a challenging question. Given the ever increasing number of Pins from various sources, how can we surface the most personalized and relevant Pins? Our answer is Pinnability.

Pinnability is the collective name of the machine learning models we developed to help Pinners find the best content in their home feed. It’s part of the technology powered by smart feed, which we introduced last August, and estimates the relevance score of how likely a Pinner will interact with a Pin. With accurate predictions, we prioritize those Pins with high relevance scores and show them at the top of home feed.

The benefits of Pinnability

Before launching Pinnability a few months ago, all home feed content from each source (e.g., following and Picked For You) was arranged chronologically, without taking into account which Pins people may find to be most interesting. In other words, a newer Pin from the same source always appeared before an older Pin. This simple rule is easy to understand and implement, but it lacked the ability to effectively help Pinners discover Pins that really interest them, because a low-relevance Pin could very well appear before a high-relevance one (see Figure 1).

With Pinnability launched, the candidate Pins for home feed are scored using the Pinnability models. The scores represent the personalized relevance between a Pinner and the candidate Pins. Pins in home feed are prioritized by the relevance score as illustrated in Figure 2.

Powering Pinnability with machine learning

In order to accurately predict how likely a Pinner will interact with a Pin, we applied state-of-the-art machine learning models including Logistic Regression (LR), Support Vector Machines (SVM), Gradient Boosted Decision Trees (GBDT) and Convolutional Neural Networks (CNN). We extracted and tested thousands of textual and visual features that are useful for accurate prediction of the relevance score. Before we launch a model for an online A/B experiment, we thoroughly evaluate its offline performance based on historical data.

Figure 3 summarizes the three major components of our Pinnability workflow, namely training instance generation, model generation and home feed serving.

Training instance generation

The basis of the Pinnability training data is the historical Pinner interaction with home feed Pins. For example, after viewing a Pin in home feed, a Pinner may choose to like, repin, click for a Pin closeup, clickthrough, comment, hide, or do nothing. We record some of the “positive actions” and “negative action” as training instances. Naturally the number of Pins viewed is often much larger than the number of Pins in which the Pinner made a positive action, so we sample the positive and negative instances at different rates. With these defined, we test thousands of informative features to improve Pinnability’s prediction accuracy.

Our unique data set contains abundant human-curated content, so that Pin, board and user dynamics provide informative features for accurate Pinnability prediction. These features fall into three general categories: Pin features, Pinner features and interaction features:

  • Pin features capture the intrinsic quality of a Pin, such as historical popularity, Pin freshness and likelihood of spam. Visual features from Convolutional Neural Networks (CNN) are also included.
  • Pinner features are about the particulars of a user, such as how active the Pinner is, gender and board status.
  • Interaction features represent the Pinner’s past interaction with Pins of a similar type.

Some features are subject to transformation and normalization. For instance, log transformation is applied to many “count features” such as the number of Pins a Pinner owns for regression-friendly distributions.

The major challenge we faced in developing a robust training data generation pipeline was how to cope with the large data scale. We built MapReduce pipelines to generate the training instances, each representing a Pinner/Pin interaction. A training instance contains three parts of information:

  • Meta data (Pin ID, Pinner ID, source of the interaction, timestamp, etc.) for data grouping when we want to train and analyze a Pinnability model with a subset of training instances, such as following and Picked For You (PFY) models.
  • Target value to indicate whether a Pinner has taken a positive action after viewing the Pin. We can train separate models that optimize different positive actions such as repins and clickthroughs.
  • Feature vector that contains the informative signals for interaction prediction.

Pinnability model generation

In training Pinnability models, we use Area Under the ROC Curve (AUC) as our main offline evaluation metric, along with r-square and root mean squared error. We optimized for AUC not only because it is a widely used metric in similar prediction systems, but also because we’ve observed strong positive correlation between the AUC gain from offline testing and an increase in Pinner engagement in online A/B experiment. Our production Pinnability model achieves an AUC score averaging around 90 percent in home feed.

We experimented with multiple machine learning models, including LR, GBDT, SVM and CNN, and we use AUC score in 10-fold cross-validation and 90/10 train-test split settings with proper model parameters for evaluation. We observed that given a fixed feature set, the winning model always tends to be either LR or GBDT for Pinnability. For online A/B experimentation, we prioritize models based on offline AUC scores.

Among the thousands of features we added to the training instances, we select features that significantly increase our offline AUC metric as candidates for online A/B experiments. Given the large amount of candidate features, we often test new features in smaller groups, such as recency, Pin owner quality and category match features. The A/B experiments we conducted compare Pinner engagement between the control group using production features and the treatment group using the new experimental features. If the results are positive, we evaluate the extra data size and latency impact on our servers before adding the new features to our production Pinnability models. We iterate quickly on new features supported by our robust training instance generation, model training and evaluation pipelines. In order to monitor models’ on-going performance, we keep a small holdout user group that is not exposed to the Pinnability models. Comparing the engagement difference between the holdout and enabled groups provides valuable insights about Pinnability’s long-term performance.

Currently we only use offline batch data to train our Pinnability models. This poses a potential issue in that we’re not utilizing the most recent data to dynamically adjust model coefficients in serving. On the other hand, we tested and confirmed that the model coefficients do not change substantially when trained on different batches of data separated by several days, so the benefits of online model adjustment are subject to further evaluation.

We’re also exploring ways to apply online training with real-time instances to augment our offline training pipeline so our models are calibrated immediately after we gather new home feed activity data. Online training poses new challenges both algorithmically to our machine learning pipeline and systematically to our home feed serving framework.

Home feed serving

Home feed is powered by our in-house smart feed infrastructure. When a new Pin is repinned, smart feed worker sends a request to the Pinnability servers for the relevance scores between the repinned Pin and all the people following the repinning Pinner or board. It then inserts the Pins with the scores to the pool that contains all followed Pins. PFY Pins are inserted into the PFY pool with the Pinnability relevance score in a similar fashion.

When a user logs on or refreshes home feed, smart feed content generator materializes the new content from the various pools while respecting the relevance scores within each pool, and the smart feed service renders the Pinner’s home feed that prioritizes the relevance scores.

Pinnability outcome

We continue to refine Pinnability and have released several improvements to-date. With each iteration, we’ve observed significant boosts in Pinner engagement, including an increase in the the home feed repinner count by more than 20 percent. We’ve also observed significant gains in other metrics including total repins and clickthroughs.

Given the importance of home feed and the boost in Pinner engagement, Pinnability continues to be a core project in building our discovery engine. We’ve also begun to expand the use of our Pinnability models to help improve our other products outside home feed.

We’re always looking for bright engineers to join Pinterest to help us solve impactful problems like Pinnability.

Yunsong Guo is a software engineer on the Recommendations team

Acknowledgements: Pinnability is a long-term strategic project being developed in collaboration with Mukund Narasimhan, Chris Pinchak, Yuchen Liu, Dmitry Chechik and Hui Xu. This team, as well as people across the company, helped make this project a reality with their technical insights and invaluable feedback.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Why you should be A/B testing your infrastructure

0
0

The benefits of using a data-driven approach to product development are widely known. Most companies understand the benefits of running an A/B experiment when adding a new feature or redesigning a page. While engineers and product managers have embraced a data-driven approach to product development, few think to apply it to backend development. We’ve applied A/B testing to major infrastructural changes at Pinterest and have found it extremely helpful in validating those changes have no negative user-facing impact.

Bugs are simply unavoidable when it comes to developing complex software. It’s often hard to prove you’ve covered all possible edge cases, all possible error cases and all possible performance issues. However, when replacing or re-architecting an existing system, you have the unique opportunity to prove that the new system is at least as good as the one it’s replacing. For rapidly growing companies like Pinterest, the necessity to re-architect or replace one component of our infrastructure happens relatively frequently. We rely heavily on logging, monitoring and unit tests to ensure we’re creating quality code. However, we also run A/B experiments whenever possible as a final step of validation to ensure there’s no unintended impact on Pinners. The way we run the experiment is pretty simple: half of Pinners are sent down the old code path and hit the old system and the other half use the new system. We then monitor the results to make sure there’s no impact across all our key metrics for Pinners in the treatment group. Here are the results of three such experiments.

2013: A new web framework

Our commitment to A/B testing infrastructural changes was forged in early 2013 when we rewrote our web framework. Our legacy code had grown increasingly unwieldy over time, and its functionality was beginning to diverge from that of our mobile apps, because it ran through completely independent code paths. So we built a new web framework (code-named Denzel) that was modular and composable and consumed the same API as our mobile clients. At the same time we redesigned the look and feel of the website.

When it came time to launch, we debated extensively whether we should run an experiment at all, since we were fully organizationally committed to the change and hadn’t yet run many experiments on changes of this magnitude. But when we ran the experiment on a small fraction of our traffic, we discovered not only major bugs in some clients we hadn’t fully tested but also that some minor features we hadn’t ported over to the new framework were in fact driving significant user engagement. We reinstated these features and fixed the bugs before fully rolling out the new website, which gave Pinners a better experience and allowed us to understand our product better at the same time.

This first trial by fire helped us establish a broad culture of experimentation and data-driven decision-making, as well as learn to break down big changes into individually testable components.

2014: Pyapns

We rely on an open-source library called pyapns for sending push notifications via Apple’s servers. The project was written several years ago and wasn’t well maintained. Based on our data and what we’d heard from other companies, we had concerns about its reliability. We decided to test out using a different library called PyAPNs, which seemed better written and better maintained. We set up an A/B experiment, monitored the results and found that there was a 1 percent decrease in our visitors with PyAPNs. We did some digging and couldn’t determine the cause for the drop, so we eventually decided to roll back and stick with pyapns.

2015: UserService

We’ve slowly been moving towards a more service-oriented architecture. Recently we extracted a lot of our code for managing users and encapsulated it into our new UserService. We took an iterative approach to building the service, extracting one piece of functionality at a time. With such a major refactor of how we handle all user-related data, we wanted to ensure nothing broke. We set up an experiment for each major piece of functionality that was extracted, for a total of three experiments. Each experiment completed successfully showing no drop in any metrics. The results have given us strong confidence that this new UserService is at parity with the previous code.

We’ve had a lot of success with A/B testing our infrastructure. It’s helped us identify when changes have caused a serious negative impact that we probably wouldn’t have noticed. When they go well, they also give us the confidence that a new system is performing as expected. If you’re not A/B testing your infrastructure changes, you really should be.

John Egan is a growth engineer and Andrea Burbank is a data scientist

Acknowledgements: Dan Feng, Josh Inkenbrandt, Nadine Harik and Vy Phan for their help in running the experiments covered in this post.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Learn to stop using shiny new things and love MySQL

0
0

A good portion of the startups I meet and advise want to use the newest, hottest technology to build something that’s cool, but not technologically groundbreaking. I have yet to meet a startup building a time machine, teleporter or quantum social network that would actually require some amazing new tech. They have awesome new ideas with down-to-earth technical requirements, so I kept wondering why they choose this shiny (and risky) new stuff when all they need is a good ol’ trustworthy database. I think it’s because many assume that building the latest and greatest needs the latest and greatest!

It turns out that’s only one of three bad reasons (traps) why people go for the shiny and new. Reason two is people mistakenly assume older stuff is slow, not feature rich or won’t scale. “MySQL is sluggish,” they say. “Java is slow,” I’ve heard. “Python won’t scale,” they claim. None of it’s true.

The third reason people go for shiny is because older tech isn’t advertised as aggressively as newer tech. The younger companies needs to differentiate from the old guard and be bolder, more passionate and promise to fulfill your wildest dreams. But most new tech sales pitches aren’t generally forthright about their many failure modes.

In our early days, we fell into this third trap. We had a lot of growing pains as we scaled the architecture. The most vocal and excited database companies kept coming to us saying they’d solve all of our scalability problems. But nobody told us of the virtues of MySQL, probably because MySQL just works, and people know about it.

Through the gauntlet, two of the most important lessons I learned building Pinterest were:

  • Don’t be the biggest. If you’re the biggest user of a technology, your challenges will be greatly amplified.
  • Keep it simple. No matter what technology you’re using, it will fail.

After about a year of fast, sleep-defying scaling at Pinterest, we had MySQL, Memcache, MongoDB, Redis, Cassandra, Membase and Elastic Search. Everything was on fire and breaking in their own special ways. We wanted to simplify and get rid of all the fancy stuff, but we also wanted something that would scale to the moon with us. It was time to start our grand re-architecture. To help guide us and our choices, we built a set of questions to apply to every different technology:

  • Does the technology meet your needs?
    Solidify your requirements. Do you need simple lookups, ordered lookups and/or lookup by geography? Graph walking? One technology may not be able to support all of your requirements, but it’s nice when it does.
  • How well does it scale?
    Some technology is designed for massive scale and some is not. You may have to layer a little scale magic on top. With scale comes more complexity and less agility. Keep that in mind, and avoid scaling prematurely.
  • Is the cost justified?
    Consider the support contract or licensing and whether or not you can get by without a support contract. If you’re an angel funded startup, ideally you’d like to spend your money on the lights and Ramen, and not on a mainframe.
  • How mature is the technology?
    Maturity of a product is the most important question you can ask next to basic requirements. This is where I’ll spend the lion share of this post.

Maturity is natural

The harder and more passionately people push on a technology, the faster they will run across bugs or performance problems, fix them and hopefully contribute fixes back for the whole community to use.

Maturity can come a hell of lot faster if the technology is simple. Getting a five line Python program to print “Hello, World” to be bug free should take 30 person-seconds, whereas a program to handle world-wide distributed banking transactions will take many person-years to mature due to the complexity that must be hammered out.

So, let’s equation-ify this. Maturity increases with blood and sweat, but comes slower with more complexity.

Maturity = (Blood + Sweat) / Complexity

(I really wanted to throw in some e, i, and π into the equation, but just couldn’t justify it. Yet…)

As a technology hardens, collaboration occurs, understanding gets deeper and a wealth of knowledge is built out. As the technology itself gets more stable and beautiful, documentation and simplification occur and the frontiers and boundaries are tested and widened.

You don’t want to be on the wrong end of the maturity equation. There be dragons there:

  • Hiring will be more difficult
    Try searching on Google for MySQL admin and Cassandra admin. Try walking out into a busy San Francisco walkway and yell out that you need a MySQL admin. You’ll find one. Hbase, unlikely.
  • You’ll find minimal community
    It’s 2 a.m. and you’re getting a weird error message “48fe1a is opaque” (an actual error message I got from Membase). What the f!*k does that mean? Crap, there are there no answers anywhere on Google. Conversely, I can’t remember the last time a MySQL question I had wasn’t already answered along with somebody calling the questioner a nOObXor.
  • You’re more likely to fail, possibly catastrophically
    We had an unexpected loss of data on nearly every technology we used at one time or another, except MySQL. Even on the worst days, when the hard drive crashed, we still managed to find a script somebody wrote to do magic voodoo to get our MySQL data out again and live another day. Other technologies left us dead in our tracks because nobody had encountered the same problems, or they hadn’t taken the effort to dive deep to recover their data or they hadn’t contributed the fix back to the community. Incidentally, I’m super thankful we never trusted the golden copy of our data to any other system except MySQL in those early days.

Sometimes you have to be on the bad end of this maturity ratio. For instance, if you HAVE to have a Flux Capacitor for your time machine, recognize that you won’t be able to hire for it easily. There will be minimal community online to help you debug why you only went back in time 100 years and not 1,000 years. Support may not be able to help you understand why Marty McFly is now stuck in a supernova.

If you’re on the frontier, you’ll hit new bugs and issues that the rest of the world has never seen. They’ll be 10x harder to debug and will likely require a depth of knowledge that goes outside the comfort zone of your current engineers. You’ll have to dig in, push hard and learn fast. I send you my virtual hugs and admiration. I’ve been there. It will be tough. Blog what you find, collaborate and communicate.

If you’re starting or growing a company, and your scale is smaller than huge, consider maturity to be your most important factor aside from basic requirements. Ask yourself — does MySQL sufficiently meet my needs? If so, use it. If you’re wondering if MySQL will be fast enough, the answer is YES. Even better than fast, MySQL’s performance will be consistent.

Last Remarks

So I’ve wailed away on a bunch of technologies, but I seem to have a near-romantic thing for MySQL. I’d like to take a moment to mention that MySQL, while mature, does not solve all your problems. Sometimes you’ll have to venture away from the comforting warming glow of maturity.

  • Cartesian Distance
    If you need to search for nearby points in two dimensions, storing coordinates as Geohashes in MySQL would work well (here’s an XKCD comic to help). Three dimensions would probably also work well. But if you need large N-dimensional search spaces, I don’t know of a good way to store and retrieve them in MySQL efficiently. You might find yourself needing to store N-dimensional points if, for instance, you have created a model that produces feature vectors for some input and you want to see if two inputs are similar. Classic examples include determining if two images are similar but not exactly the same. For these sorts of situations, consider building a distributed RP/KD tree (would love to collaborate! Email me!).
  • Speed of writes
    MySQL delivers full write consistency. If you’re willing to trade off “full” for “eventual” consistency, your writes can be much faster. HBase, Cassandra and other similar technologies write to an update log incredibly fast, at the expense of making reads slower (reads must now read the stored info and walk the update log). This is a nice inversion, because it’s easier to cache reads and make them fast.
  • FIFOs, such as feeds
    My biggest complaint about MySQL is that it’s still living in 1994 (with baggy pants and the.. erk.. Macarena). Many uses of databases back then needed relational queries. There were no social networks. MySpace wouldn’t come around for another nine years! And so MySQL is built out of trees and has no good notion of queues. To insert into a B-tree is an O(lg(N)) operation (assuming happy balance). But today, social networks are a major force on the internet, and they depend heavily on queues. We want uber fast O(1) enqueuing! My suggestion is to not use MySQL for feeds. It’s too much overhead. Instead, consider Redis, especially if you’re still a small team. Redis is super fast and has lists with fast insertion and retrieval. If you’re a larger company and can hire the folks to maintain it, consider HBase. It’s working well for our feeds.
  • Logs
    For the love of everything holy, don’t store logs in MySQL. As mentioned in the previous paragraph, MySQL stores things in trees. Logs should not live in trees (that’s a weird thing to say…). Send your logs to Kafka, then use Secor to read from Kafka and ship up to S3. Then go nuts with EMR, Qubole or your map-reduce platform du jour.
  • Scale beyond one box
    If you’re in this position and you’ve optimized all queries (no joins, foreign keys, distinct, etc.), and you’re now considering using read slaves, you need to scale to beyond one MySQL server. MySQL won’t do this for you out of the box, but it’s not hard. And we have a solution, which you’ll be able to read about and learn how we did this once I finish writing the blog post. In the meantime, bug me at marty@pinterest.com, and I’ll get you up and running now.

Marty Weiner is an engineering manager at Pinterest. Follow his adventures on Twitter and Pinterest. Interested in joining his team? Check out our careers site.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Shared web credentials: A simpler way to log in

0
0

Our top priority is to create a great user experience in everything we build, and across platforms. As part of that, each day we work to reduce the effort it takes to use Pinterest. So when Apple announced support for shared web credentials, we were excited to make it easier for Pinners to log in.

Shared web credentials allows an iOS app to read and write passwords to the same store that Safari uses on iOS and Mac. For example, if a Pinner signed up for Pinterest on their Mac, they could download our iPhone app and immediately login without needing to remember or tap in their credentials. Magic!

How To

Before you can start using the APIs, there are two requirements to get iOS to trust your app with shared web credentials.

  • Add your domain to “Associated Domains” in the Capabilities section of your Xcode project:
  • Sign a JSON file and host it at https://<yourdomain.com>/apple-app-site-association. Note the https.

    The file should contain a dictionary that lists your approved app identifier and is signed to ensure it has not been tampered with. The listing below shows an example JSON file formatted for reading:

    Listing 1-1 Server-side web credentials:
    { 
    "webcredentials": { 
    "apps": [ "YWBN8XTPBJ.com.example.myApp"] 
    } 
    }
    

    We used our pinterest.com certificate and key to sign the JSON file (more on the requirements of the certificate here). To sign the certificate, use the commands below:

    Listing 1-2 Signing the credentials file:

    echo '{"webcredentials":{"apps":["YWBN8XTPBJ.com.example.myApp"]}}' > json.txt 
    
    cat json.txt | openssl smime -sign -inkey pinterest.com.key 
            -signer pinterest.com.pem 
            -certfile intermediate.pem 
            -noattr -nodetach 
            -outform DER > apple-app-site-association

    As a final step, host the produced file on your website under https://example.com/apple-app-site-association.

    This process should be relatively straightforward, however there are a couple of gotchas.

    First, be sure you have credentials in iCloud Keychain that match your domain. Without them the API methods will fail with an error (-25300, No matching items found). If the user clicks “Not Now” there will be no error, but zero credentials returned.

    Also, if you don’t get everything set up correctly the first time, you’ll want to delete the app between tests. Apple verifies your domain app link when your app is installed, so relaunching the app after fixing the issue won’t result in a new check.

  • Now, to request the user’s credentials in the app, just make a simple function call:
    SecRequestSharedWebCredential(NULL,
                                      NULL,
                                      ^(CFArrayRef credentials, CFErrorRef error) {
                                          if (error) {
                                              //handle error
                                          } else {
                                              if ([(__bridge NSArray *)credentials count] > 0) {
                                                  NSDictionary *dictionary = [(__bridge NSArray *)credentials objectAtIndex:0];
                                                  NSString *accountName = dictionary[(__bridge id)(kSecAttrAccount)];
                                                  NSString *password =  dictionary[(__bridge id)(kSecSharedPassword)];
                                                } else {
                                                  // No accounts
                                              }
                                          }
                                      });

Design

Imagine you’ve just signed up for Pinterest on the web and then download our iOS app. Upon loading the app, you’re offered to immediately log in with your credentials, which is the ideal experience.

However, you don’t want to annoy users who have no interest in using the feature. So if the user clicks “Not Now,” offer to stop showing the dialog every time they’re on the login screen.

To create the most seamless experience, integrate shared web credentials in other parts of your app.

  • When the user logs in successfully, call SecAddSharedWebCredential to update the user’s credentials so that they can use them when they visit your website later.
  • If you have a password change feature, be sure to also update credentials in the keychain as above.
  • Finally, make sure to have a button to login with the credentials on your login screen.

Results

After introducing support for shared web credentials we saw modest increase in engagement of our Pinners, which made the small effort well worth the time. Encouraging people to use features like iCloud Keychain will help users create more secure passwords without having to remember them, which is a win for everyone involved.

Garrett Moon is an iOS engineer on the Mobile team.

Acknowledgements: Thanks to Amine Kamel, senior security engineer, for providing the how-to on signing JSON files properly.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.


Improving Pinning with a predictive board picker

0
0

Every day, tens of millions of people discover and save Pins on Pinterest, making the Pinning flow one of the most important features. It’s so important that we’re constantly and carefully making updates to it to ensure ease-of-use and fast load times.

As an active Pinner, I have more than 20 boards where I’m curating collections of Pins. Before, I often found myself scrolling through the drop down list of my many boards trying to find the right board for each pin. This was especially challenging on mobile where only a few boards appeared on the screen at a time. The data showed I wasn’t the only one having this frustrating experience, because active Pinners have 24 boards on average.

As a former Pinterest intern (class of 2013), I started building a smart board picker for iOS and Android in the Pinning flow during a Make-a-thon (our version of a hack-a-thon). The goal was to provide suggested boards based on the match of the content of the Pin and a Pinner’s boards to make Pinning easier.

The magic

We have a rich collection of metadata for the content on our service. For most Pins, we know exactly what they’re about, such as category (like travel), and which keywords are most important based on descriptions written by tens of thousands of Pinners. The same applies to boards. Using these rich data we maintain, we concluded we could predict which board a Pinner intends to save to.

When a Pinner repins, we run a scoring algorithm using the Pin against all the Pinner’s boards and return the best-scored board(s) based on content match. The scoring algorithm uses a linear combination of the following features:

  • Category match: whether the Pin’s category is matched with the board’s category. The Pin and board’s category information is represented as a unit vector, where each dimension tells the probability of the Pin or the board falling in this category. The match is represented as the dot product of the two category vectors.
  • Keyword match: we use keywords to describe a Pin or board. These textual terms place each object in a high dimensional space. We measure closeness in that space by inner product of the two sparse vectors of keywords from the Pin and the board.
  • Activeness of the board: how frequent new Pins are added to the board is also considered in the scoring algorithm. A board that has been abandoned for a long time should be penalized in favor of more active boards.

Figure 1: Smart board picker behind the scenes

Tuning the user experience

Accuracy isn’t the only factor that makes a predictive feature successful. Getting the right user experience is even more important, especially for a feature embedded in the essential Pinning flow. A small mistake could ruin the Pinning experience for millions. Given we were introducing a small latency for fetching board suggestions in the Pinning flow, we needed to keep the impact of the additional latency as low as possible in order to make the repin experience better.

Unfortunately, we didn’t get it right the first try. Our first version on iOS showed the original board picker first (with the three most recently picked boards in the Quick Picks section) while fetching the board suggestions, and then swapped them in the Quick Picks section after they were fetched. Although the numbers showed Pinners were spending less time on the repin flow and quickly picking up this feature, we noticed there was a strange increase in the number of Pinners who edited or deleted Pins. We learned the swapping of boards in Quick Picks was providing a bad user experience (see in the animation below). Those who were trying to Pin to the recent boards would accidentally save to the wrong board when the swapping happened right before they tapped on the board.

Figure 2: First try of smart board picker

To make a better user experience, we added a small spinning circle, blocked the area reserved for suggested boards while fetching them and kept the Pinning flow active. We still show the most recently picked board at the first slot before the two predictions, and Pinners can scroll down a list of all their boards if they don’t want to wait for suggestions. Putting Pinners first is always our priority, and this design successfully reduced the accidental Pinning to wrong boards while still achieving our goal for this feature.

Figure 3: Improved version of smart board picker

Insights

There are several interesting insights from the experiment:

  • Making the Pinning flow easier to use increases repins and repinners: our tests show the smart board picker reduces abandons on the Pinning flow by 4 percent and increases overall repinners by 1 percent.
  • Suggesting correct boards in the Pinning flow reduces the efforts of re-organizing Pins later: our data confirms that the smart board picker reduces the number of Pinners who create boards by 2 percent, who edit Pins by 4 percent and who delete pins by 3 percent. In other words, Pinners are Pinning to the correct boards!
  • The smart board picker is helpful in the core Pinning flow: this feature increased the number of Pinners using boards in Quick Picks by 15 percent and reduced the average time spent on the Pinning flow by 10 percent. 

Come try the smart board picker on the latest iOS or Android apps if you haven’t already!

Yuchen Liu is a software engineer on Recommendations team at Pinterest.

Acknowledgements: Smart board picker was built in collaboration with Bin Liu on iOS platform and with Nikki Shah on Android platform. 

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Introducing the Pinterest Developers Platform

0
0

Every day, millions of people use Pinterest to save creative ideas for recipes, articles, places to travel, products and more. Today we’re announcing the beta version of the Pinterest Developers Platform, a suite of APIs for developers to build apps and integrations that bring Pins to life.

On Pinterest, people have saved over 50 billion Pins. As a developer, you’ll have the chance to build something that makes those Pins a reality, like apps for ordering ingredients from a recipe Pin or booking trips based on travel Pins.

When Pinners sign in to your app with Pinterest, you can:

  • Build a personalized, curated experience based on their boards and Pins
  • Let people easily create multiple Pins or boards to get more of your content distributed across Pinterest

We’d love to hear your app ideas—take a look at our technical overview and apply for beta access. We’re looking forward to working with developers (only in the U.S. to start).

Josh Inkenbrandt is a product manager on the Developer Platform team.

Under the hood: Teletraan deploy system

0
0

Among the things a developer worries about most, deploy is near - or at - the top of the list. A deployment is often the first time a new code change runs in the production environment. A dependable and straightforward deploy tool is a crucial part of any developer’s arsenal.

A deploy system should support the following functionalities:

  • Rollback. This is the most important feature of any deploy tool. Having a time machine to go back to a certain previous state is priceless.
  • Hotfix. There are times when rollback is either impractical or hard, and so a hotfix is easier to perform and faster to deploy with higher priority over the regular ones.
  • Rolling deploy. Deploy shouldn’t interrupt service, but if absolutely necessary, the impact has to be minimal. It’s important to halt the deployment if a certain number of servers have failed to upgrade or a service SLA is violated.  
  • Staging and testing. Deploying to production directly has higher risks than deploying to a staging environment or canary to verify things work first. Often times engineers don’t follow this best practice because of the overhead of creating a staging environment and integrating it with their tests. A good deploy system minimizes such overhead.
  • Visibility. Make sure it’s easy to find out which code changes are available to deploy and the number of hosts running new and older versions. It’s also important to easily track which code change was introduced when and by whom, as well as the critical metrics and alarms status during a deployment.
  • Usability. A simple user interface is key for the above functionalities.

Introducing Teletraan

Teletraan is our internal deploy system for supporting all of the above functions (named after a character from the famous Transformer TV series). It was built by a small group of development tools engineers on the Cloud Engineering team that drives reliability, speed, efficiency and security for the site and infrastructure.

Design overview

Teletraan follows the traditional client-server model with MySQL as the backend data storage.

Deploy agents are daemons running on all the hosts and interact with Agent Service periodically to get the latest instructions. During a deployment, an agent downloads and extracts service build tar along with specific deploy scripts, and executes them. These deploy scripts include PRE/POST-DOWNLOAD scripts, PRE/POST-RESTART scripts and the RESTART script itself, and are responsible for stopping and starting services.

Teletraan Workers perform jobs in the background, such as transitioning deploy states based on deploy progress and performing auto deploys based on schedule.

Teletraan Service provides APIs support for Deploy Board and any RESTful calls. It’s responsible for most deploy-related actions, including deploy and rollback. It also creates and maintains service deploy configuration, answers deploy and agent status queries, enforces permission control and more.

Advanced features

In addition to the core functionalities listed above, Teletraan also supports several advanced features:

  • Pause and resume. It comes in handy when a developer wants to double check  something before the code is fully deployed to the cluster.
  • Qualification. Once configured, a successful deploy will trigger an acceptance test to qualify it. An accepted deploy could be used for future promote or auto deploy to the next stage
  • Auto Deploy. Automatically promote builds from one stage to another whenever a new build is available or based on cron job-like schedule settings. An auto deploy could be paused, rollbacked or overridden by the system automatically upon failures.

Teletraan has helped us move faster and ship code easier. We want to share these deploy tools with the world, and are planning to open-source Teletraan later this year. Keep an eye on the blog for updates.

Baogang Song is an engineering lead on the Internal Development Tools team, which is part of the Cloud Engineering team at Pinterest.

Acknowledgements: Teletraan was built by Jinru He, Nick DeChant and Baogang Song from the Internal Development Tools team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Building a scalable machine vision pipeline

0
0

Discovery on Pinterest is all about finding things you love, even if you don’t know at first what you’re looking for. The Visual Discovery engineering team is tasked with building technology that will help people to continue to do just that, through features that can understand the objects in a Pin’s image to get an idea of what a Pinner is looking for.

Over the last year we’ve been building a large-scale, cost-effective machine vision pipeline and stack with widely available tools with just a few engineers. We faced two main challenges in deploying a commercial visual search system at Pinterest:

  • As a startup, we needed to control the development cost in the form of both human and computational resources. Feature computation can become expensive with a large and continuously growing image collection, and with engineers constantly experimenting with new features to deploy, it’s vital for our system to be both scalable and cost-effective.  
  • The success of a commercial application is measured by the benefit it brings to the user (e.g., improved user engagement) relative to the cost of development and maintenance. As a result, our development progress needed to be frequently validated through A/B experiments with live user traffic.

Today we’re sharing some new technologies we’re experimenting with, as well as a white paper, accepted for publication at KDD 2015, that details our system architecture and insights from these experiments and makes the following contributions:

  • We present a scalable and cost-effective implementation of a commercially deployed visual search engine using mostly open-source tools. The tradeoff between performance and development cost makes our architecture more suitable for small-and-medium-sized businesses. 
  • We conduct a comprehensive set of experiments using a combination of benchmark datasets and A/B testing on two Pinterest applications, Related Pins and an experiment with similar looks, with details below.  

Experiment 1: Related Pin recommendations

It used to be that if a Pin had never before been saved on Pinterest, we weren’t able to provide Related Pins recommendations. This is because Related Pins were primarily generated from traversing the local “curation graph,” the tripartite user-board-image graph evolved organically through human curation. As a result, “long tail” Pins, or Pins that lie on the outskirts of this curation graph, have so few neighbors that graph-based approaches do not yield enough relevant recommendations. By augmenting the recommendation system, we are now able to recommend Pins for almost all Pins on Pinterest, as shown below.

Figure 1. Before and after adding visual search to Related Pin recommendations.

Experiment 2: Enhanced product recommendations by object recognition

This experiment allowed us to show visually similar Pin recommendations based on specific objects in a Pin’s image. We’re starting off by experimenting with ways to use surface object recognition that would enable Pinners to click into the objects (e.g. bags, shoes, etc.) as shown below. We can use object recognition to detect products such as bags, shoes and skirts from a Pin’s image. From these detected objects, we extract visual features to generate product recommendations (“similar looks”). In the initial experiment, a Pinner would discover  recommendations if there was a red dot on the object in the Pin (see below). Clicking on the red dot loads a feed of Pins featuring visually similar objects. We’ve evolved the red dot experiment to try other ways of surfacing visually similar recommendations for specific objects, and will have more to share later this year.

Figure 2.  We apply object detection to localize products such as bags and shoes. In this prototype, Pinners click on objects of interest to view similar-looking products.

By sharing our implementation details and the experience of launching products, we hope visual search can be more widely incorporated into today’s commercial applications.

With billions of Pins in the system curated by individuals, we have one of the largest and most richly annotated datasets online, and these experiments are a small sample of what’s possible at Pinterest. We’re building a world-class deep learning team and are working closely with members of the Berkeley Vision and Learning Center. We’ve been lucky enough to have some of them join us over the past few months.

If you’re interested in exploring these datasets and helping us build visual discovery and search technology, join our team!

Kevin Jing is an engineering manager on the Visual Discovery team. He previously founded Visual Graph, a company acquired by Pinterest in January 2014.

Acknowledgements: This work is a joint effort by members of the Visual Discovery team, David Liu, Jiajing Xu, Dmitry Kislyuk, Andrew Zhai, Jeff Donahue and our product manager Sarah Tavel. We’d like to thank the engineers from several other teams for their assistance in developing scalable search solutions. We’d also like to thank Jeff Donahue, Trevor Darrell and Eric Tzeng from the Berkeley Caffe team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Building security into buyable Pins

0
0

Earlier this week, we announced buyable Pins, a simple and secure way to buy your favorite products on Pinterest. When building buyable Pins, we focused on making the technology easy and fun to use from a mobile device, but even more important was building security right into the feature.

How does it work?

We’ll be working with our primary partner Stripe, as well as Braintree, two industry-leading payment providers who’ve been protecting people’s information for years. We’ll be working with both Stripe and Braintree to vault credit cards.

Once a Pinner enters their credit card information in the Pinterest app, we send it via an encrypted channel to the providers, who store it inside a secure vault. The merchant then charges the credit card from the vault through their existing payment processor and lets Pinterest know that the order was successful.

Security and trust are incredibly important to us. We chose to work with these providers for their experience and the trusted relationships they have with the payment processors who work with the world’s best merchants.

This is just the beginning. With this technology and design, we’ll be able to work with merchants of all sizes and help Pinners everywhere discover and buy the products they love.

Announcing buyable Pins to Pinners and partners at Pinterest HQ.

Stay tuned for more posts from the buyable Pins engineering team. If you’re interested in tackling new engineering challenges as we roll out buyable Pins, join our team!

Wendy Lu is an engineer on the iOS team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.

Viewing all 64 articles
Browse latest View live




Latest Images