Quantcast
Channel: Making Pinterest
Viewing all articles
Browse latest Browse all 64

Open-sourcing PINCache

$
0
0

Because the Pinterest iOS app downloads and processes an enormous amount of data, we use a caching system to cache models and images to avoid eating into our Pinners’ (users’) data plans. For quite some time we used TMCache to persist GIFs, JPEGs and models to memory and disk, but after using it in production, Pinners were reporting the app was hanging. After attributing the issue to TMCache, we re-architected a significant portion and forked the project, which resulted in our new open-source caching library, PINCache, a non-deadlocking object cache for iOS and OSX. Here’s how we went from deadlocks to forking.

Making an asynchronous method synchronous

First, we identified the problem. TMCache has native asynchronous methods and uses a common pattern to provide synchronous versions of those methods:

dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[doWorkAsyncrounouslyAndCallback:^{
	dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);

Unfortunately, this pattern has a fatal flaw, as it blocks the calling thread and waits for a signal from a dispatched queue. But what happens when you make a bunch of these types of calls? Thread starvation. If every thread is waiting on another operation to complete and there are no more threads to go around, you wind up in a deadlocked state.

Making a synchronous method asynchronous

The obvious solution is to make our ‘native’ methods synchronous and wrap dispatch_async around them for our asynchronous versions. It turns out that’s step one to solving TMCache’s issues. But there’s more. TMCache uses a serial queue to protect ivars and guarantee thread safety, which, according to Apple’s documentation for migrating away from threads to GCD, is a great idea. However, there’s one minor detail hidden in those docs: “…as long as you submit your tasks to a serial queue asynchronously, the queue can never deadlock.” Reading between the lines, if you want to avoid deadlocking, you can’t synchronously access a serial queue used as a resource lock. This is also something we observed empirically by writing a unit test that deadlocks TMCache every time:

- (void)testDeadlocks
{
    NSString *key = @"key";
    NSUInteger objectCount = 1000;
    [self.cache setObject:[self image] forKey:key];
    dispatch_queue_t testQueue = dispatch_queue_create("test queue", DISPATCH_QUEUE_CONCURRENT);
    
    NSLock *enumCountLock = [[NSLock alloc] init];
    __block NSUInteger enumCount = 0;
    dispatch_group_t group = dispatch_group_create();
    for (NSUInteger idx = 0; idx < objectCount; idx++) {
        dispatch_group_async(group, testQueue, ^{
            [self.cache objectForKey:key];
            [enumCountLock lock];
            enumCount++;
            [enumCountLock unlock];
        });
    }
    
    dispatch_group_wait(group, [self timeout]);
    STAssertTrue(objectCount == enumCount, @"was not able to fetch 1000 objects, possibly due to deadlock.");
}

Back to semaphores

If we want synchronous methods available on the cache, we have to protect our ivars and guarantee thread safety with another mechanism. What we need is a lock. Standard locks are going to reduce our performance, but using a dispatch_semaphore has a slight advantage:

Dispatch semaphores call down to the kernel only when the calling thread needs to be blocked. If the calling semaphore does not need to block, no kernel call is made.

So how do you use a dispatch_semaphore as a lock? Easy:

dispatch_semaphore_t lockSemaphore = dispatch_semaphore_create(1);
//lock the lock:
dispatch_semaphore_wait(lockSemaphore, DISPATCH_TIME_FOREVER);
//do work inside lock
 
  ...

//unlock the lock:
dispatch_semaphore_signal(lockSemaphore);

The difference between using a semaphore as a lock in this manner and the common pattern we mentioned in the beginning of this post is we don’t need a separate thread to release the lock.

Introducing PINCache

The decision to fork TMCache was made after a lengthy conversation through email with the github maintainers, who decided they weren’t comfortable making a major architectural change in a short timeframe. To allow existing users of TMCache to opt out of the significant changes we made, we decided to fork the project. Here are the main differences between TMCache and PINCache:

  • PINCache is similar to TMCache in that it owns instances of both a memory cache and disk cache. It propagates calls to each, relying first on the fast memory cache and falling back to the disk cache.
  • PINMemoryCache has synchronous native methods, and the asynchronous versions wrap them. It uses a dispatch_semaphore as a lock to guarantee thread safety.
  • PINDiskCache also has synchronous native methods and asynchronous versions simply wrap them, too.
  • Instead of using a shared queue, PINDiskCache provides two methods (one asynchronous and one synchronous) to operate on files safely:

    lockFileAccessWhileExecutingBlock:(PINDiskCacheBlock)block
    synchronouslyLockFileAccessWhileExecutingBlock:(PINDiskCacheBlock)block;

  • One other major difference is that multiple instances of PINDiskCache operate independently. This can increase performance, but it’s no longer safe to have two instances of PINDiskCache with the same name.

Replacing TMCache with PINCache

If you have an app in production that uses TMCache and you want to switch to PINCache, there’s a bit of work. First, since sharedQueue is no longer available on PINDiskCache, you’ll want to use lockFileAccessWhileExecutingBlock:. Second, you’ll need to migrate all your users’ disk caches over to PINCache or clean them up. Just run this snippet somewhere before you initialize any PINDiskCache or PINCache instances:

//migrate TMCache to PINCache
- (void)migrateDiskCachesWithNames:(NSArray *)cacheNames
{
    //migrate TMCache to PINCache
    NSString *rootPath = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) firstObject];
    for (NSString *cacheName in cacheNames) {
        NSString *oldPathExtension = [NSString stringWithFormat:@"com.tumblr.TMDiskCache.%@", cacheName];
        NSURL *oldCachePath = [NSURL fileURLWithPathComponents:@[rootPath, oldPathExtension]];
        NSString *newPathExtension = [oldPathExtension stringByReplacingOccurrencesOfString:@"tumblr" withString:@"pinterest"];
        newPathExtension = [newPathExtension stringByReplacingOccurrencesOfString:@"TMDiskCache" withString:@"PINDiskCache"];
        NSURL *newCachePath = [NSURL fileURLWithPathComponents:@[rootPath, newPathExtension]];
        if (oldCachePath && [[NSFileManager defaultManager] fileExistsAtPath:[oldCachePath path]]) {
            NSError *error;
            [[NSFileManager defaultManager] moveItemAtURL:oldCachePath toURL:newCachePath error:&error];
            if (error) {
                [[NSFileManager defaultManager] removeItemAtURL:oldCachePath error:nil];
            }
        }
    }
}

Contributing to PINCache

We use PINCache heavily and want to see it become the best caching library available on the platform. With that in mind, we welcome pull requests and bug reports! We promise to address them as quickly as possible. We can’t wait to see the awesome, performant, non-deadlocking apps you make with this!

Garrett Moon is an iOS engineer on the Mobile team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.


Viewing all articles
Browse latest Browse all 64

Trending Articles