Fast and non-blocking Core Data back-end programming

October 23rd, 2012

About a month ago we have completely rewritten an important part of the Blits app. Though users did not notice much difference in terms of new features, a lot has changed under the hood. The overall app performance has increased and the software is more robust and maintainable. To make this possible we relied heavily on the 'new' iOS5 Core Data concurrency APIs. We'd like to share what we have learned.

Some facts about the back-end used by Blits:

Blits is an app which is able to display every single product from the website (which has 5 million items at least) including order information, images in different resolutions and promotions. It also has to manage the whole tree of product categories (which currently consists of 3955 categories) and keep it up to date. On top of that it also has some filters (refinements) which can be used to filter out products in which the customer is not interested. To speed up things, it contains a cache to store every XML response and image retrieved from offline for 24 hours.

When I refer to the back-end, I mean the back-end inside the iOS app ( the datacontrollers, the parsers, implementation of the networking, etc). I am not talking about the server-side web services. As you can see, a large and sophisticated back-end is necessary to run all these processes correct and smooth. But there is one more requirement to meet if you want your app to be successful: responsivenes.

Responsiveness is a recurrent theme in all our apps. An app should serve its user and therefore should respond to every event the user triggers, no matter what. And because a user does not care about a 'back-end' (or probably does not even know it's there) the back-end should be completely invisible to the user and must not influence responsiveness.

Enough non-tech talk, lets dig in the details!

If you ever made a back-end before you should be aware of some basic subjects: Core Data, XML parsing and multithreading. In this blogpost I'll dig into how we implemented Core Data together with multithreading so writing to a Core Data persistent store can be done on the background. If you are a bit confused about the above parts, I would encourage you to consult the WWDC session videos for these subjects.

Current situation

The previous version of Blits had a back-end which was compatible with iOS 4.3+. The new back-end is iOS5+ only.

So why did we rewrite the whole back-end and why did we drop the 4.3 support?

The back-end was built during the project and because a lot of feature-additions / changes were made during the project the code of the back-end began to clutter and was not modular anymore. It also lacked some features and it was programmed so that there was a possibility a product was stored more than once in the cache which could cause overhead and inconsistencies. At the end it was more complex than it needed to be.

Another point which was bothering us was the way to deal with multiple NSManagedObjectContext objects. In Blits the XML parsing and Core Data storage is done on a background thread and most of the fetching from Core Data happens on the main thread. In this way we make sure that the app is responsive enough by not bothering the main thread with the heavy work.

To keep the two Core Data contexts in sync it is mandatory to save a context after it has been modified. This saving happens persistent (will be written to disk) and is therefore very slow. Also we had a lot of trouble merging the two contexts using the mergeChangesFromContextDidSaveNotification: function. Sometimes a deadlock appeared when saving the context from the wrong thread and this is something you want to avoid for sure.

To get all these thing straight and add the missing functionality we based our new back-end on some new Core Data functionality available since iOS5 which I will explain in this post.

Interlude: Multithreading

I am talking a lot about multithreading and background threads. These terms are in close relationship with responsiveness.

Every program is started in the 'main thread'. If you do not concern with multithreading, every part of your program is performed on the main thread. This includes handling user events, UI and all of the calculations / functions you perform. This is not something to worry about, as long as the amount of work you do is not too large.

If a task on the main thread is taking to much time (for instance, more than 100ms) and at the same time the user tries to drag something on the screen, he or she may notice a 'lag' in your app. This is because the app is busy doing the task you assigned to it (a large calculation for example) and has no time to perform the actual drag movement (usually translation of an image or view).

This are the basics of multithreading (main thread on a core and a background thread on the other one).

End of interlude

Basics of multithreading Core Data

The new Core Data functionality is based upon the principle of thread confinement: each NSManagedObjectContext is tight to one and only one thread. When performing an operation on a NSManagedObjectContext (reading or writing) you have to make sure that this is done on the correct thread. This is exact the same strategy as we tried to accomplish using the previous back-end, but now it is made more easy using some newly provided functions.

Apple did even a better job by providing functionality for having a parent context which has one or more child contexts. The parent context should be tight to the main thread and is the only context which is able to write directly to the persistent store on the disk. Child contexts are only tight to a parent context and not directly to persistent store on disk. When a child context is saved it will push its changes to the parentContext and not further. This happens in memory and is therefore very fast. When persistent storing of the data is needed the parent context should be saved (slow). But ultimately this is only necessary when an app is transferred to the background or when an important operation (e.g. adding a product to the basket/favourites) is performed. This way parsing and storing should not block the main thread anymore! We use this principle in Blits by parsing and storing in the child context and by reading from the parent context.

A recap of the basics you should remember when using multiple Core Data contexts in iOS5: - You have one persistent store and thus one .sqlite file which is used by Core Data. (Strictly seen it is possible to have multiple persistent stores but this is discouraged.)

  • You have to create a parent NSManagedObjectContext which is tied to the persistent store coordinator and is running on the main thread
  • You can create a child NSManagedObjectContext which runs in a separate background thread and can be connected to the parent context.
  • When saving a child NSManagedObjectContext this is done in memory to the parent context (and is therefore fast).
  • You can perform 'blocks' on every context which are then scheduled for processing.

The last point is a new one: To be able to conform to the thread confinement you should always use the performBlock: and the performBlockAndWait: functionality provided by a NSManagedObjectContext when you cannot guarantee to conform to thread confinement (or you just want to be sure). One can pass a block which does some heavy work using the desired managedObjectContext and Core Data will automatically schedule the block for execution and uses the correct thread. Depending on which function is called the operation will be synchronous (performBlockAndWait:) or asynchronous (performBlock:). When you want to know more of these functions I do not advise you to look in the documentation because these are very non-descriptive. Instead check out the Core Data Release Notes for iOS 5.0. Now that you understand the concept it is time to show some short and simple code guidelines for setting up Core Data this new way:

Creating the parent context

  • Just create your NSManagedObjectModel and NSPersistentStoreCoordinator as before
  • Initialize the parent NSManagedObjectContext by setting its concurrency type to NSMainQueueConcurrencyType
  • Set the persistentStoreCoordinator of the newly created context

Creating the child context

  • Create a new NSManagedObjectContext with concurrency type NSPrivateQueueConcurrencyType
  • Set its parentContext (setParentContext:) to the above created NSManagedObjectContext Now you can start executing blocks on both contexts using the performBlock: or the performBlockAndWait: functions. The first returns immediately (asynchronous) and the second waits for the block to be performed (synchronous). Things to remember: - After creating or modifying a NSManagedObject save to its managedObjectContext to keep everything in sync!
  • Objects created in one managedObjectContext cannot be used in another managedObjectContext without triggering an exception. Always convert them to the managedObjectContext you want to use them in. Check out the github link down below for some sample code.

I can imagine this sounds a bit complicated. In fact, if you try it yourself it should feel very comfortable within a few minutes. Apple has a good Core Data video from the last WWDC (2012) sessions which describes the whole parent/child structure so if you are still confused I recommend you to watch it.

Check out this example project which contains the described setup and a sample test. Also some helpers functions are provided.

When you still have questions, found a bug or have suggestions, please contact me @rvandijke :)