Switching GPUImage to use cached framebuffers

March 17, 2014

Author: Brad Larson

Switching GPUImage to use cached framebuffers

I recently pushed a significant set of changes to the GPUImage repository,, and wanted to explain them in a little more detail, especially since this changes one part of the manual photo filtering process. These changes should dramatically reduce the overall memory usage of the framework, help to prevent memory-related crashes, and fix a number of subtle bugs that have plagued the framework since I started it.

The problem with per-filter framebuffers

From the beginning, GPUImage was built around the concept of discrete filter operations, each with their own self-contained shaders and framebuffers to which these shaders rendered their output. This was fast, and made sense, design-wise.

Unfortunately, this had a significant drawback in that every filter had to be backed by an individual image (sometimes more than one, in the case of multi-pass filters). These images (textures) were all uncompressed bitmaps of four bytes per pixel, which for a 1080p video frame is 8.3 MB by itself. If you started chaining filters, or using more complex multi-pass ones, you could quickly see your memory usage balloon.

This was a particular problem with photos captured from cameras on the newer devices, which got so large that passing a single photo down the capture pipeline was often enough to cause a memory spike and complete termination of an application (or worse, a hard device reboot). I tried to combat this with some aggressive deallocation of textures as images propagated through the filter pipeline, but that didn't work as well as I had hoped.

Framebuffer caching

I finally decided to switch to an idea I've had for a while: a framework-wide cache of framebuffers, where filters would only pull these down as needed, then recycle them back into the cache when done. It took a while to implement, and has resulted in a complete restructure of the underlying framework memory model. It probably has some problems, but it was stable enough that I decided to push this to the mainline codebase for GPUImage.

The way this works is that when a filter wants to draw a frame, it asks the cache for a framebuffer with the size and texture characteristics it wants. If one doesn't exist in the cache, a new one is created and returned. If one does exist, it is pulled out of the cache and handed to the filter.

The filter runs its processing operation and then hands its framebuffer to the next step in the filter chain. That subsequent filter does the same thing, requesting a framebuffer for its output, and the instant it no longer needs the framebuffer from the previous step, it releases that. I use a crude reference counting system to track when framebuffers are no longer needed, and once they aren't, they are pulled back into the framebuffer cache.

What this means is that a 10-step filter chain now takes no more memory at any given moment than a 1-step filter chain did before. In fact, I was also able to make the YUV conversion from the camera a recyclable framebuffer as well, further reducing memory usage. This is most noticeable for photo capture, where the SimplePhotoFilter example goes from peaking at 72 MB of memory usage on an iPhone 4S (as recorded by the Memory Monitor instrument) to 41 MB, a 43% reduction. It no longer triggers memory warnings on that device, and avoids many of the bizarre resource exhaustion behaviors people had observed before.

The cache itself will have more intelligent memory management in the future, but for now cached framebuffers will build up inside the cache until the application receives a memory warning, at which point the cache is purged. If you want to manually trigger this yourself, you can use the -purgeAllUnassignedFramebuffers method on [GPUImageContext sharedFramebufferCache] to trigger an emptying of the cache at any point.

Changes to image capture

This does add one slight wrinkle to the interface, though, and I've changed some method names to make this clear to anyone updating their code. Because framebuffers are now transient, if you want to capture an image from one of them, you have to tag it before processing. You do this by using the -useNextFrameForImageCapture method on the filter to indicate that the next time an image is passed down the filter chain, you're going to want to hold on to that framebuffer for a little longer to grab an image out of it. -imageByFilteringImage: automatically does this for you now, and I've added another convenience method in -processImageUpToFilter:withCompletionHandler: to do this in an asynchronous manner.

With these changes, all framebuffers are now backed by texture caches where possible, so there is no need for -prepareForImageCapture anymore. This method is gone. I've also renamed -imageFromCurrentlyProcessedOutput to -imageFromCurrentFramebuffer, again to make people pay attention to this change in image capture.

I'm still trying to see if I can more intelligently handle some of this, but for now this is what I've set up.

I'm hoping that this will provide a long-term solution to the memory issues that have bothered me since I created the framework, and that this will fix a list of bizarre bugs that people have run into with failures in framebuffer creation, random crashes, etc. It may break a few things in the near term, so watch out for that. I'll be trying to fix what I find as I stress-test this in situations beyond the framework sample applications.