Generating Object Proposals

22 Dec

proposal_teaserThis is a followup post to A Seismic Shift in Object Detection. In the earlier post I discussed the resurgence of segmentation for object detection, in this post I go into more technical detail about the algorithms for generating the segments and object proposals. If you haven’t yet, you should read my previous post first. 🙂

First, a brief historical overview. In classic segmentation the goal was to assign every pixel in an image to one of K labels such that distinct objects receive unique labels. So ideally an algorithm would generate separate segments for your cat and your couch and pass these on to the next stage of processing. Unfortunately, this is a notoriously difficult problem, and arguably, it’s simply the wrong problem formulation. Classic segmentation is exceedingly difficult: mistakes are irreversible and incorrect segments harm subsequent processing. If your kitty had its head chopped off or was permanently merged with the couch, well, tough luck. Thus classic segmentation rarely serves as a pre-processing step for detection and workarounds have been developed (e.g. sliding windows).


A major innovation in how we think about segmentation occurred around 2005. In their work on estimating geometric context, Derek Hoiem et al. proposed to use multiple overlapping segmentations, this was explored further by Bryan Russell and Tomasz Malisiewicz and their collaborators (see here and here). The idea is as simple as it sounds: generate multiple candidate segmentations, and while your kitty may be disfigured in many, hopefully at least one of the segmentations will contain her whole and unharmed. This was a leap in thinking because it shifts the focus to generating a diversity of segmentations as opposed to a single and perfect (but unachievable) segmentation.

The latest important leap, and the focus of this post, was first made in 2010 concurrently by three groups (Objectness, CPMC, Object Proposals). The key observation was this: since the unit of interest for subsequent processing in object detection and related tasks is a single object segment, why exhaustively (and uniquely) label every pixel in an image? Instead why not directly generate object proposals (either segments or bounding boxes) without attempting to provide complete image segmentations? Doing so is both easier (no need to label every pixel) and more forgiving (multiple chances to generate good proposals). Sometimes a slight problem reformulation makes all the difference!

Below I go over five of the arguably most important papers on generating object proposals (a list of additional paper can be found at the end of this post). Keep reading for details or skip to the discussion below.

Objectness: One of the earliest papers on generating object proposals. The authors sample and rank 100,000 windows per image according to their likelihood of containing an object. This `objectness’ score is based on multiple cues derived from saliency, edges, superpixels, color and location. The proposals tend to fit objects fairly loosely, but the first few hundred are of high quality (see Fig. 6 in this paper). The algorithm is fast (a few seconds per image) and outputs boxes.

CPMC: Published concurrently with objectness, the idea is to use graph cuts with different random seeds and parameters to obtain multiple binary foreground / background segmentations. Each generated foreground mask serves as an object proposal, the proposals are ranked according to a learned scoring function. The algorithm is slow (~8 min / image) as it relies on the gPb edge detector but it generates high quality segmentation masks (see also this companion paper).

Object Proposals: Similarly to CPMC (and published just a few months after), the authors generate multiple foreground / background segmentations and use these as object proposals. Same strengths and weaknesses as CPMC: high quality segmentation masks but long computation time due in part to reliance on gPb edges.

sssSelective Search: As discussed in my previous post, arguably the top three methods for object detection as of ICCV13 all used selective search in their detection pipelines. The key to the success of selective search is its fast speed (~8 seconds / image) and high recall (97% of objects detected given 10000 candidates per image). Selective search is based on computing multiple hierarchical segmentations using superpixels from Felzenszwalb and Huttenlocher (F&H) computed on different color spaces. Object proposals are the various segments in the hierarchies or bounding boxes surrounding them.

RPRandomized Prim’s (RP): a simple and fast approach to generating high quality proposal boxes (again based on F&H superpixels). The authors propose a randomized greedy algorithm for computing sets of superpixels that are likely to occur together. The quality of proposals is high and object coverage is good given a large number of proposals (1000 – 10000). RP is the fastest of the batch (<1 second per image) and is a promising pre-processing step for detection.

So what’s the best approach? It depends on the constraints and application. If speed is critical the only candidates are Objectness, Selective Search or RP.  For high quality segments CPMC or Object Proposals are best, the bounding boxes returned by RP also appear promising. For object detection, recall is critical and thus generating thousands of candidates with Selective Search or RP is likely the best bet. For domains where a more aggressive pruning of windows is necessary, for example, weakly supervised or unsupervised learning, Objectness or CPMC are the most promising candidates. Overall, which object proposal method is best suited depends on the target application and computational constraints.

One interesting observation is that all five algorithms described above utilize either gPb or F&H for input edges or superpixels. The quality and speed of the input edges detectors help determine the speed of the resulting proposals and their adherence to object boundaries. gPb is accurate but slow (multiple minutes per image) while F&H is fairly fast but of lower quality. So, this seems to be a perfect spot to drop a shameless plug for our own work: our recent edge detector presented at ICCV runs in real time (30 fps) and achieves edges of similar quality to gPb (and even somewhat higher). Read more here. 🙂


I expect that after its recent successes object proposal generation will continue to receive strong interest from the community. I hope to see the development of approaches that are faster, have higher recall with fewer proposals, and better adhere to object boundaries. Downstream it will be interesting to see more algorithms take advantage of the segmentation masks associated with the proposals (currently most but not all detection approaches discard the segmentation masks). And of course I have to wonder, will we experience yet another paradigm shift in segmentation? Let’s see where we end up…

Below is a list of additional papers on generating object proposals. If I missed anything relevant please email me or leave a comment and I’ll add a link!


4 Responses to “Generating Object Proposals”

  1. Tomasz Malisiewicz December 22, 2013 at 10:18 pm #

    Hi Piotr,
    Thanks again for the insightful blog post. A part of me still earnestly believes in segmentation-driven object detection, but I have been very disappointed with the times it takes to run typical image segmentation algorithms. The main reason I am excited about your ICCV 2013, “Structured Forests for Fast Edge Detection” paper is the claimed 30fps. If you could do a little bit of extra work to get a pool of candidate segments, I would love to see a fast object proposal algorithm based on your edge detection approach. (Graduate students reading this blog post, you have just been given a paper idea)

    Generating an edge map at 30fps might not be enough. If the proposal generation algorithm is expensive or if the per-segment feature computation is slow, the entire object detection pipeline will still be slow. Do you have any ideas on how the underlying computations in your ICCV 2013 paper could be used as features for an object detector? In other words, maybe the work you are doing to get the edges could also benefit the learning algorithm used in detection.

    Keep the posts a comin’.

    –Tomasz Malisiewicz

  2. bbabenko (@bbabenko) December 26, 2013 at 12:41 am #

    The practical side of me is pretty surprised that segmentation is turning out to be useful, but the geekier side of me is excited to see all these different things come together 🙂

    @Tomasz: the features Piotr et al use for the edge detector are pretty much the same as ones they use for pedestrian detection (, with one important difference: for object detection one needs to compute features at many scales, whereas for edge detection it’s usually one or maybe a couple scales. Even with approximation tricks, you’ll end up needing additional feature computation on top of what you need for edge detection alone. Still though, Piotr has gotten feature computation to be blazing fast, and he didn’t even tap into GPUs… I don’t have a good sense of how big of a bottleneck the proposal generation algorithms are, but if they can be made fast, I think this whole pipeline would be pretty efficient.

  3. Jimei Yang March 6, 2014 at 9:17 pm #

    Hi Piotr,

    Thanks for this nice intro to object proposals. My paper is also related to this topic, but for generating class-specific segmentation proposals using exemplars. J. Yang, Y.-H. Tsai and M.-H. Yang, “Exemplar Cut”, ICCV13. You may find this one fit into your list 🙂



  1. Evaluating Object Proposals | pdollar - November 18, 2014

    […] popular pre-processing step for object detection. Some time ago I surveyed the state-of-the-art in generating object proposals, and the field has seen a lot of activity since that time! In fact, Larry Zitnick and I proposed […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: