An innovative approach for image-completion software
By Jonathan Damery, ECE ILLINOIS
June 23, 2014
- ECE graduate student Jia-Bin Huang and professor Narendra Ahuja collaborated with Microsoft Research to demonstrate an improved image-completion algorithm.
- To complete an image gap, their approach identifies planes and patterns within the image and then automatically scales content to match.
- Before the research is presented at the SIGGRAPH conference next August, the researchers plan to release the source code, allowing others to test the algorithm.
With image-editing software like Adobe Photoshop, adding a missing face into a group photo is rather simple. It requires a source image of the person in the correct pose, but otherwise, with a few clicks of the selection tool, the person can be copied, pasted, repositioned, and presto, the combination is done. No problem. Removals, however—perhaps omitting a photobomber from the same image—are much less straightforward. Content must be created; a gap must be filled.
Now researchers from ECE ILLINOIS and Microsoft have demonstrated an approach that streamlines this task—automatically filling the erased space with content based on patterns and planes within the image. While other photo-mending methods exist, this new technique demonstrates striking improvements.
Clockwise from upper left: Original showing the region to be removed in red, the Illinois and Microsoft results, two other methods below. (Original image by Flickr user micromegas.)
On his tablet, Huang displayed one of the example images included in the paper: a courtyard-like space with a prominent sculpture in the center. He explained that their approach first identifies vanishing points in the image—up to three—and based on those points, the algorithm determines the visual planes: in this case, the two walls that converge in the courtyard.
“Given each plane, you detect how things are repeated,” Huang said. “It’s very simple.”
The courtyard walls are five stories tall (at least that much is visible) with repeating patterns of windows and stones on the facade. When the researchers selected and deleted the sculpture in the foreground, along with some peripheral tree boughs, the algorithm then kicked into play, automatically continuing the pattern, rendering the windows for each floor at the correct scale and orientation.
On Huang’s website
, viewers can click through over 80 images and compare the originals to the team’s edited versions, as well as to the results from five additional commercial or academic image-editing approaches. The other approaches, which have been considered state-of-the-art, use low-level techniques, meaning that the algorithms cannot account for content. Instead they extract samples from the surrounding image and meld them into the deleted space, without regard to accurate perspective.
“I think the significant thing about this work, why this work was accepted is that ... it’s a step forward using the mid-level structure,” Huang said. “We take a higher perspective.”
For the courtyard image, the other approaches result in cockeyed windows, and the corner where the two walls converge was blurred in each case, as though the image was printed in wet ink and then smudged. A casual glance is enough to realize something is amiss, but the image from Huang’s team requires much more scrutiny.
Their approach works particularly well for architectural images, where there are defined planes, like the walls of the courtyard. If the planes are less clear, as with landscape photos, the technique reverts to an existing image-completion algorithm.
Clockwise from upper left: Original showing the region to be removed in red, the Illinois and Microsoft results, two other methods. (Original image by Flickr user addictive_picasso.)
With any method, however, replacing large regions is difficult. One example shows a 14-story building, mid-demolition, with the exterior of the lower floors removed. Huang and his collaborators then selected and deleted those floors, causing the algorithm to render the windows downward, as though recompleting the building. This almost worked except that the corner of the building skewed to one side. This was included as a failed example in the paper, yet, compared to the other methods, the result is quite close.
“I like this because it’s very different from the input,” Huang said of the image. “Although you can see that there is some error, this error is much easier to to fix.”
Before the SIGGRAPH conference in August, Huang and his collaborators plan to release the source code for their approach, allowing others to run images through the procedure and make potential modifications to the code. At that point, the team’s work could be incorporated into existing photo-editing software, both free and commercial.
One of the last images Huang displayed was of a lab partner, standing in front of the Taj Mahal with his wife. It was their honeymoon trip. In the background, sidewalks flanked the long reflecting pool leading to the iconic mausoleum. Those sidewalks were streaming with tourists. It looked busy—too congested.
“I told him to send me this picture. I can help you,” Huang said, and almost like magic—Huang clicked to the edited image—the other tourists were gone.
From top: Original showing the region to be removed in red, a GIF animation showing the Illinois and Microsoft image-completion process. (Original image by Flickr user agile_dore.)
Editor's note: media inquiries should be directed to Brad Petersen, Director of
Communications, at firstname.lastname@example.org or (217) 244-6376.