Results 1 - 10
of
136
Video Snapcut: Robust Video Object Cutout Using Localized Classifiers
- ACM Trans. Graphics
"... Figure 1: We propose a video object cutout system that performs consistently well on a variety of examples which are difficult for previous approaches. See Section 6 and the accompanying video for more comparisons and results. Original videos in (d) and (f) courtesy of Artbeats. Although tremendous ..."
Abstract
-
Cited by 126 (6 self)
- Add to MetaCart
Figure 1: We propose a video object cutout system that performs consistently well on a variety of examples which are difficult for previous approaches. See Section 6 and the accompanying video for more comparisons and results. Original videos in (d) and (f) courtesy of Artbeats. Although tremendous success has been achieved for interactive object cutout in still images, accurately extracting dynamic objects in video remains a very challenging problem. Previous video cutout systems present two major limitations: (1) reliance on global statistics, thus lacking the ability to deal with complex and diverse scenes; and (2) treating segmentation as a global optimization, thus lacking a practical workflow that can guarantee the convergence of the systems to the desired results. We present Video SnapCut, a robust video object cutout system that significantly advances the state-of-the-art. In our system segmentation is achieved by the collaboration of a set of local classifiers, each adaptively integrating multiple local image features. We show how this segmentation paradigm naturally supports local user editing and propagates them across time. The object cutout system is completed with a novel coherent video matting technique. A comprehensive evaluation and comparison is presented, demonstrating the effectiveness of the proposed system at achieving high quality results, as well as the robustness of the system against various types of inputs. 1
Evolving feature selection
- Tuv E., Peng H., Ding C., Long F., Berens M., Parsons L., Zhao Z., Yu L., Forman G
"... two-hybrid junk sequences contain ..."
Sketch2photo: internet image montage
- ACM SIGGRAPH Asia
, 2009
"... Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered onlin ..."
Abstract
-
Cited by 76 (23 self)
- Add to MetaCart
Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered online images used during composition are shown in (d). We present a system that composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet. Although online image search generates many inappropriate results, our system is able to automatically select suitable photographs to generate a high quality composition, using a filtering scheme to exclude undesirable images. We also provide a novel image blending algorithm to allow seamless image composition. Each blending result is given a numeric score, allowing us to find an optimal combination of discovered images. Experimental results show the method is very successful; we also evaluate our system using the results from two user studies. 1
TVSeg -- interactive total variation based image segmentation
- IN: BRITISH MACHINE VISION CONFERENCE (BMVC
, 2008
"... Interactive object extraction is an important part in any image editing software. We present a two step segmentation algorithm that first obtains a binary segmentation and then applies matting on the border regions to obtain a smooth alpha channel. The proposed segmentation algorithm is based on the ..."
Abstract
-
Cited by 55 (17 self)
- Add to MetaCart
(Show Context)
Interactive object extraction is an important part in any image editing software. We present a two step segmentation algorithm that first obtains a binary segmentation and then applies matting on the border regions to obtain a smooth alpha channel. The proposed segmentation algorithm is based on the minimization of the Geodesic Active Contour energy. A fast Total Variation minimization algorithm is used to find the globally optimal solution. We show how user interaction can be incorporated and outline an efficient way to exploit color information. A novel matting approach, based on energy minimization, is presented. Experimental evaluations are discussed, and the algorithm is compared to state of the art object extraction algorithms. The GPU based binaries are available online.
User-assisted intrinsic images
- ACM TRANSACTIONS ON GRAPHICS (SIGGRAPH ASIA
, 2009
"... For many computational photography applications, the lighting and materials in the scene are critical pieces of information. We seek to obtain intrinsic images, which decompose a photo into the product of an illumination component that represents lighting effects and a reflectance component that i ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
For many computational photography applications, the lighting and materials in the scene are critical pieces of information. We seek to obtain intrinsic images, which decompose a photo into the product of an illumination component that represents lighting effects and a reflectance component that is the color of the observed material. This is an under-constrained problem and automatic methods are challenged by complex natural images. We describe a new approach that enables users to guide an optimization with simple indications such as regions of constant reflectance or illumination. Based on a simple assumption on local reflectance distributions, we derive a new propagation energy that enables a closed form solution using linear least-squares. We achieve fast performance by introducing a novel downsampling that preserves local color distributions. We demonstrate intrinsic image decomposition on a variety of images and show applications.
RepFinder: Finding Approximately Repeated Scene Elements for Image Editing
"... Figure 1: Repeated element detection and manipulation. (Left-to-right) Original image with user scribbles to indicate an object template (red) and background (green); repeated instances detected, completed, dense correspondence established, and ordered in layers; fish in the original image replaced ..."
Abstract
-
Cited by 35 (20 self)
- Add to MetaCart
Figure 1: Repeated element detection and manipulation. (Left-to-right) Original image with user scribbles to indicate an object template (red) and background (green); repeated instances detected, completed, dense correspondence established, and ordered in layers; fish in the original image replaced by a different kind of fish from a reference image (top-right inset); rearranged fishes. Repeated elements are ubiquitous and abundant in both manmade and natural scenes. Editing such images while preserving the repetitions and their relations is nontrivial due to overlap, missing parts, deformation across instances, illumination variation, etc. Manually enforcing such relations is laborious and error-prone. We propose a novel framework where user scribbles are used to guide detection and extraction of such repeated elements. Our detection process, which is based on a novel boundary band method, robustly extracts the repetitions along with their deformations. The algorithm only considers the shape of the elements, and ignores similarity based on color, texture, etc. We then use topological sorting to establish a partial depth ordering of overlapping repeated instances. Missing parts on occluded instances are completed using information from other instances. The extracted repeated instances can then be seamlessly edited and manipulated for a variety of high level tasks that are otherwise difficult to perform. We demonstrate the versatility of our framework on a large set of inputs of varying complexity, showing applications to image rearrangement, edit transfer, deformation propagation, and instance replacement. image editing, shape-aware manipulation, edit propa-Keywords: gation
Strong supervision from weak annotation: Interactive training of deformable part models
- ICCV
"... We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semiautomate the labeling of a new example) and online learning (where a newly labeled example is used to update the current model para ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
(Show Context)
We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semiautomate the labeling of a new example) and online learning (where a newly labeled example is used to update the current model parameters). This framework is scalable to large datasets and complex image models and is shown to have excellent theoretical and practical properties in terms of train time, optimality guarantees, and bounds on the amount of annotation effort per image. We apply this framework to part-based detection, and introduce a novel algorithm for interactive labeling of deformable part models. The labeling tool updates and displays in real-time the maximum likelihood location of all parts as the user clicks and drags the location of one or more parts. We demonstrate that the system can be used to efficiently and robustly train part and pose detectors on the CUB Birds-200–a challenging dataset of birds in unconstrained pose and environment. 1.
Invertible motion blur in video
- ACM Trans. Graph
, 2009
"... Figure 1: By simply varying the exposure time for video frames, multi-image deblurring can be made invertible. (Left) Varying exposure photos of a moving car. Notice the change in illumination and the blur size in the captured photos. (Right) The foreground object is automatically rectified, segment ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Figure 1: By simply varying the exposure time for video frames, multi-image deblurring can be made invertible. (Left) Varying exposure photos of a moving car. Notice the change in illumination and the blur size in the captured photos. (Right) The foreground object is automatically rectified, segmented, deblurred, and composed onto the background using the varying exposure video. Novel renderings, such as motion streaks, can be generated by linear combination of the deblurred image and the captured photos. We show that motion blur in successive video frames is invertible even if the point-spread function (PSF) due to motion smear in a single photo is non-invertible. Blurred photos exhibit nulls in the frequency transform of the PSF, leading to an ill-posed deconvolution. Hardware solutions to avoid this require specialized devices such as the coded exposure camera or accelerating sensor motion. We employ ordinary video cameras and introduce the notion of null-filling along with joint-invertibility of multiple blur-functions. The key idea is to record the same object with varying PSF’s, so that the nulls in the frequency component of one frame can be filled by other frames. The combined frequency transform becomes nullfree, making deblurring well-posed. We achieve jointly-invertible blur simply by changing the exposure time of successive frames. We address the problem of automatic deblurring of objects moving with constant velocity by solving its critical components: preservation of all spatial frequencies, segmentation and motion estimation of moving parts, and non-degradation of the static parts of the scene. We demonstrate several challenging cases of object motion blur including textured backgrounds and partial occluders.
Extracting Depth and Matte using a Color-Filtered Aperture
"... This paper presents a method for automatically extracting a scene depth map and the alpha matte of a foreground object by capturing a scene through RGB color filters placed in the camera lens aperture. By dividing the aperture into three regions through which only light in one of the RGB color bands ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This paper presents a method for automatically extracting a scene depth map and the alpha matte of a foreground object by capturing a scene through RGB color filters placed in the camera lens aperture. By dividing the aperture into three regions through which only light in one of the RGB color bands can pass, we can acquire three shifted views of a scene in the RGB planes of an image in a single exposure. In other words, a captured image has depth-dependent color misalignment. We develop a color alignment measure to estimate disparities between the RGB planes for depth reconstruction. We also exploit color misalignment cues in our matting algorithm in order to disambiguate between the foreground and background regions even where their colors are similar. Based on the extracted depth and matte, the color misalignment in the captured image can be canceled, and various image editing operations can be applied to the reconstructed image, including novel view synthesis, postexposure refocusing, and composition over different backgrounds.
Improving depth perception with motion parallax and its application in teleconferencing
, 2009
"... Abstract—Depth perception, or 3D perception, can add a lot to the feeling of immersiveness in many applications such as 3D TV, 3D teleconferencing, etc. Stereopsis and motion parallax are two of the most important cues for depth perception. Most of the 3D displays today rely on stereopsis to create ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Depth perception, or 3D perception, can add a lot to the feeling of immersiveness in many applications such as 3D TV, 3D teleconferencing, etc. Stereopsis and motion parallax are two of the most important cues for depth perception. Most of the 3D displays today rely on stereopsis to create 3D perception. In this paper, we propose to improve user’s depth perception by tracking their motions and creating motion parallax for the rendered image, which can be done even with legacy displays. Two enabling technologies, face tracking and foreground/background segmentation, are discussed in detail. In particular, we propose an efficient and robust feature based face tracking algorithm that is capable of estimating the face’s location and scale accurately. We also propose a novel foreground/background segmentation and matting algorithm with time-of-flight camera, which is robust to moving background, lighting variations, moving camera, etc. We demonstrate the application of the above technologies in teleconferencing on legacy displays to create pseudo-3D effects. I.