COMPUTER VISION READING
GROUP
|
If you want to join the reading group, please click here.
May 23, 2012 [Wednesday]:
Title: Efficient Model-based 3D Tracking of Hand Articulations using Kinect Presenter: Marijn Koetzier Time: 13.00-14.00 (BBL-445) Info: pdf | youtube 1 | youtube 2 | website Abstract: We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance and 3D structure of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and robust 3D tracking of hand articulations can be achieved in near real-time.
May 16, 2012 [Wednesday]:
Title: On Missing Data Treatment for Degraded Video and Film Archives Presenter: Remco Gubbels Time: 13.00-14.00 (BBL-445) Info: pdf Abstract: Image sequence restoration has been steadily gaining in importance with the increasing prevalence of visual digital media. The demand for content increases the pressure on archives to automate their restoration activities for preservation of the cultural heritage that they hold. There are many defects that affect archived visual material and one central issue is that of Dirt and Sparkle, or “Blotches.” Research in archive restoration has been conducted for more than a decade and this paper places that material in context to highlight the advances made during that time. The paper also presents a new and simpler Bayesian framework that achieves joint processing of noise, missing data, and occlusion.
May 9, 2012 [Wednesday]:
Title: Material Recognition: Exploring Features in a Bayesian Framework Presenter: Pascal Mettes Time: 13.00-14.00 (BBL-445) Info: pdf | website Abstract: We are interested in identifying the material category, e.g. glass, metal, fabric, plastic or wood, from a single image of a surface. Unlike other visual recognition tasks in computer vision, it is difficult to find good, reliable features that can tell material categories apart. Our strategy is to use a rich set of low and mid-level features that capture various aspects of material appearance. We propose an augmented Latent Dirichlet Allocation (aLDA) model to combine these features under a Bayesian generative framework and learn an optimal combination of features. Experimental results show that our system performs material recognition reasonably well on a challenging material database, outperforming state-of-the-art material/texture recognition systems.
May 2, 2012 [Wednesday]:
Title: Image Inpainting Presenter: Sander van der Ven Time: 13.00-14.00 (BBL-445) Info: pdf 1 | pdf 2 Abstract: A new algorithm is proposed for removing large objects from digital images. The challenge is to fill in the hole that is left behind in a visually plausible way. In the past, this problem has been addressed by two classes of algorithms: (i) “texture synthesis” algorithms for generating large image regions from sample textures, and (ii) “inpainting” techniques for filling in small image gaps. The former has been demonstrated for “textures” – repeating two-dimensional patterns with some stochasticity; the latter focus on linear “structures” which can be thought of as one-dimensional patterns, such as lines and object contours. This paper presents a novel and efficient algorithm that combines the advantages of these two approaches. We first note that exemplar-based texture synthesis contains the essential process required to replicate both texture and structure; the success of structure propagation, however, is highly dependent on the order in which the filling proceeds. We propose a best-first algorithm in which the confidence in the synthesized pixel values is propagated in a manner similar to the propagation of information in inpainting. The actual colour values are computed using exemplar-based synthesis. In this paper the simultaneous propagation of texture and structure information is achieved by a single, efficient algorithm. Computational efficiency is achieved by a block-based sampling process. A number of examples on real and synthetic images demonstrate the effectiveness of our algorithm in removing large occluding objects as well as thin scratches. Robustness with respect to the shape of the manually selected target region is also demonstrated. Our results compare favorably to those obtained by existing techniques.
April 25, 2012 [Wednesday]:
Title: Fluid Simulation: a Literature Review Presenter: Charis Kontaxis Time: 13.00-14.00 (BBL-445) Info: report | a book summary Abstract: The goal of this literature review is to understand how general fluid simulation works and how more advanced concepts can be added to it so we can have more realistic representations of real fluids. Firstly, the grid necessary to represent fluid's motion in a program and the different simulation viewpoints, namely the Eulerian and the Lagrangian, were studied and, together, explained one the most controversial parts of the simulation loop: the velocity advection. Moreover, the pressure projection and its linear system solver, were studied so that the most time consuming part of the simulation, can be efficiently programmed. The concept of level sets can be integrated in the simulation resulting in a well-defined and highly detailed interface between two neighboring fluids. In addition, many different hybrid particle methods were studied, from particles that label a cell as fluid or not, to particles that along with the level sets produce more realistic fluid surfaces and nice effects(e.g. spray and foam in the case of water)
April 18, 2012 [Wednesday]:
Title: Optic Flow: (1) Local and Global Approaches (2) Modeling Temporal Coherence Presenter: Robby T. Tan Time: 13.00-14.00 (BBL-445) Info: PDF 1 | PDF 2 | PDF 3 Abstract: - Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods Differential methods belong to the most widely used techniques for optic flow computation in image sequences. They can be classified into local methods such as the Lucas–Kanade technique, and into global methods such as the Horn-Schunck approach and its extensions. Often local methods are more robust under noise, while global techniques yield dense flow fields.
- Modeling Temporal Coherence for Optical Flow: Despite the fact that temporal coherence is undeniably one of the key aspects when processing video data, this concept has hardly been exploited in recent optical flow methods. In this paper, we will present a novel parametrization for multi-frame optical flow computation that naturally enables us to embed the assumption of a temporally coherent spatial flow structure, as well as the assumption that the optical flow is smooth along motion trajectories.
- Optic Flow in Harmony:Most variational optic flow approaches just consist of three constituents: a data term, a smoothness term and a smoothness weight. In this paper, we present an approach that harmonises these three components.
April 11, 2012 [Wednesday]:
Title: Research Topics at Noldus (Internship 2012) Presenter: Nico van de Aa Time: 13.00-14.00 (BBL-445) Info: pdf 1 | pdf 2 | pdf 3 | pdf 4 Abstract: - Heart rate frequency and heart rate variability in larval and adult zebrafish are important parameters for cardiac functioning and are associated with cardio-toxicity in human beings. Therefore, these animals are popular for revealing cardio-toxic and neuro-toxic effects of pharmacological compounds in drug discovery. To facilitate such studies, Noldus InnovationWorks aims to develop a system to automatically measure heart rate frequency and heart rate variability in zebrafish larvae. This project involves the detection of heart rate frequency and variability of zebrafish larvae in multiple steps of complexity, starting with a single fixed larva, multiple fixed larvae, single free swimming larva and multiple free swimming larvae.
- Video observation of people is a non-intrusive way to measure people’s behavior. To identify human activity automatically, detection and tracking of people in camera views is not sufficient. It is hard to classify a set of video frames as a person drinking a glass of milk and even more challenging to distinguish drinking from eating behavior. Therefore, Noldus InnovationWorks (www.noldus.com) aims to develop a general system to automatically classify some user defined actions from video streams. This project involves the selection of feasible features to represent actions the user is interested in. A pattern recognition algorithm must be developed to classify these features to find the actions.
- Video forms a non-intrusive way to study how people behave. If multiple cameras are available, the views can be combined to capture depth information about the scene next to the multiple angles of view. To translate the 3D world to the camera views, the cameras need to be calibrated or, in other words, know how the image is formed and where the camera is in the 3D world. Currently, the cameras are manually calibrated once using a checkerboard. To follow a certain (body part of a) person with multiple moving cameras, Noldus InnovationWorks (www.noldus.com) aims to develop a (near) real-time way of calibrating the cameras without using any calibration object. This project involves feature point detection in each camera view and matching those feature points among images. Once the points are matched correctly, one can obtain the camera calibration parameters. Preferably, this processing time is (near) real-time.
- Non-intrusive measurements people or animal behavior is important to many fields of research in the life sciences. Noldus InnovationWorks (www.noldus.com) tries to create state-of-the-art advances in computer vision and behavior recognition on the cutting edge between science and application. The Kinect camera, introduced by Microsoft for the Xbox game console, introduces a new and cheap alternative for non-intrusive analysis of both humans and animals. Compared to a standard camera, the Kinect camera offers 3D depth information of the scene. The accompanying SDK already provides several features such as simple pose estimation of a human being.
April 4, 2012 [Wednesday]:
Title: Video-based Fog Removal Presenter: Yuri Parijs Time: 13.00-14.00 (BBL-445) Info: Contrast-based Fog Removal Algorithm | Tarel et al.'s Algorithm Abstract: Visibility enhancement in bad weather is important in many applications, including in decreasing road accidents. Current single image visibility enhancement or specifically fog removal methods are capable of increasing the visibility of a fog plagued image. In this master thesis project we attempt to improve a single image method by using tracking information obtained from a video using SIFT-Flow. Before starting on enhancing visibility in video we analyse and compare two often cited papers in the field of single image visibility enhancement in bad weather. From the comparison of the two methods we conclude that Tan’s method works best for foggy images and Tarel et al’s method is better at images containing haze. Our method of choice for visibility enhancement in video is Tarel et al’s method, this choice is based on the analysis of the single image methods. The method is fast and should benefit more from additional data obtained from video as it has difficulties with correctly estimating the atmospheric veil for white objects. The atmospheric veil is based on the whiteness of the image, where the whiter an object is the further it is estimated to be. Using the tracking data from SIFT Flow we try to detect wrongly estimated objects and correct the atmospheric veil for these objects, we focus on finding white objects that are close to the observer.
March 28, 2012 [Wednesday]:
Title: Layered Surfaces due to Subsurface Scattering Presenter: Jeffrey Lemein Time: 13.00-14.00 (BBL-445) Info: PDF | Website Abstract: The reflection of light from most materials consists of two major terms: the specular and the diffuse. Specular reflection may be modeled from first principles by considering a rough surface consisting of perfect reflectors, or micro-facets. Diffuse reflection is generally considered to result from multiple scattering either from a rough surface or from within a layer near the surface. Accounting for diffuse reflection by Lambert's Cosine Law, as is universally done in computer graphics, is not a physical theory based on first principles. This paper presents a model for subsurface scattering in layered surfaces in terms of one-dimensional linear transport theory. We derive explicit formulas for backscattering and transmission that can be directly incorporated in most rendering systems, and a general Monte Carlo method that is easily added to a ray tracer. This model is particularly appropriate for common layered materials appearing in nature, such as biological tissues (e.g. skin, leaves, etc.) or inorganic materials (e.g. snow, sand, paint, varnished or dusty surfaces). As an application of the model, we simulate the appearance of a face and a cluster of leaves from experimental data describing their layer properties.
March 14, 2012 [Wednesday]:
Title: Gradient Response Maps for Real-Time Detection of Texture-Less Objects Presenter: Tim de Jager Time: 13.00-14.00 (BBL-445) Info: website | PDF | source code Abstract: We present a method for real-time 3D object instance detection that does not require a time consuming training stage, and can handle untextured objects. At its core, our approach is a novel image representation for template matching designed to be robust to small image transformations. This robustness is based on spread image gradient orientations and allows us to test only a small subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. In addition, we demonstrate that if a dense depth sensor is available we can extend our approach for an even better performance taking also 3D surface normal orientations into account. We show how to take advantage of the architecture of modern computers to build an efficient but very discriminant representation of the input images that can be used to consider thousands of templates in real-time. We demonstrate in many experiments on real data that our method is much faster and more robust with respect to background clutter than current state-of-the-art methods.
March 7, 2012 [Wednesday]:
Title: Motion Capture Using Joint Skeleton Tracking and Surface Estimation Presenter: Jeffrey Resodikromo Time: 13.00-14.00 (BBL-445) Info: website | PDF Abstract: This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence. Given an articulated template model and silhouettes from a multi-view image sequence, our approach recovers not only the movement of the skeleton, but also the possibly non-rigid temporal deformation of the 3D surface. While large scale deformations or fast movements are captured by the skeleton pose and approximate surface skinning, true small scale deformations or non-rigid garment motion are captured by fitting the surface to the silhou ette. We further propose a novel optimization scheme for skeleton-based pose estimation that exploits the skeleton’s tree structure to split the optimization problem into a local one and a lower dimensional global one. We show on various sequences that our approach can capture the 3D motion of animals and humans accurately even in the case of rapid movements and wide apparel like skirts.
February 22, 2012 [Wednesday]:
No reading group this week.
February 15, 2012 [Wednesday]:
Title: Research and Project Overview Presenter: Robby Tan Time: 13.00-14.00 Location: BBL-445
October 02, 2011 [Wednesday]:
Title: Decomposing Layered Surfaces Presenter: Robby Tan Time: 11.00-12.00 (BBL-445) Slide: Abstract: Many object surfaces are composed of layers of different physical substances, known as layered surfaces. These surfaces, such as patinas, water colors, and wall paintings, have more complex optical properties than diffuse surfaces. Although the characteristics of layered surfaces, like layer opacity, mixture of colors, and color gradations, are significant, they are usually ignored in the analysis of many methods in computer vision, causing inaccurate or even erroneous results. Therefore, the main goals of this paper are twofold: to solve problems of layered surfaces by focusing mainly on surfaces with two layers (i.e., top and bottom layers), and to introduce a decomposition method based on a novel representation of a nonlinear correlation in the color space that we call the “spider” model. When we plot a mixture of colors of one bottom layer and n different top layers into the RGB color space, then we will have n different curves intersecting at one point, resembling the shape of a spider. Hence, given a single input image containing one bottom layer and at least one top layer, we can fit their color distributions by using the spider model and then decompose those layered surfaces. The last step is equivalent to extracting the approximated optical properties of the two layers: the top layer’s opacity, and the top and bottom layers’ reflections. Experiments with real images, which include the photographs of ancient wall paintings, show the effectiveness of our method.
October 26, 2011 [Wednesday]:
Title: Loose-limbed People: Pose Estimation and Tracking Presenter: Robby Tan Time: 11.00-12.00 (BBL-445) Slide: pdf Abstract: We pose the problem of 3D human tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected limbs. Conditional probabilities relating the 3D pose of connected limbs are learned from motion captured training data. Similarly, we learn probabilistic models for the temporal evolution of each limb (forward and backward in time). Human pose and motion estimation is then solved with non-parametric belief propagation using a variation of particle filtering that can be applied over a general loopy graph. The loose-limbed model and decentralized graph structure facilitate the use of low-level vi- sual cues. We adopt simple limb and head detectors to provide “bottom-up” information that is incorporated into the inference process at every time-step; these detectors permit automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking a walking person in video imagery using four calibrated cameras. Our experimental apparatus includes a marker-based motion capture system aligned with the coordinate frame of the calibrated cameras with which we quan- titatively evaluate the accuracy of our 3D person tracker.
October 12, 2011 [Wednesday]:
Title: Automatic Rigging and Animation of 3D Characters Presenter: Tim de Jager Time: 11.00-12.00 (BBL-445) Slide: PPT Paper: webpage Abstract: Animating an articulated 3D character currently requires manual rigging to specify its internal skeletal structure and to define how the input motion deforms its surface. We present a method for ani- mating characters automatically. Given a static character mesh and a generic skeleton, our method adapts the skeleton to the character and attaches it to the surface, allowing skeletal motion data to an- imate the character. Because a single skeleton can be used with a wide range of characters, our method, in conjunction with a library of motions for a few skeletons, enables a user-friendly animation system for novices and children. Our prototype implementation, called Pinocchio, typically takes under a minute to rig a character on a modern midrange PC.
Questions:
October 05, 2011 [Wednesday]:
Title: Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Presenter: Frank Evers Time: 11.00-12.00 (BBL-445) Slide: Link: webpage Abstract: Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary, and proposes a generic approach based on the pictorial structures framework. We show that the right selection of components for both appearance and spatial modeling is crucial for general applicability and overall performance of the model. The appearance of body parts is modeled using densely sampled shape context descriptors and discriminatively trained AdaBoost classifiers. Furthermore, we interpret the normalized margin of each classifier as likelihood in a generative model. Non-Gaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between parts. The marginal posterior of each part is inferred using belief propagation. We demonstrate that such a model is equally suitable for both detection and pose estimation tasks, outperforming the state of the art on three recently proposed datasets.
Questions:- Is it wise to make part specific classifiers ignoring maybe important features of the body?
- How should this method cope with symmetry in legs and arms? (this is captured in the kinematic model now)
September 28, 2011 [Wednesday]:
Title: Cost-Sensitive Active Visual Category Learning Presenter: Diederik Roijers Time: 11.00-12.00 Slide: PDF Abstract: We present an active learning framework that pre- dicts the tradeoff between the effort and information gain associated with a candidate image annotation, thereby ranking unlabeled and partially labeled images according to their expected “net worth” to an object recognition system. We develop a multi-label multiple-instance approach that accom- modates realistic images containing multiple objects and allows the category-learner to strategically choose what anno- tations it receives from a mixture of strong and weak labels. Since the annotation cost can vary depending on an image’s complexity, we show how to improve the active selection by directly predicting the time required to segment an unlabeled image. Our approach accounts for the fact that the optimal use of manual effort may call for a combination of labels at multiple levels of granularity, as well as accurate prediction of manual effort. As a result, it is possible to learn more accurate category models with a lower total expenditure of annotation effort. Given a small initial pool of labeled data, the proposed method actively improves the category models with minimal manual intervention.
Questions:- Do you think the authors succeeded in their objectives?
- Are there still elements of human vision in the field of object classification that are not incorporated in this computer vision approach?
- Do you see any possible improvements to this algorithm/type of learning?
- Do you see other interesting possible applications of active learning in computer vision?
September 14, 2011 [Wednesday]:
Title: Views on Image-Based Material Editing Presenter: Thijs Zumbrink Time: 11.00-12.00 Slide: Paper: Report Abstract: This report describes the theory, implementation and evaluation of Image- Based Material editing, a technique that alters appearance of objects in images by substituting the material for another, synthetic material. Following Khan et al’s method, we estimate the 3D shape of the object of interest in an input image, based on shading information. We proceed by inpainting the image, practically erasing the object. This will lead to an illumination map that is used to render the object with a new material and approximate environmental reflections. The theory and implementation of three techniques are explained: Shape from Shading, Image Inpainting and Rendering. Each technique has been experimented with, and the evaluations are discussed. The results of the process are unsatisfactory, caused by a poor result in one of the subsystems. However, the results of the other subsystems are as expected.
June 29, 2011 [Wednesday]:
Title: Blind Separation of Superimposed Moving Images Using Image Statistics (PAMI 2011) Presenter: Andreea Barack Time: 13.00-14.00 Slide: Link: Video and Code Abstract: We address the problem of blind separation of multiple source layers from linear mixtures thereof, involving unknown linear mixing coefficients and unknown motions of layers in each mixture. Such mixtures can be caused in photography by the presence of a transparent medium, like a window glass, when the camera or the medium moves between snapshots. To understand how to achieve correct separation, we study the statistics of natural images in the Labelme data set. We not only confirm the well-known sparsity of image gradients, but also discover new joint behavior patterns of image gradients. Based on these statistical properties, we develop a sparse blind separation algorithm to estimate both layer motions and linear mixing coefficients and then recover all layers. This method can handle general parameterized motions, including translations, scalings, rotations and other transformations. In addition, the number of layers is automatically identified, and all layers can be recovered even in the under-determined case where mixtures are fewer than layers. The effectiveness of this technology is shown in both simulated and real superimposed images.
June 22, 2011 [Wednesday]:
Title: Robust Treatment of Collisions, Contact and Friction for Cloth Animation (Siggraph 2005) Presenter: Zhenting Wu Time: 13.00-14.00 Slide: Abstract: We present an algorithm to efficiently and robustly process collisions, contact and friction in cloth simulation. It works with any technique for simulating the internal dynamics of the cloth, and allows true modeling of cloth thickness. We also show how our simulation data can be post-processed with a collision-aware subdivision scheme to produce smooth and interference free data for rendering.
June 15, 2011 [Wednesday]:
Title: Object Detection with Discriminatively Trained Part Based Models (PAMI 2010) Presenter: Coert van Gemeren Time: 13.00-14.00 Slide: Link: website Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin- sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI-SVM in terms of latent variables. A latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
June 8, 2011 [Wednesday]:
Title: Simultaneous surveillance camera calibration and foothead homology estimation from human detections Presenter: Frank Evers Time: 13.00-14.00 Slide: Slides Abstract: We propose a novel method for automatic camera calibration and foot-head homology estimation by observing persons standing at several positions in the camera field of view. We demonstrate that human body can be considered as a calibration target thus avoiding special calibration objects or manually established fiducial points. First, by assuming roughly parallel human poses we derive a new constraint which allows to formulate the calibration of internal and external camera parameters as a Quadratic Eigenvalue Problem. Secondly, we couple the calibration with an improved effective integral contour based human detector and use 3D projected models to capture a large variety of person and camera mutual positions. The resulting camera autocalibration method is very robust and efficient, and thus well suited for surveillance applications where the camera calibration process cannot use special calibration targets and must be simple.
May 25, 2011 [Wednesday]:
Title: A Line Segment Detector Presenter: Dennis Grootendorst Time: 13.00-14.00 Slide: Abstract: We propose a parameterless linear-time line segment detector that gives accurate results and a controlled number of false detections. This algorithm is tested and compared to state-of-the-art algorithms on a wide set of natural images.
May 18, 2011 [Wednesday]:
Title: Markerless Motion Capture of Interacting Characters Using Multi-view Image Segmentation (CVPR 2011) Presenter: Nico van der Aa Time: 13.00-14.00 Slide: Link: website | video Abstract: We present a markerless motion capture approach that reconstructs the skeletal motion and detailed time-varying surface geometry of two closely interacting people from multi-view video. Due to ambiguities in feature-to-person assignments and frequent occlusions, it is not feasible to di- rectly apply single-person capture approaches to the multi- person case. We therefore propose a combined image seg- mentation and tracking approach to overcome these diffi- culties. A new probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Thereafter, a single-person markerless motion and surface capture approach can be ap- plied to each individual, either one-by-one or in parallel, even under strong occlusions. We demonstrate the perfor- mance of our approach on several challenging multi-person motions, including dance and martial arts, and also provide a reference dataset for multi-person motion capture with ground truth.
May 11, 2011 [Wednesday]:
Title: Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction (CVPR 2010) Presenter: Rob Schuddeboom Time: 13.00-14.00 Slide: Paper: online lecture Abstract: Piecewise planar models for stereo have recently be- come popular for modeling indoor and urban outdoor scenes. The strong planarity assumption overcomes the challenges presented by poorly textured surfaces, and re- sults in low complexity 3D models for rendering, storage, and transmission. However, such a model performs poorly in the presence of non-planar objects, for example, bushes, trees, and other clutter present in many scenes. We present a stereo method capable of handling more general scenes containing both planar and non-planar regions. Our pro- posed technique segments an image into piecewise planar regions as well as regions labeled as non-planar. The non- planar regions are modeled by the results of a standard multi-view stereo algorithm. The segmentation is driven by multi-view photoconsistency as well as the result of a color- and texture-based classifier, learned from hand-labeled pla- nar and non-planar image regions. Additionally our method links and fuses plane hypotheses across multiple overlap- ping views, ensuring a consistent 3D reconstruction over an arbitrary number of images. Using our system, we have reconstructed thousands of frames of street-level video. Re- sults show our method successfully recovers piecewise pla- nar surfaces alongside general 3D surfaces in challeng- ing scenes containing large buildings as well as residential houses.
May 4, 2011 [Wednesday]: CANCELED
Title: Stereo Matching with Linear Superposition of Layers Presenter: Jonas Koperdraat Time: 13.00-14.00 Slide: Abstract: We address stereo matching in the presence of a class of non-Lambertian effects, where image formation can be modeled as the additive superposition of layers at different depths. The presence of such effects makes it impossible for traditional stereo vision algorithms to recover depths using direct color matching-based methods. We develop several techniques to estimate both depths and colors of the component layers. Depth hypotheses are enumerated in pairs, one from each layer, in a nested plane sweep. For each pair of depth hypotheses, matching is accomplished using spatial-temporal differencing. We then use graph cut optimization to solve for the depths of both layers. This is followed by an iterative color update algorithm which we proved to be convergent. Our algorithm recovers depth and color estimates for both synthetic and real image sequences
April 14, 2011 [Thursday]
Title: Secrets of Optical Flow Estimation and Their Principles Presenter: Tomas Hodan Time: 11.00-12.00 Slide: Link: author's related talk | matlab code Abstract: The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formula- tion, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made re- cent advances possible through a thorough analysis of how the objective function, the optimization method, and mod- ern implementation practices influence accuracy. We dis- cover that “classical” flow formulations perform surpris- ingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimiza- tion is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a non- local term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to in- clude information about flow and image boundaries we de- velop a method that ranks at the top of the Middlebury benchmark.
April 7, 2011 [Thursday]
Title: 3D People Tracking with Gaussian Process Dynamical Models Presenter: Reinier Noorda Time: 11.00-12.00 Slide: Link: video Abstract: We advocate the use of Gaussian Process Dynamical Models (GPDMs) for learning human pose and motion pri- ors for 3D people tracking. A GPDM provides a low- dimensional embedding of human motion data, with a den- sity function that gives higher probability to poses and motions close to the training data. With Bayesian model averaging a GPDM can be learned from relatively small amounts of data, and it generalizes gracefully to motions outside the training set. Here we modify the GPDM to per- mit learning from motions with significant stylistic varia- tion. The resulting priors are effective for tracking a range of human walking styles, despite weak and noisy image measurements and significant occlusions.
March 31, 2011 [Thursday]
Title: Gradient Space Manipulation and Shadow Removal Presenter: Bart Liefers [project page] Time: 11.00-12.00 Slide: Abstract: This paper is concerned with the derivation of a progression of shadow-free image representations. First we show that adopting certain assumptions about lights and cameras leads to a 1-d, grey-scale image representation which is illuminant invariant at each image pixel. We show that as a consequence, images represented in this form are shadow-free. We then extend this 1-d representation to an equivalent 2-d, chromaticity representation. We show that in this 2-d representation, it is possible to re-light all the image pixels in the same way, effectively deriving a 2-d image representation which is additionally shadow-free. Finally, we show how to recover a 3-d, full colour shadow-free image representation by first (with the help of the 2-d representation) identifying shadow edges. We then remove shadow edges from the edge-map of the original image by edge in-painting, and we propose a method to re-integrate this thresholded edge map, thus deriving the sought-after 3-d shadow-free image
March 24, 2011 [Thursday]
Title: TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context Presenter: Rick van Rooij Time: 11.00-12.00 Slide: author's PPT | Rick's PPT Link: project page | video 1 | video 2 Abstract: Object detection and pixel-wise scene labeling have both been active research areas in recent years and impressive results have been reported for both tasks separately. The integration of these different types of approaches should boost performance for both tasks as object detection can profit from powerful scene labeling and also pixel-wise scene labeling can profit from powerful object detection. Consequently, first approaches have been proposed that aim to integrate both object detection and scene labeling in one framework. This paper proposes a novel approach based on conditional random field (CRF) models that extends existing work by 1) formulating the integration as a joint labeling problem of object and scene classes and 2) by systematically integrating dynamic information for the object detection task as well as for the scene labeling task. As a result, the approach is applicable to highly dynamic scenes including both fast camera and object movements. Experiments show the applicability of the novel approach to challenging real-world video sequences and systematically analyze the contribution of different system components to the overall performance.
-
March 17, 2011 [Thursday]
Title: Shape from Shading: a well-posed problem? Presenter: Thijs Zumbrink [project page] Time: 11.00-12.00 Slide: Link: project page | author's website Abstract: Shape From Shading is known to be an ill-posed problem. Contrary to the previous work, we show here that if we model the problem in a more realistic way than it is usually done (we take into account the 1/r2 attenuation term of the lighting), Shape From Shading can be completely well-posed. Thus the shading allows to recover (almost) any surface from only one image (of this surface), without any additional data (in particular, without regularity assumptions and without the knowledge of the heights of the solution at the local "minima". More precisely, in this report we formulate the problem as that of solving a new PDE, we develop a complete mathematical study of this equation (existence and uniqueness of the solution) and we design a new provably convergent numerical method. Finally, we test our new SFS method on various synthetic images and on our database of real images of faces, with success.
-
March 10, 2011 [Thursday]
Title: Real-time Body Tracking Using a Gaussian Process Latent Variable Model Presenter: Xinghan Luo Time: 11.00-12.00 Slide: Link: website Abstract: In this paper, we present a tracking framework for cap- turing articulated human motions in real-time, without the need for attaching markers onto the subject’s body. This is achieved by first obtaining a low dimensional represen- tation of the training motion data, using a nonlinear di- mensionality reduction technique called back-constrained GPLVM. A prior dynamics model is then learnt from this low dimensional representation by partitioning the motion sequences into elementary movements using an unsuper- vised EM clustering algorithm. The temporal dependencies between these elementary movements are efficiently cap- tured by a Variable Length Markov Model. The learnt dy- namics model is used to bias the propagation of candidate pose feature vectors in the low dimensional space. By com- bining this with an efficient volumetric reconstruction al- gorithm, our framework can quickly evaluate each candi- date pose against image evidence captured from multiple views. We present results that show our system can ac- curately track complex structured activities such as ballet dancing in real-time.
-
March 3, 2011 [Thursday]
Title: Image-based Fog Removal Presenter: Yuri Parijs [project page] Time: 11.00-12.00 Slide: Link: Tarel et al's results | Robby Tan's results | He et al's results Abstract: One source of difficulties when processing outdoor images is the presence of haze, fog or smoke which fades the colors and reduces the contrast of the observed objects. We introduce a novel algorithm and variants for visibility restoration from a single image. The main advantage of the proposed algorithm compared with other is its speed: its complexity is a linear function of the number of image pixels only. This speed allows visibility restoration to be applied for the first time within real-time processing applications such as sign, lane-marking and obstacle detection from an in-vehicle camera. Another advantage is the possibility to handle both color images or gray level images since the ambiguity between the presence of fog and the objects with low color saturation is solved by assuming only small objects can have colors with low saturation. The algorithm is controlled only by a few parameters and consists in: atmospheric veil inference, image restoration and smoothing, tone mapping
-
February 24, 2011 [Thursday]
Title: Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection Presenter: Coert van Gemeren [project page] Time: 11.00-12.00 Slide: PDF Link: author's website Abstract: We propose Recursive Compositional Models (RCMs) for simultaneous multi-view multi-object detection and parsing (e.g. view estimation and determining the posi- tions of the object subparts). We represent the set of ob- jects by a family of RCMs where each RCM is a probability distribution defined over a hierarchical graph which corre- sponds to a specific object and viewpoint. An RCM is con- structed from a hierarchy of subparts/subgraphs which are learnt from training data. Part-sharing is used so that dif- ferent RCMs are encouraged to share subparts/subgraphs which yields a compact representation for the set of objects and which enables efficient inference and learning from a limited number of training samples. In addition, we use appearance-sharing so that RCMs for the same object, but different viewpoints, share similar appearance cues which also helps efficient learning. RCMs lead to a multi-view multi-object detection system. We illustrate RCMs on four public datasets and achieve state-of-the-art performance.
-
February 17, 2011 [Thursday]
Title: Active Self-calibration of Multi-camera Systems Presenter: Nico van de Aa Time: 11.00-12.00 Slide: PDF Abstract: The paper presents a method for actively calibrating a multi-camera system consisting of pan-tilt zoom cameras. After a coarse initial calibration, the proposed method determines the probability of each relative pose using a probability distribution based on the camera images. The relative poses are optimized by rotating and zooming each camera pair in a way that significantly simplifies the problem of extracting correct point correspondences. In a final step we use active camera control, the optimized relative poses, and their probabilities to calibrate the complete multi-camera system with a minimal number of relative poses. During this process it estimates the translation scales in a camera triangle using only two of the three relative poses and no point correspondences. Quantitative experiments on real data outline the robustness and accuracy of our approach.
-
February 10, 2011 [Thursday]
Title: Real-time Wide area Multi-Camera Stereo tracking Presenter: Dennis Grootendorst Time: 11.00-12.00 Slide: PDF Abstract: The paper presents a fully integrated real-time system to track humans with a network of stereo sensors over a wide area. The processing includes single camera tracking and multi-camera fusion. Each single camera detects and tracks humans in its own view and a multi-camera fusion module combines all the local tracks of the same human into a global track. The paper proposes novel stereo segmentation and tracking techniques to handle multiple humans moving in groups in cluttered environments. It has developed a ground-based fusion method for camera handoff using space-time constraint. It shows results and performance evaluation on very challenging data from a 12- camera system.
-
January 19, 2011 [Wednesday]
Title: Hand-tracking based interactions for mobile devices Presenter: Willem-Jan Spoel Time: 13.00-14.00 Slide:
Additional Materials: PDF | Video Abstract: Nowadays mobile devices allow more and more functionalities because of better hardware and software. Because of this the user interface is more complex. At the moment multiple interfaces are often used to interact with a mobile device (haptics, voice commands). A problem what haptic examples have in common, is that the sensitive region is very small and is limited to at least the size of the phone. Also, when using the touch screen, you limit at the same the most important function of it: visibility.A different method of interacting with a mobile device which doesn’t block the visibility and isn't limited by the size of the screen, is using the camera of the device. Using that camera, hand gesture information can be captured and used for interaction with the mobile device. The main purpose in the project is tracking a hand using a single camera of a mobile device and extracting pose and gesture information. Because a mobile device is used, efficient and robust tracking and recognition of the hand is important.
-
December 22, 2010 [Wednesday]
Title: (1) Sparse 3D Reconstruction from Videos
(2) Wide baseline Stereo and MatchingPresenter: Frank Evers and Dennis Grootendorst Time: 13.00-14.00 Slides: Wide-baseline stereo and matching [PDF]
Abstract: (1) Sparse 3D Reconstruction from Videos:
This is the report on the experimentation project of master student (Game and Media Technology) Frank Evers, carried out under supervision of dr. Robby Tan. The purpose of this project is to acquaint the student with carrying out an experimentation for research and/or development purposes. The goal of this experimentation is to do a sparse 3D reconstruction from videos without any prior knowledge on the camera calibration. Because of the large (and increasing) availability of online videos on websites such as youtube, it is interesting again to look at 3D reconstruction from video. The problem of estimating Structure from Motion (as this is called) has namely already been researched extensively in the past but now that there is so much more data available it has become an interesting topic once again. With the use of this large set of online videos we would then be able to automatically create 3D reconstructions of scenery all around the world. When combining this with their meta-data such as title, description and GPS-coordinates, these reconstructions could be linked to actual places in the world, thus allowing for a searchable 3D earth.
(2) Wide Baseline Stereo and Matching
Stereo matching is an important but difficult topic. The DAISY descriptor can facilitate in efficient computation of a local descriptor for dense matching. First, a pipeline for dense stereo matching is presented, then the performance of the DAISY descriptor compared with SIFT's is discussed.
-
December 8, 2010 [Wednesday]
Title: (1) Facial Expression Classification
(2) Stereo Depth Estimation From Video SequencePresenter: Jan de Wit and Bob Boggemann Time: 13.00-14.00 Slides: Abstract: (1) Facial Expression Classification:
As an experimentation project supervised by Dr. Robby T. Tan, Jan (second year master student) has attempted to combine existing techniques to create a full pipeline for emotion classification from frontal face images. This includes face detection, feature detection and extraction and finally classification between seven different emotions. After detecting the face using Viola-Jones, Shape Models (snakes) are used to get a face model of which various measurements are derived. Finally, an attempt was made to extract texture data using Canny edge detection. Classification is currently based mostly on mouth data and is implemented using AdaBoost. The final challenge is to combine the classifications from various sources (texture, shape model) into a final decision that is more robust than the individual classifications. The extended Cohn-Kanade (CK+) database is used to train the pipeline and test its performance.
(2) Stereo Depth Estimation From Video Sequence
This report is about an experimentation project performed by Master student Bob Böggemann under supervision of dr. Robby T. Tan. The goal is to recover consistent dense depth maps from a static video sequence recorded by a freely moving camera. The framework used for depth estimation is mainly derived from (Zhang, Jia, Wong, & Bao, June 2009) and has been modified to fit our purpose. It consists of a structure from motion technique to derive camera parameters, a Markov Random Field to model the disparity likelihood and smoothness constraints, a segmentation step to better handle texture-less regions and a bundle optimization step to obtain temporal coherence. The photo consistency and geometric coherence constraints assure that after the bundle optimization step the resulting depth map sequence will be just as fluent as the original input video. Finally, the framework is evaluated by performing experiments on several video sequences.
-
December 1, 2010 [Wednesday]
Title: Introduction to Gaussian Process Latent Variable Model Authors: N. Lawrence, J. Wang, A. Hertzmaan, R. Urtasun, C. Rasmussen, et al. Presenter: Robby Time: 13.00-14.00 Slides: PDF Lecture: Gaussian Process | GPLVM | GPDM | 3D Tracking Website: Gaussian Processes (illustration) Abstract: In this reading group, the gaussian process latent variable model (GPLVM) will be introduced. The model is useful in at least two applications: (1) dimensionality reduction, and (2) priors for visual tracking with specific motion. After GPLVM, the discussion will be continued to Gaussian Process Dynamical Model (GPDM), which is the extension of GPLVM specifically for probabilistic dynamic model. If time allows, the applications of GPLVM/GPDM in human pose estimation will also be discussed. Note, in the beginning of the discussion, the basic concept of Gaussian Processes will be briefly discussed.
-
November 24, 2010
No reading group!
-
November 17, 2010 [Wednesday]
Title: Hybrid Multi-view Reconstruction by Jump-Diffusion (cvpr 2010, oral) Authors: F. Lafarge, R. Keriven, M. Bredif, H. Vu Presenter: Arjen Time: 13.00-14.00 Slides: PPT Lecture: online video Abstract: The paper proposes a multi-view stereo reconstruction algorithm which recovers urban scenes as a combination of meshes and geometric primitives. It provides a compact model while preserving details: irregular elements such as statues and ornaments are described by meshes whereas regular structures such as columns and walls are described by primitives (planes, spheres, cylinders, cones and tori). A Jump- Diffusion process is designed to sample these two types of elements simultaneously. The quality of a reconstruction is measured by a multi-object energy model which takes into account both photo-consistency and semantic considerations (i.e. geometry and shape layout). The sampler is embedded into an iterative refinement procedure which provides an increasingly accurate hybrid representation. Experimental results on complex urban structures and large scenes are presented and compared to multi-view based meshing algorithms.
-
November 10, 2010
Title: Monocular 3D Pose Estimation and Tracking by Detection (cvpr 2010, oral) Authors: M. Andriluka, S. Roth, B. Schiele Presenter: Nico Time: 13.00-14.00 Slides: PDF Lecture: online video More Info: project page (videos + data) Abstract: Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by realworld scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and viewpoint of the person from single frames. The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available 2D image evidence into robust estimates of 2D limb positions over short image sequences (= tracklets). The third and final stage uses those tracklet-based estimates as robust image observations to reliably recover 3D pose. We demonstrate state-of-the-art performance on the HumanEva II benchmark, and also show the applicability of our approach to articulated 3D tracking in realistic street conditions.
-
November 3, 2010
Presenter: Willem-Jan Time: 13.00-14.00 Title: On Detection of Multiple Object Instances using Hough Transforms (Oral CVPR 2010) Authors: O. Barinova, V. Lempitsky, P. Kohli Slides: PPT Lecture: online video More Info: video + code Abstract: To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability. At the same time, the framework bypasses the problem of multiple peaks identification in Hough images, and permits detection of multiple objects without invoking nonmaximum suppression heuristics. As a result, the experiments demonstrate a significant improvement in detection accuracy both for the classical task of straight line detection and for a more modern category-level (pedestrian) detection problem.
-
October 27, 2010
Presenter: Robby Time: 13.00-14.00 Title: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classication, (Oral ECCV 2010) Authors: J. Niebles, C. Chen, L. Fei-fei Slides: PPT More Info: Website Abstract: Much recent research in human activity recognition has focused on the problem of recognizing simple repetitive (walking, running, waving) and punctual actions (sitting up, opening a door, hugging). However, many interesting human activities are characterized by a complex temporal composition of simple actions. Automatic recognition of such complex actions can benefit from a good understanding of the temporal structures. The paper presents a framework for modeling motion by exploiting the temporal structure of the human activities. It represents activities as temporal compositions of motion segments. It trains a discriminative model that encodes a temporal decomposition of video sequences, and appearance models for each motion segment. In recognition, a query video is matched to the model according to the learned appearances and motion segment decomposition. Classication is made based on the quality of matching between the motion segment classiers and the temporal segments in the query sequence. To validate, the paper introduces a new dataset of complex Olympic Sports activities. It shows that the algorithm performs better than other state of the art methods.
-
October 20, 2010
Presenter: Nico Time: 13.00 - 14.00 Title: Efficient Extraction of Human Motion Volumes by Tracking, (CVPR 2010) Authors: J. Niebles, B. Han, L. Fei-fei Slides: PDF Abstract: The paper presents an automatic and efficient method to extract spatio-temporal human volumes from video, which combines top-down model-based and bottom-up appearancebased approaches. From the top-down perspective, the algorithm applies shape priors probabilistically to candidate image regions obtained by pedestrian detection, and provides accurate estimates of the human body areas which serve as important constraints for bottom-up processing. Temporal propagation of the identified region is performed with bottom-up cues in an efficient level-set framework, which takes advantage of the sparse top-down information that is available. The formulation also optimizes the extracted human volume across frames through belief propagation and provides temporally coherent human regions. The paper demonstrates the ability of the method to extract human body regions efficiently and automatically from a large, challenging dataset collected from YouTube.
Description:
One of the purposes of the reading group is to follow closely the developments of computer vision, particularly those related to the projects we
have in Multimedia and Geometry Group in Utrecht University. Currently, we have two projects: (1) visual tracking and pose-gesture recognition,
(2) 3D reconstruction (structure from motion, multiple view stereo, the geometry of 3D points, etc). Aside from computer vision, we also include techniques in machine learning that are useful
or potentially useful for our research. Of course, other interesting topics in computer vision are always welcome.
There are two types of members: (1) active members, and (2) passive members. Passive members do not have any obligation,
while active members are asked to give presentations.
Coordinator: Robby T. Tan
If you are not a UU student and want to join, please e-mail to:
tanrobby (at) gmail.com
Meeting Location:
- BBL-445
Buys Ballot Laboratory
Department of Information and Computing Sciences, Universiteit Utrecht
Princetonplein 5, De Uithof, 3584 CC Utrecht
Presentation Schedule:
The following schedule is subject to change. It is arranged only to give approximated dates, so that the students/presenters can prepare the talk better.
| | | |
|---|---|---|
| 23/5/2012 | Marijn Koetzier | Kinect-based hand detection and fitting |
| 30/5/2012 | ||
| 6/6/2012 | Antolin Janssen | AAM-based Hand Fitting |
| 13/6/2012 | Lifang Chen | Activity Recognition: 3D SIFT |
| 20/6/2012 | Vikram Doshi | A Morphable Model For The Synthesis Of 3D Faces |
| 27/6/2012 | Michael Hobbel | 3D face reconstruction |
| 4/7/2012 | Maurits Lam | Rendering Layered Surfaces |
| 11/7/2012 | Patrick Jansen | Micro-bleeding Detection in Retina |