Door detection in images integrating appearance and shape cues, A.C. Murillo, J. Košecká, J.J. Guerrero C. Sagüés, - PDF

Description
0 DIIS - I3A Universidad de Zaragoza C/ María de Luna num. E-5008 Zaragoza Spain Internal Report: 2007-V09 Door detection in images integrating appearance and shape cues A.C. Murillo, J. Košecká, J.J.

Please download to get full document.

View again

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Crosswords

Publish on:

Views: 8 | Pages: 12

Extension: PDF | Download: 0

Share
Transcript
0 DIIS - I3A Universidad de Zaragoza C/ María de Luna num. E-5008 Zaragoza Spain Internal Report: 2007-V09 Door detection in images integrating appearance and shape cues A.C. Murillo, J. Košecká, J.J. Guerrero C. Sagüés If you want to cite this report, please use the following reference instead: Door detection in images integrating appearance and shape cues, A.C. Murillo, J. Košecká, J.J. Guerrero C. Sagüés, Workshop: From Sensors to Human Spatial Concepts, pages 4-48, IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego- USA, This work was supported by projects DPI , IST URUS-STP and NSF Grant No. IIS Door detection in images integrating appearance and shape cues A. C. Murillo, J. Košecká, J. J. Guerrero and C. Sagüés Abstract Important component of human-robot interaction is the capability to associate semantic concepts to encountered locations and objects. This functionality is essential for visually guided navigation as well as place and object recognition. In this paper we focus on the problem of door detection using visual information only. Doors are frequently encountered in structured man-made environments and function as transitions between different places. We adopt a probabilistic approach to the problem using a model based Bayes inference to detect the door. Different from previous approaches the proposed model captures both the shape and appearance of the door. This is learned from a few training examples, exploiting additional assumptions about structure of indoors environments. After the learning stage, we describe a hypothesis generation process and several approaches to evaluate the probability of each generated hypothesis. The new proposal is tested on numerous examples of indoor environments, showing a good performance as long as enough features are encountered. I. INTRODUCTION In this paper we present a new technique for detecting doors in perspective images of indoors environment using only visual information. Detection of doors is of great importance for various navigation and manipulation tasks. The doors are often places which separate different locations, can be used as landmarks for navigation and/or relative positioning or as hints to define exploration and SLAM strategies []. They also need to be recognized for door opening and navigation to neighbouring rooms [2], [3]. The problem of door detection has been studied numerous times in the past. The existing approaches differ in the type of sensors they use and the variability of the environment/images they consider. For example, doors are detected using both visual information and range data (sonar) in [4] and [5]. In [4] authors exploit the fact that vision is good for providing long range information (beyond the range of ultrasound sensor) and detect and group the vertical lines based on the expected dimensions of the door and form initial door hypotheses. In [5] the authors tackle more general problem of obtaining a model of the environment defined by instantiations of several objects of predefined class (e.g. doors, walls) given range data and color images from an omni-directional camera. The doors are then detected as particular instantiations of the door model, given all the sensory data. The door hypotheses are obtained by fitting linear segments to laser range data and associated color values from the omnidirectional camera. In [6] authors focus on handling the variations in door appearance due to camera pose, by characterizing properties of the individual segments using linguistic variables of size, direction and height and combine the evidence using fuzzy logic. Additional work using visual information only was reported in [7], where only geometric information about configurations of line segments is used. In most instances, only the doors which were clearly visible and close to the observer were selected as correct hypotheses. Additional motivation for revisiting the door detection problem is to explore the suitability of general object detection/recognition techniques to door detection/recognition problem. A more general goal is to use the obtained insights for tackling recognition and detection of additional objects in the environment, which do not have a very discriminative appearance and are to a large extend defined by their shape, but whose shape projection varies dramatically as a function of viewpoint, e.g. tables or shelves. The object recognition techniques, which are being explored extensively in computer vision, commonly adopt so called part based models of the objects, which consider representations of objects in terms of parts [8] and spatial relationships between them. Learning the object parts for different object classes is often the first stage of existing approaches. The classification methods then vary depending whether full generative model is sought or discriminative techniques are used, or combination of both. In the simplest of the generative model settings, the recognition stage proceeds with the computation p(object X, A), where X, A are the positions and appearance of the object parts and p(x, A θ) is learned from training examples, where θ are the parameters chosen to describe the object. With the discriminative approaches, multi-class classifiers are trained to distinguish between the low-level features/parts characteristic of a particular class [9] and typically proceed in a supervised or weakly supervised setting. In the robotic domain the discriminative approach has been applied for place and door recognition using Adaboost learning procedure, with geometric features computed from laser and Haar-like features computed from images as input features [0]. In this work, we explore the Bayesian approach, where we learn the probability distribution P (Door θ) in a parametric form in a supervised setting. Alternatively to the part based representation assumed in [8], we pursue a model based approach, where the geometry of the door is specified by small number of parameters and appearance is learned from a few training examples. This type of representations resemble models used in interpretations of architectural styles and man-made environments, where the analysed scenes are typically well characterized by a small number of geometric/architectural primitives []. Instead of proposing the generative model of the whole image, we use the constraints of man-made environments to reduce the search space and to generate multiple hypotheses and use the learned probability distribution to evaluate their likelihood. A. C. Murillo, J.J. Guerrero and C. Sagüés are with the I3A/DIIS at University of Zaragoza, 5000 Spain. J. Košecká is with the Department of Computer Science George Mason University, Fairfax, VA, 22030, USA. This work was supported by projects DPI , DPI , IST URUS-STP and NSF Grant No. IIS 2 Outline Section II describes the probabilistic model we adopt. The hypothesis generation process is explained in Section III, followed in Section IV by the approaches for learning the appearance likelihood of the model. Section V explains how the likelihood evaluation of each hypothesis is performed. Finally sections VI and VII present initial door detection experiments and some conclusions of the work. II. PROBLEM FORMULATION w (width) C (x,y) C2 h (height) C3 C4 Fig.. Model of a door and components of one of its corner-features (C ): corner location (x, y ) and lines that give rise to the cross point (l H, l V ). We will assume that the door model, see Fig., is well described by a small set of parameters θ. Ideally, if we were to pursue a fully Bayesian approach, we would first learn or have at our disposal prior distributions of these parameters. We start with a restricted simple setting where we seek to compute p(object X, A), with X, A being the positions and appearance of low-level features detected in the image: p(object X, A) p(x, A Object)P (Object). Assuming that all objects are equally likely and that our object of interest can be well described by a small set of parameters θ = (θ S, θ A ), shape and appearance parameters respectively, this posterior probability can be decomposed: P (θ X, A) P (X, A θ)p (θ) = P (X, A θ S, θ A )P (θ S, θ A ) = P (X, A θ S, θ A )P (θ A )P (θ S ). () We consider the parameters θ S and θ A to be independent, e.g. appearance (color/texture) of a primitive is independent of its shape and vice versa. The interpretation of the final terms in () is as follows: P (θ S ) represents the prior knowledge about the geometric shape parameters of the door, for instance the ratio between width and height of the door or the position of the c 3 corner, which should be touching the floor. P (θ A ) is the prior information on the appearance of the object, in our case doors. This information is typically learned from examples. In this work we will exploit only color information, but more elaborate appearance models based on texture can be incorporated. P (X, A θ S, θ A ) is the likelihood term of individual measurements, given a particular instantiation of the model parameters θ = (θ S, θ A ). In the presented work, we do not use priors P (θ A ) and P (θ S ) and consider only maximum likelihood values of the parameters θ S and θ A, which for geometry are given by a known model and for appearance are learned in a supervised setting. The likelihood term can be further factored as P (X, A θ S, θ A ) = P (A X, θ S, θ A )P (X θ A, θ S ). The shape likelihood term used in this work is explained in Section likelihoods are described and evaluated in Sections IV and V-A. V-B and the possible choices for the appearance III. HYPOTHESES GENERATION The selection of individual hypotheses consists of a low level feature extraction process followed by the initial hypotheses instantiation which we describe next. 3 A. Feature extraction First, line segments are extracted from the image with our implementation of the reference [2], and the vanishing points are estimated with the approach described in [3]. Using vanishing point information the line segments are grouped in two sets: lines which are aligned with the vertical vanishing direction and lines which are aligned with either horizontal direction or the z optical axis. All possible intersections between vertical and the remaining sets of lines are computed. The intersection points which have low corner response (measured by Harris corner quality function) are rejected. Figure 2 shows an example of the extracted lines grouped using the vanishing information (in red vertical ones, in blue non-vertical ones). In the same figure, all the intersection points that were close to line segments are shown with a cross (+), and those that remained after the high cornerness response filtering are re-marked with squares around ( ). Finally the detected intersections are classified into 4 types (c, c 2, c 3 and c 4 ), according to the kind of corner that they produce (see Fig. 3). B. Instantiation of initial hypotheses The corner features detected in the previous stage are grouped into sets of compatible ones, which are used to define initial hypotheses. In the first stage pairs of compatible corners ({c, c 2 }, {c, c 3 }, {c 2, c 4 } and {c 3, c 4 }) are found. To consider a pair of corners to be compatible we take into account its alignment, according to the directions of the lines (l V,l H ) that generated those corner features. For example, a corner of type c is considered compatible with all corners of type c 2 which are on the right of the c corner and whose respective line segments l H are aligned up to a small threshold. This search for two corner hypotheses is followed by the intersection between these sets of 2 corners, obtaining sets of 3 compatible corners: {c, c 2, c 3 }, {c, c 3, c 4 }, {c, c 2, c 4 }, {c 2, c 3, c 4 }. Similarly, we look for intersections between the 3-corner hypotheses to obtain hypotheses supported by 4-corners {c, c 2, c 3, c 4 }. After this stage, we have four types of hypotheses: supported by 4, 3 or 2 corner features or composed by those singleton corners that did not have compatible corner features. Example hypotheses generated for an image are shown in Fig. 4. All the corner features extracted are marked by a square ( ), the corners contributing to each hypothesis are marked by, and the dotted lines show the area delimitated by the hypothesis. Each subplot shows the hypotheses contributed by, 2, 3 or 4 corners respectively for the same test image. Only for the 4 corner hypotheses all supporting corners correspond to real corner features ( ). In the remaining cases the missing corners are generated by completing the rectangle with the information from the available corner features, using their supporting lines as shown in Fig. 5. IV. LEARNING THE APPEARANCE MODEL The appearance model of the door is learned in a supervised setting from a set of examples. In this work, the appearance model was learnt in two steps: first a set of training images was segmented based on color and individual segments were labeled as door or non-door segments. Then, we investigated various means of representing the appearance (in our case color) probabilistically Fig. 2. Line segments grouped in vanishing directions (vertical in red, non-vertical in blue), corner points detected (green +) and corner points with high corner response (black ). intersection point line segment C corner type C2 corner type 2 C3 corner type 3 C4 corner type 4 Fig. 3. Examples of line intersections and the four types of corner features that can be generated. 4 Fig. 4. Initial hypotheses generated for a test image. All extracted corner features are shown with a black square. Top-left: -corner hypotheses; Top-right: 2-corner hypotheses ; Bottom-left: 3-corner hypotheses; Bottom-right: 4-corner hypotheses corner features supporting the hypothesis generated missing corners for the hypothesis 3-corner hypothesis completion 2-corner hypothesis completion -corner hypothesis completion Fig. 5. Examples of the completion of initial hypothesis supported by less than 4 corner features. A. Reference images segmentation and labeling The set of training images was segmented using the color based segmentation algorithm proposed in [4] at different levels of accuracy. All obtained door segments were labelled. Fig. 6 shows two examples of reference images segmented at the finest level tried (σ = 0.5; k = 500; min size = 20) where k is the maximum number of segments, while minimal segment size is 20 pixels. The value of σ is used in the smoothing stage preceding the algorithm. Using this level of segmentation generated segments of desirable size, without including too many pixels outside of door regions Fig. 6. Two training images and their segmentations. Left: the original images. Right: segmentation obtained (the dotted pattern marks the segments labelled as doors). B. Representations of the appearance information Once the door-segments are detected and labelled, the appearance properties of the modeled object are learned. Fig. 7 shows the door and non-doors pixels from the reference images plotted in the RGB and Lab (CIE 976 (L* a* b*)) color spaces. These distributions were studied for different levels of segmentation. Not surprisingly for coarser segmentation levels, the 5 Blue B Green A L Red Fig. 7. Reference door (red) and non-door (blue) pixels plotted in Lab (left) and RGB (right) color spaces. clusters belonging to doors were less compact. The color space Lab represents the pixels from the labelled segments in a more compact way. In order to be able to compute the appearance likelihood P (A X, θ A ) of pixels in a particular region to be door pixels, we need to adopt a particular form of probability density function or alternative means of computing the probabilities directly. Here we examine three such choices and discuss their advantages and disadvantages: - The first approach examined was motivated by earlier work on using color for face detection [5]. Here instead of assuming a parametric model of pdf, authors used directly the data (pixels counts) acquired in the training stage. In this setting, probability that a given rgb pixel value x belongs to a door is obtained as P (x θ A ) = r(rgb) t(rgb), (2) where r(rgb) is the count of pixels with color rgb that were labeled as a door segments, and t(rgb) is the total count of pixels with color rgb in the reference images. Fig. 8 shows several examples of the probability assigned to each pixel following this approach. Fig. 8. Examples of the probability assigned to each pixel based on the reference pixels counts. 2 - The color distribution of the pixel is modeled as a mixture of Gaussians. The probability of a particular rgb pixel value x then takes the following form k P (x θ A ) = α j (2π)d Σ j e 2 (x µj)t Σ (x µ j) (3) j= where α j 0 and k j= α j = and θ A = {µ i, Σ i } k are the means and covariances of learned clusters. Instead of using the entire probability distribution to determine the probability of a pixel having a door appearance, we find the Gaussian with the highest probability and assign that probability to the pixel P (x θ A ) = (2π)d Σ j e 2 (x µj)t Σ (x µ j). (4) Similarly to RGB color representation, the same model can be adopted with any other three band color representations, such as HSV or Lab. Figure 9 show the clustering results for RGB and Lab color spaces. They correspond to the finer level of segmentation and also finer clustering (k=0), although tests were done also with coarser levels of segmentation and clustering. Fig. 0 shows several examples of the appearance likelihoods assigned to individual pixels following this approach. 3 - The mixture of Gaussians model has been also applied to color histograms of door regions. In this case, for each labeled segment in the training set, a histogram of the color distribution is built. Two ways of building these histograms have been studied. First, only marginal histograms were considered and each color band was quantized in 50 bins, therefore each region is represented by a 50 bins histogram. With this representation we are assuming that the three color bands are independent, yielding a low dimensional representation of the histogram. This assumption has been successfully applied before [6] and has been shown to be useful in cases where there are few training examples (as it occurs in our case). 6 LAB clusters (k = 0) RGB clusters (k = 0) Fig. 9. pixels). K-means clustering of color from labelled door pixels in Lab (left) and RGB (right) color spaces (the big set of blue + points corresponds to non-door RGB clustering (k = 0) LAB clustering (k = 0) Fig. 0. Examples of the probability assigned to each pixel based on the mixture of Gaussians. In the second approach, full histograms were considered. The color space is quantized from 24 bits (considering 256 possible values for each of the three color bands) to 2 bits (2 4 possible values for each band). Then each of the three color bands should be fitted to a range between 0 and 5, giving a range of 4096 possible colors. Each color rgb now is represented as 256 r g 5 + b + and 4096 bin histogram can be computed for each region using this representation. The generated histograms, with either of these two approaches are clustered with k-means and each cluster is represented by a centroid, mass and covariance. The probability of a certain region determined by features X having an appearance of a door depends on the distance, d h, to the closest cluster: P (A X, θ A ) = e d h σ h. (5) The distance d h between two normalized histograms h and h2 is based on the Bhattacharyya distance, d h (h, h2) = n i= h(i)h2(i), where n is the number of histogram bins. σh is the deviation among this distance computed from the query to door and non-door reference histograms. V. LIKELIHOOD COMPUTATION The previous section describes seve
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks