FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 2 | 3 || 5 | 6 |   ...   | 16 |

«MULTI-CAMERA SIMULTANEOUS LOCALIZATION AND MAPPING Brian Sanderson Clipp A dissertation submitted to the faculty of the University of North Carolina ...»

-- [ Page 4 ] --

Each of the processes described to this point can operate with only temporally or spatially local information, including feature extraction, feature tracking, and relative pose estimation. The next section will give background on global mapping operations, which use non-local information to create a map of the camera system’s operating environment.

2.5 Global Mapping In this dissertation, global mapping refers to the processes that a SLAM system must perform to create an internal representation of its operating environment when all of the environment is not visible from a single point of view. At a minimum, the global map must reflect the topology of the operating environment, i.e. the connectivity between the various parts of the environment. This requires that the VSLAM system perform loop detection.

However, for many practical applications, a topological map is not enough and a Euclidean map is required. Some example applications where a Euclidean map is required are efficient path planning and automated targeting. A Euclidean map is the sort of map we are generally accustomed to where each point on the map is known with respect to every other point and the map has a single scale. Additionally, in a Euclidean map distances and angles are preserved. When creating a Euclidean map it is not simply enough to make note of a loop when it is detected. The system must also perform loop correction to make the geometry of the map reflect the Euclidean geometry, not just the topology, of the operating environment. Additionally, the global geometry of a Euclidean map can help to eliminate false loop detections. For example, when mapping a building by moving around it, if two corners of a building look the same and have similar geometry, we can use the shape of the path taken around the building to disambiguate the corners.

2.5.1 Loop Detection Visual loop detection has been studied extensively in the vision and robotics communities. Some authors chose to tackle loop detection based only on appearance (Cummins

–  –  –

ers included geometry in their approaches (Chli and Davison, 2008; Davison, 2003; Eade and Drummond, 2006; Ankur Handa and Davison, 2010; Irschara et al., 2009; Sivic and Zisserman, 2003; Williams et al., 2007).

Sivic and Zisserman (2003) were the first to apply text retrieval techniques to find objects within a video sequence. Their approach is suitable for loop detection since it finds images with common structure in a video sequence. They use both SIFT descriptors (Lowe,

2004) and maximally stable extremal regions (MSER) (Matas et al., 2002). Their approach consists of a preprocessing step followed by a retrieval phase. In the preprocessing step SIFT and MSER features are extracted in each of the images. A subset of these image’s features are then grouped using K-means clustering. Each of these clusters is referred to as a visual word. They use approximately 500 images in the clustering and generate approximately 10000 distinct visual words. They are limited in the number of clusters they can generate by the complexity of K-means clustering with such a large number of clusters. Each image’s features are then quantized into visual words. An inverted file is stored for each visual word, which contains which images a visual word is found within and the visual word’s position in that image.

When querying to find an object in the video the user selects a region of interest containing this object. The system then finds the descriptors in this region and their visual words. It then uses a term-frequency inverse document frequency (TF-IDF) weighting formula to find other images in the video that contain the same visual words. The TF-IDF weighting takes into account both the frequency of a given visual word in the current image and the log inverse frequency of a visual word in the documents. The TF-IDF formula

–  –  –

nd is the total number of visual words in document d, N is the number of documents in the database and ni is the number of documents containing visual word i.

Each document in the database (video) is represented by a vector of TF-IDF values for

each visual word:

–  –  –

The dot products of the vector representing the query region and each document in the database can then be taken to find images that are likely to contain the object of interest.

The method then performs a final geometric verification step. Their method starts with two matched feature regions and looks at the local neighborhood of each feature. The same features in the local neighborhood of the matched features should be found in both frames.

This is a form of loose geometric verification, which does not enforce any sort of global model such as a homography or essential matrix.

Sivic and Zisserman show that their system vastly out-performs this naive approach in terms or run time as well as precision-recall curves. The improvement in precision and recall is primarily due to the TF-IDF weighting of the descriptors and their weak geometric verification.

–  –  –

trieval. Their vocabulary tree approach uses a hierarchical k-means clustering of the feature space to partition features into visual words. The tree-based approach overcomes scaling issues that limited the vocabulary size in Sivic and Zisserman (2003). With a larger number of visual words, the descriptor space can be broken up into more discriminative, smaller clusters. A larger number of smaller clusters improve image retrieval performance. Addi

–  –  –

that they did not need to use geometric verification to achieve good performance. They show results on up to a one-million image database, three orders of magnitude larger than Sivic and Zisserman’s results.

Cummins and Newman (2008) developed a probabilistic approach to loop detection using only image information without geometric verification. Their fast appearance based mapping (FAB-MAP) approach uses a bag-of-words (Sivic and Zisserman, 2003) approach to represent the images where the features are quantized into particular visual words. However, rather than use TF-IDF to determine the likelihood of two images matching, Cummins and Newman learn a generative model for the visual words. This model, learned from training data, reflects the fact that certain visual words are likely to co-occur in an image because they come from the same object. Knowing which features are likely to appear or disappear together because they are part of the same object or objects, the system in effect recognizes unique locations by finding unique groupings of objects in the imagery. The system also recognizes which visual words appear in many locations because of repetitive scene parts.

–  –  –

ing is handled by the inverse document frequency term while in Cummins and Newman’s model a probabilistic approach is taken.

Cummins and Newman make a comparison between a naive Bayesian assumption that the likelihood of a visual word being found in an image is independent of the other visual words in the image and a model that takes into account the correlations between visual word occurrences. They show that using a simplification of the full correlation between all features based on a Chow Liu tree (Chow and Liu, 1968) can provide significant performance improvements over the naive Bayesian approach at little additional computational cost. The Chow Liu tree is a maximal spanning tree over the correlation between features which, while it is a simplification over the full correlation matrix, provides good performance with close to real-time computation.

Loop detection methods that use geometry as well as appearance can be divided into two classes: those that first use a camera pose prior and scene geometry to guide appearance based matching (Chli and Davison, 2008; Davison, 2003; Eade and Drummond, 2006;

Ankur Handa and Davison, 2010), and those that first match based on appearance and then verify the match geometrically (Irschara et al., 2009; Wang et al., 2005; Williams et al., 2007). Geometry-first approaches presume that there is some non-uniform prior on the camera’s pose within the map and use this prior to guide feature matching. The mean and variance of the 3D features are projected into the next expected camera. In combination with the uncertainty on the current camera pose, this gives a search region in the image to look for each feature. Chli et al. showed that when sequentially finding features in this way, the information from one feature match can reduce the pose uncertainty and therefore limits the search region size for subsequent features. In later work, they developed more efficient methods to represent the probabilistic constraints between the feature projections (Ankur Handa and Davison, 2010). These methods make real-time probabilistic feature matching possible.

Geometry-first methods have been successfully demonstrated on small room-sized environments. However, were they to be used in mapping larger environments, the large uncertainty in the camera pose prior after traveling long distances would become an issue.

With a large pose uncertainty, such as one would have after traversing a large loop, the projection of the feature uncertainty regions in the image would be extremely large. Also, the number of features that might project in the image grows as the camera may take on a wide range of poses. A more efficient approach is needed.

Appearance-first approaches can detect loops of any size because they do not rely on a

–  –  –

2006) and inverted files are used to find visually similar parts of a database of images.

These images have been previously processed into a 3D point cloud model of the environment. Features are then matched between the query image and the map based on descriptor similarity (dot product of SIFT features). The perspective three-point method (Haralick et al., 1994) in a RANSAC framework is then used to verify that the geometry of the scene could indeed generate the feature distribution in the query image. However, Irschara et al.

do more than just this. They create virtual views of the 3D point cloud model and use these as the database that query images are compared with. This significantly reduces the number of images in the database, speeding up the histogram generation from the inverted files.

They also make extensive use of the graphics processing unit (GPU) to speed up processing query image features into visual words through the vocabulary tree.

A wholly different approach to feature or key-point identification was developed by Ozuysal et al. (2007,2010) and demonstrated in a real-time monocular SLAM system by Williams et al. (2007). Their approach, which they call ferns, uses a collection of simple Bayesian classifiers to separate key-points into a set of classes. These classifiers (ferns) start with an image patch representing the key-point. Each classifier is then made up of a set of binary decisions. Two pixels are randomly selected in the patch and if one pixel is brighter than the other the value is one, otherwise the binary decision value is zero. Using a large number of these random binary measurements over the image patch a distribution of resulting binary numbers (fern results) can be found over a large training set of examples of the same image patch. These examples can be synthetically generated affine warps of an original image patch. In practice the number of binary decisions (leaves in the fern) must be large (L = 300) to achieve acceptable classification performance. For each keypoint, the method must calculate the probability that it takes on any one of the 2300 possible fern output values. This is intractable and so the authors simplify the computation by breaking up the large fern into M smaller ferns each of which can take on 2L/M values.

The classification results of this group of ferns are then multiplied together to form the final probability of a feature being of a particular class. Grouping the binary decisions into small ferns performs better experimentally than assuming that each of the binary decisions is statistically independent.

Ferns were an outgrowth of and an improvement on Lepetit and Fua’s work on randomized forests (Lepetit and Fua, 2006). Lepetit and Fua were the first to consider the wide-baseline matching problem as a classification problem where a key-point should be mapped to a class or view-set. This is the same approach they later took in Ferns but with the difference that randomized forests make use of randomized trees (Amit et al., 1996) for the classification while ferns makes use of the non-hierarchical fern structure.

Ozuysal et al. ’s ferns perform the same sort of quantization as a combination of affine invariant feature extraction (SIFT for example) followed by hierarchical quantization through a vocabulary tree. However, ferns do not require an affine invariant feature to be extracted with a descriptor vector, which is usually costly to compute. Also, the number of ferns and the number of binary decisions in each fern can be tailored to the computational resources available on a given platform. Fewer ferns might be used in an embedded system while more might be used in a laptop for example. Of course, this comes at a price. With fewer ferns, the features can be broken into fewer classes reducing the classes’ discriminative power.

Williams et al. (2007) showed that ferns could be used to recover from a loss of tracking in monocular VSLAM. While loss of tracking is not exactly the same problem as loop detection, Eade and Drummond (2008) have pointed out that the same location recognition mechanism can be used for both. Willams’s SLAM system is based on an Extended Kalman Filter for map and pose estimation (Davison et al., 2007). Each feature that is added to the map is passed to a background process that warps a patch about the feature in various ways such as rotation, scaling, and perspective warping. These warped patches are then passed into the ferns which learn the distribution of appearances that may occur for a given feature patch. Later when the camera becomes lost, the system can classify the features in a new image using the ferns to find likely matches between the features in the current image and those in the map.

Pages:     | 1 |   ...   | 2 | 3 || 5 | 6 |   ...   | 16 |

Similar works:

«Donald W. Livingston’s Philosophical Melancholy and Delirium Peter S. Fosl Hume Studies Volume XXIV, Number 2 (November, 1998) 355-366. Your use of the HUME STUDIES archive indicates your acceptance of HUME STUDIES’ Terms and Conditions of Use, available at http://www.humesociety.org/hs/about/terms.html. HUME STUDIES’ Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of...»

«PLACE OF RESIDENCE: UNDERSTANDING THE IMPACT ON INTERACTIONS AND RELATIONSHIPS WITH PEERS, FACULTY, AND DIVERSE OTHERS AMONG SENIOR STUDENTS by Tara C. Sullivan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Higher Education) in the University of Michigan Doctoral Committee: Associate Professor Deborah F. Carter, Co-Chair, Claremont Graduate University Professor Lisa R. Lattuca, Co-Chair Professor Tabbye M. Chavous Associate Professor...»

«Influence of Sulfur on Liquid Fuel Reforming by Joseph M. Mayne A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Chemical Engineering) in The University of Michigan Doctoral Committee: Professor Johannes W. Schwank, Chair Professor Philip E. Savage Professor Margaret S. Wooldridge Associate Professor Suljo Linic Adjunct Professor Galen B. Fisher c Joseph M. Mayne 2010 All Rights Reserved For Baby Joey ii ACKNOWLEDGEMENTS This...»

«Development of Analytical Methods for the Determination of Methylarginines in Serum Thomas Linz B.S. Chemistry, Truman State University, 2007 Submitted to the Department of Chemistry and the Graduate School of the University of Kansas in partial fulfillment of the requirements for the degree of Doctor of Philosophy. _ Susan Lunte – Chair _ Robert Dunn _ Craig Lunte _ David Weis _ Brian Ackley Dissertation Defense: April 12, 2012 The Dissertation Committee for Thomas Linz certifies that this...»


«THEISM AND MODAL COLLAPSE Klaas J. Kraay Ryerson University This paper appears in American Philosophical Quarterly 48 (2011): 361-372. The published version can be found online at: http://apq.press.illinois.edu/48/4/kraay.html. God is traditionally taken to be a necessarily existing being who is unsurpassably powerful, knowledgeable, and good. The familiar problem of actual evil claims that the presence of gratuitous suffering in the actual world constitutes evidence against the existence of...»


«S100 Gene Family Members in Oral Squamous Cell Carcinomas (OSCCs): Functional Characterization of S100A14 in Proliferation and Invasion of OSCC Derived Cells Dipak Sapkota Dissertation for the degree of Philosophiae Doctor (PhD) University of Bergen, Norway 2011 UNIVERSITETET I BERGEN S100 gene family members in oral squamous cell carcinomas (OSCCs): Functional characterization of S100A14 in proliferation and invasion of OSCC derived cells Dipak Sapkota Dissertation for the degree of...»

«Ghent University Faculty of Arts and Philosophy “I AM NOT SURE IF THIS IS A HAPPY ENDING” THE QUEST FOR FEMALE EMPOWERMENT IN ANGELA CARTER’S WISE CHILDREN Supervisor: Dissertation submitted in partial fulfilment of the requirements for the degree of “Master in Professor Marysa Demoor de Taalen Letterkunde: Engels” by Aline Lapeire 2009-2010 Lapeire ii Lapeire iii “I AM NOT SURE IF THIS IS A HAPPY ENDING” THE QUEST FOR FEMALE EMPOWERMENT IN ANGELA CARTER’S WISE CHILDREN The...»

«Practical Guidelines for Multiplicity Adjustment in Clinical Trials Michael A. Proschan, PhD, and Myron A. Waclawiw, PhD Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland ABSTRACT: Multiplicity in clinical trials may appear under several different guises: multiple endpoints, multiple treatment arm comparisons, and multiple looks at the data during interim monitoring, to name a few. It is well recognized by statisticians and nonstatisticians alike...»

«Creating Green Chemistry: Discursive Strategies of a Scientific Movement Jody A. Roberts Dissertation submitted to the Faculty of Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Science and Technology Studies Committee: Richard M. Burian (chair) Daniel Breslau Richard F. Hirsh Timothy W. Luke Joseph C. Pitt 13 December 2005 Keywords: Green Chemistry, Scientific Movements, Chemistry Studies, Discursive...»

«Order N ◦ : 4822 Thesis with University of Bordeaux Physics Science and Engineering Doctorate School presented by Hassen KRAIMIA in partial fulfillment of the requirements for the degree of Doctor of Philosophy in: Electronics ————————— Ultra-Low Power RFIC Solutions for Wireless Sensor Networks ————————— Discussing: 10 July 2013 Commission: Examinator Prof. Corinne BERLAND ESIEE Examinator Prof. Christian ENZ Swiss Federal Institute of Technology in...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.