FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 10 | 11 || 13 | 14 |   ...   | 16 |

«MULTI-CAMERA SIMULTANEOUS LOCALIZATION AND MAPPING Brian Sanderson Clipp A dissertation submitted to the faculty of the University of North Carolina ...»

-- [ Page 12 ] --

At this point, the newest key-frame has been completely incorporated into the local map. It will be considered until it leaves the bundle adjustment window or the visual odometry fails and a new sub-map is started. Please note that as soon as a frame has an initial pose in the visual odometry module, its 3D pose with respect to the global map can be found. This pose will be locally accurate and will be refined through the windowed bundle adjustment. The pose may be changed when loops are detected in the Global SLAM module but this should not affect tasks such as obstacle avoidance. After exiting the bundle adjustment window, key-frames are processed by the Global SLAM module.

5.2.3 Global SLAM Module The Global SLAM module ensures global consistency in our VSLAM system. It incorporates the information of all currently available key-frame poses, feature measurements, and initial 3D feature estimates from the Visual Odometry module. The final result is a set of globally consistent, Euclidean sub-maps each of which has its own global coordinate frame. The sub-maps are disjoint, meaning they cover separate areas in the environment or cannot be linked by common 3D features due to limitations of wide-baseline feature matching.

The key element to improve the incremental motion estimation provided by the VisualOdometry module is the detection of loop completions. Loop completions provide additional constraints to the local constraints found in the VO module. Our system uses the vo

–  –  –

alternative approach like the Fab-Map approach of Cummins and Newman (2008) could be used instead. In our approach, SIFT feature descriptors are quantized into visual words using a K-d tree over a descriptor space, which is pre-computed. The visual words seen in an image are then organized so that one can find out quickly, which images a visual word is seen in. Finding similar images to a query image is then as simple as computing a vote to determine in what other images a query image’s visual words are found. In the vote higher weight is given to the more discriminative visual words that are found less frequently.

The Global SLAM module can operate in one of two modes. When exploring new areas the system operates in loop seeking mode while in previously mapped regions the system operates in known location mode. Loop Seeking Mode Loop seeking mode performs loop detection for each new key-frame and after a successful loop identification a global refinement is computed through bundle adjustment. Loop detection begins by using the vocabulary tree to find a list of the most similar images to the current key-frame sorted by similarity. Images from recent key-frames are removed from the list so that loops are only found to older sections of the map. In our system recent keyframes are selected by the number of key-frames between the current key-frame and other key-frames and is a selectable parameter. A more principled approach might use visibility constraints on the previous key-frames in the sequence to determine which ones could see the current key-frame’s features and therefore should be considered ”recent”. Images in the list are tested in order of similarity until a matching image is found or the similarity score of the next best match is too low.

Rather than simply match SIFT features from the query image to those visible in the next most similar image we use the putative matching image to find a region of local 3D scene structure and match the query image to this structure. This can be seen as a form of query expansion based on 3D geometry. The expansion is done by finding the images near the next most similar image and including all of the 3D features visible in all of these images in the SIFT matching and geometric verification. The SIFT matching is then performed from the image to the 3D structure. SIFT descriptor matching is performed from the descriptors of the features in the current key-frame to the 3D features’ descriptors. We only try to match SIFT descriptors with the same associated visual word, which reduces the number of descriptor dot products performed. A RANSAC process using the three-point perspective pose method is then used to find the pose of the current camera and the pose is non-linearly optimized afterwards.

If the above method finds a solution supported by enough inlier matches, it is considered a loop. The features associated with the inlier measurements to the RANSAC are linked so that they are treated as a single feature in bundle adjustment. Using 3D feature to 2D projection matching with geometric verification makes false positive loop detections much less likely than using an image-to-image, appearance only, matching approach. This is because our approach combines both visual and geometric similarity to detect loops. Still truly repetitive 3D structures which also look the same can cause incorrect loops to be detected. Dealing with repetitive structures remains an open research problem.

If no loop has been detected, then the next key-frame is tested for a potential loop closing. If a loop was detected, the system performs a global correction to the current submap incorporating the newly detected loop. Since the newly detected loop features have high reprojection errors in the current key-frame, they would be deemed invalid by our bundle adjustment, which uses a robust cost function. Hence, they would not influence the error mitigation process. To overcome this effect we re-distribute the error before bundle adjustment. This initializes the bundle adjustment much closer to the global minimum of its cost function, increasing its convergence rate and decreasing the chance of converging to a local minimum.

We re-distribute the accumulated error by starting with the difference in the current key-frame pose and the current key-frame’s pose calculated w.r.t. the old features. This gives us the amount of drift that the system has accumulated since it left the last known location in the sub-map. This last known location is either the first frame in the sequence if no loops have been found so far or the last place the system was operating in known location mode. The system is operating in known location mode when it has reacquired features it has mapped before and is tracking with respect to that known map. The system linearly distributes the error correction for the cameras back to the point it was operating in known location mode. Spherical linear interpolation (Shoemake, 1985) of the rotation error quaternion is used to interpolate the rotation error. Feature points are similarly corrected by moving them along with the camera that first views them. A global bundle adjustment of the map is then performed. After bundle adjustment, outlier measurements are removed as well as features visible in fewer than two key-frames. These features give little information about the scene structure and are more likely to be incorrect since they do not match the camera’s motion. After successfully detecting the loop and correcting the accumulated error the Global SLAM module enters known location mode. Known Location Mode After successfully identifying a loop, this mode continuously verifies that the robot is still moving in the previously mapped environment. Verification is done by linking the current 3D SIFT features to previously seen 3D SIFT features in the environment surrounding the current location. These matches are added to a windowed bundle adjustment in the GS module, which keeps the camera path consistent with the older previously computed parts of the map.

In known location mode SIFT feature matching between the current key-frame and the old 3D SIFT features is done using the predictive approach described in the visual odometry module (see Section 5.2.2 for a discussion of visual odometry). Older features can be linked to the features visible in the current frame by projecting all of the 3D SIFT features seen in the previous key-frame and it’s neighboring images (two key-frames are neighbors if they see the same 3D feature) and comparing descriptors. If no matching older SIFT features are found then the robot has left the previously observed parts of the environment and the system reenters the ”Loop Seeking” mode.

The windowed bundle adjustment in GS is much the same as the one performed in the Visual Odometry module. The only difference in this case is that the older key-frames are also included in the bundle but fixed. This ensures the new camera poses stay consistent with the existing map. Fixing the older cameras is also justified since they have already been globally bundle adjusted and are probably more accurate than the more recent keyframes. After the windowed bundle adjustment processing begins on the next key-frame.

5.3 Implementation Details A key to the performance our system is that each of the three modules Scene Flow, Visual Odometry, and Global SLAM operates independently and in parallel. To ensure that all captured information is used, only the Scene Flow module has to operate at frame-rate. The timing constraints on the visual odometry are dynamic and only depend on the frequency of key-frames. This module can lag behind by a few frames. The Global SLAM module is less time constrained since its corrections can be incorporated into the local tracking when they are available. The system’s modules operate in separate threads that each adhere to the individual module timing requirements.

5.3.1 Scene Flow Module The Scene Flow module begins by taking raw, Bayer pattern images off of the stereo cameras. These images must be converted to luminance images and radially undistorted before the sparse scene flow can be measured. We use color cameras so that the video we record can later be used for dense stereo estimation and 3D modeling. While tracking could be performed on radially distorted images, we remove the radial distortion from the images so that later SIFT feature extraction in the Visual Odometry module can be done on undistorted images. Using undistorted images helps in SIFT matching when using cameras with a large amount of radial distortion.

De-mosaicing, radial undistortion, and sparse scene flow are all calculated on the graphics processing unit (GPU) using CUDA. To increase performance we minimize data transfer between CPU to GPU by downloading the raw image to GPU for each frame, performing all computations in GPU memory, and then only uploading undistorted images to the CPU for the key-frames as well as the tracked feature positions.

After each key-frame the feature tracks (2D position and feature identifier) and the undistorted images are passed to the Visual Odometry module. While the Visual Odometry module processes the key-frame the Scene Flow thread can track ahead of it, buffering new key frames until the Visual Odometry module is able to process them. Hence, the speed of Visual Odometry does constrain the Scene Flow module’s real-time performance. This is just one example of how parallelism adds robustness to our system.

5.3.2 Visual Odometry Module In this module we perform the incremental motion estimation from the KLT-features tracks and the detection of SIFT features in parallel. For efficiency we use one thread for each of the two stereo images. After the SIFT detection we release the image buffers to save memory.

As described in Section 5.2.2 the Visual Odometry module’s outputs are the relative camera motion and the new 3D points. These outputs are stored in a queue and are removed from Visual Odometry’s local storage. Using a queue decouples processing in the VO and GS module threads. Whenever tracking fails all the VO module’s internal data (key-frame poses and 3D features) is queued for processing by the Global SLAM module.

5.4 Experimental Results In order to demonstrate the speed, accuracy, and long-term stability of our VSLAM system we present results from two video sequences of two indoor environments with different characteristics. The first sequence was taken in an office environment, which has a large, open floor plan. I will refer to this as the ”office” sequence. The second sequence was shot in a building with long, but relatively narrow (1.7m) hallways. It will be called the ”hallway” sequence. The closed floor plan does not allow features to be tracked for long periods of time since they quickly leave the stereo camera’s field-of-view, yet the system successfully maps the halls accurately with an error of less than 30cm over the 51.2m length of the longest hall shown in Figure 5.10. This is an error of less than 0.6%.

Our setup uses a calibrated stereo camera pair consisting of two Point Grey Grasshopper cameras with 1224×1024 pixel resolution color CCD sensors delivering video at fifteen frames (stereo pairs) per second. The system’s 7cm baseline is comparable to the median human inter-pupil distance. The cameras are mounted on a rolling platform with the computer. Using a rolling platform the planarity of the camera path can be used to evaluate the quality of the reconstruction results. However, the full six degrees of freedom are estimated for the camera’s motion. While performing real-time VSLAM the system also records the imagery to disk for debugging or archival purposes.

The office sequence includes transparent glass walls and other reflective surfaces that make tracking more challenging (please see Figure 5.3 for example frames). It also has a hallway with relatively low texture, which our system successfully maps, showing it is robust to areas without a large amount of structure. In one section of the video a person moves in front of the camera, partially occluding some of the tracked features (see Figure 5.3). Even in this case, the system is able to reject the moving person’s feature tracks as outliers and continue tracking correctly.

Figure 5.4 shows the difference between operating only using visual odometry and performing the full VSLAM with loop detection and global map correction.

In the top pane of Figure 5.4, the map is shown using only visual odometry where the relative motion from frame to frame is accumulated to form the camera path. In visual odometry no loop detection of global map correction is performed hence, the system drifts over time. In this scene, VO accumulated drift of approximately 3m over an approximately 150 meter path. In the bottom pane, the results of our Global SLAM module are shown. Clearly, the long-term drift of visual odometry is eliminated by loop detection and the succeeding error mitigation through bundle adjustment.

Pages:     | 1 |   ...   | 10 | 11 || 13 | 14 |   ...   | 16 |

Similar works:

 Copyright by Tania Heather Cantrell The Dissertation Committee for Tania Heather Cantrell certifies that this is the approved version of the following dissertation: How Do News Issues Help Frame Telenovela Plots? A Framing Analysis of Brazilian Print National Press and TV Globo’s 8 p.m. Telenovela Duas Caras [Two Faced/s] Committee: Stephen D. Reese, Co-Supervisor Joseph D. Straubhaar, Co-Supervisor Renita Coleman Dustin Harp América...»

«IMPROVING THE PERFORMANCE, AVAILABILITY, AND SECURITY OF DATA ACCESS FOR OPPORTUNISTIC MOBILE COMPUTING BY STEPHEN D. SMALDONE A dissertation submitted to the Graduate School—New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Computer Science Written under the direction of Liviu Iftode and approved by New Brunswick, New Jersey May, 2011 c 2011 Stephen D. Smaldone ALL RIGHTS RESERVED...»


«THE INTEGRATION OF LANGUAGE AND CONTENT: FORM-FOCUSED INSTRUCTION IN A CONTENT-BASED LANGUAGE PROGRAM by Antonella Valeo A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Curriculum Teaching and Learning Ontario Institute for Studies in Education University of Toronto © Copyright by Antonella Valeo (2010) THE INTEGRATION OF LANGUAGE AND CONTENT: FORM-FOCUSED INSTRUCTION IN A CONTENT-BASED LANGUAGE PROGRAM Doctor of Philosophy,...»

«Structured Probabilistic Models of Proteins across Spatial and Fitness Landscapes Hetunandan Kamichetty CMU-CS-11-116 March 2011 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Chris J. Langmead(co-chair) Eric P. Xing (co-chair) Jaime Carbonell Chris Bailey-Kellogg Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2011 Hetunandan Kamichetty The views and conclusions contained in this document...»

«Copyright by Teresa Taylor Partridge The Dissertation Committee for Teresa Taylor Partridge Certifies that this is the approved version of the following dissertation: Infant EEG Asymmetry Differentiates Between Attractive and Unattractive Faces Committee: Judith H. Langlois, Supervisor Rebecca Bigler Catharine Echols Jennifer Beer S. Natasha Beretvas Infant EEG Asymmetry Differentiates Between Attractive and Unattractive Faces by Teresa Taylor Partridge, B. A.; M. S.; M. A. Dissertation...»


«The Development of Environmental Comprehensive Multiphase Nuclear Magnetic Resonance Spectroscopy by Hussain Masoom A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Chemistry University of Toronto © Copyright 2015 by Hussain Masoom Abstract The Development of Environmental Comprehensive Multiphase Nuclear Magnetic Resonance Spectroscopy Hussain Masoom Doctor of Philosophy Graduate Department of Chemistry University of Toronto...»

«LANGUAGE OUTCOMES IN SCHOOL-AGED CHILDREN ADOPTED FROM EASTERN EUROPEAN ORPHANAGES by Susan D. Hough B.A. Pennsylvania State University, 1976 M.A. University of Pittsburgh, 1979 Submitted to the Graduate Faculty of Education in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2005 UNIVERSITY OF PITTSBURGH SCHOOL OF EDUCATION This dissertation was presented by Susan D. Hough It was defended on April 25, 2005 and approved by Dr. Naomi...»

«THE “SECOND QUINTET”: MILES DAVIS, THE JAZZ AVANT-GARDE, AND CHANGE, 1959-68 A DISSERTATION SUBMITTED TO THE DEPARTMENT OF MUSIC AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Kwami Taín Coleman August 2014 © 2014 by Kwami T Coleman. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons AttributionNoncommercial...»

«Practical Guidelines for Multiplicity Adjustment in Clinical Trials Michael A. Proschan, PhD, and Myron A. Waclawiw, PhD Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, Maryland ABSTRACT: Multiplicity in clinical trials may appear under several different guises: multiple endpoints, multiple treatment arm comparisons, and multiple looks at the data during interim monitoring, to name a few. It is well recognized by statisticians and nonstatisticians alike...»

«CLARENCE THOMAS AND THE FIFTH AMENDMENT: HIS PHILOSOPHY AND ADHERENCE TO PROTECTING PROPERTY RIGHTS Nancie G. Marzulla* I. INTRODUCTION Since the appointment of Clarence Thomas by President Bush in 1991, there has been a significant amount of scholarly writing devoted to the nation's second African-American Supreme Court Justice. Most of this scholarship, however, seems devoted either to the writings and views of Justice Thomas on race relations 1 or to the controversy surrounding his...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.