FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 3 | 4 || 6 | 7 |   ...   | 33 |

«Ear Recognition Biometric Identification using 2- and 3-Dimensional Images of Human Ears Anika Pflug Thesis submitted to Gjøvik University College ...»

-- [ Page 5 ] --

Figure 2.10: Workflow diagram of our biometric service provider.

The workflow illustrates the interfaces between different programming languages.

GES-3D training set and negative images that contain randomly cropped parts of the background from the GES-3D scenario. The segmentation step is finalized by a post-processing step, where the most likely region of interest (ROI) is selected from each image. This selection relies on the fact that the process of capturing the image series is highly constrained.

These constraints allow us to make assumptions about the ROI size, aspect ratio and its position in the image. Using these assumptions, we check the returned ROIs for plausibility.

A ROI is only selected if it complies with the assumptions on size, aspect ratio and position.

Otherwise we consider the ROI as a false positive and discard it.

Video streams: For video streams, we first detect a profile face view, as shown in the example image the upper row of Figure 2.13. This is done in two steps: (1) a face profile detector selects frames that contain profile views within a given sub region of the video frame. This constraint is based on the assumption, that the CCTV camera does not move and that each subject, who wants to use the ATM is standing at the same distance from the camera. (2) For each successfully detected profile face, we now run an ear detector within the extended profile face region. Although this means that we discard some frames that could have been used for ear recognition, a sufficiently large number of true positives is left. In case the outer ear is visible, we select the ear region from the profile face ROI. If no ear can be detected, the entire frame is discarded.

3D head models: If the input media is a 3D model, we render pairs of depth and texture images for discrete camera viewpoints. Examples for the left and the right side view of the model are shown in Figure 2.11. The renderer assumes that the null pose of the model is normalized in such a way, that the normal of the nose tip points towards the camera (see Figure 2.11: left column).

For each rendered pose, we segment the ear region by fusing information from the texture and from the depth channel (see Chapter 5).

–  –  –

Figure 2.11: Rendered image with null-pose (left), right pose with a 90 degree rotation (middle) and left pose with a -90 degree rotation (right) from a 3D head model - the left part of the figure shows texture channel and the right part shows the corresponding depth data.

Figure 2.12: Example of a segmented ROI and the result after normalization with CPR.

2.5.3 Normalization After segmenting the ear region, the orientation of the ear needs to be normalized with respect to rotation and scale. As shown in [148], the correction of these rotations is important for the recognition accuracy, because a misaligned ear will be likely to cause wrong decisions in the comparison process. For the same reason, hair or skin around the ear should be removed before extracting features.

In the normalization step, we use Cascaded Pose Regression (CPR) for removing the margin and to compensate for small rotations [148, 62]. CPR is used for fitting an ellipse on the ear, where the major axis connects the two far-most points of the lobule and the outer helix with each other. We then rotate the ROI, such that the major axis is vertical and then cut off all pixels that are outside of the enclosing rectangle of the ellipse. This leaves us with images, with a minimal margin around the ear and where all ears are oriented in exactly the same way. Finally, all images are resized to 100 × 100 pixels. This facilitates the subsequent feature extraction step and has no impact on the system accuracy. For video images, we also try to estimate the head pose using a series of tree structured models with a shared pool of landmarks [222]. For pose estimation, the algorithm tries to find the best tree for the given image and returns the corresponding pose. As a side effect, the tree structured model can be used for assuring the quality of the region of interest for the ear region, because it allows us to check whether the position of the ear is plausible.

2.5.4 Feature Extraction After normalizing the input images, we apply Local Phase Quantization (LPQ) [10] for obtaining a fixed length histogram of local texture features. The concept behind LPQ [10] is to transform the image into the Fourier domain and to only use the phase information in the subsequent steps. For each pixel in the image, we compute the phase within a predened local radius and quantize the phase by observing the sign of both the real and the imaginary part of the local phase. Similarly to LBP, the quantized neighbourhood of each pixel is encoded as an 8-bit binary string and stored in a code image. The code image hence


–  –  –

Figure 2.13: Illustration of the single processing steps for computing the features of an image (frame) from a 2D video stream.

Processing of other input media only differs in the segmentation step.

contains values between 0 and 255.

We divide the image into equally sized sub-windows and compute the LPQ code image for each window. Every code image is then represented as a histogram with 256 bins. The histograms for all local windows are finally concatenated in order to obtain the overall feature vector. For a 20×20 window size and an overlap of 10 pixels, this results in a 16384 fixed-length dimensional feature vector.

For removing redundant data from this large descriptor, we project the histogram into a lower-dimensional space using Linear Discriminant Analysis (LDA). The projected histogram in LDA space is the final feature vector. This feature vector is stored in a database along with some meta data, such as the pose of the enrolled image, the identifier of the subject, the identifier for the capture process and the input media type. The lower row of Figure 2.13 illustrates the feature extraction process and the feature subspace projection.

2.5.5 Search The search task takes either a single 2D image or a video sequence and returns a ranked list of identities, where a lower list rank indicates a lower likelihood that the subject in the probe video and the reference are from the same source.

If the input medium is a video, we obtain as many feature vectors as we find video frames with an ear. Each feature vector is compared with the each of the reference images in the database. For comparison, we use a NN-classifier with cosine distance. For each feature vector from the input media, we obtain a list of distances between the feature vector and all the templates in the database. Each of these lists with distances is sorted in ascending order.

We only retain the lowest n distances, where n is specified by the user when initiating the search.

Let there be m sorted lists with length n. We fuse these lists by counting the number of occurrences at given ranks for each identity. Note that we distinguish between identities and images. We may have several images with the same identity (i.e. showing the same subject), and each identity is represented by at least one image. For each valid ROI in the input media, we obtain a sorted list of possible candidates. We iterate through each sorted list and assign a score to each identity according to the position in the list. Identities with higher ranks obtain a higher score than identities with lower ranks. After assigning a score to each identity in each list, we create a fused list by summing up the scores for each identity. Finally we sort the identities in the combined list according to the score in descending order and finally return the n identities with the largest score.


Figure 2.14: Example images for Quality levels 1 (left) until 6 (right) that serves as the basis for the evaluation of the detection accuracy. All of these examples are taken from the video stream.

2.6 System Performance Evaluation The proposed ear recognition system is evaluated with the dataset described in Section 2.3.

In our experiments, we focus on camera pose three, which gives us a profile view of the subjects, while they are using the ATM. Figure 2.2 illustrates the workforce of the system and shows an example for each media type.

We evaluate each step of the system separately, starting with the segmentation step for each media type. In a second experiment, we evaluate the accuracy of the pose estimation module and the third experiment we provide results for the recognition system. Finally, our last experiment provides results on the recognition performance of the complete system, including errors introduced by the segmentation, pose estimation and recognition step.

2.6.1 Ear Detection The data that we obtain from our project partner is not labelled, such that we do not have a ground truth for evaluating the detection performance, such as in Chapters 4, 5, 6 and

7. We would also like to know more about the typical types of errors that we get and see whether there is any trend toward a certain type of error for a particular media type. We distinguish between six different quality levels for the region of interest (ROI), which are denoted as L1, L2, L3, L4, L5 and L6. An example for each of the quality levels can be found in Figure 2.14. Quality levels L3,L4, L5 and L6 represent a failure to capture (FTC) and the

failure to extract (FTE). In the following list, the quality levels are defined:

• L1: The ear is located in the center of the ROI and the pose is a full profile view.

• L2: The ear is fully covered by the ROI, but it is either not centered or the image is off-pose.

• L3: Parts of the ear are cut off or occluded, but more than half of the ear is still visible.

• L4: Major parts of the ear are cut off. The data is not sufficient for being used in the ear recognition system and should be dropped.

• L5: Major parts of the ear are occluded. The data is not sufficient for being used in the ear recognition system and should be dropped.

• L6: Something else has been detected (False positives) The test set for the videos contains 5566 images (single video frames) from 200 subjects.

The test sets for the mugshots and for the rendered 3D images contains 400 left and right ears from the same 200 subjects.

For the 3D images and the 2D mugshots, the quality level probabilities are similar, because the capture settings are following the same constraint. For the video sequences, the probabilities for a given quality level differ significantly.

The portion of correctly segmented ears ranges from 40% to 60% of the mugshots and rendered 3D images, which is still unacceptably low, especially for the enrolment set. The


–  –  –

0.2 9) ) ) )

–  –  –


–  –  –

Figure 2.17: Cumulative match characteristic (CMC) of manually cropped ROI and ROIs that have been labelled with L1 in the segmentation and the normalization step.

The rank-1 identification rate for L1 ROIs is 10.35% and the rank-10 identification rate is 55.17%.

–  –  –

large portion of Failure To Enrol (FTE, also see Appendix B) in the video stream can be compensated for by the high number of frames, such that we must discard a large number of images, but we have a sufficient number of frame. We obtain between between 10 and 30 frames per video that contain a correctly segmented ear.

The main reason for the poor performance is the small number of training images that we have at hand for training the detector for the mugshots and the video streams. During our experiments, we compared different detectors. One of them was the pre-trained haar-cascade from OpenCV 1. We also trained a detector using the ear data from different publicly available ear datasets. A third detector was trained using positive and negative sample images from the test dataset. As expected, the latter detector outperformed the other two. The detection performance could be improved, if a sufficient number of training subjects for both scenarios would be available.

Further, the detection performance may suffer from the low resolution and high noise in the video streams. The resolution of the video frames could, for instance, be improved by applying a super resolution technique [76].

In the case of the 3D segmentation, we observe a large number of cases where parts the ear are cut off in the ROI. This is a typical limitation of the segmentation algorithm, which we already observed in previous experiments (see chapter 5). We could be able to minimize the number false positive detections by optimizing the parameters of the algorithm to the scale and resolution properties of the test dataset. The parameters for the 3D detection algorithm could be adapted to the particular capture setting in order to improve the detection accuracy.

2.6.2 Pose Estimation For evaluating the accuracy of the pose estimation step, we only use ROIs of quality level L1 from the previous segmentation step. Hence, we only have images containing full profile views of a subject (-90 of +90 degrees yaw pose). This knowledge serves as our ground truth in this experiment.

Again, we distinguish between different classes of errors that are characterized by the difference between the estimated viewing angle and the actual viewing angle. In this experiment, we use a pre-trained model that is publicly available for download 2. This model is optimized for the MultiPie dataset 3 and is supposed to work for face regions lager than 150 × 150 pixels. We also evaluated models with fewer parts, but we found that the best performance could be achieved with the previously mentioned model. This model is able to locate 41% of the ROIs, which means that we could not estimate a pose for the remaining 59% of the ear images. The results on the accuracy of the pose estimation attempts, where an ear is detected are summarized in Figure 2.6.1.

Pages:     | 1 |   ...   | 3 | 4 || 6 | 7 |   ...   | 33 |

Similar works:

«OXFORD READINGS IN PHI LOSOPH Y Of Edited by BASIL MITCHELL THE PHILOSOPHY OF RELIGION Edited by BASIL MITCHELL OXFORD UNIVERSITY PRESS O;llfotd University Press, Walton Street, Oxford OX2 6DP Oxford New York Toronto Delhi Bombay Calcutta Madras Karachi Petaling Jaya Singapore Hong Kong Tokyo Nairobi Dar es Salaam Cape Town Melbourne Auckland and associated companies in Beirut Berlin Ibadan Nicosia Oxford is a trade mark of Oxford University Press ISBN 0 19 875018 8 © Oxford Universay Press...»

«Torah Praxis after 70 C.E.: Reading Matthew and Luke-Acts as Jewish Texts by Isaac Wilk Oliver A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Near Eastern Studies) in The University of Michigan 2012 Doctoral Committee: Professor Gabriele Boccaccini, Chair Professor Raymond H. Van Dam Assistant Professor Ellen Muehlberger Assistant Professor Rachel Neis Professor Daniel Boyarin, University of California, Berkeley To my Father, Benoni...»

«Natural Language Processing Tools for Reading Level Assessment and Text Simplification for Bilingual Education Sarah E. Petersen A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2007 Program Authorized to Offer Degree: Computer Science & Engineering University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Sarah E. Petersen and have found that it...»

«EQUALIZING CHILD SEX RATIOS IN INDIA: UNDERSTANDING THE TRENDS, DISTRIBUTION, COMPOSITION, AND POTENTIAL DRIVERS By: Nadia Diamond-Smith, MSc A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland January, 2014 ABSTRACT Child sex ratios have been falling in recent decades in India, leading to an increasing number of missing girls. Although the country as a whole is becoming more imbalanced, in almost a...»

«Anxious Records: Race, Imperial Belonging, and the Black Literary Imagination, 1900 – 1946 Victoria J. Collis Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY © 2013 Victoria J. Collis All rights reserved ABSTRACT Anxious Records: Race, Imperial Belonging, and the Black Literary Imagination, 1900 – 1946 Victoria J. Collis This dissertation excavates the print and archive culture of...»

«RECYCLED BRICK MASONRY AGGREGATE CONCRETE: USE OF RECYCLED AGGREGATES FROM DEMOLISHED BRICK MASONRY CONSTRUCTION IN STRUCTURAL AND PAVEMENT GRADE PORTLAND CEMENT CONCRETE by Tara Lani Cavalline A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Infrastructure and Environmental Systems Charlotte Approved by: _ Dr. David C. Weggel _ Dr. Helene A. Hilger _ Dr. Vincent O....»

«POSITIVE YOUTH DEVELOPMENT AND YOUTH-PROFESSIONAL RELATIONSHIPS: EXPLORING THE NATURE OF STRENGTHS-BASED PRACTICES WITH CHILDREN AND YOUTH FROM PROFESSIONALS’ PERSPECTIVES BY C 2008 Jeong Woong Cheon B.A., Chungang University (Seoul, Republic of Korea), 1982 M.A., Chungang University (Seoul, Republic of Korea), 1984 Ph.D., Chungang University (Seoul, Republic of Korea), 1994 Submitted to the School of Social Welfare and the Faculty of the Graduate School of the University of Kansas in partial...»


«The Operas of J. N. von Poissl (1783-1865) Aesthetics and Ideology Martin John Pickard Submitted in accordance with the requirements for the degree of Doctor of Philosophy The University of Leeds School of Music May 2012 ii The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may...»

«Citizen Evaluation of Local Government Performance and Service by Catherine McNamara A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved April 2012 by the Graduate Supervisory Committee: Nicholas Alozie, Co-Chair N. Joseph Cayer, Co-Chair Joanna Lucio ARIZONA STATE UNIVERSITY May 2012 © 2012 Catherine McNamara All Rights Reserved ABSTRACT Government performance and accountability have grown to be predominant areas within public...»

«RELATIONAL TRANSFER IN REINFORCEMENT LEARNING by Lisa Torrey A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON May 2009 i ACKNOWLEDGMENTS This research was partially supported by DARPA grant HR0011-04-1-0007, United States Naval Research Laboratory grant N00173-06-1-G002, and DARPA grant FA8650-06-C-7606. Special thanks to my advisor, Jude Shavlik, to my collaborators Trevor...»

«Environment and Planning D: Society and Space 2014, volume 32, pages 739 – 752 doi:10.1068/d13111p Organismic spatiality: toward a metaphysic of composition Tano S Posteraro Department of Philosophy, Pennsylvania State University, University Park, State College, PA 16801, USA; e-mail: tano.sage@gmail.com Received 29 August 2013; in revised form 7 December 2013 Abstract. The task of this paper is the construction of a theory of organismic spatiality. I take as a starting point Gilles...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.