«Ear Recognition Biometric Identiﬁcation using 2- and 3-Dimensional Images of Human Ears Anika Pﬂug Thesis submitted to Gjøvik University College ...»
The conclusion of a forensic expertise is a likelihood estimation of the probability that the suspect and the subject in the video are one and the same person. The expert testimony can be supported by an automated identiﬁcation system, but an automated decision may never be the only source of evidence. Automatic identiﬁcation is usually used for selecting the most likely candidates (usually between 50 and 100 subjects) that are then further examined by a forensic expert. According to the Daubert Standard, the way of how an expert came to a given conclusion must be clear and comprehensive for the judge and it should be based upon sufﬁcient facts. Secondly, any sources of error must be transparent and the principle that was used for the conclusion must be an established scientiﬁc standard.
Finally the principle must be applicable for the particular case .
The probe presentation of the outer ear can either be analyzed directly from any imagery where it is clearly visible or ear prints can be taken from surfaces. Ear prints can be left on windows or doors, when burglars try to make sure that the house is empty .
Meijerman has shown that Ear prints are individual, but are dependent on different factors such as pressure and temperature, which can result in a high intra-class variation . In Germany, there are several example cases, where ear prints have helped the investigators to identify the suspects, such as in Osnabrueck 1 and Hamburg 2.
The evidential values of ear prints is heavily discussed for several reasons. Firstly, ear prints are usually left before a crime is committed and are hence not necessarily a proof that the subject who left the ear print is the same subject who committed the crime. Secondly, there is also no indication of the time when the ear print was left . Finally it is argued that comparing actual ears and an image of the ear is different from comparing two ear prints. The tissue of the outer ear is soft and additional factors such as pressure, moist, the lifting process and surface structure have an inﬂuence on the appearance of the ear print. In research on ear recognition, images of known subjects are compared with each other, showing that the appearance of the ear is unique enough to distinguish between the subjects in the database. There are controversial discussions whether ear prints are equally unique and permanent .
For the purpose of identifying subjects from CCTV images, a systematic way to describe 1 http://www.ksta.de/html/artikel/1218660630642.shtml 2 http://www.n-tv.de/panorama/Ohrabdruck-ueberfuehrt-Einbrecher-article6144406.
1. BIOMETRIC EAR RECOGNITIONthe outer ear is essential. Such standard descriptions exist for all kinds of characteristics that can be observed in imagery, including the outer ear. Such description methods consist of a list of relevant landmarks (e.g. concha, outer helix, tragus). During the analysis, the forensic expert describes the appearance of each part and then gives an estimate of how similar the representations in the probe CCTV image and the reference image are (usually the mugshot from the database but sometime other images are used as well). For the ear lobe such descriptions could be for instance: hanging, attached, partly attached. The sum of all of these descriptions together with an estimated similarity between the suspect and the subjects in the reference image are summarized in an expertise that can be used at court. For good reasons, a expertise must be prepared from a human expert and not by an automated system. Consequently, current ear analysis concepts are highly optimized towards manual analysis.
In forensic identiﬁcation, biometric identiﬁcation systems are used for retrieving the most likely candidates from a large database. Current systems mainly rely of facial images for reducing the number of possible suspects. The reliability of these systems is reduced by pose variations and occlusions, but also by low image contrast and compression artefacts.
Automated identiﬁcation systems with full 3D images as references can provide a poseindependent view of the subject, which can potentially result into more accurate estimates of the similarity by offering the possibility of adjusting the pose of the reference image to the pose of the probe image. Such an estimation could also include the shape of the outer ear, especially in cases where only half proﬁle views of the suspect are available. In such a scenario, ear recognition is a valuable amendment to facial descriptors that enables forensic experts to use all the evidences available in the probe image.
As soon as a list of the most likely suspects is available, evidence may be manually collected with photogrammetry or superimposition of tracings. In photogrammetry, we measure the precise distances between landmarks in a pair of images. The analysis must ensure that only identical points are compared and, in case of pose variation, two images with the same pose are needed. If a 3D model of the subject is available, we could also use re-rendered view of the model. In some cases it may also be possible to compensate slight pose variations by applying afﬁne transformations to the reference image. For superimposition of tracings, the outlines of the ear are extracted and then ﬁtted onto another image (presumably from the database or another crime scene). Subsequently, the analyst checks how well the outlines of the two images match. When analyzing face images with this method, the analyst can also investigate the symmetry two half faces from two different images . The techniques described above currently are mostly applied on facial images, but may - in principle - be used for any type of imagery, including ear images.
1.3 Goals of This Work This work aims at exploring new techniques for 2D and 3D ear recognition. We focus on, but are not limited to forensic identiﬁcation from CCTV footage. Instead of 2D mugshots, we assume that police station have full 3D head models stored in their forensic databases.
With this background we investigate possibilities to combine 2D and 3D information with the goal of increasing the performance of ear recognition systems with respect to the segmentation accuracy, normalization accuracy and recognition rates. We combine 2D and 3D information (rendered depth images) by exploiting the fact that depth and texture information are co-registered in rendered views of the 3D model and propose different ways of how these information channels can be combined. In order to measure the virtues of combining depth and texture information, we compare the performance rates of our algorithm with the performance accomplished with 2D data or 3D data only. We further analyze the statistical properties of ﬁxed length histogram features and propose a generic method for creating binary representations for a more efﬁcient search technique. We apply these binary feature vectors in a sequential search approach, where the binary feature vectors are used
1.4 STRUCTURE for creating a short list of the most likely candidates and the real-valued features are used for reﬁning the search within the short list. An additional focus is set on the impact of image quality (i.e. blur and noise) on segmentation and recognition performance. Finally, we explore the suitability unsupervised clustering for classiﬁcation of ﬁxed length histogram features.
The goals of the thesis are can be summarized with the following research questions:
Q1: How can the outer ear be automatically detected from 2D and 3D images?
Q2: How can cropped ear images be normalized with respect to rotation and scale?
Q3: Is it possible to combine 2D and 3D data in order to obtain a better descriptor that yields a better performance than 2D or 3D alone?
Q4: How can ear templates be represented in order to enable fast search operations in large datasets?
Q5: Which impact does signal degradation have on the performance of ear recognition systems?
Q6: Is it possible to automatically ﬁnd categories of ear images?
As an extension to our research results, we develop a demonstrator ear recognition module that is part of a multi-modal face and ear recognition system. This system is evaluated and tested using a challenging dataset that is collected and provided by forensic experts from the German criminal police. This dataset is comprised of 3D models as reference data. Mugshots and CCTV videos are used as probe data. The dataset represents a typical scenario in forensic identiﬁcation, where an unknown subject is to be identiﬁed from a video sequence. We explore the virtues and limitations of ear recognition in this scenario and point out future directions for forensic ear recognition systems.
1.4 Structure This thesis is divided into three parts. In the remainder of this ﬁrst part of the document, we will give an overview of the publications and contributions in the context of this work.
Subsequently, we give an overview of the GES-3D project, which was conducted in the context of this work, including an explanation of system requirements, the image capture system, the workﬂow of our biometric service provider and some concluding remarks on the overall performance of the system.
The structure of the second part of the document roughly follows the general work ﬂow in a biometric pipelines as proposed in the ISO/IEC SC37 SD11 standard document  (a brief summary can be found in the Appendix B). The structure of this thesis is also summarized in Figure 1.2. The ﬁgure will show up in each chapter in part II and is intended to guide the reader through the structure of this thesis and maintain a link between the single publications and the research questions (see previous Section 1.5).
We start with an elaborate overview of the state of the art. A brief update of this survey is given later in this chapter in chapter C. We start with the initial segmentation step. For segmentation, we propose a novel ear detection method, where depth and texture information is combined as expressed as a number of shape descriptors. We select the shape that is in the largest cluster of the best-rates shapes in the image. We also propose a sliding window technique using a circular detection window and evaluate it with respect to its robustness against rotations.
The segmentation step is concluded with a geometric normalization approach that does not rely on any symmetry constraints. We show that the outer ear can be normalized with this approach by measuring the recognition rates of a simple texture descriptor. We then move forward to the feature extraction step and present an evaluation of different texture
1. BIOMETRIC EAR RECOGNITION
Figure 1.2: Illustration of the structure of this thesis.
At the beginning of each chapter, we highlight one or several processing steps and the topics that are discussed.
descriptors in combination with selected subspace projection techniques. We benchmark the parameter sets for selected texture descriptors with three different datasets. Moreover, we propose a new descriptor that creates a ﬁxed length histogram descriptor from surface and texture data.
The Chapters 9, 10, 11 and 12 of the thesis concentrate on applications and further investigations on the basis of the aforementioned ear recognition system. We ﬁrst propose a binarization method for histogram features and then focus on the impact of signal degradation on the performance of segmentation and recognition with respect to noise and blur.
Finally, we examine different texture feature spaces for clustering tendencies with the goal of providing an unsupervised classiﬁcation scheme for ear biometrics.
The thesis is concluded with part III. This part summarizes the ﬁndings in part II and gives an outlook to future work and remaining challenges for 2D and 3D ear recognition.
1.4.1 List of Publications 126.96.36.199 Attached Research Articles • ANIKA PFLUG, CHRISTOPH BUSCH, Ear Biometrics - A Survey of Detection, Feature Extraction and Recognition Methods, IET Biometrics, Volume 1, Number 2, pp.
114-129 • ANIKA PFLUG, ADRIAN WINTERSTEIN, CHRISTOPH BUSCH, Ear Detection in 3D Proﬁle Images Based on Surface Curvature, International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2012
• ANIKA PFLUG, PHILIP MICHAEL BACK, CHRISTOPH BUSCH, Towards making HCS Ear detection robust against rotation, International Carnahan Conference in Security Technology (ICCST), 2012 • ANIKA PFLUG, CHRISTOPH BUSCH, Segmentation and Normalization of Human Ears using Cascaded Pose Regression, Nordic Conference on Secure IT Systems (NordSec), 2014 • ANIKA PFLUG, PASCAL N. PAUL AND CHRISTOPH BUSCH, A comparative Study on Texture and Surface Descriptors for Ear Biometrics, International Carnahan Conference in Security Technology (ICCST), 2014 • JOHANNES WAGNER, ANIKA PFLUG, CHRISTIAN RATHGEB, CHRISTOPH BUSCH, Effects of Severe Signal Degradation on Ear Detection, 2nd International Workshop on Biometrics and Forensics (IWBF), 2014 • ANIKA PFLUG, JOHANNES WAGNER, CHRISTIAN RATHGEB AND CHRISTOPH BUSCH, Impact of Severe Signal Degradation on Ear Recognition Performance, Biometrics, Forensics, De-identiﬁcation and Privacy Protection (BiForD), 2014 • ANIKA PFLUG, ARUN ROSS, CHRISTOPH BUSCH, 2D Ear Classiﬁcation Based on Unsupervised Clustering, In Proceedings of International Joint Conference on Biometrics (IJCB), 2014 • ANIKA PFLUG, CHRISTIAN RATHGEB, ULRICH SCHERHAG, CHRISTOPH BUSCH, Binarization of Histogram Models: An Application to Efﬁcient Biometric Identiﬁcation, Conference on Cybernetics (CYBCONF), 2015 188.8.131.52 Additional Research Articles •  CHRISTOPH BUSCH, ANIKA PFLUG, XUEBING ZHOU, MICHAEL DOSE, MICHAEL ¨ BRAUCKMANN, J ORG HELBIG, ALEXANDER OPEL, PETER NEUGEBAUER, KATJA LEOWSKI, HARALD SIEBER, OLIVER LOTZ, Multi-Biometrische Gesichtserkennung, 13.