«Ear Recognition Biometric Identiﬁcation using 2- and 3-Dimensional Images of Human Ears Anika Pﬂug Thesis submitted to Gjøvik University College ...»
Biometric Identiﬁcation using 2- and 3-Dimensional Images of
Thesis submitted to Gjøvik University College
for the degree of
Doctor of Philosophy in Information Security
Faculty of Computer Science and Media Technology
Gjøvik University College
Ear Recognition - Biometric Identiﬁcation using 2- and 3-Dimensional Images of Human
Ears / Anika Pﬂug
Doctoral Dissertations at Gjvik University College 2-2015
ISBN: 978-82-8340-007-6 ISSN: 1893-1227 We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard.
(J. F. Kennedy) Declaration of Authorship I, Anika Pﬂug, hereby declare that this thesis and the work presented in it is entirely my own. Where I have consulted the work of others, this is always clearly stated.
Summary The outer ear is an emerging biometric trait that has drawn the attention of the research community for more than a decade. The unique structure of the auricle is long known among forensic scientists and has been used for the identiﬁcation of suspects in many cases.
The next logical step towards a broader application of ear biometrics is to create automatic ear recognition systems.
This work focuses on the usage of texture (2D) and depth (3D) data for improving the performance of ear recognition. We compare ear recognition systems using either texture or depth data with respect to segmentation and recognition accuracy, but also in the context of robustness to pose variations, signal degradation and throughput. We propose a novel segmentation method for ears where texture and surface information are fused in the feature space. We also provide a reliable method for geometric normalization of ear images and present a comparative study of different texture description method and the impact of their parametrization and the capture settings of a dataset. In this context, we propose a fusion scheme, were ﬁxed length spectral histograms are created from texture and surface information.
The proposed ear recognition system is integrated into a demonstrator system as a part of a novel identiﬁcation system for forensics. The system is benchmarked against a challenging dataset that comprises of 3D head models, mugshots and CCTV videos from four different perspectives. As a result of this work, we outline limitations of current ear recognition systems and provide possible direction for future applied research.
Having a complete ear recognition system with optimized parameters, we measure impact of image quality on the accuracy during ear segmentation and ear recognition. These experimentsfocus on noise, blur and compression artefacts and are hence only conducted on 2D data. We show that blur has a smaller impact on the system performance than noise.
In scenarios where we work with compressed images, we show that the performance can be improved by optimizing the size of local image patches for feature extraction and the size of the compression artefacts.
This thesis is concluded by work on automatic classiﬁcation of ears for the purpose of narrowing the search space in large datasets. We show that classiﬁcation of ears using texture descriptors is possible. Furthermore, we show that the class label is inﬂuenced by the skin tone, but also by the capture settings of the dataset. In further work, we propose a method for the extraction of binary feature vectors of texture descriptors and their application in a 2-stage search system. We show that our 2-stage system improves the recognition performance, because it removes images from the search space that would otherwise have caused recognition errors in the second stage.
I would like to express my gratitude towards CASED and Hochschule Darmstadt, who hosted me for four years and provided me with everything I needed for conducting my research. I am grateful for the opportunities I had at Gjoøvik University College, where my research in biometrics started back in 2010 with my master thesis on vein recognition. This would not have been possible without the grateful and unprejudiced support of Christqoph Busch and Daniel Hartung. During the entire time of my studies, I received invaluable feedback and guidance from Christoph.
Further, I would like to say thank you to all the project partners in GES-3D and my coworkers and lab mates Chris Stein and Xuebing Zhou. It was a pleasure working with you and I really do not know, how the project would ever have been completed without your commitment.
During my work at Hochschule Darmstadt, I was happy to supervise Adrian, Johannes, Philip and Ulrich, who were excellent students. All of them have done great work and provided valuable support for my research. Along with these great students, I am grateful to be part of the da/sec research group, where I was working together with great lab mates, ¨ such as Andreas, Bjorn, Christan D., Christan R., Jessica and Martin.
I would also like to thank Prof. Arun Ross for giving me the opportunity to visit his lab at Michigan State University. Thanks to my lab mates Ajita, Antitza, Asem and Thomas, I was having a wonderful time in Michigan.
Finally, special thanks go to my family for their support during my entire life. Finally, I would like to thank Kim, who accompanied me though all the years and gave a meaning to so many things.
11.1 Data acquisition scenario................................. 133
11.2 Maximum intensities of blur and/ or noise...................... 135
11.3 EERand IR for different intensities of blur, noise and combination of these.... 138
9.1 Properties of the Poly-U palmprint datbase and the UND-J2 ear database and the number of resulting identiﬁcation attempts..................... 118 Identiﬁcation rates and hit rates for various values of L (in %) for PolyU-MS 9.2 (top) and UND-J2 (bottom) and feature extraction algorithms using k most reliable bits during comparison................................ 119
1.1 Introduction Being originally used for forensic identiﬁcation, biometric systems evolved from a tool for criminal investigation into a number of commercial applications. Traditional means of automatic recognition such as passwords or ID cards can be stolen, faked or forgotten. Contrary to this, a biometric characteristic should be universal, unique, permanent, measurable, high-performing, acceptable for users and it should be as hard as possible to circumvent the biometric authentication system. Today we ﬁnd ﬁngerprint recognition system in cellphones, laptops and front doors. With the increased dissemination of biometric systems, the recognition performance in the ﬁeld of forensics has also improved during the last years.
Biometric systems have helped to identify the two suspects of the Boston Bombings in 2013  and also for identifying a suspect for a series of robberies in gas stations in the Netherlands . In the latter case, ear recognition played an important role for the identiﬁcation of the suspect.
The outer ear as a biometric characteristic has long been recognized as a valuable means for personal identiﬁcation by criminal investigators. The French criminologist Alphonse Bertillon was the ﬁrst to investigate the potential for human identiﬁcation of ears more than one century ago in 1890 . In 1893 Conan Doyle has published an article in which he describes particularities of ears from selected famous people and argues that the outer ear, just like the face, reﬂects properties of a person’s character . In his studies about personal recognition using the outer ear in 1906, Richard Imhofer only needed four different characteristics to distinguish 500 different ears . Later on in 1964 the American police ofﬁcer Alfred Iannarelli collected more than 10 000 ear images and determined 12 characteristics to identify a subject . Iannarelli also conducted studies on twins and triplets, where he discovered that ears are even unique among genetically identical subjects. Scientiﬁc studies conﬁrm Iannareli’s assumption, that the shape of the outer ear is unique , and it is also assumed to be stable throughout a human life time .
In numerous research papers from the last decade, it could be shown that the biometric recognition performance achievable with automated ear recognition systems is competitive to face recognition systems . Lei et al. have also shown that the outer ear can be used for gender classiﬁcation .
In many criminal cases there is no further evidence than a CCTV video where we see the perpetrator committing a crime. Such footage can only be used as evidence in court, if the perpetrator’s identity can be determined from the CCTV footage without a doubt.
Criminal investigators usually try to combine information coming from witnesses or the victim with cues from the CCTV footage. Biometric identiﬁcation is a helpful tool for this task, however the uncontrolled conditions in CCTV videos remain a difﬁcult setting for every automated identiﬁcation system. For this purpose, national police ofﬁces maintain a database of criminals with mugshots that serve as the biometric reference during an automatic search. Typical difﬁculties for automatic identiﬁcation in this particular use case are for example pose variations, sparsely lit scenes, image compression, blur and noise.
If there is no further information available but the CCTV footage, the German criminal police (Bundeskriminalamt or short BKA) tries to identify suspects with the help of an auBIOMETRIC EAR RECOGNITION Figure 1.1: CCTV footage of a man who stole money from several ATMs in the Rhein-Main area. The right image is a close up of the video frame on the left. This image was taken in a bank in Frankfurt. Note that face identiﬁcation is impossible, because the subject in wearing a baseball cap. (image source: BKA. The case was closed in August 2014, but the video is still available at  tomatic face recognition system, called GES (An abbreviation of ”Gesichtserkennungssystem”, which is German for face recognition system). The reference images for GES are collected by criminal police ofﬁcers during crime investigation processes and are stored in a central database that is maintained in the premises of Bundeskriminalamt (BKA) in Wiesbaden. A set of reference images consists of a series of mugshots that contains at least one frontal portrait image, a half proﬁle and a full proﬁle image (from the right side). Newly collected datasets follow a new standard, where left and right half proﬁle, as well as left and right full proﬁle images are acquired .
In order to protect CCTV equipment from vandalism, cameras are often installed in corners or underneath the ceiling. Figure 1.1 shows an example of such video footage from a real case, where we see a man leaving a bank after he robbed an ATM. CCTV cameras are usually arranged to deliver face images. Perpetrators, who are aware of the presence of a CCTV camera will avoid to directly look into the camera and some times wear hats in order to cover their faces. This means that investigators frequently have to work with half proﬁle or proﬁle views where the face can be partly or fully occluded. In such scenarios, ear recognition can be a valuable amendment to existing face recognition systems for identifying the subject.
As long as the video contains a single frame, where the subject’s face is clearly visible from one of the angles that match with one of the reference mugshots, automatic identiﬁcation has a chance to succeed. In practice, however, clear images from a deﬁned pose are rarely the case. Typical countermeasures against being identiﬁed are baseball caps, hats or scarves. Additionally, the resolution of surveillance cameras in relation to the monitored region can be small and we may encounter additional signal degradations, such as interlacing and blur.
In a small study, which was conducted at Darmstadt central station between September 2013 and June 2013, we tried to estimate the average probability that the outer ear is visible in public. The observation took place in the entrance hall of the train station where people walk through a door. At different times during the day, we counted the number of visible and occluded ears. We also made notes about the type of occlusion and the weather outside.
We observed that the ear was fully visible in 46% of the cases (5431 observations in total).
At the same time, the probability of occlusion is highly dependent on the gender. Whereas the ears of women were only fully visible in 26.03% of the cases, the ears of men were fully visible in 69.68% of the cases. More details about this study can be found in Appendix A. With forensic identiﬁcation of suspects in mind, this result is still encouraging because 74.3% of the suspects in Germany in 2013 were male .
1.2 FORENSIC IDENTIFICATION USING EAR IMAGESTable 1.1: Level of support for imagery as evidences at court as agreed upon by the FIAG.
The table is taken from 
1.2 Forensic Identiﬁcation Using Ear Images In order to be valid in court, any imagery should provide an objective evidence that is independently measurable and veriﬁable . Many courts require an estimation of the strength of evidence within an image from an independent expert. Without such an expertise, CCTV footage is not accepted as a valid evidence. Table 1.1 shows an example of different levels of certainty, which comply with the standard of the Forensic Imagery Analysis GROUP (FIAG). In an expertise, a certainty level is assigned to each trait in the image.