«Ear Recognition Biometric Identiﬁcation using 2- and 3-Dimensional Images of Human Ears Anika Pﬂug Thesis submitted to Gjøvik University College ...»
Similarly to the segmentation experiment, we must conclude that the accuracy of the pose estimation is not satisfactory. Even though we expect to ﬁnd only full proﬁle images, the pose estimator returns a different (and thus wrong) result in 55% of the cases. Based on these results, we decided not to use the pose estimator at all in the ﬁnal system. We believe that the pose estimation could be improved, if we would train a detector using the video frames from our scenario.
2.6.3 Algorithm Performance of Ear Recognition We evaluate the algorithm performance in two evaluation settings that differ by the selection of the input data. In the ﬁrst evaluation setting we use manually cropped ROIs, which means that we do not have a segmentation error. The second evaluation is using input 1 http://mozart.dis.ulpgc.es/Gias/MODESTO/DetectDemousingViolaJonesxmlfiles.zip 2 http://www.ics.uci.edu/ ˜xzhu/face/ 3 http://www.multipie.org/
2. THE GES-3D PROJECT
data, that has been labelled with quality level L1. These images do hence contain a normalized image of the ear. We use 3D models as references and video data as probes. The CMC curves of the two evaluation settings are shown in Figure 2.17.
We observe that the performance of the two data sets is similar. The difference in the slope of the curve is only due to the smaller number of samples in the L1 set. The recognition performance is lowered by differences in the pose and a low image quality of the ear ROIS. As already mentioned in the section on detection performance, super resolution could be applied in order to obtain ROIs with a higher quality (i.e. sharper edges, less blur though block artefacts).
When looking at the recognition performance, we must conclude that the high recognition rates from the baseline experiments (see chapter 8), could not be achieved with our test data and the GES-3D system. Further analysis of the system performance in the upcoming section discusses a number of different reasons for this.
2.6.4 System Performance The system performance experiment treats the entire ear recognition system as a black box. We evaluate the system using videos or mugshots as references and 3D images as probes. We also provide a baseline performance for the existing system, where mugshots are used as references and video stream are used as probes. The error rates reported in this experiment are composed of the failure to enrol (FTE) and the false match/false non-match rate (see Appendix B).
We deﬁne the following three test cases. The performance rates for the test cases are summarized in Figure 2.18. These results could be reproduced in the black-box test on unseen data (the remaining 100 subjects), which was conducted by our project partner Fraunhofer IGD.
• 3D references and probes from video: 200 rendered right ear images subjects from 200 subjects as reference and 200 video ﬁles with camera viewpoint 3 (one video per subject with 4 frames per second) as probe images. The remaining three viewpoints were excluded from the experiment, because the ear regions were too small (viewpoint 1) or the quality was reduced through interlacing (viewpoint 4) and motion blur (viewpoint 2).
• 3D references and probes from mugshots: 400 rendered ear images from left and right ears of 200 subjects as reference and the corresponding 400 mugshot images from 5-part sets (also showing left and right ears of 200 subjects).
• Mugshots as references and probes from video: 200 right proﬁle mugshots and from 200 subjects as reference images and 200 video ﬁles with camera viewpoint three (one video per subject with 4 frames per second) as probe images. This test case represents the typical search operation as it is done in current forensic identiﬁcation systems.
In general, the system performance evaluation is consistent with the algorithm performance. The performance rates in all test cases are poor (close or equal to the coin toss), especially for test cases where the rendered 3D images are used as references. We conclude that segmentation errors play a minor role in the overall system performance, though they may have been more inﬂuential if the overall system performance was be higher. The GESD system is the ﬁrst of its kind and may be regarded as a reference system for future research projects on the practical application of ear biometrics.
We also observe a noticeable difference between the CMC curve of the test case using 3D images as references (black curve) and the test case using the mugshots as references (dotted curve).
This may have several reasons:
Figure 2.18: Cumulative match characteristic (CMC) of the system performance for different combinations of media types.
• Normalization errors : The normalization algorithm may have introduced a small alignment error. In conjunction with the local histogram model, that is used for comparison, these normalization errors have a considerable impact on the comparison scores and hence on the recognition performance.
• Resolution and quality : The video quality of a full frame from the surveillance camera has a high quality, however the distance to the subject is so big that the ear region is small. Due to this, the resolution of the ear images from the video is low, which leads to blurry and noisy images. Some frames also contain strong motion blur and compression artefacts. These degradations of the texture information signiﬁcantly lowers the recognition performance. Rendered 3D models may also contain deﬁciencies from the 3D representations, such as missing surface patches.
• Cross media / cross sensor comparisons : Although the usage of 3D data as references has the potential to improve the robustness to pose variations, different imaging technologies for probe and reference images are used. This means that we have to deal with cross-sensor artefacts (including different image compression techniques).
The consequence of this is that feature vectors from rendered 3D texture have different properties than feature vectors from the video stream. We must assume that the feature vectors contain a bias that is introduced by the properties of the imaging technique
• Pose variations : Pose variations remain a problem in the current system. The poses in the rendered 3D images and the video frames are slightly different and hence are an important factor that lowers the performance of the system. The impact of pose variations can be clearly quantiﬁed when comparing the performance rate of comparisons between mugshots and rendered 3D images on the one hand and 3D images and video frames on the other hand.
2.7 Conclusion The performance rates obtained with the GES-3D datasets are far behind the biometric performance, we obtained using laboratory data (see 8). Although the term ”real-life data” is frequently used in academic literature, many of the described algorithms and systems are tuned towards the dataset and its underlying constraints. Our results clearly indicate
2. THE GES-3D PROJECT that the performance rates of ear recognition for a more realistic dataset are far behind the performance that is reported for academic data. Keeping in mind that ear recognition is a valuable amendment for the forensic analysis of facial images, this should be a motivation to continue working on ear recognition in surveillance scenarios. We are aware that our data is also collected in a speciﬁed laboratory setting and true ”real-life data” is likely to be even more challenging. Compared to the performance achieved by the face recognition modules from our project partners, the ear recognition module is also behind of what is already possible for face recognition. Within the limitations in resolution and pose variations (self-occlusion), the ear recognition module in GES-3D could certainly be improved.
Ear recognition is a promising characteristic, in particular for forensic identiﬁcation.
The results from GES-3D stress the strong connection between capture scenario and the performance that can be expected from the recognition system. Even though we achieved high recognition performance in our experiments using academic datasets, we could not reproduce these results with the GES-3D system. Ear recognition systems from CCTV footage needs further research efforts, which should particularly focus on the question of how pose variations affect the appearance of the outer ear. Moreover, the availability of a suitable dataset would be a valuable contribution to this goal. Ear recognition systems could also beneﬁt from existing technology for face recognition, especially in unconstrained scenarios, where the face recognition community is several steps ahead.
Ear Biometrics: A Survey of Detection, Feature Extraction and Recognition Methods This paper provides an elaborate overview of the state of the art in ear recognition in 2012, when this project was launched and it intended to answer research questions Q0: What is the current state of the art in ear recognition?
This work gives an overview of available databases and compares a large selection of previous work on segmentation and recognition with respect to the approaches and their performance indicators. It concludes with a section that outlines future challenges in the ﬁeld. Please refer to Appendix C for an additional Survey of the progress in the ﬁeld since the publication of this paper.
The paper was published in ANIKA PFLUG, CHRISTOPH BUSCH, Ear Biometrics - A Survey of Detection, Feature Extraction and Recognition Methods, IET Biometrics, Volume 1, Number 2, pp. 114-129.
The possibility of identifying people by the shape of their outer ear was ﬁrst discovered by the French criminologist Bertillon, and reﬁned by the American police ofﬁcer Iannarelli, who proposed a ﬁrst ear recognition system based on only seven features.
The detailed structure of the ear is not only unique, but also permanent, as the appearance of the ear does not change over the course of a human life. Additionally, the acquisition of ear images does not necessarily require a person’s cooperation but is nevertheless considered to be non-intrusive by most people.
Because of these qualities, the interest in ear recognition systems has grown significantly in recent years. In this survey, we categorize and summarize approaches to ear detection and recognition in 2D and 3D images. Then, we provide an outlook over possible future research in the ﬁeld of ear recognition, in the context of smart surveillance and forensic image analysis, which we consider to be the most important application of ear recognition characteristic in the near future.
3.1 Introduction As there is an ever-growing need to automatically authenticate individuals, biometrics has been an active ﬁeld of research over the course of the last decade. Traditional means of automatic recognition, such as passwords or ID cards, can be stolen, faked, or forgotten.
Biometric characteristics, on the other hand, are universal, unique, permanent, and measurable.
The characteristic appearance of the human outer ear (or pinna) is formed by the outer helix, the antihelix, the lobe, the tragus, the antitragus, and the concha (see Figure 3.1).
The numerous ridges and valleys on the outer ear’s surface serve as acoustic resonators.
For low frequencies the pinna reﬂects the acoustic signal towards the ear canal. For high frequencies it reﬂects the sound waves and causes neighboring frequencies to be dropped.
Furthermore the outer ear enables humans to perceive the origin of a sound.
The shape of the outer ear evolves during the embryonic state from six growth nodules.
Its structure, therefore, is not completely random, but still subject to cell segmentation. The inﬂuence of random factors on the ear’s appearance can best be observed by comparing the left and the right ear of the same person. Even though the left and the right ear show some similarities, they are not symmetric .
The shape of the outer ear has long been recognized as a valuable means for personal identiﬁcation by criminal investigators. The French criminologist Alphonse Bertillon was the ﬁrst to become aware of the potential use for human identiﬁcation through ears, more than a century ago . In his studies regarding personal recognition using the outer ear in 1906, Richard Imhofer needed only four different characteristics to distinguish between 500 different ears . Starting in 1949, the American police ofﬁcer Alfred Iannarelli conducted the ﬁrst large scale study on the discriminative potential of the outer ear. He collected more than 10 000 ear images and determined 12 characteristics needed to unambiguously identify a person . Iannarelli also conducted studies on twins and triplets, discovering that ears are even unique among genetically identical persons. Even though Iannarelli’s work lacks a complex theoretical basis, it is commonly believed that the shape of the outer ear is unique. The studies in  and  show that all ears of the investigated databases possess individual characteristics, which can be used for distinguishing between them.
Because of the lack of a sufﬁciently large ear database, these studies can only be seen as hints, not evidence, for the outer ear’s uniqueness.
Research about the time-related changes in the appearance of the outer ear has shown, that the ear changes slightly in size when a person ages . This is explained by the fact that with ageing the microscopic structure of the ear cartilage changes, which reduces the skin elasticity. A ﬁrst study on the effect of short periods of time on ear recognition  shows that the recognition rate is not affected by ageing. It must, however, be mentioned