«Ear Recognition Biometric Identiﬁcation using 2- and 3-Dimensional Images of Human Ears Anika Pﬂug Thesis submitted to Gjøvik University College ...»
Another example for ear detection using contour lines of the ear is described by Attrachi et al. . They locate the outer contour of the ear by searching for the longest connected edge in the edge image. By selecting the top, bottom, and left points of the detected boundary, they form a triangle with the selected points. Further the barycenter of the triangle is calculated and selected as reference point for image alignment. Ansari et al.
also use an edge detector in the ﬁrst step of their ear localization approach . The edges are separated into two categories, namely convex and concave. Convex edges are chosen as candidates for representing the outer contour. Finally the algorithm connects the curve segments and selects the ﬁgure enclosing the largest area for being the outer ear contour.
It should be noted that the IITK database and USTB II already contain cut-out ear images.
Hence it can be put into question, whether the detection rates of 93.34% and 98.05% can be reproduced under realistic conditions.
A recent approach on 2D ear detection using edges is described by Prakash and Gupta in . They combine skin segmentation and categorization of edges into convex and concave edges. Afterwards the edges in the skin region are decomposed into edge segments.
These segments are composed to form an edge connectivity graph. Based on this graph the convex hull of all edges, which are believed to belong to the ear, is computed. The enclosed region is then labeled as the ear region. In contrast to , Prakash and Gupta prove the feasibility of edge-based ear detection on full proﬁle images, where they achieved a detection rate of 96.63% on a subset of the UND-J2 collection. In  propose the same edge connectivity for ear recognition on 3D images. Instead of edges, they use discontinuities in the depth map for extracting the initial edge image and then extract the connectivity graph. In their experiments, they use the 3D representations of the same subset as in  and report a detection rate of 99.38%. Moreover they show that the detection rate of their graph-based approach is not inﬂuence by rotation and scale.
Jedges and Mate propose another edge-based ear detection approach, which is likely to be inspired by ﬁngerprint recognition techniques. They train a classiﬁer with orientation pattern, which were previously computed from ear images. Like other naive classiﬁers, their method is not robust against rotation and scale. Additionally the classiﬁer is likely to fail under large pose variations, because this will affect the appearance of the orientation pattern.
Abaza et al.  and Islam et al.  use weak classiﬁers based on Haar-wavelets in connection with AdaBoost for ear localization. According to Islam et al., the training of the classiﬁer takes several days, however once the classiﬁer is set up, ear detection is fast and effective. Abaza et al. use a modiﬁed version of AdaBoost and report a signiﬁcantly shorter training phase. The effectiveness of their approach is proved in evaluations on ﬁve different databases. They also include some examples of successful detections on images from the internet. As long as the subject’s pose does not change, weak classiﬁers are suitable for images which contain more than one subject. Depending on the test set Abaza et al. achieved a detection rate between 84% and 98.7% on the Shefﬁeld Face database. On average, their approach successfully detected 95% of all ears.
Yan and Bowyer developed an ear detection method which fuses range images and corresponding 2D color images . Their algorithm starts by locating the concha and then uses active contours for determining the ear’s outer boundary. The concha serves as the reference point for placing the starting shape of the active contour model. Even though the concha is easy to localize in proﬁle images, it may be occluded if the head pose changes or if a subject is wearing a hearing aid or ear phones. In their experiments Yan and Browyer only use ear images with minor occlusions where the concha is visible; hence it could neither be proved nor disproved whether their approach is capable of reliably
3.3 EAR DETECTION detecting ears if the concha is occluded.
Yuan and Mu developed a method for real-time ear tracking in video sequences by applying Continuously Adaptive Mean Shift (CAMSHIFT) to video sequences . The CAMSHIFT algorithm is frequently used in face tracking applications and is based on region matching and a skin color model. For precise ear segmentation, the contour ﬁtting method based on modiﬁed active shape models, which have been proposed by Alvarez et al. is applied . Yuan and Mu report a detection rate of 100%, however the test database only consisted of two subjects. Nevertheless their approach appears to be very promising for surveillance applications but needs to be further evaluated in more realistic test scenarios.
Shih et al. determine ear candidates by localizing arc-shaped edges in an edge image.
Subsequently the arc-shaped ear candidates are veriﬁed by using an Adaboost classiﬁer.
They report a detection rate 100% on a dataset, which consists of 376 images from 94 subjects.
Zhou et al. train a 3D shape model in order to recognize the histogram of shape indexes of the typical ear . Similarly to the approaches of Abaza et al. and Islam et al., a sliding window of different sizes is moved over the image. The ear descriptor proposed by Zhou et al. is built from concatenated shape index histograms, which are extracted from subblocks inside the detection window. For the actual detection, an SVM classiﬁer is trained to decide whether an image region is the ear region or not. As far as we know, this is the ﬁrst ear detection approach, which does not require having corresponding texture images in addition to the range image. Zhou et al. evaluated their approach on images from the UND collections and report a detection rate of 100%. It should be noted that this approach was not tested under rotation, pose variations and major occlusions, but under the impression of the good performance, we think this is an interesting task for future research.
Ear detection methods based on image transformations have the advantage of being robust against out-of-plane rotations. They are designed to highlight speciﬁc properties of the outer ear, which occur in each image where the ear is visible no matter in which pose the ear has been photographed. In  the Hough transform is used for enhancing regions with a high density of edges. In head proﬁle images, a high density of edges especially occurs in the ear region (see Figure 3.3). In  it is reported that the Hough transform based ear detection gets trapped when people wear glasses since the frame introduces additional edges to the image. This especially occurs in the eye and nose region. The ear detection approach based on Hough transform was evaluated on the images in the XM2VTS database (see  for a detailed database description), where a detection rate of 91% was achieved.
The ray transform approach proposed in  is designed to detect the ear in different poses. Ray transform uses a light ray analogy to scan the image for tubular and curved structures like the outer helix. The simulated ray is reﬂected in bright tubular regions and hence these regions are highlighted in the transformed image. However, the ray transform also highlights straight edges and edges from other objects, such as hair and glasses (see Figure 3.3). Using this method Alastair et al. achieved am impressive recognition rate of 98.4% on the XM2VTS database. Hence, the ray transform approach by Cummings et al.
outperforms Hough transform, most likely because it is more robust against disruptive factors such as glasses or hair.
A recent approach for 2D ear detection is described in . Kumar et al. propose to extract ears from 2D images by using edge images and active contours. They evaluate their approach on a database, which consists of 100 subjects with 7 images per subject. A special imaging device was used for collecting the data. This device makes sure that the distance to the camera is constant and that the lighting conditions are the same for all images. Within this setting a detection rate of 94.29% is reported.
When putting ear detection into practice, robustness against pose variation and occlusion is of great importance. Nevertheless, most of the ear detection methods described above were not tested with realistic occlusion scenarios, such as occlusion by hair, jewEAR BIOMETRICS: A SURVEY OF DETECTION, FEATURE EXTRACTION AND
ellery or headdresses. A possible reason for this may be the lack of databases containing appropriate images, but this gap has been ﬁlled recently by different working groups, who contributed appropriate datasets (see Section 3.2). Furthermore, to our best knowledge there are no investigations on the effect of occlusion in 3D ear images.
3.4 2D Ear Recognition Each ear recognition system consists of a feature extraction and a feature vector comparison step. In this survey we divide ear recognition approaches into four different subclasses namely holistic approaches, local approaches, hybrid approaches and statistical approaches.
In Tables 3.2 and 3.3 all 2D ear recognition approaches mentioned in this paper are summarized in chronological order.
3.4.1 Holistic Descriptors Another approach, which has gained some popularity is the Force Field Transform by Hurley . The Force Field transformation approach assumes that pixels have a mutual attraction proportional to their intensities and inversely to the square of the distance between them rather like Newton’s universal law of gravitation. The associated energy ﬁeld takes the form of a smooth surface with a number of peaks joined by ridges (see Figure 3.4.1).
3.4 2D EAR RECOGNITION Table 3.2: Summary of approaches for 2D ear recognition, part 1. Unless stated differently, performance always refers to rank-1 performance.
3. EAR BIOMETRICS: A SURVEY OF DETECTION, FEATURE EXTRACTION AND
RECOGNITION METHODSTable 3.3: Summary of approaches for 2D ear recognition, part 2. Unless stated differently, performance always refers to rank-1 performance.
Using this method, Hurley et al. achieved a rank-1 performance of more than 99% on the XM2VTS database (252 images). Building on these results, Abdel-Mottaleb and Zhou use a 3D representation of the force ﬁeld for extracting points lying on the peak of the 3D force ﬁeld . Because the force ﬁeld converged at the outline of the ear, the peaks in the 3D representation basically represent the ear contour. Nonetheless, the force ﬁeld method is more robust against noise than other edge detector, such as Sobel or Canny. Using this approach, Abdel-Mottaleb and Zhou achieved a rank-1 performance of 87.93% on a dataset with consists of 103 ear images from 29 subjects.
Dong and Mu  add pose invariance to the edges, which are extracted by using the force ﬁeld method. This is achieved with null space kernel ﬁshier discriminant analysis (NKFDA), which has the property of representing non-linear relations between two datasets. Dong and Mu conducted experiments on the USTB IV dataset. Before feature extraction, the ear region was cropped out manually from the images and the pose is normalized. For pose variations of 30 degrees they report a rank-1 recondition rate of 72.2%.
For pose variations of 45 degrees the rank-1 performance dropped to 48.1%.
In a recent publication of Kumar and Wu  they present an ear recognition approach, which uses the phase information of Log-Gabor ﬁlters for encoding the local structure of the ear. The encoded phase information is stored in normalized grey level images. In the experiments, the Log-Gabor approach outperformed force ﬁeld features and a landmarkbased feature extraction approach. Moreover, different combinations of Log-Gabor ﬁlters were compared with each other. The rank-e performance for the Log-Gabor approaches ranges between 92.06% and 95.93% on a database which contains 753 images from 221 subjects.
The rich structure of the outer ear results in speciﬁc texture information, which can be measured using Gabor ﬁlters. Wang and Yuan  extract local frequency features by using a battery of Gabor ﬁlters and then select the most distinctive features by using general discriminant analysis. In their experiments on the USTB II database, they compared the performance impact of different settings for the Gabor ﬁlters. Different combinations of orientation and scales in the ﬁlter sets are compared with each other and it was found that neither the number of scales nor the number of orientations has a major impact on the rankperformance. The total rank-1 performance of Wang and Yuan’s approach is 99.1%. In a similar approach Arbab-Zavar and Nixon  measured the performance of Gabor ﬁlters in the XM2VTS database where they report a rank-1 performance of 91.5%. A closer look at the Gabor ﬁlter response showed that the feature vectors are corrupted by occlusion or other disruptive factors. In order to overcome this, a more robust comparison method is proposed, which resulted in an improved recognition rate of 97.4%.