«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
The evaluation of the corner detection algorithm has been a significant challenge for the simple reason that the corners will change appearance in coordination to the subject’s eye morphology over time (both momentary e.g. blinks or eyelid movement because of eye movement as well as more lasting changes e.g. squinting). In other words, the corners may appear to have moved if the subjects squints or simply looks in a different direction and their eyelids move significantly. For the evaluation data set, frames of the subject is looking straight ahead (as already mentioned in Chapter 4, this is a requirement for the iris boundary and corner detection algorithms) were included and frames where the corners could not be clearly marked were excluded, giving a total of 1,856 frames.
In some cases, because of the particular eye morphology of the individual subjects, there were two candidates for the inner eye corner, as shown in Figure 30. As the right-most candidate changes appearance more often than the left-most candidate and in order to be consistent across-subjects, the left-most candidate was always marked thus marking the point nearest to the tear gland as the inner eye corner. Figure 31 illustrates the marking software in operation.
The tracker was set to process all the frames in the above data set and the error was quantified as the Euclidean distance between the calculated corners and the manually marked corners. The errors were then classified to negligible, acceptable and unacceptable to aid the interpretation of the results. As negligible were considered any errors less than or equal to 4 pixels. As acceptable were considered any errors between 4 and 10 pixels and as unacceptable any errors over 10 pixels.
Slightly higher thresholds (4 pixels versus 2 pixels for negligible and 10 pixels versus 8 pixels for acceptable/unacceptable) were used for the evaluation of the eye corners detection algorithm for
a) Just like the pupil and iris evaluation, manually marking these positions over several hundreds of frames is a somewhat error-prone process; there is no objective means to mark the exact location of the points in question and as careful as one may be, errors will occur in such a repetitive and tedious task. This problem is even more evident with the
In practice, it is hypothesised that the manual marking errors are higher than assumed but the above low-threshold values were used for the classification because this cannot be proved objectively and to produce conservative error measurements.
FIGURE 30: "TRUE" INNER EYE CORNER MAY HAVE TWO CANDIDATES, SHOWN IN RED.
120 Table 16, Table 18 and Table 17, Table 19 below display the error and error classification statistics for the inner and outer corner respectively. As before, in the error statics table, the first column designates the test sample, the second column displays the processed number of frames for the particular sample, the third and fourth column the average error and standard deviation for the sample respectively and the fifth column displays the maximum error across the frames of the particular sample. Finally, the last row shows the average error, standard deviation and maximum error for the whole data set taken together. In the error classification statistics table, the sample is shown on the first column, the total number of frames on the second column, followed by the number of frames of each class and the number of frames that failed to process.
The final row shows the total number of frames for each class, both as a cardinal number and a percentage.
Detecting the location of the eye corners is the hardest problem to tackle and this is indeed reflected in the error measurements from the corner detection, as shown in the tables above.
Both the inner and outer corner detection algorithm show very similar error measurements;
8.32 ± 5.78 pixels and 8.41 ± 5.40 pixels respectively. If it is considered that the algorithm subsamples the original image to a quarter of its size, this corresponds to 2-3 pixels of error in the down-sampled image, which is an acceptable result. The error is amplified when the result is up-sampled.
The maximum error across the whole data set is 50.24 pixels and 44.38 pixels for the inner and outer corner detection respectively. This shows that the algorithm will in some cases output false positives and this is the reason that the clustering algorithm was integrated into the corner detection as explained in Chapter 4.
The evaluation of the corner detection after clustering has been applied was performed in the same manner as the corner detection algorithm and the results are presented in tables 18 through 21. As can be seen, there is a significant reduction in error with the average error being
7.41 ± 3.78 pixels for the inner corner and 6.49 ± 3.21 pixels for the outer corner. Additionally, by filtering outliers (false positives), the maximum error was reduced to 16.76 pixels and 25.83 pixels respectively.
Figure 32 and Figure 33 graphically illustrate how different clustering methods and settings affect the average error measurements for the inner and outer corner detection respectively. The
As can be seen in this comparison:
Both clustering algorithms offer a significant improvement versus no clustering, especially when the error is high.
In cases of small error, the two clustering methods perform similarly.
In cases of larger error, the clustering method used in the eye-tracker performs better than the weighted sum clustering.
In most cases, there are no significant improvements offered for.
TABLE 21: CORNER DETECTION EVALUATION AFTER CLUSTERING (INNER CORNER) - ERROR
CLASSIFICATION STATISTICS, NUMBER OF FRAMES.
136 137 139 140 141 142 143 144
FIGURE 32: COMPARISON OF CLUSTERING MODES AND SETTINGS (INNER CORNER).
FIGURE 33: COMPARISON OF CLUSTERING MODES AND SETTINGS (OUTER CORNER).
The evaluation of the 2D gaze vector calculation algorithm is essentially an evaluation of the eyetracker as a whole. In the previous subsections, the individual components of the algorithm were isolated and individually assessed without requiring input from any other components. For example, the iris boundary detection algorithm requires as input the pupil centre location and pupil contour; during the evaluation of the algorithm, this input was derived from the manually marked data such that the error assessed is purely based on the iris detection algorithm and not of the pupil detection and iris boundary detection algorithms in cooperation.
In addition, the actual calculation of the 2D gaze vector from its inputs (initial pupil centre, initial pupil contour and eye corner locations) is performed in exactly the same manner both in the tracker and the manually marked data set. Thus, the accuracy of the 2D gaze vector calculation depends solely on the accuracy of its individual inputs. Having considered this, the error measured in this evaluation is the difference between the angle calculated from the set of eye features extracted by the tracker and the angle calculated from the set of eye features that were manually marked.
The table below illustrates the average, standard deviation and maximum error values (degrees) for the complete test data set; for this test, the twelve (12) different positions recorded for each subject were used.
On average, the gaze angle error is 2.78° with standard deviation 1.99°, which is a range that renders the eye-tracker practical for the target applications. In the worst case, the error reached a total maximum or 9.51°. It is important here to consider that a) the 2D gaze angle is affected both by errors in the pupil detection and errors in eye corner detection b) errors in manual marking of the pupil and eye corners as explained earlier will have also affected these measurements c) like any eye-tracker which requires subject calibration, the accuracy of the output is directly proportional to the accuracy of the calibration.
A better benchmark may have been to measure the error after the 2D gaze angle has been classified into several distinct classes; this benchmark is offered as part of the case study, in Chapter 6. Further, a comparison of the trackable range and accuracy between the REACT eyetracker and the SR Research EyeLink-II is presented in the next section of this chapter. While these systems were designed for dissimilar applications, this evaluation is offered as a comparative benchmark if the REACT eye-tracker were extended to track gaze on a screen. The REACT eye-tracker is the first eye-tracker that a) is specifically designed for tracking extreme, usually non-visual, eye-movements and b) is head-mounted but uses the eye corners as reference points instead of the glint.
TABLE 26: 2D GAZE VECTOR CALCULATION EVALUATION - ERROR STATISTICS, IN DEGREES.
COMPARISON OF THE REACT EYE-TRACKER WITH SR RESEARCH EYELINK-IIFor the evaluation of the eye-tracker to be complete, it is necessary to know how it performs compared to other systems and whether it satisfies the requirements identified earlier in this thesis. To this end, the REACT eye-tracker was compared to a commercial eye-tracker commonly used in psychology studies (e.g. Altmann and Kamide, 2007), the EyeLink-II by SR Research. The comparison was geared towards comparing the tracking ability and accuracy of the two eyetrackers over the full range of eye-movements possible.
The EyeLink-II has been primarily designed to track the subject’s gaze onto a screen and an addon is available to enable gaze tracking with a scene camera. On the other hand, as reviewed in this thesis, the REACT eye-tracker has been designed to track the subject’s direction of gaze irrelevant to the world as it is concerned with non-visual eye-movements when the subject is not 148 focused on a visual target within the environment. Because of the different target applications, for this comparison to take place, the REACT eye-tracker had to be extended and a gaze-mapping scheme needed to be implemented that could map vectors from the image to locations on a screen.
The homographic mapping and calibration scheme presented by Li et al. (2005) was implemented in order to provide this mapping. Similar to EyeLink, at the beginning of each experiment, nine (9) points are displayed on the screen (centre, corners and mid-points of each side). When the subject has fixated at each point (synchronized with a key press), the pupil position in the image is detected and the vector from the reference point to the pupil position is calculated and recorded. In contrast to the eye-tracker by Li et al. (2005), where the reference point is the corneal reflection, the REACT eye-tracker uses the middle point of the line that connects the inner and outer eye corner as reference. In brief, the mapping H is a 3x3 matrix that has eight degrees of freedom and is calculated by generating a constraint matrix using measured point correspondences and determining the null space of the constraint matrix through singular value decomposition; for further details the interested reader is referred to Li et al. (2005).
The EyeLink is able to operate in two modes:
a) Pupil-only mode. In this mode, only the pupil is tracked and its location in the image is mapped to the screen coordinates. According to the EyeLink specification (SR Research,
2009) the tracker is able to operate within a range of ±30° horizontally, ±20° vertically and 0.5° of accuracy.
b) Pupil and corneal-reflection (PCR) mode. In this mode, both the pupil and the corneal reflection are tracked and the vector between the two features is used to map gaze onto the screen. By using the corneal reflection as the reference point, the EyeLink can prevent the introduction of tracking errors from the slippage of its heavy headband (approx.
420gr; SR Research, 2009). However, using the corneal reflection or glint also makes it vulnerable to loss of tracking when the corneal falls onto the sclera and is particularly hard to detect, as explained in Chapter 3. In PCR mode and with head-tracking enabled, the EyeLink specification indicates a conservative tracking range of ±20° horizontally, ±18° vertically.
149 If the subject is placed 50cm away from a 24” screen, the tracking range of the EyeLink is approximately exhausted. A typical 24” screen measures approximately 52cm wide and 32cm tall, which at 50cm away from it, corresponds to a range of ±27° horizontally and ±18° vertically.
This presented a significant problem for this experiment; the EyeLink needs to be calibrated within this range8 but the SDK that ships with EyeLink attempts to make full use of the screen resolution. Thus, if a large screen is used, the calibration points will be placed well beyond the tracking capabilities of the EyeLink and if a small screen is used, a limited range will be tested.
The 84” screen previously used in this chapter provided us with a potential range of approximately ±60° horizontally and ±52° vertically (when the subject is placed 50cm away from the screen). To circumvent the above limitation of the EyeLink SDK, a false screen resolution of a virtual screen (384x288 pixels) was reported to EyeLink by the controlling software that was developed for the experiment and the maximum possible resolution was chosen for the screen (1600x1200 pixels). In this configuration, the virtual screen spanned approximately ±22.2° horizontally and ±16.95° vertically – a range within EyeLink’s tracking capabilities. Then, the calibration points dictated by the EyeLink SDK were translated such that they were correctly placed within the virtual screen, which is in turn placed in the centre of the actual screen as shown in Figure 34.