«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
It is desirable to assess the performance of each component separately and thus, for the iris radius and corners extraction algorithms that depend on previous outputs (pupil location and pupil location, iris radius respectively), they were taken from the validation data set such that there is no interference from errors from other components. The validation test data set was composed of nine (9) different subjects, of which two subjects were female and seven male, one subject was Black, one Asian and the rest seven Caucasian. Finally, two subjects were in the 35group and the other seven subjects were in the 25-35 age group.
Beyond assessing each individual component separately, it is also desirable to assess the performance of the eye-tracker as a whole. Thus, the 2D gaze angle calculation algorithm was evaluated by comparing the 2D gaze angle calculated using inputs (pupil position, iris radius, corners locations) as calculated by the eye-tracker versus using inputs from the validated data set.
Finally, in order to assess the usability performance of the eye-tracker as hardware, a simple questionnaire was designed and distributed to the subjects that have used the eye-tracker.
The evaluation of the REACT eye-tracker can thus be split into the following parts:
1. Evaluation of the pupil detection algorithm
2. Evaluation of the iris boundary detection algorithm
3. Evaluation of the corner detection algorithm
4. Evaluation of the 2D gaze vector calculation algorithm
5. Evaluation of the eye-tracker hardware usability
TESTBED AND SAMPLE VIDEO COLLECTIONCollecting sample videos was an important process in order to develop, test and evaluate the REACT eye-tracker. In order to do this, a software application was designed that projects points with known locations on a screen. By placing the subject at a fixed distance from the screen and aligning the centre of the projected points and the subject’s direct point of gaze, it is possible to consistently generate extreme eye-movements that cover regular intervals of the complete 360° view.
The aforementioned test bed was implemented with a NEC MT1065 projector which is configured with a special mirror to project images from the back of an 84” screen.
The latter configuration was essential to collecting video samples successfully from this test bed:
The subject needs to be located fairly close to the screen such that it is possible to generate “extreme” eye-movements outside of the normal field of view, as a simulation of eye-movements that occur during thinking. For the same reason, running the test on a regular 17” desktop screen would not suffice as the generated eye-movements would be restricted in a narrow field of view and the screen is required to be large.
Using a regular ceiling-mounted projector configuration, it is impossible to display the screen correctly at the same time as recording the subject’s eye-movements as part projector’s beam would be occluded by the subject’s head and shoulder. Thus, projecting to the screen from the back was absolutely vital.
FIGURE 24: SUBJECT STANDING IN FRONT OF THE PROJECTOR BEFORE RUNNING A RECORDINGSESSION.
The test bed software was designed to display a set of twelve points from an imaginary circle that fills the screen as shown in Figure 24. If the centre circles coincides with the centre of a Cartesian plane, then the points start at 0° and continue at 30° increments to complete a full circle. The display order of the points is randomized to simulate a more realistic testing environment for the eye-tracker and prevent the subjects moving their eyes in anticipation of the next eye-movement.
At the same time as displaying the points on screen, the video input from the eye-tracker is being
recorded to disk. The synchronization is as follows:
Subject calibration phase. All twelve points and the centre of the circle are displayed all at once. The researcher asks subjects to indicate where the centre of the circle ought to be such that it is the focus of their direct gaze. Then, subjects are asked to make sure they can look at each and every point without moving their head or straining their eyes. If necessary, the circle radius is adjusted to allow subjects to keep their head still and comfortably view all the points throughout the recording. Video recording is off during this phase. When subject are ready, they are asked to press a key to move on to the next phase.
Tracker calibration phase. Once subject calibration is completed, the centre of the circle is displayed once again and subjects are asked to press a key when their eyes are focused on the centre point. This is done such that the first recorded eye-movement is that of the person looking straight ahead and the tracker may be calibrated. At the beginning of this phase, video recording is initiated. On key press, the next phase is initiated.
100 Circle phase. The twelve points are displayed one by one; subjects are asked to focus on each point and press a key when ready. When the key press occurs, the next point is displayed. When all points have been displayed, the key press initiates the last phase.
Final phase. This phase is identical to the Tracker calibration phase and was added in case consistency checks between the two pupil positions (original and repeated) were later required. At the end of this phase, video recording is turned off and the software exits.
At the same time as recording the video from the eye-tracker, the coordinates of the point currently displayed on screen are also recorded. Keystrokes may be detected by the change of displayed point. Figure 25 illustrates the recording software in operation.
FIGURE 25: THE RECORDING SOFTWARE IN OPERATION; CALIBRATION (LEFT) AND RECORDING(RIGHT).
EVALUATION OF THE PUPIL DETECTION ALGORITHMIn order to evaluate the robustness and the accuracy of the pupil detection algorithm, the pupil was manually marked in a set of nine (9) sample videos from nine (9) different subjects, giving a total of 12,334 frames. The pupil was marked on each frame by manually drawing an adjustable transparent ellipse over the pupil and then automatically calculating the centre of the drawn ellipse as the pupil centre, as illustrated in Figure 26.
The tracker was set to process all the frames in the above data set and the error was quantified as the Euclidean distance between the calculated pupil position and the manually marked position. The errors were then classified to negligible, acceptable and unacceptable to aid the interpretation of the results. As negligible were considered any errors less than or equal to 2 101 pixels because it is estimated that errors up to 2 pixels may be a result of the manual marking;
despite the use of manual software that eases the marking task, it is impossible to mark the data set 100% accurately as a) in some cases the exact pupil boundaries cannot be distinguished and
b) the boundaries are often not bound to one single pixel. As acceptable were considered any errors between 2 and 8 pixels and as unacceptable any errors over 8 pixels.
FIGURE 26: MANUAL MARKING OF THE PUPIL CONTOUR AND LOCATION IN THE VALIDATION VIDEOS.
Table 10 and Table 11 below display the error and error classification statistics. Specifically, in Table 10, the first column designates the test sample, the second column displays the processed number of frames for the particular sample, the third and fourth column the average error and standard deviation for the sample respectively and the fifth column displays the maximum error across the frames of the particular sample. Finally, the last row of Table 10 shows the average error, standard deviation and maximum error for the whole data set taken together. In Table 11, a similar format is used; the sample is shown on the first column, the total number of frames on the second column, followed by the number of frames of each class and the number of frames that failed to process. The final row of Table 11 shows the total number of frames for each class, both as a cardinal number and a percentage.
As can be seen from the tables above, the pupil detection algorithm performs quite well despite using a global threshold; on average, the error is 2.04 ± 3.32 pixels. The good performance of the algorithm can be attributed to the use of infrared lighting and the accompanied dark-pupil effect.
102 Poor performance may only be observed in sample Subj09 where the error for 326 of the 1361 total frames has been classified as unacceptable. This was due to poor placement of the camera;
specifically, the camera was erroneously placed almost directly in front of the eye. Not only does that obscure the subject’s vision more than placing it pointing upwards but it creates a reflection similar to that of the bright-pupil effect introduced in Chapter 3 (and similar to the red-eye effect seen in photographs) which causes the thresholding and snake to perform poorly, as shown in Figure 27.
The maximum error is quite large (over 20 pixels) for test sequences except Subj02 and Subj08;
this is simply due to a failure of the pupil detection algorithm. In rare cases, the fit error calculated for each ellipse favours an image blob that is not the pupil, as shown in Figure 28.
However, as shown in Table 12 which illustrates the errors for each test sequence as bar graphs, this is extremely rare (in the order of 2-3 frames per sequence, for most sequences). Additionally, such high errors rarely occur for two consecutive frames. Thus, such errors could easily be filtered when tracking over time by defining a maximum pupil movement between frames and discarding frames that exceed this value or alternatively by using a polynomial model to predict the current pupil position based on the movement from the last few frames and discarding frames for which the tracked position does not fit the model. As explain in Chapter 4 however, this is not as important for the pupil as it is for the eye corners.
Finally, where the eye-tracker has failed to detect the pupil (11 of 12,334 frames or 0.09%), it means that after filtering by size, no connected components were found in the image. If this effect were of any significant occurrence, it could be remedied by lowering the minimum blob size threshold if no components were found with the default setting.
FIGURE 27: ILLUSTRATION OF POOR CAMERA PLACEMENT CAUSING PUPIL DETECTION TO GIVE HIGH
ERRORS. SUCCESSFUL INITIALISATION IS SHOWN ON THE LEFT WHILE PARTIAL BRIGHT PUPIL EFFECTIS SHOWN ON THE RIGHT (SAMPLE: SUBJ09).
FIGURE 28: EXAMPLE OF A FAILED SELECTION OF THE PUPIL DETECTION ALGORITHM.
The iris boundary detection algorithm was evaluated in a similar way to the pupil detection algorithm. The iris boundaries were manually marked in the same sample set as used for the pupil detection algorithm (only frames where the subject is looking straight ahead were used; as already mentioned in Chapter 4, this is a requirement for the iris boundary detection algorithm – a total of 1,856 frames) and the iris radius was automatically extracted from the marked boundaries as the ½ of the difference between the two x-coordinates. Figure 29 illustrates the software used to mark the iris test data set.
The tracker was set to process all the frames in the above data set and the error was quantified as the difference between the calculated iris radius and the iris radius calculated from the manually marked boundaries. The errors were then classified to negligible, acceptable and unacceptable to aid the interpretation of the results. As negligible were considered any errors less than or equal to 2 pixels because it is estimated that errors up to 2 pixels may be a result of the manual marking; despite the use of manual software that eases the marking task, it is impossible to mark the data set 100% accurately as a) in some cases the exact iris boundaries cannot be distinguished and b) the boundaries are often not bound to one single pixel. As acceptable were considered any errors between 2 and 8 pixels and as unacceptable any errors over 8 pixels.
FIGURE 29: MANUAL MARKING OF THE IRIS BOUNDARIES.
111 Table 13 and Table 14 below display the error and error classification statistics. Specifically, in Table 13, the first column designates the test sample, the second column displays the processed number of frames for the particular sample, the third and fourth column the average error and standard deviation for the sample respectively and the fifth column displays the maximum error across the frames of the particular sample. Finally, the last row of Table 13 shows the average error, standard deviation and maximum error for the whole data set taken together. In Table 14, a similar format is used; the sample is shown on the first column, the total number of frames on the second column, followed by the number of frames of each class and the number of frames that failed to process. The final row of Table 14 shows the total number of frames for each class, both as a cardinal number and a percentage.
As can be seen from the tables above, the iris radius detection algorithm performs very well and consistently; in all of the test sequences, the error is never high enough to be classified unacceptable. On average, the iris radius error is 2.11 ± 1.42 pixels and the maximum error across all sequences is less than 8 pixels, 7.92 pixels to be exact.
Table 15 shows the error distribution for each test sequence using bar graphs.