«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
Geometric calibration, which refers to determining the relative positions and orientations of the eye-tracker components (camera and light sources) and target surface (screen). As long as the geometry does not change, this needs to be calculated only once.
Personal or subject calibration, which refers to determining parameters specific to the individual, such as the cornea curvature and the angular offset between visual and optical axes. Such parameters need to be calculated once for each subject.
Gazing mapping calibration, which refers to determining the eye-to-surface mapping functions. As mentioned earlier, this is usually done by having the subject look at points on the target surface whose geometry is known.
A fully calibrated system is a system whose camera intrinsic parameters and geometry are known. A partially calibrated system is a system whose camera intrinsic parameters or geometry is known.
HEAD-MOUNTED EYE-TRACKERSMost of the head-mounted eye-trackers found in the literature have followed very similar designs and methodologies.
Ebisawa et al. (2002; Ebisawa, 1998) present a head-mounted tracker that uses active illumination and two light sources to alternatively produce the dark- and bright-pupil effect and using simple image processing algorithms, the positions of the pupil and glint are detected, the two feature points used to determine the gaze. A good example of the common misconception that the glint does not move when the eyeball moves, when in fact it does; in such cases it is erroneously implicitly assumed that the corneal surface is a perfect mirror and thus if the head is kept fixed, the glint will remain stationary even when the cornea rotates. In some cases, a simplifying assumption that the glint is stationary may however give satisfactory results.
Takegami et al. (2002) use a rigid setup where the subject has to rest his chin on a metal frame while holding onto it with their hands, similar to Ramdane-Cherif and Nait-ali (2008). The camera is calibrated and active illumination is used; the algorithm extracts the pupil contour, 43 from which the pupil flatness (ratio of major and minor axis of an ellipse) is calculated. In this paradigm, the eye is modelled as a sphere and the pupil as a circle. However, the pupil will only appear as a circle in the image if it is exactly coplanar with the camera lens. Otherwise, it appears as an ellipse and by determining the flatness of this ellipse, the subject’s gaze can be estimated. It is reported that in this setup no subject calibration is necessary.
Li et al. (2005) present an active illumination head-mounted eye-tracker and a novel method to estimate the glint and pupil feature points. Once the glint is located, radial lines are extended to locate candidate edge points for the pupil contour, which are then optimised using the Random Sample Consensus (RANSAC) algorithm. The distance between the glint and the pupil centre is then used to calculate 2D gaze on a screen.
Clarke et al. (2002) present a high-frequency (400Hz) head-mounted system that makes use of binocular expensive CMOS cameras that are placed on the side of the headset (the eye images are captured through mirrors). The system presented by Hansen and Pece (2005) is quite unique in that it is reported to be able to track in any lighting conditions and can switch between infrared and non-infrared configurations without changing its parameters. However, it uses particle filtering to do the tracking, which is generally computationally complex and is not easy to implement in real-time (Kwok et al., 2004).
Remote eye-trackers can be easily categorised between:
Trackers which use a single camera and none or a single infrared source (Collet et al., 1997; Heinzmann and Zelinsky, 1998; Kim and Ramakrishna, 1999; Matsumooto, 2000;
Ohno et al., 2002; Sirohey et al., 2002; Benoit et al., 2005; Sun et al., 2006; Wallhoff et al., 2006; Liang and Houi, 2007; Chen and Ji, 2008; Valenti et al., 2008 and Yamazoe et al., 2008).
Trackers which use a single camera but multiple infrared sources (White et al., 1993;
Morimoto et al., 2000; Morimoto et al., 2002; Park et al., 2005; Coutinho and Morimoto, 2006; Hennessey et al., 2006; Meyer et al., 2006; Li et al., 2007 and Ramdane-Cherif and Nait-ali, 2008).
44 Trackers which use multiple cameras and one or more infrared sources (Newman et al., 2000; Shih et al., 2000; Andiel et al., 2002; Clarke et al., 2002; Ji and Yang, 2002; Beymer and Flickner, 2003; Ishima and Ebisawa, 2003; Noureddin et al., 2004; Ohno and Mukawa, 2004; Shih and Liu, 2004; Yu and Eizenmann, 2004; Park and Kim, 2005; Yoo and Chung, 2005; Merad et al., 2006; Tsuji and Aoyagi, 2006; Park, 2007; Zhu and Ji, 2007; Chen et al., 2008; Guestrin and Eizenman, 2008; Kohlbecher and Poitschke, 2008;
Hennessey and Lawrence, 2009 and Nagamatsu, 2009).
One of the major problems with remote eye-trackers is movement of the subject’s head. Not only is the resolution of the eyes reduced because of the distance between the subject and the camera, but the subject’s head is able to move unrestrictedly and the eye-tracker must be able to cope with that if it is going to track the subject’s gaze successfully.
Single camera remote systems use a variety of different methods to compensate for headmovement and calculate the gaze.
Collet et al. (1997) detect the location of the eyes and nose and use these feature points to calculate face orientation and gaze. Several similar schemes appear in the literature; Heinzmann and Zelinsky (1998) use the mouth and eye corners, Wallhoff et al. (2006) use the eyes and mouth, Chen and Ji (2008) use the nose and eye corners, Valenti et al. (2008) use the eye corners only and Yamazoe et al. (2008) use the mouth, nose and eye corners. Head-pose estimation is done similarly with stereo camera systems; for example, Newman et al. (2000) use the eye corners and mouth corners.
In a screen setup where the distance of the subject from the screen is known, Kim and Ramakrishna (1999) use the point between the eyes to compensate for small head-movement and the iris length to calculate the distance between the camera and the eyeball. Finally, 3D gaze is calculated using the iris centre. Two similar setups are employed by Matsumoto (2000) who uses the eye corners, anthropological data and the iris radius to initialise a 3D eye model that is used then to estimate 3D gaze; the mouth and eye corners are used to estimate face orientation.
Wang et al. (2005) also calculate the iris radius from the image to facilitate a 3D model of the eyeball and use the eye-corners to disambiguate between two possible solutions for the gaze vector. The eye corners and iris centre are also used by a few other systems (Tian et al., 2000;
45 Benoit et al., 2005). Sirohey et al. (2002) present a system where the iris and eyelids are detected and tracked.
The system described by Ohno et al. (2002) is “traditional” in that is uses the glint and pupil feature points but also includes an eyeball model to calculate 3D gaze. The system by Sun et al.
(2006) uses a similar model which also includes the eye-corners. Neural networks have been used to determine gaze in some systems (e.g. Stiefelhagen et al., 1997).
A single-camera remote system that can classify eye-movements in the classes defined by the NLP EAC model (up left, up, up right, left, centre, right, bottom left, bottom and bottom right) is proposed by Liang and Houi (2007), which classifies gaze into eight (8) different classes by calculating the difference between the pupil looking forward and the current pupil location.
With increased cameras and/or light sources, it is possible to use 3D models that result in greater theoretical accuracy. The detailed description of these models and how they operate in a multi-glint or multi-camera setup is beyond the scope of this review and the interested reader is referred to the excellent reviews already available (Guestrin and Eizenman, 2006; Villanueva et al., 2007; Villanueva and Cabeza, 2007). In single-camera, multiple-glint cases, calibration to the subject is still required; in their work, Villanueva and Cabeza (2008) mathematically prove that a system with one camera and two glints requires a minimum of one calibration point to give geometrically correct results. Two systems that do not abide to this rule are the systems by Morimoto and Flickner, (2002), which is reported to have lower accuracy, and Kohlbecher and Poitschke (2008), which use the pupil ellipse to extract the 3D orientation. Systems with more than one camera and sources are able to operate calibration-free (Shih et al., 2000; Nagamatsu, 2009).
Arrays of infrared sources larger than two have been used in limited occasions; for example, Coutinho and Morimoto (2006) use five sources (four on the screen and one on the camera), Meyer et al. (2006) use four infrared sources, Li et al. (2007) use an array of 3x3 infrared sources, Guestrin and Eizenman (2008) use four infrared sources. These arrays are used either for the ability to calculate 3D parameters or to overcome the problem mentioned earlier when the glint is positioned in the sclera.
46 Appearance-based methods (Stiefelhagen et al., 1997; Tan et al., 2002; Xu et al., 1998) are an alternative approach to feature-based tracking reviewed so far. These methods attempt to detect and track the eyes by directly using their photometric appearance (either through image intensity or through its response to a filter), instead of extracting features from it. From the large list of appearance-based eye-trackers in the comprehensive review by Hansen and Ji (2010), only one was designed to work with a head-mounted eye-tracker (Hansen and Pece, 2005). The latter eye-tracker uses particle filtering to track gaze and while it is very robust, it is also very complex.
There are several reasons why appearance-based methods are less favoured for this application:
a) They usually require a large amount of training data.
b) They are used in remote eye-tracking systems which means that they would most likely require significant modification to be used with the close-up pictures of a head-mounted tracker. As will be discussed in more detail in Chapter 4, the task of modelling or detecting eye features becomes harder as the camera gets closer to the eye. First, the appearance of the eye-corners is significantly different when viewed close up than when viewed from a remotely placed camera and second, the change in the camera view angle may significantly change the appearance of the eye. Both changes would probably decrease the accuracy of an appearance-based approach or require ever larger training data sets.
c) They are much more difficult to evaluate as exact landmarks are not easily defined because they are based on contours.
The review of remote eye-trackers and the methods involved has been intentionally brief for two reasons. First, as it will be argued below, remote eye-trackers are unsuitable for this application and thus, delving into the complexities of such systems would only serve to deviate from the scope of this thesis. Second, there are already detailed reviews of such systems (Guestrin and Eizenman, 2006; Hansen and Ji, 2010).
EYE-TRACKER INVASIVENESSAt the top of the requirement list is the minimisation of invasiveness, the ability to track even the most extreme eye-movements and ease of use. While a formal definition of invasiveness has not
a) whether it requires contact to the eyeball or other parts of the body
b) whether it restricts any type of movement (e.g. head) and
c) if it is mounted on the head or body, how much it weighs and how long it takes before this becomes uncomfortable for the user With invasiveness defined by the aforementioned factors, a remote eye-tracker is the least invasive type of eye-tracker that can be developed as it is not mounted on the subject and thus does not impose any further weight. Also, as mentioned earlier, because most remote eyetrackers encompass some form of head pose estimation, some head-movement is acceptable. Of course, how much movement is acceptable is solely defined by the performance of the head pose estimation.
Another important factor that determines the invasiveness of an eye-tracker and is rarely, if ever, explicitly mentioned in the literature is how much the subject is aware of his or her eyes being tracked. The feeling of being “watched” often makes people self-conscious and aware of every movement they make. Depending on what the task of the experiment is, it may also trigger performance anxiety. In any case, in experiments where rapport between the subject and the experimenter is important, it surely does not help if the subject is aware of being the subject of not only the experiment itself but the eye-tracker too. Similarly, if an elaborate subject calibration procedure is required, it can remind the subject of the eye-tracker’s presence and thus contribute towards reducing their comfort during the experiment.
Other than an out-dated comparison of five commercial eye-trackers on comfort (Williams and Hoekstra, 1994), there is no formal study of the invasiveness of different eye-trackers and the subjective experience of subjects during an experiment.
SUITABILITY OF REMOTE EYE-TRACKERSWhether remote eye-trackers are any less invasive or not, they are definitely much more expensive to build as they usually require more than one camera and because of the distance
For this particular application, remote eye-trackers may prove impractical for several additional
In applications where the subject is required to look at a screen (such as tracking how people browse a website), the camera can be hidden in the screen and thus minimize invasiveness in this way. However, in an interview between subject and experimenter, this is significantly harder to achieve.