«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
What is especially interesting is that a theory similar to the EAC model has emerged fairly recently from an academic field unrelated to NLP (Ehrlichman et al., 2007); it is based on a buffer model which is used to hold representations. Eye-movements are associated with retrieval and fixations are associated with maintenance of the information in the buffer.
It is proposed that the EAC model is worthy of further research with an experimental methodology that incorporates the assumptions and requirements presented here. A significant part of this future direction is the use of eye-tracking technology to detect, track and classify eyemovements. As current state-of-the-art eye-trackers are still limited in the range of eyemovements they can track because they are designed in with the working assumption of the subject looking at a screen or an object in the environment, a novel eye-tracker that is able to successfully track non-visual eye-movements is required. The design and implementation of such an eye-tracker has been the main objective of this thesis. Thus, Chapter 3 reviews current eyetracking systems with respect to the requirements of such research and sets the background for the implementation of the Robust Eye-Accessing Cues Tracker (REACT), introduced in Chapter 4.
36CHAPTER 3: REVIEW OF EYE-TRACKING SYSTEMS
In Chapter 2, the benefit of using an eye-tracking system for research relevant to the EAC model was established. In fact, the lack of a computerised eye-movement rating system has been a major flaw in previous studies. Direct viewing methods, even when performed by trained individuals, are error-prone and make objectivity seem impossible to achieve.
While there are several eye-tracking systems with good performance (both commercial and in academia), they are fundamentally designed for applications where the subject is looking at an object or a screen in the external world. Briefly mentioned in earlier chapters and reiterated here, such visual eye-movements are restricted by a relatively narrow field of view; Hansen and Ji (2010) report that fixations normally occur within two to five (2-5) degrees of central vision.
Further, as it was found empirically, people prefer to turn their heads in the general direction of the object or screen and then shift their eyes a small amount to put the visual target in focus. A trivial example of this behaviour is television; if you were sitting in front of your computer reading this thesis and there was a television screen several degrees up and to one side, you would most likely turn your head towards the screen every time it gathered your interest and then shift your eyes to focus on any objects displayed on the television screen.
In contrast, non-visual eye-movements are usually characterized by a shift to an extreme location within the eye socket and the fundamental assumptions that current eye-tracking systems have been designed with render them incapable of tracking these eye-movements. For example, the vast majority of eye-trackers use infrared light and a glint-centred coordinate system (e.g. Ohno et al., 2002; Li et al., 2005); the glint is the reflection of the cornea and appears as a very bright spot. When the glint falls onto the sclera, which is also white, it can be very difficult to find its position. Thus, such eye-trackers operate on the implicit or explicit assumption that the glint will always fall within the iris or pupil, which would not hold true for extreme eye-movements.
Before going any further it is important to define what “extreme” means in the context of eyemovements and further explicitly list the requirements of a non-visual eye-tracker compared to a visual one.
Hansen and Ji (2010), cite the work of Tweed and Vilis (1990) when stating that “eye positions are restricted to a subset of anatomically possible positions described by Listing’s and Donder’s 37 laws”. Indeed, in their study which is not specific to the oculomotor range but concerns the geometric relations of eye position and velocity vectors during saccades with respect to Listing’s law, Tweed and Vilis (1990) briefly mention that the oculomotor range of their measurements is ±40° both horizontally and vertically. Other researchers (Guitton and Volle, 1987) however report that the human oculomotor range is ±55°, a considerably larger range. It is not uncommon to come across eye-movement studies which were performed over a range as large as ±70° (Collewijn et al., 1988) and ±80° (Guitton and Volle, 1987) though such research also aims to test objectives that require eye-movements beyond the maximum human oculomotor range. In this research work, ±55° of angular range will be considered the maximum possible range. Further, the label of “extreme eye-movements” is given to those eye-movements that extend beyond the operational tracking range of existing eye-trackers, which is equal to or less than ±30° (e.g. SR Research, 2009).
Thus, the requirement that separates visual eye-movement trackers from non-visual eyemovement trackers is its ability to maintain similar accuracy across the complete range of eyemovements as defined by the aforementioned oculomotor range of ±55°.
This chapter will walk through the eye-tracking literature from the past decade or so in order to give an overview of currently available systems and their suitability for this application. In doing so, it will also form the basis for several fundamental decisions that have informed the design of the Robust Eye-Accessing Cues Tracker (REACT).
A recent review by Hansen and Ji (2010) is a comprehensive and fairly detailed source of technical information on video-based eye-tracking systems. It categorizes research work based
on the particular area of focus:
a) Eye localization in the image that is concerned with:
i. detecting the existence of eyes ii. interpreting eye positions in the image iii. tracking the detected eyes from frame to frame
b) Gaze estimation which is concerned with estimating where the person is looking in 2D or 3D or determining the 3D line of sight.
38 While this categorization is appropriate for the latter review, this review will categorise eyetrackers in a different fashion, one that facilitates the discussion of several design choices made
with the applications requirements in mind:
Remote versus head-mounted. Remote eye-trackers where one or more cameras are placed in a remote location and head-mounted eye-trackers which are directly mounted on the subject’s head usually through a glasses-like frame or a helmet.
Light source(s). The light source used in each eye-tracker dictates the image properties and to a large degree formulates the computer vision problem that is required to be solved. As such, the eye-trackers will be classed based on whether they use natural illumination (passive) or (near-) infrared illumination (active). In the case of active illumination several light sources may be used, each one of which will result in corresponding glint, which is the reflection of the light source on the cornea. The glint is a “nickname” for the first Purkinje image, as shown in Figure 4.
Number of cameras. As this review focused on video-based eye-trackers only, it is important to include the number of cameras each system uses as more than one camera is often used.
Gaze estimation method. There are two main methods of gaze estimation, the primary objective of eye-trackers, namely 2D and 3D gaze estimation. 2D gaze estimation is concerned with estimating where exactly the subject is looking on a surface such as a screen. On the other hand, 3D gaze estimation may estimate the gaze direction or point of regard in 3D space.
Calibration requirements. As it will be briefly explained below, most eye-trackers require a calibration to be executed either once for each system, once for each subject, or both. This serves as another useful element for categorisation.
Further, eye localization schemes will not be discussed as they are only relevant to full-face images such as those taken by remote eye-trackers, unless relevant in terms of another technical aspect.
Before further discussing the requirements of an eye-tracker that is suitable for non-visual eyemovement application, it is useful to give a brief overview of existing designs based on the categories laid out above.
39 It seems that remote eye-trackers are by far the most common design, perhaps because it is considered to be less invasive than a head-mounted tracker, though this conception will be revisited later in this chapter. Thus, while there is a limited number of head-mounted designs (Ebisawa et al., 2002; Takegami et al., 2002; Li et al., 2005; Clarke et al., 2002; Hansen and Pece,
2005) there are at seven to eight times more remote trackers found in the literature (Collet et al., 1997; Heinzmann and Zelinsky, 1998; Kim and Ramakrishna, 1999; Matsumoto and Zelinsky, 2000; Ohno et al., 2002; Sirohey et al., 2002; Benoit et al., 2005; Sun et al., 2006; Wallhoff et al., 2006; Liang and Houi, 2007; Chen and Ji, 2008; Valenti et al., 2008; Yamazoe et al., 2008; White et al., 1993; Morimoto et al., 2000; Morimoto et al., 2002; Coutinho and Morimoto, 2006; Hennessey et al., 2006; Meyer et al., 2006; Li et al., 2007; Ramdane-Cherif and Nait-ali, 2008; Newman et al., 2000; Shih et al., 2000; Andiel et al., 2002; Ji and Yang, 2002; Beymer and Flickner, 2003; Ishima and Ebisawa, 2003; Noureddin et al., 2004; Ohno and Mukawa, 2004; Shih and Liu, 2004; Park and Kim, 2005; Yoo and Chung, 2005; Merad et al., 2006; Tsuji and Aoyagi, 2006; Park, 2007;
Chen et al., 2008; Guestrin and Eizenman, 2008; Kohlbecher and Poitschke, 2008; Hennessey and Lawrence, 2009; Nagamatsu, 2009; Wang et al., 2005).
FIGURE 4: ILLUSTRATION OF PURKINJE IMAGES. ADAPTED FROM HANSEN AND JI (2010).
Equally limited is the number of eye-trackers (regardless of whether they are head-mounted or remote) that use passive illumination (Hansen and Pece, 2005; Li et al., 2005; Colombo et al., 2007; Newman et al., 2000; Wang et al., 2005; Heinzmann and Zelinsky, 1998; Yamazoe et al., 2008; Matsumoto et al., 2000). Eye-trackers that use infrared light illumination offer several
advantages over passive illumination:
Depending on the exact configuration, the pupil appears as very dark or very bright (called dark- and bright-pupil effect respectively), which allows the pupil to be detected easily and with accuracy. Because of the different reflection properties of the iris and the pupil, this is possible even if the iris appears as dark as the pupil in natural light. The bright-pupil effect is produced when the light source is coaxial to the camera and the dark-pupil effect otherwise (Ebisawa, 1998).
If a filter is put on top of the camera lens to block non-infrared light, brightness and contrast are kept constant and barely, if at all, affected by other light sources from the external environment. Similarly, shadows may only be formed by the diffusion properties of the infrared source which is within the control of the eye-tracker designer. Last but not least, reflections from objects in the environment do not appear in images captured with active illumination.
Infrared light is invisible to the human eye and therefore does not distract the subject or cause the pupil to contract.
As can be seen from the list above, active illumination provides several important advantages over passive illumination. In addition, it costs very little to add infrared illumination to any hardware design and there has to be a very compelling reason to use passive illumination. There
only two disadvantages to active illumination:
Active illumination does not perform quite as well in the outdoors and eye-trackers designed to work outdoors (Hansen and Pece, 2005) avoid the use of infrared illumination.
There is additional light emitted to the eye which will soon be regulated by international safety standards (Hansen and Ji, 2010). However, in all probability, this will only be a
GAZE ESTIMATIONIn order to determine the gaze, which is primary objective of eye-trackers, a feature-based approach is by far the most commonly used (e.g. Ebisawa et al., 2002; Li et al., 2005; Ohno et al., 2002; Benoit et al., 2005). A feature-based approach means that the eye-tracker identifies a set of feature points in the image such as the pupil and the glint (for active illumination eye-trackers) in order to determine the gaze. In this category, two schemes are possible: (a) 2D regression-based gaze estimation and (b) 3D model-based gaze estimation (Hansen and Ji, 2010). One of the main challenges faced by eye-trackers which use the glint as their reference point is that with movement of the eyes and depending on the where the light source is placed, the glint may fall onto the sclera which makes it very hard to detect as they are both very bright regions. The eyecorners are also used as reference points in some natural-light trackers where the glint is not available (Zhu and Yang, 2002; Valenti et al., 2008).
2D regression-based gaze estimation is performed simply by asking the subject to look at several different calibration points whose geometry on a surface (usually the screen) is known. At the same time, the pupil or iris locations are recorded and the calibration points and feature point locations are used to calculate a function such that. Using this function, consequent pupil positions in the image can be mapped to screen coordinates. In schemes that use the corneal reflection, the vector between the glint and the pupil or iris location is used.
CALIBRATIONOn the other hand, 3D model-based gaze estimation, as the name suggests, uses the set of detected feature points in combination with a model of the eye and in some cases the scene to estimate the gaze direction or point of regard in 3D space.
All gaze estimation methods require that a set of parameters are determined through a calibration process; such processes can be categorised into the following categories (Hansen and
42 Camera calibration, which refers to determining the intrinsic camera parameters (focal length, image sensor size and principal point). As long as the parameters do not change value (e.g. by changing the camera focus setting), they only need to be calculated once.