# «BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»

a) For the inner corner: typically, the lower eyelid edge does not meet the upper eyelid edge. Thus, a distance criterion between the two edges has to be applied to find the corner. Further, often, the lower eyelid edge will be joined to a face line. The pattern 81 detection offered in the main algorithm will in some cases alleviate this problem but only in combination with the distance constraint is the algorithm robust.

b) For the outer corner: typically, the camera is rotated around the vertical axis towards the outer corner thus making the eyelid-half on the inner corner side appear longer and the eyelid-half on the outer corner side appear shorter. Thus, the upper eyelid on the outer corner side is sloped several degrees more than the inner corner side. For this reason, grouping local maxima points on the top eyelid results in several disjointed groups. In order to robustly find the outer corner, pattern matching is combined with a refinement based on the partial x- derivative of the image. Since the upper eyelid edge and the lower eyelid edge always meet on the outer corner side, a strong maxima is created in the partial x-derivative image. This maxima is used to refine the corner in the last step and is found by searching a 10x5 window.

Figure 22 illustrates the intermediate steps of the algorithm visually.

## CLUSTERING OF EYE-CORNER FEATURES

First, while false positives are output by the pupil detection algorithm also, the pupil often changes position even in the time-space of one field (1/2 frame or 1/59.94 sec). Thus, the only suitable strategy to filter out false positives is to employ a distance constraint, optionally with a motion predictor, as proposed in the pupil detection section. Also, the iris radius detection algorithm is accurate enough to not require this. In other words, only the corner detection algorithm is suitable for this kind of filtering.

82 Second, the most important reason however is how the pupil feature-point is used versus the corners. A false positive of the pupil position at a random point in time would only affect one sample; on the other hand, a false positive of the eye corners during the calibration of the eyetracker would affect several samples, until the eye-tracker re-initializes itself. With pupil detection, false positives occur rarely and only in extreme positions of the pupil. In contrast, false positive in eye corner detection can occur at any point in time, including the calibration frames thus making it of paramount importance to filter out false positives of the eye corners.

Function PreProcess pyrDownImg = PyrDown(PyrDown(source image)) derivX = partial x- derivative of pyrDownImg derivY = suppressNonMaxima(partial y- derivative of pyrDownImg) ///// Function DetectInnerCorner const SearchOffsetX const SearchWindowHeight const PairingDistance const MinimumGroupSize searchInitX = InnerIrisBoundary.X - SearchOffsetX searchTopY = leftIrisBoundary.Y - 0.25 * SearchWindowHeight searchBottomY = leftIrisBoundary.Y + 0.75 * SearchWindowHeight allMaxima = list() while(searchInitX = 0) sample = derivY [x, searchTopY : searchBottomY] maxima = FindLargestValues(sample, N = 4, distance 1) allMaxima.Add(maxima) x-end initialGroups = MaximaToGroups(allMaxima) groups = initialGroups where n(group) = averageSize(initialGroups) for each(group in groups) patternIndex = find(pattern1 or pattern2 in group) if found remove points in group with index = patternIndex end end pairs = select pairs of groups where (group1 contains point p1 and group2 contains point p2 such that dist(p1, p2) = PairingDistance) and group1.Size MinimumGroupSize and group2.Size MinimumGroupSize

**candidates = select from pairs where selectFunction:**

pairLeftMostPoint = LeftMostPoint(group1).X, average(LeftMostPoint(group1), LeftMostPoint(group2)) if pairLeftMostPoint.Y leftIrisBoundary.Y discard

if pairLeftMostPoint.X = leftIrisBoundary - 0.75 * SearchWindowHeight discard end end

**groupMatch = select from candidates where energy = max:**

energy = sum(derivY along group1) * sum(derivY along group2) end

///// Function DetectOuterCorner const SearchOffsetX const SearchWindowHeight const searchWindowLength const minimumGroupSize searchInitX = OuterIrisBoundary.X + SearchOffsetX searchTopY = rightIrisBoundary.Y searchBottomY = rightIrisBoundary.Y + SearchWindowHeight allMaxima = list() searchX = searchInitX while(searchX = searchInitX + searchWindowLength) sample = derivY [x, searchTopY:searchBottomY] maxima = FindLargestValues(sample, N = 2, distance 3) allMaxima.Add(maxima) x++ end initialGroups = MaximaToGroups(allMaxima) groups = initialGroups where n(group) minimumGroupSize and group.points.Y searchTopY

**groupMatch = select from groups where energy = max:**

energy = sum(derivY along group) end patternIndex = find(pattern1 or pattern2, groupMatch) if found cornerCandidate = group[patternIndex] else cornerCandidate = group.last() end

///// Function MaximaToGroups groups = list() while(maxima has elements) select currentPoint search groups where (p in groups) is connected to (currentPoint) if found add currentPoint to group match else create new group with currentPoint end end

## TABLE 6: PSEUDO-CODE ILLUSTRATION OF THE CALCULATION OF THE EYE CORNERS USING A CLUSTER

OF RESULTS.Function FilterOutliers2(matches, Z) mean = mean(matches) error = dist(mean - matches) sigma = stdev(error) if(Z * sigma 1.0) return matches filteredMatches = matches where error Z * sigma

In this research work, the 2D gaze vector is defined as the vector between the pupil position in the source image when the subject is looking straight ahead and the current pupil position. This is not to be confused with the 3D gaze vector defined as the vector between the subject’s eye and the point the subject is looking at in 3D space or a surface such as a screen (e.g. Morimoto and Mimica, 2005).

Calculating a 3D gaze vector would have required a fully calibrated camera (Wang et al., 2005).

Even though there are modern means of camera calibration that greatly simplify the process (see Bouguet, 2008), it is still too involved to be performed by the user of a system like the REACT eye-tracker. For this reason and given that when investigating non-visual eye-movements 3D gaze offers little or none additional information over 2D gaze, it was decided that 2D gaze calculation would suffice.

As mentioned in an earlier chapter, simplicity was a key requirement that influenced the design and development of the REACT eye-tracker. Thus, a complex calibration procedure was intentionally avoided and the eye-tracker requires only one calibration point – that of the subject looking approximately straight ahead.

This calibration point can be provided on-line (in real-time while recording and tracking at the same time) or off-line (after the video has been recorded) and initializes the tracker by calculating the initial pupil position and contour, the iris radius and the eye corners locations and.

**At each point in time, the 2D gaze vector is calculated as follows:**

The single subject calibration point required by the eye-tracker offers the significant advantage that it can even be acquired without the user’s explicit knowledge. For example, the interviewer may incite the subject to look forward and provide an audio signal that will allow the frames in question to be marked for calibration, off-line. On the other hand, it can be a source of error but if necessary, more complex calibration schemes can be incorporated easily, at the cost of additional complexity and perhaps the increase of invasiveness (by making the subject more self-conscious of his/her eyes being tracked). For example, two calibration points and may be acquired by asking the subject to look left and right (on the baseline) and classifying each eye-movement as

**follows:**

It is important to mention here that the eye corners have been used as reference points to calculate gaze only in remote systems before (Zhu and Yang, 2002; Valenti et al., 2008). The usual

As explained earlier, the problem of locating the corners is significantly different for the case of a head-mounted setup as the image is taken close-up and is much more detailed which is in fact a disadvantage. For calculating gaze, a model-based approach such as the system by Matsumoto and Zelinsky (2000), which calculates the 3D centre of the eyeball as the middle point between the two corners and then calculates the 3D vector between the eyeball centre and the pupil centre, would not be valid for the system presented in this thesis. As explained in the corner detection section, the appearance of the eye corners significantly is changed versus a remote system and the eye corners no longer appear as symmetrically placed. In other words, the distance between the inner corner and the eyeball centre would not be equal to the distance of the outer corner and the eyeball centre. Thus, a significantly more elaborate 3D eye model would need to be used, in coordination with a more elaborate subject calibration that determines these distances.

## TABLE 7: PSEUDO-CODE ILLUSTRATION OF THE 2D GAZE VECTOR CALCULATION.

Function CalcGaze(InitPupilPos, InitContour, Corners, PupilPos) if(InitContour.Contains(PupilPos)) return SubjectIsLookingStraightAhead cornerLine = FitLine(Corners.Inner, Corners.Outer) axisAngle = cornerLine.Theta gazeVector = PupilPos - InitPupilPos return theta(gazeVector) + axisAngle

The computational complexity of the algorithms presented in this thesis is often mentioned and this section aims to specifically discuss the basic time complexity of each component and their sum.

The pupil detection is involves estimating the pupil contour (thresholding, connected components labelling and blob ellipse fitting) and refining it through the use of a snake. For N number of pixels, thresholding requires one comparison and one assignment per pixel, thus being of complexity. During connected components labelling, the image is scanned twice and it thus requires approximately operations, being also of complexity. The time taken by the ellipse fitting that takes place after labelling does not depend on the number of pixels but on the number of labels or blobs which is small compared to the number of pixels and thus its complexity is approximately. Refining the contour using a snake also does not depend on the number of pixels but on the number of the points in the contour; since this number is always very small compared to the number of pixels and the snake is iterated a constant amount of times, it is also of complexity. Overall, detecting the pupil is of complexity.

When calibrating the eye-tracker, other than the pupil, both the iris and eye corners are detected.

Calculating the iris radius also requires a very small amount of operations. The edge strength is calculated over a fixed number of lines L and across a fixed width W for each line; its calculation is equivalent to two multiplications and three calculations of standard deviation (approximately number of additions and m number of multiplications for a sample of size m) of a small number of pixels compared to the total number of pixels N and therefore the total complexity can be approximated by. Filtering the outliers requires an approximately constant time and therefore is of complexity.

When detecting the eye corners, the most significant operation is the Gaussian Pyramid Decomposition (convolution of 5x5 Gaussian kernel and subsampling) whose complexity is. Relatively to this, the maxima selection and grouping that follows is of constant complexity and thus the total complexity of the corner detection is.

93

## TABLE 8: THE CALIBRATION FRAMES FOR SUBJECTS 1-9 WITH THE FEATURE POINTS ILLUSTRATED.

## TABLE 9: RANDOMLY SELECTED FRAMES FOR SUBJECTS 1-9 WITH THE PUPIL MARKED BY THE EYE-TRACKER (ONE SUBJECT PER ROW).

96 97

## CHAPTER 5: FEATURE EXTRACTION EVALUATION

Given that the REACT eye-tracker is feature-based, it makes sense to evaluate its performance in extracting these features by calculating the Euclidean distance between each feature point as extracted by the eye-tracker and the feature point as manually marked by the author.Thus, on each intermediate step of the 2D gaze calculation (detecting the pupil, calculating the iris radius and locating the corners), the appropriate set of frames was selected from the test video database and the errors were measured. To render that possible, a software application that allows for relatively easy manual marking of feature points on frames was written and used.

The set of manually marked frames will also be referred to as the validation data set.