«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
Curvature is computed at each point using a formula that has been shown (Williams and Shah,
1992) to be computationally efficient as well as favour evenly spaced points:
|⃗ | ⃗ | | is simply calculated from the image gradient (Williams and Shah, 1992).
( ) ( ) ( ) Note that all and functions mentioned above are calculated within the candidate neighbourhood.
Locating the iris is a problem that has been tackled several times before (e.g. Wang et al., 2000;
Sirohey et al, 2002; Chapter 3). However, a novel solution was required for the REACT eyetracker because the image formation is different to that of other setups and:
In many grey scale images that are illuminated using infrared light, the iris-sclera edge prominent in full-colour images is diminished (Figure 16). This can be easily explained if it is considered that the iris is coloured and its colour is reduced to grey-level.
While the iris-sclera edge is not preserved, infrared illumination will often accentuate the texture of the iris which can create strong edges that complicate the use of edge detectors.
Even though it may have been possible to design the eye-tracker without locating the iris
boundaries, doing so provides two significant advantages:
(a) It provides a robust starting point for the challenging task of locating the eye corners.
(b) Most importantly, it allows the eye-tracker to be extended to calculate the 3D gaze
provided the camera is fully calibrated. One such approach is found in the remote eyetracker system developed by Wang et al. (2005):
i. The iris radius and averages taken from anthropological data are used to estimate the radius of the eyeball.
ii. Assuming a simple eye model where the eyeball is a sphere, an ellipse is fitted to the iris contour (the iris is more suitable for a remote system because it is much bigger than the pupil and because the pupil can hardly be distinguished from the iris without infrared illumination) from which two solutions of the corresponding 3D circle are estimated using techniques outlined by Safaee-Rad, et al. (1992 cited by Wang et al., 2005). Additionally, an ellipse is fitted because the pupil and iris appear as a circle only if the person is looking straight ahead and the camera lens is parallel to the eye lens.
iii. The correct solution is chosen by using a distance constraint based on the position of the eye-corners.
EXAMPLE IMAGE EXTRACTED FROM THE EYE- SOBEL OUTPUT OF THE EYE-TRACKER IMAGE
FIGURE 16: ILLUSTRATION OF THE EDGE-LOSS IN THE INFRARED IMAGES VERSUS A FULL-COLOUR
IMAGE SHOT WITH A CAMERA FROM A DISTANCE. THE SCLERA IS THE WHITE PART OF THE EYEBALL.
A similar task to the detection of the iris boundaries is cell segmentation (e.g. Zhou and Pycock, 1997).
Cells, much like the iris, are fairly uniform in terms of the pixels’ intensity levels on the inside. In a similar vein, cell image background is also uniform, like the sclera. Thus, the edge is not necessarily defined by the change in grey-level intensity, which is the basis of most edgedetectors, but rather by a change in the uniformity over a range of pixels.
In the original paper, in images of cells, dark regions are identified to locate the cell interiors.
From the centre of each cell, several candidate boundary points are generated in regular intervals through 2π. For each radial, a set of feature measures are calculated, one of which is the edge strength; these features then combined and used to select the final boundary points from the candidates. The calculation of the edge strength is done in the same way here but its
As defined by Zhou and Pycock (1997), for a set of pixels M that are divided into two subsets m and M-m, the edge strength or maximum likelihood ratio mlr is
where ̂, ̂ and ̂ is the standard deviation of the grey-level pixel sets M, m and M-m respectively. The edge strength is calculated for several different divisions of M and peaks are observed where an edge is prominent.
In the original edge strength calculation algorithm by Zhou and Pycock (1997) shown above, the whole population is considered; however, it was empirically found that with eye image data, edges attributed to eyelashes and eyelids can severely alter the standard deviation of each population and thus make the algorithm fail or return erroneous (in this context) results. Thus, the original formula was modified to work within a constrained window such that for a line of
pixels, given a constant window size, the edge strength at an index within L is equal to:
̂ ̂ ̂
W can be chosen from averages taken from training data or adaptively to ½ the pupil radius.
Thus, in this case and. The edge strength is calculated for a number of lines to the left and right of the pupil centre. The complete iris boundary detection algorithm is illustrated in Table 3 using pseudo-code. The selection of populations in the equations above is visually illustrated in Figure 17.
The iris boundary detection algorithm is based on two fundamental assumptions:
a) The subject is looking approximately straight ahead and therefore the vertical position of the pupil approximately coincides with the semi-major axis of the iris ellipse (the iris is closer to a circle but appears as an ellipse because of the camera angle and the deinterlacing, like the pupil).
The first assumption simplifies the detection of the iris boundary while the second assumption increases the algorithm’s robustness by generating several matches and discarding outliers.
As illustrated in Table 3, the algorithm is fundamentally simple and computationally efficient. A
is defined and the edge strength is calculated for each line using the equation given above for a window of size. The local maxima are extracted as candidates for the iris boundary.
FIGURE 17: EDGE STRENGTH POPULATIONS USED TO DETECT IRIS BOUNDARIES. THE EDGE STRENGTH
IS CALCULATED ALONG SEVERAL HORIZONTAL LINES OF FIXED WIDTH ON SEVERAL POINTS WITHIN
EACH LINE. AT EACH POINT, FIXED SIZE POPULATIONS ( AND ) ARE USED TO
CALCULATE THE EDGE STRENGTH.
Outliers, or maxima that lie an abnormal distance from other candidates, are discarded using the algorithm illustrated in Table 4. In essence, the filtering algorithm discards any candidate points whose x-coordinate falls outside a specified confidence interval. If the resulting set is empty after the filtering, the interval is progressively reduced using values (99%, initial interval), (95.5% interval), (90% interval) and (68.27% interval). Most often, the matches are congregated around the mean value; however, even in cases when the matches found are more spread out, reducing the confidence interval allows the algorithm to complete successfully, at the cost of a slightly less accurate result.
Each step of the iris detection algorithm is visually illustrated in Figure 18.
Function FindLeftIrisBoundary const N const SearchLength const Iterations searchOffset = pupilContour.Width leftSearchBoundary = PupilCenter.X - searchLength rightSearchBoundary = PupilRectangle.Left - searchOffset for(y = [PupilCenter.Y - Iterations, PupilCenter.Y + Iterations]) line = image[leftSearchBoundary... rightSearchBoundary, y] mlr = EdgeStrength(line, mlrWindowSize = N) maxima = LocalMaxima(mlr) matches.Add(LeftMost(maxima)) end filteredMatches = FilterOutliers(matches, 2.576 sigma) if( filteredMatches.Count == 0 ) filteredMatches = FilterOutliers(matches, 2 sigma) if( filteredMatches.Count == 0 ) filteredMatches = FilterOutliers(matches, 1.645 sigma) if( filteredMatches.Count == 0 ) filteredMatches = FilterOutliers(matches, 1 sigma) if (filteredMatches.Count == 0 ) filteredMatches = matches leftBoundary = Point(average(X in filteredMatches), PupilCenter.Y)
Function FilterOutliers(matches, Z) meanX = mean(X in matches) errorX = Abs(meanX - X in matches) sigma = stdev(errorX) if(Z * sigma 1.0) return matches filteredMatches = matches where errorX Z * sigma
EYE CORNER DETECTIONLocating the eye corners is probably the most significant challenge for the set of input images taken with the REACT eye-tracker. The problem of locating the eye-corners has been tackled before (Lam and Yan, 1996; Zhang, 1996; Feng and Yuen, 1998; Tian et al., 2000; Sirohey and Rosenfeld, 2001; Sirohey et al., 2002; Wang et al., 2005; Xu et al., 2008) but the systems in question operated, without exception, on a full-face, sometimes colour, image.
In a close-up image, surprising as it may be, the additional level of detail creates several problems making it more difficult to locate the eye corners. With the higher-resolution of an otherwise low-cost camera, more noise is preserved7 and thus, corner detectors output many false positives. This includes random salt-and-pepper noise as well as structured noise such as shadows caused by the diffusion pattern of the illuminator and eyelashes.
Furthermore, at this level of detail, the inner eye corner does not appear as a corner; as illustrated in Figure 19, the inner corner morphology can greatly vary between people. On the upper left-hand image of Figure 19, the inner corner morphology resembles that of a corner as defined in computer vision, it is approximately symmetrical to the outer corner and the tear gland is hidden. On the contrary, on the upper right-hand image of Figure 19, the inner corner is asymmetrical to the outer corner and the upper eyelid continues to extend all the way to the tear gland. In this case, because of the size of the eyelids, a corner detector would fail to detect the point as a corner and would generate several false positives, as shown in the lower row images of Figure 19.
For the difference of two consecutive frames (before de-interlacing) mean squared error values of ≅ 30 7 are typical.
(a) To calculate the principal axis by which to calculate the 2D gaze angle (next section).
Especially in cases when the camera is rotated around the Y-axis (with Z- pointing upwards), this offers a correction of several degrees which significantly increases the accuracy of the eye-tracker. The aforementioned rotation of the camera can be a result of camera misplacement by the experimenter or slippage of the frame due to the weight of the cables or otherwise.
(b) In long sequences where the absolute position of the eyeball centre is bound to change over time (e.g. frame slippage etc.), the eye corners can be used to detect whether a reinitialization of the eye tracker is required.
Additionally, if the eye-tracker was to be configured to detect the 3D gaze (using a fully calibrated camera), the eye corners are essential to disambiguating the 3D vector solution. For more details, the interested reader is referred to Wang et al. (2000).
In order to ease the task of finding the eye corners, it is therefore necessary to remove some detail as well as noise before tackling the problem. A computationally efficient way that reduces image resolution as well as removes noise is Gaussian Pyramid Decomposition (Gonzalez and Woods, 2002). After a Gaussian filter with a 5x5 kernel is convolved over the original image, even-numbered rows and columns are discarded and the output image is a quarter of the size of the original.
Because of the aforementioned differences in morphological structure between the two corners, the corner detection algorithm is specialized for the inner and outer corner separately. Table 5 illustrates the pseudo-code for both versions of the algorithm.
First and foremost, the input image is scaled down to ¼ of its original size using the Gaussian Pyramid Decomposition mentioned above. Then, the partial x- and y-derivative of the scaled
image are calculated:
| | | | 80 For the y-derivative, non-maxima are suppressed locally using a 1x3 window. Whilst this is an irregular window (usually square windows are used for computer vision operations), it has been empirically found that it preserves the vertical edges better than a 3x3 window. This is most likely because the derivative is a one-column operation too.
As mentioned earlier, slightly different algorithms are used to detect the inner and outer corner due to the different eye morphological structure evident at this image resolution. Both
algorithms are however based on the same principle:
1. It is assumed that the edges formed between the eyelids and the sclera are within the top local maxima for a restricted window ( ). This assumption was empirically tested.
2. A grouping process begins near the iris boundary previously found and continues outwards, grouping all local maxima that are connected, using an 8-connectivity criterion.
3. The groups are searched for a set of two predefined patterns (shown in Figure 21) and if found, the grouping is terminated at that point. These patterns have been empirically found to occur when the lower eyelid edge is joined with another face line edge and thus the purpose of this step is to separate the two edges.
4. The final corner is selected from the group (outer corner) or pair of groups (inner corner) that demonstrate the maximum derivative energy. The energy of a group of
points is calculated as:
For a pair of groups :
The added steps and differences between the two algorithms are summarized here and the complete algorithms are summarised using pseudo-code in Table 5. In Step 1 above, the search window includes both the upper and lower eyelid edges for the inner corner but only the lower
eyelid edges for the outer corner. This is done for several reasons: