«BY GEORGIOS DIAMANTOPOULOS A THESIS SUBMITTED TO THE UNIVERSITY OF BIRMINGHAM FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRONIC, ...»
Of course, because of the necessity to isolate and verify the subject’s cognition discussed in Chapter 2, it would be appropriate to design an academically-sound questioning methodology based on the field of psycho-phenomenology (see Mathison and Tosey, 2008a; Mathison and 169 Tosey, 2008b). However, such a task is beyond the scope of this project and this experiment only serves to illustrate the value of an eye-tracker in its actualisation.
Additionally, as far as the analysis and technology involved is concerned, the structure of both experiments is essentially the same. Irrelevant of the questioning methodology, the task at hand is to select relevant eye-movements from the input video and accurately synchronize them with the transcribed text, which has been the focus of this case study. Normally, verbal predicates (such as “see” is visual, “hear” is auditory etc.) would be assigned to the transcribed text and this would allow the correlation with the eye-movement classes to be examined. This has not been done because a) lacking an academically-based questioning methodology the results would not have much meaning and b) it is not a technically difficult task beyond using the algorithms described in this thesis.
In this case study, a simple interview took place. The interviewer asked some basic questions about the subject’s educational background and performance, his plans for the future as well as his hobbies, musical taste and travelling. The interviewer was deliberately sat on a chair with a white screen as the background; this was done to eliminate any interference caused by eyemovements to objects in the scene that may have attracted the subject’s attention. Further, the chairs of both the subject and experimenter were adjusted such that their heads were on the same level and when the subject was looking at the experimenter, he would appear as looking straight ahead in the captured images. This was done to eliminate any interference that would have been caused the subject resting his/her eyes on the experimenter and the corresponding eye-movement. However, theoretically speaking, these eye-movements would not have been large enough to interfere in any way.
The discussion section below will briefly comment on the output of the case study and highlight the advantages of using the REACT eye-tracker in such a study. To avoid delving into complex matters that are beyond the scope of this thesis, the eye-movement-speech latency (Griffin and Bock, 2000) is not discussed or taken into account in this chapter.
DISCUSSION The case study is a rich data set of speech from the subject and interviewer and the subject’s eyemovements. In order to quantitatively examine such data, an appropriate visualisation method is
a) The full conversation was manually transcribed in a text file; speech uttered from the subject was transcribed separately to that uttered by the interviewer and they are displayed on the top (blue) and middle (green) ribbons respectively.
b) Each word from the transcript was manually synchronized to the speech audio, precise to
0.1 seconds. To signify the start and finish of the utterance of each word, the ribbon is marked with the respective darker colour. Thus, when a word has been uttered by the interviewer, the ribbon is marked with dark blue and the word is drawn; an appropriate size font was used to make sure the word fits in its respective bounding box13. Each line (set of three ribbons, Figure 40) represents a maximum time length of 5 seconds and the beginning timestamp is displayed on the top-left hand-corner of each line. Both ribbons (subject and interviewer) use the same timeline, a tick is marked at the top of the interviewer’s ribbon every 1 second.
c) Eye-movements that were selected by the eye-tracker were classified and a cropped thumbnail of the source image was drawn at the bottom ribbon, with the respective classification drawn below it, abbreviated (e.g. L is left, while DL is down and to the left).
The thumbnails were placed at the exact time they occurred minus some thumbnails which were adjusted for presentation purposes (e.g. when part of the thumbnail would not appear because the eye-movement occurred at the end of the individual timeline).
For each point in time that there is no eye-movement below the speech, it means that the subject was fixating at the same position as the previous eye-movement displayed, looking straight ahead, or moving their eyes.
However, in some cases when the words were uttered in a short space of time (0.1-0.2 seconds), a 13 minimum font size was enforced to ensure visibility, which has also created problems in some cases and words have been drawn on top of each other.
As can be seen, the eye-tracker has successfully selected and classified relevant eye-movements which accurately exhibit the behaviour of the subject’s eyes during the interview. But, before further commenting on the transcript, it is very important to make clear that this case study does not aim to prove, disprove or even provide evidence towards either case regarding the NLP EAC model. Since an academically-sound questioning methodology has not been developed, such attempt would be subject to the criticism of past research in Chapter 2. Without a solid questioning methodology, any study like this would constitute anecdotal evidence, at best.
Instead, the point of this case study is to highlight some of the eye-movement patterns that have previously been referred to in Chapter 2, where possible and, to demonstrate the benefits of using the REACT eye-tracker in non-visual eye-movement research. The full transcript is available for review in Appendix B.
First of all, obvious as it may be in an intuitive way, it is interesting to note that eye-movements have mostly taken place in time with or before relatively long pauses or while connective words such as “well” and “and” were uttered by the subject, in the example shown below (Figure 41).
This is probably the first time this has been stated in the literature based on collected data versus the perceptive powers of the experimenter. In a full-scale study, it would be particularly easy to classify connective words and pauses in the subject’s speech, correlate it with eye-movements in the vicinity and thus quantify this effect and find if it is statistically significant. This pattern of eye-movements will not have been acknowledged by model 1 of Figure 3.
Earlier, in Chapter 2, one of the main criticisms of past EAC model research was that in some studies, eye-movements were rated only after the experimenter has finished uttering the question (models 2-5, Figure 3). During the length of the interview, several eye-movements have followed a pattern contrary to this assumption: often, an eye-movement is made by the subject before the experimenter has finished uttering the question, such as shown below (Figure 42).
Another example of the two aforementioned patterns is displayed in Figure 43.
FIGURE 41: EYE-MOVEMENTS TAKING PLACE IN TIME WITH OR BEFORE PAUSES OR WHILE
CONNECTIVE WORDS ARE UTTERED.
FIGURE 42: EYE-MOVEMENTS ARE SOMETIMES PERFORMED BEFORE THE END OF THE QUESTION.
FIGURE 43: FURTHER EXAMPLE OF EYE-MOVEMENTS BEFORE THE END OF THE QUESTION AND DURINGA PAUSE.
173 Multiple eye-movements in response to a question (models 4-5 and model 3 to a lesser extent, Figure 3) were recorded in very few past EAC model research studies and when they were, the analysis was somewhat rigid, only analysing the same number of occurrences across questions (see Chapter 2 for details). From the case study, it can be observed that particularly long pauses are often synchronised with a “search” or simply, multiple eye-movements, as in the two
examples offered below (Figure 44). The following interpretations could intuitively be made:
The long pause can be taken to mean that the requested information is not readily available and is searched for during the pauses. In the second example, “I can’t remember” is perhaps then uttered as an unconscious attempt by the subject to keep the speech continuous and conversation interactive.
Normally, a smaller number of eye-movements follows the end of a question and so, if such a pattern of eye-movements can be shown to be statistically significant and interpreted as an information-lookup strategy, a set of multiple eye-movements in response to a question may be taken to mean that the information is not readily available and not readily found during a search. Thus, the position of the eye keeps changing until the information is found or the search aborted.
FIGURE 44: TWO EXAMPLES FROM THE TRANSCRIPT OF THE CASE STUDY WHERE MULTIPLE EYEMOVEMENTS WERE PERFORMED IN ANSWER TO A QUESTION.
As before, with the use of the eye-tracker, this could be easily be verified in a full-scale study by classifying even longer pauses and speech patterns that demonstrate an inability to locate certain information and correlating them with the corresponding set of eye-movements.
With regards to correlating the question type or verbal predicates spoken by the subject to the corresponding eye-movement class as suggested by the NLP EAC model, limited relevance may be shown in the case study. This is because verbal predicates of any particular modality were rarely spoken by both the subject and interviewer i.e. predicates of unspecified modality, such as “think”, were spoken instead of predicates that can be classified into one of the visual, auditory or kinaesthetic modalities, such as “see”, “hear” or “feel”.
This is most probably for three reasons:
a) The interview content was to a certain extent informational versus deeply subjective.
That is, information about the past, present and future of the subject was collected but the interview did not delve deep into the experiences associated with each question.
Having said that, three examples of eye-movements consistent with the EAC model are displayed below14: a) “I’d say” corresponds to an auditory eye-movement, b) a sequence of eye-movements around “comfortable” begins and ends with a kinaesthetic eye-movement and c) “love”, a verb associated with feeling is correlated to a kinaesthetic eye-movement.
FIGURE 45: THREE EXAMPLES OF EYE-MOVEMENTS CONSISTENT WITH THE EAC MODEL. THESE
RESULTS ARE FOR ILLUSTRATION PURPOSES ONLY AND CANNOT PROVE OR DISPROVE THE MODEL.
Please note that these examples are not sequential in time as the timestamp on the top-left hand corner 14 shows. Also, the eye-movements are interpreted as if the subject adheres to the EAC generalization for normally-organized right-handed people from Figure 1.
176 It must also be noted that there are several eye-movements in the transcript that cannot be strictly interpreted as above. In the example show below, when the subject is speaking of being “stressed” and feeling “pretty calm” (kinaesthetic cues), an eye-movement relevant to visual construction is observed (Figure 46). Of course, this may be interpreted as the subject constructing a visual image of being stressed and/or feeling calm. Once again, this demonstrates the absolutely essential requirement for an elaborate and academically validated questioning methodology that is able to resolve such ambiguities.
FIGURE 46: A VISUAL EYE-MOVEMENT IS OBSERVED NEAR THE UTTERANCE OF KINAESTHETIC CUES. AS
PER THE EAC MODEL, THIS CAN BE INTERPRETED AS THE SUBJECT CONSTRUCTING A VISUAL IMAGE OF
THEMSELVES BEING THAT WAY.
The only quantitative analysis that was performed on the case study is related to the classification of eye-movements, as shown in Table 30. The class output by the eye-tracker (labelled “Auto class”) was compared to a manual classification performed by the experimenter (labelled “Manual class”). The manual classification proved to be a much harder task than anticipated as ambiguities were eminent in some cases when the eye-movement in question was on the borderline between two classes. In addition, in some cases, it was particularly hard to discriminate between upwards eye-movements and eye-movements on the baseline; this is both because of the reduced vertical resolution that is the result of de-interlacing and because the camera is pointed upwards thus reducing the apparent range of upwards eye-movements. Where ambiguous, the classification was marked as such and highlighted bright yellow whereas erroneous classification were marked red.
177 From the total 150 eye-movements, 7 received an ambiguous classification by the experimenter and 6 were erroneously classified by the eye-tracker (Table 30; Figure 47).
It is questionable whether ambiguous classifications can be avoided unless the subject’s eyes are also captured from another camera placed on the same level and the video may be consulted to resolve ambiguities. Of course, while this would be feasible in an experimental, for the eyetracker, setup, it would probably prove impractical for eye-tracker users conducting experiments.
All 6 classification errors were caused by the eye-movement being too close on the borderline between two classes. The classification algorithm, as described in Chapter 5, will determine the class solely on the 2D gaze angle calculated and based on pre-set thresholds. As with any other statically set threshold, it is bound to fail some of the time, when the thresholded value is very close to the threshold itself. In other words, when the gaze angle is on or close to the borderline between two classes, a human rater may be able to distinguish between the classes (though not always as proved by the 7 ambiguous ratings) but the algorithm cannot.
However, not only is the number of errors small but the classification algorithm can be improved to a) signal ambiguity in the classification and b) attempt to disambiguate between the two
classes. The latter could be potentially be done by an algorithm based on the following concept: