FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 |

«Audio-Visual Integration: Generalization Across Talkers A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation ...»

-- [ Page 1 ] --

Audio-Visual Integration: Generalization Across Talkers

A Senior Honors Thesis

Presented in Partial Fulfillment of the Requirements for graduation with research

distinction in Speech and Hearing Science in the undergraduate colleges of The Ohio

State University


Courtney Matthews

The Ohio State University

June 2012

Project Advisor: Dr. Janet M. Weisenberger, Department of Speech and Hearing



Maximizing a hearing impaired individual’s speech perception performance involves training in both auditory and visual sensory modalities. In addition, some researchers have advocated training in audio-visual speech integration, arguing that it is an independent process (e.g., Grant and Seitz, 1998). Some recent training studies (James, 2009; Gariety, 2009; DiStefano, 2010; Ranta, 2010) have found that skills trained in auditory-only conditions do not generalize to audio-visual conditions, or vice versa, supporting the idea of an independent integration process, but suggesting limited generalizability of training. However, the question remains whether training can generalize in other ways, for example across different talkers. In the present study, five listeners received ten training sessions in auditory, visual, and audio-visual perception of degraded speech syllables spoken by three talkers, and were tested for improvements with an additional two talkers. A comparison of pre-test and post-test results showed that listeners improved with training across all modalities, with both the training talkers and the testing talkers, indicating that across-talker generalization can indeed be achieved. Results for stimuli designed to elicit McGurk-type audio-visual integration also suggested increases in integration after training, whereas other measures did not. Results are discussed in terms of the value of different measures of integration performance, as well as for implications for the design of improved aural rehabilitation programs for hearing-impaired persons.

Acknowledgments I would like to thank my project advisor, Dr. Janet Weisenberger, for all of her guidance and support throughout my honors thesis process. Because of her I was able to expand my knowledge more than I could have ever expected. I am extremely grateful for her time, assistance and patience. I would also like to thank my subjects for their flexibility, time and effort.

This project was supported by an ASC Undergraduate Scholarship and an SBS Undergraduate Research grant.

–  –  –

Abstract...………………………………………………………………………………………..2 Acknowledgments………………………………………………………………………….…...3 Table of Contents……………………………………………………………………………….4 Chapter 1: Introduction and Literature Review………………………………………………5 Chapter 2: Method…………………………………………………………………………….13 Chapter 3: Results and Discussion………………………………………………………….18 Chapter 4: Summary and Conclusions…………………….………………………….........23 Chapter 5: References………………………………………………………………….....….26 List of Figures……………………………………………………………………………….….28 Figures 1-13……………………………………………………………………………...……30

–  –  –

Effective aural rehabilitation programs provide hearing-impaired patients with training that can be generalized to different situations. It is important that patients can apply this training to everyday circumstances of speech perception. Maximizing an individual’s speech perception performance involves training in both auditory and visual sensory modalities. Although it has long been known that listeners will use both auditory and visual sensory modalities in situations where the auditory signal is compromised in some way (for example, listeners with hearing impairment), research has shown that listeners will use both of these modalities even when the auditory signal is perfect.

McGurk and MacDonald (1976) found that when listeners were presented simultaneously with a visual syllable /ga/ and an auditory syllable /ba/, they perceived the sound /da/, a “fusion” response. Although the auditory /ba/ was in no way distorted, the response occurs because the brain cannot ignore the visual stimulus. The resulting perception integrates, or fuses, the auditory stimulus /ba/, which has a bilabial place of articulation, and visual stimulus /ga/, which has a velar place of articulation, to form /da/, which has an intermediate alveolar place of articulation. When the stimuli were reversed, an auditory /ga/ presented with a visual /ba/, the most common response was a “combination” response of /bga/. A “combination” response occurs because the visual stimulus is too prominent to be ignored, so rather than fusing the stimuli the brain combines the prominent visual stimuli with the auditory signal to create a new perception. Subsequent studies have explored the limits of this audio-visual integration.

To understand the nature of this integration, it is important to consider the types of auditory and visual cues that are available in the speech signal.

Auditory Cues for Speech Perception In most situations the auditory cue alone is sufficient for listeners to understand speech sounds. Within the auditory signal there are three main cues for identifying speech: place of articulation, manner of articulation and voicing. Place of articulation refers to the physical location within the oral cavity where the airstream is obstructed.

Included in this category are bilabials (/b,m,p/), labiodentals (/f,v/), interdentals (/t,θ/), alveolars (/s,z/), palatal-alveolars (/Ӡ,ʃ/), palatals(/j,r/) and velars (/k,g,ŋ/). The manner of articulation refers to the way in which the articulators move and come in contact with each other during sound production. This includes stops (/p,b,t,d,k,g/) fricatives (/f,v,t,s,z,h/), affricates (/tʃ, dӠ/), nasals (/m,n,ŋ/), liquids (/l,r/) and glides (/j/). Voicing indicates whether or not the vocal folds vibrated during the production of the sound. If they do vibrate the sound is referred to as a voiced sound (/b,d,g,v,z,m,n,w,j,l,r,ð,ŋ,ӡ,dӡ/), and if they do not, the sound indicates a voiceless sound (/p,t,k,f,s,f,θ,ʃ,tʃ/) (Ladefoged, 2006). Cues to place, manner and voicing are present in the acoustic signal, in characteristics such as formant transitions, turbulence and resonance, and voice onset time.

Visual Cues for Speech Perception Although most of the information required for comprehending a speech signal can be obtained from auditory cues, McGurk and MacDonald (1976) showed that visual cues also play an important role in speech perception. Visual cues become especially useful in situations where the auditory signal is compromised, but as their study showed, even when the auditory signal is perfect visual cues are still used by listeners.

The sole characteristic of speech production that can be reliably visually detected is place of articulation, but even the results of this observation are often ambiguous (Jackson, 1988).

A primary reason that it is extremely difficult to identify speech sounds by visual cues alone is the fact that many sounds look alike. These are referred to as viseme groups, sets of phonemes that use the same place of articulation but vary in their voicing characteristics and manner of articulation (Jackson, 1988). Since place of articulation is the primary observable feature of speech sounds, it is extremely difficult to differentiate among phonemes that use the same place. The phonemes /p,b,m/ are an example of a viseme group; they all use a bilabial place of articulation, making them visually indistinguishable. It is also important to note that talkers are not all identical and that the clarity of visual speech cues can vary greatly. Jackson found in her study that it was easier to speechread talkers who created more viseme categories versus those talkers who created less. There are also other talker features that contribute to the ability to speechread, including gestures, head and eye movements and even mouth shape. All of these visual cues can aid a listener in any speaking situation but especially those situations in which the auditory signal is compromised.

Speech Perception with Reduced Auditory and Visual Signals Studies have shown that speech can still be intelligible in situations where the auditory cues are compromised. This is due to the fact that speech signals are somewhat “redundant,” meaning that they contain more than the minimum information required for identifying the sounds. Shannon et al. (1995) performed a study with speech signals modified to be similar to those produced by a cochlear implant. This was achieved by removing the fine structure information of the speech signals and replacing it with band-limited noise, while maintaining the temporal envelope of the speech. In the study different numbers of noise-bands were used and it was discovered that intelligibility of the sounds increased as the number of frequency bands increased.

However, high levels of speech recognition were reached with as few as three bands, indicating that speech signals can still be identified even with a large amount of information removed.

The study discussed above was expanded by Shannon et al. in 1998. There were four manipulations done within the study: the location of the band division was varied, the spectral distribution of the envelopes was warped, the frequencies of the envelope cues were shifted and spectral smearing was done. The factors that most negatively influenced intelligibility were found to be the warping of the spectral distribution and shifting the tonotopic organization of the envelope. The exact frequency cut offs and overlapping of the bands did not affect speech intelligibility as greatly.

Another study that examined the speech intelligibility of degraded auditory signals was performed by Remez et al. (1981), who reduced speech sounds to three sine waves that followed the three formants of the original auditory signal. Although it was reported that the signals were unnatural-sounding, they were highly intelligible to the listeners. This study further suggests that auditory cues are packed with more information than absolutely needed for identification, and that even highly degraded speech signals can still be understood.

Degraded visual cues can also still be useful signals in understanding speech.

Munhall et al. (1994) studied whether or not degraded visual cues affected speech intelligibility. They employed visual images degraded through band-pass and low-pass spatial filtering, which were presented to listeners along with auditory signals in noise.

High spatial frequency information was apparently not needed for speech perception and it was concluded that compromised visual signals can nonetheless be accurately identified (Munhall et al., 2004).

Audio-Visual Integration of Reduced Information Stimuli Studying audio-visual integration processes with compromised auditory signals is especially important because it simulates the experience of hearing impaired persons and provides insights into what promotes optimal perception. Information learned from these studies can then be used when designing aural rehabilitation programs for hearing impaired individuals. For this reason, some researchers have advocated specific training in audio-visual speech integration for aural rehabilitation programs.

Grant and Seitz (1998) offered evidence to support the idea that audio-visual integration is a process separate from auditory-only or visual-only speech perception. In experiments with hearing impaired persons, they found that audio-visual integration could not be predicted from auditory-only or visual-only performance, leading them to argue for independence of the integration process. Grant and Seitz thus suggested that specific integration training should also be incorporated into successful aural rehabilitation programs.

Effects of Training in Recent Studies More recent studies have further explored the relative value of modality-specific speech perception training. Many of these studies have employed normal-hearing listeners who have been presented with some form of degraded auditory stimulus to approximate situations encountered by hearing-impaired individuals. In our laboratory, James (2009) and Gariety (2009) tested syllable perception with syllables that had been degraded to mimic those generated by cochlear implants. To create their auditory stimuli they used a method similar to that employed by Shannon et al. (1995), in which the fine structure details of auditory stimuli were replaced with band-limited noise while preserving the temporal envelope. James (2009) and Gariety (2009) showed that the auditory-only component can be successfully trained. However, this training did not generalize to the audio-visual condition and thus did not improve integration results, leaving a question about whether integration is a skill that can benefit from training.

Ranta (2010) and DiStefano (2010) addressed the question of whether integration ability can be trained. They employed stimuli similar to those used by James (2009), but trained listeners only in the audio-visual condition. Results showed that integration can be trained, but the skills did not generalize to the auditory-only or the visual-only condition. The results of these studies suggest that skills do not generalize across modalities, supporting the argument that integration is a process independent of auditory-only or visual-only processing. However, because the value of aural rehabilitation programs is highly dependent on skills generalization, the question still remains whether this form of training can generalize in other ways, for example across different talkers.

Pages:   || 2 | 3 |

Similar works:

«HIJAB, ‘NEW PIETY’ AND THE QUESTION OF AGENCY: A CRITIQUE OF BRONWYN WINTER’S ATHEIST FEMINISM CHLOE PATTON Female Muslim attire has become a sartorial object of contention like no other in democratic Western societies of late. In Australia, social commentator Virginia Haussegger recently launched an impassioned attack on the burqa, labelling it a tool of patriarchal control that is incompatible with Australian values, and the women who wear it ‘feeble’ obscurantists who are...»

«STATEMENT OF THE COMPTROLLER GENERAL OF THE UNITED STATES 37 February 25, 2016 The President The President of the Senate The Speaker of the House of Representatives The federal government reported a unified budget deficit of approximately $439 billion for fiscal year 2015, a decrease of about $45 billion from fiscal year 2014. However, the federal government continues to face an unsustainable long-term fiscal path. To operate as effectively and efficiently as possible and to make difficult...»

«Trauma Construction and Moral Restriction: The Ambiguity of the Holocaust for Israel Jeffrey C. Alexander and Shai M. Dromi The legendary status of the Holocaust as a sacred evil has inspired international human rights law, new restrictions on national sovereignty, and newly powerful moral strictures against ethnic and racial cleansing. Yet, even as this markedly universalizing construction became ever more deeply institutionalized in Western Europe and North America, the Holocaust came to be...»

«DRAFT TM PASSER III-98 APPLICATION AND USER’S GUIDE by Steven Venglar, P.E. Assistant Research Engineer Texas Transportation Institute Peter Koonce Graduate Research Assistant Texas Transportation Institute and Thomas Urbanik II, P.E. Assistant Director Texas Transportation Institute Texas Transportation Institute Texas A&M University System College Station, Texas DRAFT DISCLAIMER The PASSERTM III-98 program was developed by the Texas Transportation Institute of the Texas A&M University...»

«Snapshots of Grammaticalization: Differential Eletrophysiological Responses to Grammatical Anomalies with Increasing L2 Exposure Darren Tanner, Lee Osterhout, and Julia Herschensohn University of Washington * 1. Introduction It is frequently observed that second language (L2) learners show significant problems producing inflectional morphology. For example, many researchers have shown that L2 learners in the early stages of acquisition may show no function words or morphemes in their speech...»

«Pennsylvania Michael Electrification Railroad Strategy Bezilla Pennsylvgnia University, State UniversityPark The Pennsylvania Railroad's conversion of its multitrack mainline connecting New York, Washington, and Harrisburg from steam to electric traction represents the outcome of one of the most significant strategic decisions made by an American railroad in the 20th century. The PRR's northeast corridor electrification, begun in 1928 and completed a decade later, constituted the largest...»

«NOTES FIRST NORTH AMERICAN RECORD OF THE COMMON MOORHEN (GALLINULA CHLOROPUS) CONFIRMED BY MOLECULAR ANALYSIS JACK J. WITHROW, University of Alaska Museum, 907 Yukon Drive, Fairbanks, Alaska 99775; jjwithrow@alaska.edu MICHAEL T. SCHWITTERS, P. O. Box 143, Choteau, Montana 59422; schwit@3rivers.net On 12 October 2010 Schwitters briefly observed a gallinule or moorhen in a small interior wetland on Shemya Island, Alaska (52° 43′ N, 174° 07′ E). In spite of considerable effort, he could not...»

«Received: 29th Feb-2012 Revised: 12th Mar-2012 Accepted: 16th Mar-2012 Research Article SEASONAL CHANGES OF HYDROGRAPHIC PROPERTIES IN SEA WATER OF CORAL REEF ISLANDS, GULF OF MANNAR, INDIA J.S. Yogesh Kumar*, and S. Geetha1 Zoological Survey of India, Andaman and Nicobar Regional Centre, National Coral Reef Research Institute, Port Blair744102, Andaman & Nicobar Islands, India. 1 Wetland Research and Development, Thoothukudi, Tamil Nadu, India *Corresponding Author mail ID –...»

«WORKING PAPER NO. 10-22 DURABLE FINANCIAL REGULATION: MONITORING FINANCIAL INSTRUMENTS AS A COUNTERPART TO REGULATING FINANCIAL INSTITUTIONS Leonard Nakamura Federal Reserve Bank of Philadelphia June 23, 2010 Durable Financial Regulation: Monitoring Financial Instruments as a Counterpart to Regulating Financial Institutions * June 23, 2010 Leonard Nakamura Federal Reserve Bank of Philadelphia Abstract This paper sets forth a discussion framework for the information requirements of systemic...»

«Khmer Ceramics 2 From the dust jacket: Amongst the wealth of ceramic ware from South-East Asia, the ceramics of the Khmer empire of Angkor are probably the least wellknown, just as this aspect of the art and culture of that great civilization is also the least explored. In this important contribution to the growing body of literature on the ceramics of the region, the author presents the first comprehensive study of Khmer ceramics to be published. The study examines the background, influences,...»

«MARK G. PLEW CURRICULUM VITAE OFFICE: HOME: Department of Anthropology 3389 Crosspoint Avenue Boise State University Boise, Idaho 83706 208-426-3444 Fax: (208) 426-4329 Email: mplew@boisestate.edu EDUCATION: 1985 Ph.D., Indiana University, Degree in Anthropology with Archaeological emphasis and minors in Geography (Don C. Bennett) and Museology. Dissertation: “A Prehistoric Settlement Pattern for the Owyhee Uplands, Idaho.” Wesley R. Hurt, Chair M.A., Indiana University, Archaeology....»

«Voice (PPV) Portsmouth Parent Voice (PPV) Run parents for parents of children and young people with additional needs or disability ).IMPACTING ON YOU. NEWSLETTER May 2015 Edition Information, advice and support for parents/carers of children and young people (0-25) with special needs and disability Voice (PPV) CONTENTS Page 2: Content PPV – Editorial Page 3: Page 4: PPV and Autism HampshireCoffee Drop in Page 5: Autism Centre For Employment Page 7: SEND TrainingMental Capacity Act Page 8:...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.