FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 || 3 |

«Audio-Visual Integration: Generalization Across Talkers A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation ...»

-- [ Page 2 ] --

Some evidence suggests that this type of generalization is possible. For example, Richie and Kewley-Port (2008) trained listeners to identify vowels using audiovisual integration techniques. They found that training audio-visual integration was successful and that the trained listeners showed improvement from pre-test to post-test in both syllable recognition and sentence recognition, whereas the untrained listeners did not. More importantly, a substantial degree of generalization across talkers was observed. They suggest that audio-visual speech perception is a skill that, when done appropriately, can be trained to produce benefits to speech perception for persons with hearing impairment. They argued that implementing these techniques into aural rehabilitation could provide an important and effective part of a successful program for hearing impaired individuals.

Present Study The results from Richie and Kewley-Port (2008) offer encouragement for the possibility that across-talker generalization can be obtained. However, the question remains whether similar talker generalization can be observed for the consonant-based degraded stimuli used by Ranta (2010) and DiStefano (2010). The present study addresses this question by providing training in audio-visual speech integration with one set of talkers and testing for integration improvement with a different set of talkers. A group of normal-hearing listeners received ten training sessions in audio-visual perception of speech syllables produced by three talkers. The auditory component of these syllables was degraded in a manner consistent with the signals produced by multichannel cochlear implants (Shannon et al, 1995), similar to the methods used by James (2009), DiStefano (2010), and Ranta (2010). Listeners were periodically tested for improvement in auditory-only, visual-only and audio-visual perception with stimuli produced by both the training talkers and two additional talkers who had not been used in training. Consistent with the results of Richie and Kewley-Port, it was anticipated that integration would improve substantially for the training talkers. A smaller but still noticeable improvement was anticipated for the non-training talkers, reflecting some degree of generalization. Regardless of the results, findings should provide new insights to the limits of generalizability of audio-visual integration training, and how to produce more effective designs for aural rehabilitation programs for hearing impaired patients.

–  –  –

Participants The present study included five listeners, two males and three females, ages 21years. All five had normal hearing as well as normal or corrected vision, by selfreport. Participants were compensated $150 for their participation. Materials previously recorded from five adult talkers, two male and three female native Midwestern English speakers, were used as the stimuli.

Stimuli Selection A limited set of eight syllables were presented, all of which satisfied the following


1. The pairs of stimuli were minimal pairs; the initial consonant was their only

–  –  –

2. All stimuli contained the vowel /ae/, selected because of the lack of lip rounding or lip extension, which can create speech reading difficulties

3. Each category of articulation, including place (bilabial, alveolar velar), manner (stop, fricative, nasal), and voicing (voiced or voiceless), was represented

–  –  –

4. All syllables were presented without a carrier phrase.


The same set of single-syllable stimuli was used for each of the conditions:

–  –  –

The degraded audio-visual conditions included the following four dual-syllable (dubbed) stimuli. The first item in the pair represents the auditory stimulus while the second indicates the visual stimulus.

–  –  –

Stimuli Recording and Editing The stimuli used in this study were identical to those used in recent studies (e.g., James, 2009; DiStefano, 2010; and Ranta, 2010) in order to yield comparable results.

Speech samples from five talkers were degraded using a MATLAB script designed by Delgutte (2003). The speech signal was filtered into two broad spectral bands. Then, the fine structure of each band was replaced with band limited noise, while the temporal envelope remained intact. The resulting stimulus was a 2-channel stimulus, similar to those used by Shannon et al. (1998). Using a commercial video editing program, Video Explosion Deluxe, the degraded auditory stimuli were dubbed onto the visual stimuli.

The final step involved burning the stimulus sets onto DVDs using Sonic MY DVD. Four DVDs were created for each of the five talkers. Each of these DVDs contained sixty stimuli arranged in random order to eliminate the possibility of memorization from the participants.

Visual Presentation All participants were initially pre-tested using degraded auditory, visual and audio-visual conditions, and then received training in all three of these conditions. The visual portion of the stimulus was presented using a 50 cm video monitor positioned approximately 60 cm outside the window of a sound attenuating booth. The monitor was eye level to the participants and positioned about 120 cm away from them. The stimuli were presented using recorded DVDs on a DVD player. During auditory-only presentation the monitor screen was darkened.

Degraded Auditory Presentation The degraded auditory stimuli were presented from the headphone output of the DVD player through 300-ohm TDH-39 headphones at a level of approximately 75 dB SPL.

Testing Procedure Testing was conducted in the Ohio State University’s Speech and Hearing Department located in Pressey Hall. Participants were instructed to read over a set of instructions explaining the procedure and listing a closed-set of response possibilities, which included 14 possible responses. Included in the response set were the 8 presented stimuli along with 6 other possibilities, which reflected McGurk-type fusion and combination responses for the discrepant stimuli. These additional responses included syllables dat, nat, pcat, ptat, bgat and bdat.

Each participant was tested individually in a sound attenuating booth that faced the video monitor located outside of the booth. Auditory stimuli were transmitted through headphones inside the booth. The examiner recorded and scored the participant’s verbal responses as heard via an intercom system. Each participant was initially administered a pre-test including stimuli selected from a set of 15 DVDs, three for each of the five talkers, each DVD containing 60 randomly ordered syllables. In the pre-test, the listeners were presented with one DVD from each talker in each of the three listening conditions (auditory-only, visual-only and audio-visual). Each DVD contained 30 congruent stimuli expected to elicit the correct response. The remaining 30 stimuli were discrepant, designed to elicit McGurk-type responses. Participants were instructed to listen to/watch each DVD and to verbally respond the syllable they perceived for each stimulus. During the pre-test no feedback was provided.

The pre-test was followed by five training sessions in which participants received audio-visual training on two DVDs for each of the three training talkers. When presented with congruent stimuli, if the participant provided the correct response the examiner visually reinforced the response with a head nod. If the response was incorrect the examiner would provide the correct response via an intercom system. For the discrepant stimuli the appropriate responses were as follows, with the first column representing the visual stimulus, the second representing the auditory and the third

representing the expected McGurk-type response:

–  –  –

As with the congruent stimuli, if the participant responded correctly the examiner provided visual reinforcement, whereas, if they responded incorrectly they were told the appropriate McGurk-type response via an intercom system. The decision to use the McGurk-type responses as the appropriate response was made because Ranta’s study provided evidence to support the hypothesis that these responses can be trained and by using these McGurk-type responses we could determine if this training would generalize to other talkers.

Upon completing the five training sessions a mid-test identical to the pre-test was administered. Next, participants had five more training sessions identical to the first five.

Upon completing the additional five training sessions, a post-test identical to the midtest and the pre-test was administered to the participants. Each test took approximately 2-3 hours and the training sessions took approximately 8-10 hours. Training was divided into 1 or 2 sessions at a time. The participants were frequently encouraged to take breaks in order to prevent fatigue.

–  –  –

Results of the pre-test, mid-test and post-test were analyzed to determine whether or not improvements were seen in all three modalities and whether or not these improvements generalized from the training talkers to the testing talkers. Percent correct performance data for the congruent stimuli are presented first, followed by the percent response results for the discrepant stimuli.

Percent Correct Performance Figure 1 displays the averaged results for overall percent correct intelligibility performance in each modality for the auditory-only (A-only), visual-only (V-only) and audio-visual (A+V) (congruent) conditions for each testing situation, pre-test, mid-test and post-test. Results are shown for the stimuli produced by training talkers. Listeners showed improvements from pre-test to post-test in all three modalities. A two-factor repeated measures analysis of variance (ANOVA) was performed on arcsinetransformed percentages to assess the improvements and evaluate whether differences observed across testing sessions were statistically significant. ANOVA results indicated a significant main effect of test (pre vs. post), F(1,4)=50.525, p=.002, as well as a significant main effect of modality (A-only, V-only, A+V), F(2,8)=87.364, p.001. There was no significant interaction found between test and modality, F(2,8)=2.65, p=.13(ns).

Pairwise comparisons were also performed for these data. Results showed that there was no significant difference between the means of A-only and V-only performance, mean difference=.194, p=.015. A significant difference was found between A-only and A+V, mean difference=.456, p=.001, and between V-only and A+V, mean difference=.65, p.001.

It is important to note that the significant improvement from pre-test to post-test in all three modalities generalized to the testing talkers as well, as shown in Figure 2.

Figure 2 shows the results for overall percent correct intelligibility performance in each of the listening conditions, A-only, V-only and A+V, for each testing situation, pre-test, mid-test and post-test, for the talkers not used in the training sessions (i.e., the testing talkers). ANOVA results for the testing talkers revealed a significant main effect of test (pre vs. post), F(1,4)=45.499, p=.003 as well as a significant main effect of modality (Aonly, V-only, A+V), F(2,8)=115.052, p.001. As with the training talkers, there was no significant interaction found between test and modality, F(2,8)=1.431, p=.29 (ns).

Pairwise comparisons also revealed results similar to those of the training talkers. There was no significant difference between A-only and V-only, mean difference=.027, p=.591.

A significant difference was seen between A-only and A+V, mean difference=.550, p.001, as well as between V-only and A+V, mean difference=.523, p.001.

Figures 3-5 display these data in a format allowing easier comparison. In Figure 3 results are shown for percent correct performance in the A-only condition across tests with training and testing talkers, for side-by-side comparison. This graph shows that the listeners improved their performance from pre-test to post-test with both the training talkers and the testing talkers. ANOVA results revealed that there was a significant main effect of test (pre vs. post), F(1,4)=37.440, p=.004 as well as a significant main effect of talker (training vs. testing), F(1,4)=252.066, p.001. In Figure 4, results for the V-only condition are displayed. ANOVA results for these data show a significant effect of test, F(1,4)=141.307, p.001, but no difference across talkers, F(1,4)=.385, p=ns, and no significant interaction, F(1,6)=.234, p=ns. Figure 5 shows data for the A+V condition.

Here no significant effects were observed across tests, F(1,4)=4.550, p=.100, nor across talkers, F(1,4)=4.369, p=.105. Again, no interaction was observed, F(1,4)=1.395, p=.303.

Integration performance with the congruent stimuli across tests is shown in Figure 6. The averages for training talkers and testing talkers are shown. Here integration is defined as the difference between the percent correct in the A+V condition and the best single modality performance (A-only or V-only). Using this measure, the amount of integration actually declines slightly from pre-test to post-test for both the training talkers and the testing talkers. A two-factor ANOVA revealed that there was no significant main effect of test (pre vs. post), F(1,4)=3.642, p=.13. There was also no significant main effect of talker (training vs. testing), F(1,4)=1.359, p=.30. This decrease in integration could be attributed to the fact that the listeners showed greater improvements in the A-only and V-only conditions as compared to the A+V condition.

Figure 7 examines the results for stimuli produced by individual talkers. The pretest and post-test percent correct responses in the A-only condition across listeners this figure shows for the three training talkers as well as the two testing talkers. In this figure it is important to note that training talkers JK and EA and the testing talkers KS and DA all began with similar baseline percent correct intelligibility. However, training talker LG started off with a percent correct intelligibility that was slightly higher than the others and listeners showed a greater improvement in this modality with this talker.

Pages:     | 1 || 3 |

Similar works:

«See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/280076856 Volume'12'Issue'2' 0 6 / 2 0 1 4 Object Pleasures and Job Segregation: Barbers, hairstylists, and the material (be)longings of work Article · June 2014 CITATIONS READS 1 author: Craig Rich Loyola Marymount University 5 PUBLICATIONS 10 CITATIONS SEE PROFILE All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Craig Rich...»

«USAS Convention—Chicago, Ill., September 16–20, 2009 United States Masters Swimming House of Delegates—Minutes Thursday, September 17, 2009, 8:45 a.m. President Rob Copeland called the 2009 House of Delegates to order at 8:47 a.m. CDT. Secretary Meg Smath called the roll for delegates who had not yet been certified. Introduction. President Copeland welcomed delegates to the convention and reminded them that the theme of the convention was ―Moving Forward.‖ He introduced the members of...»

«Codas 1. “Quickly: The Slow Poem” by Jonathan Skinner 2. “The Vocabulary of Taste: Carlo Petrini and the Poetics of Slow Poetry” by Robert Bertholf JONATHAN SKINNER Quickly: The Slow Poem I had been following the Slow Poetry posts on Dale’s blog for awhile, but still had little idea what he was getting at, specifically, with the word “slow.” I understood that he wanted to give a name to his poetic opposition—Slow Poetry, as opposed, presumably, to the speedy compositions of the...»

«301 Startling Proofs & Prophecies Proving That God Exists Unless otherwise noted, all scripture quotations are taken from the King James Bible. 301 Startling Proofs and Prophecies Copyright 1996 by Peter Lalonde and Paul Lalonde Published by Prophecy Partners Inc., PO Box 665, Niagara Falls, Ontario, L2E 6V5, Canada ISBN 0-9680758-0-0 All rights reserved. No portion of this book may be reproduced in any form, except for brief quotations, without the written permission of the publisher. Cloud...»

«LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 1 January 2011 ISSN 1930-2940 Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D. A. R. Fatihi, Ph.D. Lakhan Gusain, Ph.D. K. Karunakaran, Ph.D. Jennifer Marie Bayer, Ph.D. S. M. Ravichandran, Ph.D. G. Baskaran, Ph.D. Oatesian World of Violence and Female Victimization An Autopsy J. Samuel Kirubahar, Ph.D....»

«Application Report SPRAC32 – March 2016 Processor SDK RTOS Customization: Modifying Board Library to Change UART Instance on AM335x Lalindra Jayatilleke ABSTRACT This document describes the procedure to modify the default UART0 example in the AM335x Processor SDK RTOS package to enable UART1. On the BeagleBone Black (BBB) P9 header, pins 24(TX) and 26(RX) are connected to UART1. This procedure shows a test to verify that UART1 is enabled on the BBB. Tutorial Environment • Code Composer...»

«One-Third is Three-Quarters of One-Half Peter Gould Lynne Outhred NSW Department of Education and Training Macquarie University peter.gould@det.nsw.edu.au lynne.outhred@mq.edu.au Michael Mitchelmore Macquarie University mike.mitchelmore@mq.edu.au This paper reports on part of a larger cross-sectional study of the development of students’ quantitative concepts of fractions. In total, 1676 students in Years 4–8 were asked a series of questions designed to elicit their concept images of...»

«CURRICULUM VITAE PER ASLAK MYKLAND October 2009 Department of Statistics mykland@galton.uchicago.edu University of Chicago http://galton.uchicago.edu/∼mykland/ 5734 University Ave. phone: + 1 (773) 702 8044/8333 Chicago, IL 60637 fax: + 1 (773) 702 9810 USA phone in Oxford: +44 (0)1865 616600 EDUCATION Ph.D. University of California, Berkeley, 1989 (Statistics) M.Sc. (Cand.Scient.) University of Bergen (Norway), 1984 (Statistics) B.Sc. (Cand.Mag.) University of Bergen (Norway), 1983...»

«Title: Rim to Rim, Grandview, and other Stuff You May or May Not Care About By: Mark E. Boyer Date: September 2005 E mail: msboyer@acd.net OK so I’ve waited the requisite 23 months for my reservations at Phantom Ranch (at the bottom of Grand Canyon) and I am now ready for my first rim to rim hike in mid September 2005. (For the record I understand they have now changed the time to a maximum of 13 months advance notice) I’ve read many trip reports and I always like to know a few details...»

«PANDUAN RESMI LINUX MINT 13 EDISI MATE Alih Bahasa Oleh SAHRI RIZA UMAMI i Panduan Resmi Pengguna Linux Mint DAFTAR ISI Pengantar Linux Mint Sejarah Tujuan Nomor versi dan Nama sandi Edisi Dimana mencari bantuan Pemasangan Linux Mint Mengunduh ISO Melalui Torrent Memasang sebuah klien Torrent Mengunduh berkas Torrent Melalui sebuah cermin unduh (download mirror) Membaca Catatan Rilis Memeriksa MD5 Membakar (burn) ISO ke DVD Boot LiveDVD Pasang Linux Mint ke Hard Drive Urutan Boot Pengantar...»

«Ornella Rovetta Girona 2014: Arxius i Industries Culturals DOCUMENTING AND JUDGING INTERNATIONAL CRIMES IN THE BELGIAN JUDICIAL ARCHIVES (1914-2014): DIGITIZATION CHALLENGES AND POSSIBILITIES Ornella Rovetta Postdoctoral researcher, Université Libre de Bruxelles Summary This paper presents a digital database project that seeks to identify, to describe and to digitize Belgian judicial archives related to war crimes, crimes against humanity and genocide. It focuses on the records produced by...»

«CHILD PORNOGRAPHER, LARRY FLYNT, ET AL: A CLEAR AND PRESENT DANGER TO CHILDREN DRAFT IN PROGRESS by Judith A Reisman, PhD, Former Principal Investigator of Images of Children, Crime & Violence in Playboy, Penthouse and Hustler, 1989, US Dpt of Justice, Juvenile Justice and Delinquency Prevention, Grant No. 84-JN-AX-K007. Image censored based on “mirror” and other neuronal processing data. Above is one of Larry Flynt’s many “cartoons” “jokes” about sexual assault of children, dated...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.