FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 5 | 6 || 8 | 9 |   ...   | 12 |

«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 7 ] --

Next, I consider a number of plots focusing on showing evidence of goat-, lamb-, and wolf-type speaker populations, similar to those shown in the prior work of Doddington, et al.

[22]. The first plot, addressing goat-like tendencies, that is, causing missed detection errors in true speaker trials, is shown in Figures 3.12 and 3.13 for males and females, respectively.

Here, the average true speaker score is plotted against the number of true speaker trials for each speaker. In these plots a large amount of outlying average target scores would indicate greater variability across speakers. However, it is not clear in either plot that there are more outliers than would be expected if the target score distribution did not depend on the speaker, though there appear to be a handful of goat-ish speakers among females with fewer than five target trials (indicated by those points showing the lowest average target scores).

To look for a population of lambs, namely those speakers who cause false alarm errors as target speakers, I plot the true speaker scores for a target model against the highest impostor score for that target model. This plot is shown in Figure 3.14 for males, and in Figure 3.15 for female speakers. In both male and female plots there is a large cluster of points indicating speakers without lamb-ish tendencies, i.e., those with maximum impostor scores less than, or on par with, true speaker scores. However, there are also many instances showing maximum impostor scores greater than the target scores for target models, and also greater than most other maximum impostor scores, suggesting lamb-like tendencies for some speakers.

Finally, in Figures 3.16 and 3.17, I plot the average maximum impostor score against the number of test conversation sides for each impostor speaker, for male and female speakers, respectively. As was the case with the earlier plots of average target scores, there is no clear evidence that the average maximum impostor score distributions are speaker-dependent.

Interestingly, there seem to be more outliers on the low end, i.e., with low average maximum impostor scores, than on the high end (which would indicate wolf-ish tendencies).

3.1.4 Effects of Speaker Demographics on System Scores Continuing with the UBM-GMM system with T-norm, using Switchboard-1 electret conversation sides, I now switch focus to consider whether speaker demographics are evident in system scores. For the Switchboard-1 corpus, the following information is available for each speaker: sex, birth year, education level, and dialect area. The possible education levels are less than high school, less than college, college, and more than college. The dialect area corresponds to the region where the speaker lived for his first 10 years; the possible areas include New England, North Midland, South Midland, Western, New York City, Northern, Southern, and Mixed. In order to assess what characteristics have an impact on the scores produced by the system, I performed an analysis of variance (ANOVA) test for a number of different score distributions, described below. In each case, the probability (p) given to show significance level is the probability of being incorrect in concluding that the distributions are not the same.

Since trial independence is an incorrect assumption, target scores were averaged for target


–  –  –

Average true trial score Average true trial score 0.8 0.6 0.4 0.2 −0.2

–  –  –

0.8 0.6 0.4 0.2 −0.2

–  –  –

speakers over all target trials for each speaker before ANOVA analysis was done. I first looked at the target scores for female speakers compared to the target scores for male speakers. In this case, I found that the distribution of male target scores differed significantly from the distribution of female target scores (p 0.01), with male target trials having a higher average score. Next, for female and male speakers separately, I considered the effect of age, education level, and dialect. The target score distributions for different age groups (20-29, 30-39, 40-49, and 50-69) did show a significant difference (p = 0.054 for females and p = 0.013 for males), meaning that the score distribution for at least one age group differed from the rest. However, a pair-wise comparison test (designed to keep the total probability of error to less than 10%) showed that significant differences only occurred between two pairs of distributions: 20-29 versus 30-39 (for males only) and 20-29 versus 50-69 (for both males and females). Education level did not result in differing target score distributions for either sex. Finally, although there appeared to be some differences in distributions for different dialects, only males showed a significant difference (i.e., at least one dialects distribution was different, with p = 0.013), and pair-wise comparisons found only two pairs of dialects to have significantly different score distributions (New England versus New York City and Northern versus New York City).

For impostor scores, the assumption of trial independence is again incorrect. In this case, I considered three different approaches: an assumption of target-test speaker pair independence, wherein impostor scores are averaged for each target-test speaker pair; an assumption of target speaker independence, wherein impostor scores are averaged for each target speaker; and an assumption of impostor speaker independence, wherein impostor scores are averaged for each impostor speaker. As would be expected, there is a significant difference in score distributions for same-sex speaker trials and different-sex speaker trials (p 0.001 for all averaging approaches). When comparing scores for which the target and impostor speaker have an age difference of 5 years or less to scores for which the age difference between speakers is greater than 5 years, there is also a significant difference for both females and males (p 0.001 when averaging for each speaker pair or for impostor speakers, p 0.028 when averaging for target speakers). A comparison of the scores where the target and impostor speakers have the same education level to scores where the speakers have different education levels did not show any significant differences for either sex. Finally, looking at trials with speakers of the same dialect area versus trials with speakers of different dialect areas, there were significant differences when treating the speaker pairs independently (p = 0.079 for females and p = 0.013 for males), and for females when treating the impostor speakers independently (p = 0.063). Perhaps more significant differences are not found in this case because the dialect region information collected does not accurately reflect dialectal differences for all the speakers.


3.2 Analysis of Recent System and Data Set I now move on from Switchboard-1 analysis to the more recent SRE08 corpus, which contains greater degrees of channel variability. The SRE08 short2-short3 condition uses roughly 2.5-3 minutes of speech for both training and testing [53]. This speech may be taken from one side of a conversation between two people, or from part of an interview.

Furthermore, the data includes both telephone and microphone channels (there are 14 types of microphones).

Using the short2 and short3 conversation sides, I generate a set of trials different from those used in the NIST evaluation. For my purposes, I use conversation sides from all speakers with at least 5 available speech utterances. In some cases, the same conversation side was recorded on multiple channels (telephone and microphones, or just microphones).

In these cases, I selected only one instance of that conversation side, in order to prevent the introduction of confounding factors due to having the same lexical content across different speech samples. There are 416 speakers (256 female, 160 male), with 3049 conversation sides, and a total of 22,210 target trials. For each impostor speaker pair, five impostor trials are chosen (along with the corresponding trials that have the train and test data switched), for a total of 453,600 impostor trials.

In order to better address the effects of channel variability, I use a UBM-GMM system with simplified factor analysis applied, implemented with the ALIZE toolkit [9]. The UBM is trained using 1553 conversation sides from Fisher and Switchboard-2. The rank 70 eigenchannel U matrix for simplified factor analysis is trained using 1900 conversation sides from SRE04 telephone data (99 speakers with 10 conversation sides each) and SRE05 microphone data (91 speakers with 10 conversation sides each). For the given set of trials, the system has a minimum DCF of 0.382 and an EER of 8.93%.

3.2.1 Target Trials and Goat-ish Behavior I begin by performing an analysis of variance (ANOVA) test using all target trial scores for each speaker in order to determine if there is a speaker effect on the means. With a resulting p 0.001, the null hypothesis that the target scores come from the same (speakerindependent) distribution can be rejected. Figure 3.18 shows a box plot for the male target scores, by speaker. It is clear that the distributions vary across speakers in this case.

Similarly, application of the Bartlett multiple-sample test for equal variances to the target scores also rejects the hypothesis that the scores come from normal distributions with the same variance.

Next, I perform a Kruskal-Wallis test, a non-parametric analysis of variance test that uses ranks and avoids the need for an assumption that the scores are normally distributed.

Once again, the results of such a test for the target scores are conclusive in rejecting the null hypothesis that the score distributions do not depend on speaker, with p 0.001.


–  –  –

3.2.2 Impostor Trials and Lamb-ish or Wolf-ish Behavior After averaging impostor scores for each impostor speaker pair, I considered both the set of these average impostor scores for each target speaker (looking for lambs) and the set of average impostor scores for each test speaker (looking for wolves). In both cases, application of ANOVA did not reject the null hypothesis (p 0.44 for female, male, and all speakers).

Similarly, the Kruskal-Wallis Test did not reject the null hypothesis that these scores do not depend on the speaker, though the female speakers came closest to significant differences, with p = 0.11.

3.2.3 Distribution of Errors Across Speakers Using the threshold corresponding to the minimum DCF, errors for each speaker are counted. In particular, I count the number of false rejections (to find goats), the number of false acceptance errors as the target speaker (to find lambs), and the number of false acceptance errors as the test speaker (to find wolves). Cumulative distributions of these errors are plotted for female and male speakers in Figures 3.19 and 3.20, respectively.

There is a very speaker-dependent distribution of errors for female speakers, for all three types of errors. In the case of false rejections, 50% of the errors are due to 38, or roughly 15% of the speakers. This is even more drastic for false acceptances as the target speaker, for which 18, or roughly 7% of the speakers cause 50% of the errors. For false acceptances as the test speaker, 61, or about 24% of speakers account for 50% of the errors.

The story is similar for male speakers. Once again, a speaker-dependent distribution of missed detection errors is observed, with 23, or about 14%, of the speakers producing 50% of the errors. Only 25, or 16%, of the speakers account for 50% of the false alarms as targets, while 33, or 21%, of the speakers produce 50% of false alarms as impostor speakers.

The uneven distribution of errors across speakers suggest goat-like, lamb-like, and wolflike tendencies for both male and female speakers.

3.3 Discussion The examination and analysis of system scores presented here has demonstrated that automatic speaker recognition system performance is dependent on the speakers. Speakers may be difficult to correctly verify as the true speaker, and speakers may generate high impostor scores, as either the target speaker, the test speaker, or both.

However, I have also observed a dependence on which segments are selected for training and testing; certain conversation side train-test pairings may produce errors, while others corresponding to the same speaker or speaker pair may not, and scores are not symmetric for a given pair of conversation sides (i.e., switching which utterance is used to train the target model will change the score). Such results suggest that any attempts to predict or use information about how a system will respond to speakers may need to take an approach involving


Cumulative false alarm errors (in %) Cumulative false rejection errors (in %)

–  –  –

Figure 3.20: Cumulative distribution of errors across male speakers, for false rejections, false acceptances as the target, and false acceptances as the impostor.


conversation pairs. At the same time, averaging scores over sets of trials corresponding to a speaker can give a better sense of overall tendencies.

Furthermore, I have observed that there can often be a large degree of variation across speaker pairs; for the same target speaker, impostor scores may change significantly from impostor speaker to impostor speaker. As such, I move away from the separate concepts of lamb and wolf, into a discussion of difficult-to-distinguish impostor speaker pairs, i.e., those pairs for whom the system is likely to produce false alarm errors. At the same time, it is useful to keep in mind that within a given speaker population, there may well be an overall tendency for a particular speaker to cause false alarms, for a number of speaker pairings.

Pages:     | 1 |   ...   | 5 | 6 || 8 | 9 |   ...   | 12 |

Similar works:

«CURE KINETICS OF WOOD PHENOL-FORMALDEHYDE SYSTEMS By JINWU WANG A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY Department of Civil and Environmental Engineering May 2007 To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of JINWU WANG find it satisfactory and recommend that it be accepted. _ Co-Chair _ Co-Chair _ _ ii ACKNOWLEDGEMENT Many...»

«SCHOOL OF THE BUILT ENVIRONMENT COLLEGE OF SCIENCE AND TECHNOLOGY,UNIVERSITY OF SALFORD, SALFORD, UK IMPROVING THE LIBYAN CUSTOMERS' TRUST AND ACCEPTANCE FOR ONLINE BANKING TECHNOLOGY Ahmed E A Mohamed Submitted in Partial Fulfilment of the Requirements of the Degree of Doctor of Philosophy, June 2013 i Table of Contents List of Tables Table of Figures List of Abbreviations Acknowledgements Dedication Abstract Chapter 1: Introduction 1.1 Background to the Study 1.2 Overview of Libya 1.2.1...»

«Loughborough University Institutional Repository Invented tradition and translated practices: the career of Tai Chi in China and the West This item was submitted to Loughborough University's Institutional Repository by the/an author.Additional Information: • A Doctoral Thesis. Submitted in partial fulllment of the requirements for the award of Doctor of Philosophy of Loughborough University. https://dspace.lboro.ac.uk/2134/6865 Metadata Record: c Gehao Zhang Publisher: Please cite the...»

«Ionic Liquid Based Polymer Gel Electrolytes A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Keun Hyung Lee IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Advisers: C. Daniel Frisbie and Timothy P. Lodge November 2012 © Keun Hyung Lee 2012 Acknowledgements This thesis would not have been completed without the help and support of many people. First and foremost, I would like to thank my advisors, Professors Dan...»

«Isolated Experiences: Gilles Deleuze and the Solitudes of Reversed Platonism James Brusseau Facultad de Filosofia y Letras Universidad Nacional Autonoma de Mexico Acknowledgement I acknowledge Professor Alphonso Lingis for his contributions to this work. The task of contemporary philosophy has been defined: the reversal of Platonism.Gilles Deleuze, 1968 Difference and Repetition Contents Introduction I Difference 1. Difference As Production And Limitation. 2. The Eternal Return Does Difference:...»

«Designing Statistical Language Learners: Experiments on Noun Compounds Mark Lauer Department of Computing Macquarie University NSW 2109 Australia Submitted in Partial Ful llment of the Requirements of the Degree of Doctor of Philosophy December, 1995 Copyright c Mark Lauer, 1995 To Lesley Johnston, without whom nothing good can ever come. Abstract Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic...»

«Famous Writers More or Less The Beat Generation as a Literary Coterie by Christopher Graham Challis Thesis Submitted to the Department of English University of Leicester For the degree of Doctor of Philosophy UMI Number: U439544 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if...»

«PLACEMENT TESTING AND MORPHOSYNTACTIC DEVELOPMENT IN SECOND LANGUAGE LEARNERS OF ENGLISH by Patti A. Spinner B.A., Rutgers University, 1995 M.A., Ohio State University, 1999 Submitted to the Graduate Faculty of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh UNIVERSITY OF PITTSBURGH ARTS AND SCIENCES This dissertation was presented by Patti A. Spinner It was defended on July 23, 2007 and approved by Dr. G. Richard...»

«A Commentar y on the Upanishads by Swami Nirmalananda Giri 1 © Atma Jyoti Ashram http://www.atmajyoti.org http://blog.atmajyoti.org 2 A Commentar y on the Isha Upanishad Seeing All Things in God Introduction to the Upanishads The sacred scriptures of India are vast. Their importance is ranked differently according to the particular viewpoint of the individual. In Hinduism there are six darshanas, or systems of philosophy. They often seem to contradict themselves (and their professed adherents...»


«RAJP111097 (NT) Australasian Journal of Philosophy Vol. 83, No. 2, pp. 241 – 251; June 2005 IN DEFENCE OF SCEPTICAL THEISM A REPLY TO ALMEIDA AND OPPY Michael Bergmann and Michael Rea Some evidential arguments from evil rely on an inference of the following sort: ‘If, after thinking hard, we can’t think of any God-justifying reason for permitting some horrific evil then it is likely that there is no such reason’. Sceptical theists, us included, say that this inference is not a good one...»

«EXPLORING THE MULTI-FACTORIAL MANIFESTATIONS OF JOINT HYPERMOBILITY SYNDROME AND THE IMPACT ON QUALITY OF LIFE Carol Clark A thesis submitted in partial fulfillment of the requirements of Bournemouth University for the degree of Doctor of Philosophy [June, 2012] Bournemouth University COPYRIGHT STATEMENT This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and due acknowledgement must always be...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.