WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 12 |

«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 6 ] --

However, the results of MANOVA indicated that the phonetic distribution across the sets did not differ significantly for either female or male speakers, nor did the number of selected frames. A MANOVA testing differences across the acoustic features, in particular linear frequency cepstral coefficients (LFCCs), delta LFCCs and, delta-delta LFCCs did show some significant differences in the case of LFCCs and delta LFCCs.

Chapter 3 Speaker-Dependent System Performance The first component of this thesis work is to establish the effects that inherent speaker qualities have on automatic speaker recognition system performance. To this end, I analyze scores from a GMM-UBM system, as well as UBM-GMM system with simplified factor analysis, in several ways. I begin with a small subset of data with limited channel variability, and gradually extend this to further exploration.

3.1 Preliminary UBM-GMM System Analysis 3.1.1 System and Data The corpus under investigation in the following analysis is Switchboard-1 [45]. This corpus of conversational telephone speech, which has roughly 2.5 minutes of speech per conversation side, was chosen for several reasons. First, there is less channel variability than in more recently collected corpora. This is desirable for my analysis because my focus is on intrinsic speaker effects, rather than extrinsic factors like channel. Second, there is a variety of information available for the speakers, including age, education level, and dialect area.

In order to further control for channel effects, I consider only those conversation sides with electret handset labels (as determined by SRI’s automatic handset labeler). This results in 3429 conversation sides from 407 speakers, of whom 199 are female and 208 are male. For my analysis, I obtain the full set of one conversation side training and testing scores, i.e., training on each conversation side, and testing every model against every conversation side, for a total of 11,754,612 trials (not including the trials where the train and test conversation sides are the same). Of these, 38,676 are target trials.

The automatic speaker recognition system used for this data is a basic cepstral genderindependent UBM-GMM. Specifically, the input features are 12th order MFCCs plus energy,

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 26

with deltas and double-deltas, with CMS applied. There are 1024 Gaussian mixtures, and the UBM is trained using a small set of 286 conversation sides from the Fisher corpus [17], a conversational speech corpus collected on the telephone. This set was chosen to be balanced in terms of sex and handset type. The conversations are about 5 minutes in length, so each conversation side contains roughly 2.5 minutes. I use SRI’s UBM-GMM system implementation [37].

For additional channel compensation, I apply T-norm to this UBM-GMM system, using conversation sides from Fisher and Switchboard-1 (separate from conversation sides used for the aforementioned Switchboard-1 experimental set) as the impostor cohort. There are 327 impostor models in total, 163 female and 164 male.

3.1.2 Speaker Subset Due to the large number of trials in this experiment, it is not feasible to visualize all of the scores at once. However, it is informative to consider a confusion matrix in order to see how the system scores vary depending on the speaker(s). Thus, limiting the speakers to those with 10 electret conversation sides, I obtain a set of scores for 15 male speakers and 19 female speakers, with a total of 340 conversation sides. A plot of the scores for these speakers is shown in Figure 3.1 for the UBM-GMM system without T-norm applied. The blocks of 10 conversation sides are labeled according to speaker number. The first 15 speakers are male, and the last 19 are female (labels 16-34). Thus, the target trials correspond to 10x10 blocks along the diagonal, with impostor trials elsewhere. The lower left and upper right quadrants are same-sex trials (male and female, respectively), while the upper left and lower right quadrants correspond to mixed-sex trials. The male only and female only quadrants are shown in Figures 3.2 and 3.3 for closer examination.

One thing to notice is the variation among target trial scores. Different speakers vary in the degree to which their target trials produce high scores. For instance, male speaker 14 and female speaker 29 tend to have higher target trial scores, while female speaker 33 tends to have lower target scores. Furthermore, speakers vary in the degree of consistency across their target scores. While some speakers appear to have fairly similar scores across all target trials, e.g. male speaker 3 and female speaker 16, others have much more variation in the range of target scores, e.g. male speaker 13 and female speaker 20.

In terms of impostor trials, it is also clear that scores are more confusable for certain speaker pairs, such as male speakers 3 and 5 or female speakers 19 and 29, and less confusable for other speaker pairs, such as male speakers 5 and 13 or female speakers 16 and 20.

Additionally, we can observe tendencies across the same speaker to produce higher or lower scores as the impostor model or test segment. Those speakers with higher scores as the target model (column blocks) are potential lambs, while those speakers with higher scores as the test segment speaker (row blocks) are potential wolves. Another observation of note is that some higher scores are even produced for mixed-sex trials, such as those for male speaker





8. Finally, it is apparent that scores are not symmetric, indicating that for the UBM-GMM

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 27

–  –  –

system, there is a dependence which conversation is used to train the target model.

To further get a sense of speaker behavior, I average the scores in various ways. First, consider on the true speaker trials. For each target model (of each speaker), I average the true speaker scores over the nine such trials for that target model. A plot of these averages is shown across all male and all female speakers, in Figures 3.4 and 3.5, respectively. Male speaker 14 has high average true speaker scores (an observation consistent with the notes from the confusion matrix plot), while male speakers 9 and 10 tend to have average true speaker scores on the lower end of the range. Similarly, female speakers 21 and 29 have higher average target scores and female speaker 33 has lower averages; again, such observations are consistent with those made from examination of the plot of the confusion matrix, though Figures 3.4 and 3.5 are better able to give a sense of the relative performance of speakers as true targets. It is interesting to consider the differences between averages for different target models corresponding to the same speaker. In certain cases, there are outlier target models, whose average true speaker scores are much lower than the rest, as with male speakers 2 and 13 and female speakers 20 and 24. Female speaker 22 appears to have two sets of target models, which cluster among relatively higher or lower average true speaker scores. Male speaker 3 and female speakers 19 and 23 appear to be the most consistent across target models. Clearly, the degree of consistency across true speaker trial scores is a factor in how difficult it is for a system to make a correct decision about whether the train and test speakers are the same.

Next, for a given target speaker, I average the impostor speaker scores for every test segment over all the target models of the given speaker. To begin, Figure 3.6 shows these average impostor scores for four of the male speakers. The plots in the figure contain clusters of points denoted by a symbol+color combination. Each symbol+color combination designates one particular (impostor) test speaker, and each point with that symbol+color combination corresponds to one conversation side of that test speaker. The value at each point is the average score for the given impostor conversation side, averaged over all models of the target speaker (who is the constant across all points). If the average impostor scores are typically on the high end of the range over all impostor speakers, this suggests the target speaker in question has lamb-ish tendencies, i.e., a tendency to produce high impostor scores as the target model.

Among the male target speakers, male speakers 1 and 3 have a lot of variation across the average scores for different test segments of the same impostor speaker. Speaker 3 appears to have more lamb-ish qualities, with higher average impostor scores across several impostor speakers. On the other hand, speakers 8 and 15 appear to have greater consistency in average scores across test segments of the same impostor speaker. Speaker 15 is the least lamb-ish, with fairly low average impostor scores over all impostor speakers. In many instances, the target speakers produce average impostor scores that vary across impostor speakers.

Figure 3.7 shows similar plots of average impostor scores for four of the female speakers.

Examination of the female speaker plots indicates the most lamb-ish tendencies for speaker 18, and the least lamb-ish for speaker 20. Female speaker 18 shows a great deal of variation

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 31

–  –  –

0.5 0.4 0.3 0.2 0.1 −0.1

–  –  –

0.5 0.4 0.3 0.2 0.1 −0.1

–  –  –

−0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.5 −0.5 −0.6 −0.6

–  –  –

−0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.5 −0.5 −0.6 −0.6

–  –  –

Figure 3.6: Average scores for each impostor test segment, averaged over all target models of male speakers 1, 3, 8, and 15.

Each color+symbol combination designates a particular (impostor) test speaker, whose corresponding speaker number is labeled on the abscissa. Each individual point within a color+symbol combination corresponds to a particular test utterance of that test speaker.

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 34

in average scores across impostor speakers, and impostor test segments of the same speaker.

In contrast, female speaker 33 (and to a lesser extent speaker 20) shows very similar average scores across most of the impostor speakers.

These plots clearly show different types of score distributions depending on the speaker.

While some speakers produce low impostor scores and high target scores, making them less likely to cause errors given a threshold, other speakers have tendencies towards high impostor scores, or a wide range of target scores, making them more likely to produce false alarms or false rejections. Furthermore, differences have been observed not only at the speaker level, but also at the level of train and test conversation sides.

3.1.3 All Electret Trials In order to see and analyze more speaker data, I extend the data to include all speakers, and all trials with conversation sides labeled as electret, using scores from the UBM-GMM system with T-norm. Again, the aim of such analysis is to gain better understanding of the different types of speaker behavior.

Some interesting scatter plots are shown below for female speakers. Figure 3.8 shows the average impostor score for each impostor speaker (averaged over all targets) versus the average impostor score for each target speaker (averaged over all impostors); these values have a correlation coefficient ρ = 0.598, implying that lambs (target speakers with high impostor scores) also have a tendency to be wolves (test speakers with high impostor scores). This correlation is reasonable since the same speaker pairs are used in the trials for calculating both impostor score averages; the only difference between the averages is whether the constant speaker is the target or the impostor. Furthermore, if false acceptance errors are caused by “average” speakers being confusable, then it makes sense that an “average” speaker would be confusable with other speakers, both as the target and as the test.

Figure 3.9 shows the average impostor score versus the average target score for speakers as the target speaker.

With a correlation coefficient of ρ = −0.485, the implication is that goats (speakers with low target scores) also have a possibility of being lambs (target speakers with high impostor scores).

The same plots are shown for male speakers in Figures 3.10 and 3.11.

In the first plot, showing average impostor score as the test versus average impostor score as the target, there is an even higher correlation of ρ = 0.682, again suggesting that lamb-ish and wolf-ish behavior are related.

The second plot, showing average impostor score versus average target score as the target speaker, yields a smaller correlation in the male case, with ρ = −0.277, though it is still negative. There is less evidence to suggest that goats may have a tendency to also be lambs. It is possible that the correlations in these plots may be due to other factors, such as differing numbers of target and impostor trials per speaker, or poor audio quality for some conversation sides.

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 35

–  –  –

−0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.5 −0.5 −0.6 −0.6

–  –  –

−0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.5 −0.5 −0.6 −0.6

–  –  –

Figure 3.7: Average scores for each impostor test segment, averaged over all target models of female speakers 3, 5, 6, and 18.

Each color+symbol combination designates a particular (impostor) test speaker, whose corresponding speaker number is labeled on the abscissa. Each individual point within a color+symbol combination corresponds to a particular test utterance of that test speaker.

CHAPTER 3. SPEAKER-DEPENDENT SYSTEM PERFORMANCE 36

–  –  –

−0.5 −1 −1.5

–  –  –

−0.5 −1 −1.5

–  –  –



Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 12 |


Similar works:

«Word & World Volume 22, Number 2 Spring 2002 Ephesians and Stoic Physics DAVID E. FREDRICKSON S BODY A HIERARCHICAL CONCEPT? IN EPHESIANS, ONE TYPE OF ANALYSIS AFfirms that it is by pointing to the social function of the head/body motif in the letter. Then it assembles parallels from Greco-Roman philosophy and literature that speak either of spatial arrangements (head/up and body/down) or social organizations in which the head represents leadership and the body followers.1 Often at this point,...»

«Contested Humanity: Blackness and the Educative Remaking of the Human in the Twentieth Century By Ronald Kenneth Porter A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Education and the Designated Emphasis in Critical Theory in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Daniel H. Perlstein, Chair Professor Zeus Leonardo Professor Nelson Maldonado-Torres Spring 2012 Contested...»

«REASONS AGAINST BELIEF: A THEORY OF EPISTEMIC DEFEAT by Timothy D. Loughlin A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy Major: Philosophy Under the Supervision of Professor Albert Casullo Lincoln, Nebraska May, 2015 REASONS AGAINST BELIEF: A THEORY OF EPISTEMIC DEFEAT Timothy D. Loughlin, Ph.D. University of Nebraska, 2015 Adviser: Albert Casullo Despite its central...»

«MECHANICAL PROPERTY DETERMINATION FOR FLEXIBLE MATERIAL SYSTEMS A Thesis Presented to The Academic Faculty by Jeremy Lee Hill In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Daniel Guggenheim School of Aerospace Engineering Georgia Institute of Technology May 2016 Copyright c 2016 by Jeremy Lee Hill MECHANICAL PROPERTY DETERMINATION FOR FLEXIBLE MATERIAL SYSTEMS Approved by: Dr. Robert D. Braun, Advisor Dr. Christopher L. Tanner Daniel Guggenheim School of...»

«Symptom Experience Following Lung Cancer Surgery by Kathleen Garrubba Hopkins Bachelors of Science, University of Pittsburgh, 1978 Master of Science, Industrial Engineering, University of Pittsburgh, 1982 Associates Degree, Community College of Allegheny County, 2005 Submitted to the Graduate Faculty of School of Nursing in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh © 2014 UNIVERSITY OF PITTSBURGH School of Nursing This dissertation...»

«CHARACTERISTICS OF THE HIGH SPEED GAS-LIQUID INTERFACE Christopher Jude Weiland Dissertation submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Mechanical Engineering Pavlos P. Vlachos, Committee Chair Jon J. Yagla, Member Eugene F. Brown, Member Demetri P. Telionis, Member Andrew T. Duggleby, Member 12.02.2009 Blacksburg, Virginia Keywords: Submerged Round Gas Jet, Submerged...»

«Slurring Words1 Luvell Anderson and Ernie Lepore Increasingly philosophers of language have been turning their attention to a phenomenon not much explored in the past. Racial and ethnic slurs have become an important topic, not only for the sake of theorizing about them adequately but for the implications they have on other well-worn areas of interest within the discipline. For instance, in “Reference, Inference, and The Semantics of Pejoratives” Timothy Williamson discusses the merits of...»

«Madhyamaka is Not Nihilism Jay L Garfield Smith College University of Melbourne Central University of Tibetan Studies Introduction Ngrjuna (c. 200 CE) is the founder of the Madhyamaka school of Buddhist philosophy, and easily, after the Buddha himself, the most influential philosopher in the Mahyna Buddhist tradition. Despite the great consensus on his philosophical and doctrinal importance, there is little consensus, either in the canonical Buddhist and non-Buddhist literature of...»

«“THEY WASN’T MAKIN’ MY KINDA MUSIC”: HIP-HOP, SCHOOLING, AND MUSIC EDUCATION By Adam J. Kruse A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Music Education—Doctor of Philosophy ABSTRACT “THEY WASN’T MAKIN’ MY KINDA MUSIC”: HIP-HOP, SCHOOLING, AND MUSIC EDUCATION By Adam J. Kruse With the ambition of informing place consciousness in music education by better understanding the social contexts of hip-hop music...»

«CASE STUDIES: AFRICAN AMERICAN HOMESCHOOLERS: WHO ARE THEY AND WHY DO THEY OPT TO HOMESCHOOL? BY SHEILA L. SHERMAN A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY K-12 Educational Administration 2012 ABSTRACT CASE STUDIES: AFRICAN AMERICAN HOMESCHOOLERS: WHO ARE THEY AND WHY DO THEY OPT TO HOMESCHOOL? BY SHEILA L.SHERMAN Homeschooling is not an aberration but a phenomenon which many scholars believe to be...»

«KENYAN ART MUSIC IN KENYA’S HIGH SCHOOL GENERAL MUSIC CURRICULUM: A RATIONALE FOR FOLK-SONG BASED CHORAL MUSIC By DUNCAN MIANO WAMBUGU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2012 Duncan Miano Wambugu To the memory of my late father, Railton D Wambugu, who started me on this journey, and to my dearest mother, Margaret Wambugu, who ensured I completed...»

«TEL AVIV UNIVERSITY THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING The Zandman-Slaner Graduate School of Engineering Prior based Image Segmentation By Tamar Riklin-Raviv THESIS SUBMITTED FOR THE DEGREE OF “DOCTOR OF PHILOSOPHY” SUBMITTED TO THE SENATE OF TEL-AVIV UNIVERSITY August 2007 TEL AVIV UNIVERSITY THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING The Zandman-Slaner Graduate School of Engineering Prior based Image Segmentation By Tamar Riklin-Raviv THESIS SUBMITTED FOR THE...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.