WWW.DISSERTATION.XLIBX.INFO FREE ELECTRONIC LIBRARY - Dissertations, online materials

<< HOME
CONTACTS

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 12 |

# «A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 9 ] --

4.2 Results System performance for the selected speaker pairs is reported using the minimum detection cost function (DCF) and false alarm (FA) rate, since I am concerned with ﬁnding diﬃcult-to-distinguish impostor pairs. The DCF is deﬁned as a weighted sum of the miss (i.e., not identifying a target speaker match) and false alarm (i.e., identifying an impostor

speaker as the target speaker) error probabilities:

DCF = CMiss × PMiss|Target × PTarget + CFalseAlarm × PFalseAlarm|NonTarget × (1 − PTarget ) (4.2) In Equation (4.2), CMiss and CFalseAlarm are the relative costs of detection errors, and PTarget is the a priori probability of the speciﬁed target speaker. SRE08 used CMiss = 10, CFalseAlarm = 1, and PTarget = 0.01.

For a given decision threshold, the FA rate is deﬁned as:

number of false alarm errors PFalseAlarm = (4.3) total number of nontarget trials For each speaker recognition system, I compute the percent diﬀerence in minimum DCF for the most (and least) similar speaker pairs relative to the minimum DCF over all speaker pairs. Relative to a FA rate of 1% on all speaker pairs, I calculate the percent diﬀerence in FA rate (at the decision threshold yielding 1% FA on all trials) for the most (and least) similar pairs. These relative diﬀerences are then averaged over all systems. With each featuremeasure, if more similar (i.e., closer) speaker pairs correspond to diﬃcult-to-distinguish speaker pairs, then diﬀerences in the DCF and FA rate should be positive and signiﬁcant. The converse holds for less similar speaker pairs, which will have signiﬁcant negative diﬀerences if they are easier for systems to distinguish.

Figures 4.1 and 4.

2 show performance diﬀerences for the top 1% most and least similar speaker pairs, respectively. For each feature group, the feature-measure pair yielding the largest DCF and FA changes is presented. Similarly, Figures 4.3 and 4.4 show results when considering the top 5% most and least similar speaker pairs, respectively.

Features of each type can select speaker pairs for which the most (or least) similar have worse (or better) performance than all speaker pairs. Furthermore, this diﬀerence in performance typically increases when a smaller fraction of speaker pairs is used, i.e., there is a bigger diﬀerence for the most similar 1% of speaker pairs than for the most similar 5%.

It should be noted that diﬀerences in performance are not uniform across diﬀerent speaker veriﬁcation systems.

The feature-measure that yields the largest average diﬀerence in performance for the 1% most similar speaker pairs is the Euclidean distance between vectors of the mean ﬁrst, second, and third formant frequencies. The next best feature-measures include other formant-based measures, the percent diﬀerence of median energy, and the correlation of histograms of LPC freqencies with minimum magnitude requirement. For the 1% least similar speaker pairs, results are fairly similar across feature-measures, with the correlation of LPC frequency

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 61

–  –  –

Figure 4.4: Relative diﬀerences in DCF and FA rate for the least similar 5% of speaker pairs, compared to all speaker pairs.

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 65

histograms and spectral slope yielding the smallest diﬀerences. The Euclidean distance between vectors of the mean, ﬁrst, second, and third formant frequencies also appears to be the best feature-measures for ﬁnding the 5% most diﬃcult-to-distinguish speaker pairs, with the percent diﬀerence of the sum of formants and the absolute diﬀerence in LTAS local peak height being the next best. As with the 1% least similar speaker pairs, the 5% least similar show very consistent results across feature-measures, with reduced eﬀectiveness for the correlation of LPC frequency histograms and spectral slope.

Detection error tradeoﬀ (DET) curves are shown for example systems in Figures 4.5 and 4.6, using the Euclidean distance between vectors of the means of the ﬁrst, second and third formants, and the percent diﬀerence of the median energy, respectively. Although the system in Figure 4.6 has good separation among the diﬀerent DET curves, there is more overlap in the DET curves of Figure 4.5. Furthermore, Figure 4.5 reveals an asymmetry in behavior for dissimilar and similar speaker pairs, showing that the performance on diﬃcult-to-distinguish speaker pairs is closer to performance on all speaker pairs. While this asymmetry does not exist for all systems and all sets of selected speaker pairs (as evidenced by Figure 4.6), the trend does hold in most cases.

Given that I am using at most a few coarsely calculated features, it is impressive to see the diﬀerences in performance that can be obtained using these measures to select easy- or diﬃcult-to-distinguish speaker pairs. It is worth noting that a large reason for such success is due to the information gained by the relative ranking of speaker pairs. As a single, standalone number, a feature-measure may not have much use. However, when taken in the context of a group of feature-measures corresponding to a set of speaker pairs, the absolute values of the feature-measures no longer matter; instead, the gain lies in being able to order a set of speaker pairs from least to most similar.

While the results presented thus far are indeed promising, the diﬀerences in performance for similar speaker pairs (relative to all speaker pairs) still have potential to increase further.

Accordingly, I test a measure that utilizes Gaussian mixture models, with the motivation that GMMs may better predict speaker recognition system performance, given that many systems utilize cepstral feature-trained GMMs. Using SRI’s tools for training GMMs for speaker recognition [37], I trained speaker-speciﬁc GMMs via maximum a posteriori (MAP) adaptation from a universal background model trained on Fisher data. The input features were 12th order MFCCs plus energy, with deltas and double-deltas, and the models used 1024 Gaussians. For each unique pair of speaker-speciﬁc GMMs, an approximation to the Kullback-Leibler (KL) divergence (based on the unscented transform [26]) was used to measure similarity. Results are shown in Figure 4.7.

Compared to previous feature-measures, the KL divergence is indeed more eﬀective at ﬁnding diﬃcult- and easy-to-distinguish speaker pairs. DET curves for an example system are shown in Figure 4.8. Again, relative to performance on all speaker pairs, there is a larger performance gap for dissimilar speaker pairs than for similar speaker pairs.

Returning to the groups of speaker pairs selected by the KL divergence approximation for GMMs, I more closely examine the 1%, 3%, 5%, 10%, and 20% most and least similar speaker

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 66

–  –  –

Figure 4.5: DET curves for an illustrative speaker recognition system, using the Euclidean distance between vectors of the mean ﬁrst, second, and third formant frequencies for speaker pair selection.

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 67

–  –  –

Figure 4.6: DET curves for an illustrative speaker recognition system, using the percent diﬀerence of median energy for speaker pair selection.

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 68

–  –  –

Figure 4.8: DET curves for an illustrative speaker recognition system, using the approximated KL divergence between speaker-speciﬁc GMMs to select speaker pairs.

## CHAPTER 4. PREDICTING DIFFICULT-TO-DISTINGUISH SPEAKER PAIRS 70

pairs. Overall, there are 150 speakers, with 87 female and 63 male, for which there are 1815 same-sex impostor speaker pairs with impostor trials in the SRE08 short2-short3 task. For the groups of speaker pairs with larger values for KL divergence, that is, those speaker pairs that are expected to be easier for systems to distinguish, the majority are male (close to 75% on average). The opposite tendency holds to a lesser extent for more similar pairs tending to be female, although the groups with the lowest 1% and 3% of KL divergence values still have more male speaker pairs. These results suggest that there is a greater range of diﬀerences among male speakers, so that there are likely to be more dissimilar male speaker pairs.

Furthermore, examining the number of times a particular speaker appears in a group

of similar or dissimilar speaker pairs, we note that there tend to be two types of speakers:

those who appear frequently as members of diﬃcult-to-distinguish speaker pairs, and those who occur frequently as members of easy-to-distinguish speaker pairs. In fact, there are 15 speakers (1 male, 14 female) that never appear in the most-dissimilar groups, and 24 speakers (10 male, 14 female) that never appear in the most-similar groups. Such a result is consistent with the existence of wolves and lambs, that is, the tendencies of a speaker to cause false alarm errors.

4.3 Discussion In summary, the results of this investigation demonstrate that it is possible to predict which speaker pairs will be diﬃcult for a typical speaker recognition system to distinguish.

Both diﬃcult- and easy-to-distinguish speaker pairs can be selected using a measure of similarity calculated from features like pitch, energy, or spectral slope. For the features considered here, using the Euclidean distance between vectors of mean ﬁrst, second, and third formant frequencies produces the largest diﬀerence in performance for similar and dissimilar speaker pairs. An even more successful measure is the KL divergence calculated between speaker-speciﬁc GMMs. Overall, the degree of success is higher for selecting dissimilar speaker pairs than it is for selecting similar speaker pairs, possibly because similarity in a single characteristic is not necessarily suﬃcient to identify a diﬃcult-to-distinguish speaker pair. Although the feature-measures cannot match the eﬀectiveness of ﬁnding diﬃcult-todistinguish speaker pairs by actually selecting such pairs using results for a given system, they still provide potentially useful information about speakers. In particular, one may be able to determine an overall tendency of a speaker to be similar or dissimilar to other speakers. Additionally, being able to rank a set of speaker pairs can be quite informative.

In the next chapter, I build upon this approach by using a set of feature statistics in order to detect diﬃcult speakers. I consider the task of ﬁnding diﬃcult target speakers, who are prone to causing false rejection errors, separately from the task of ﬁnding diﬃcult impostor speakers, who are prone to causing false alarms. Speciﬁcally, I train support vector machine (SVM) classiﬁers using examples of the most and least diﬃcult target and impostor speakers.

Chapter 5 Detecting Diﬃcult Speakers It has been observed that simple feature statistics can be used to provide measures of similarity between speakers. Up to this point, I have used these feature statistics individually.

Now, I investigate one method for using them jointly in order to make a prediction about whether a speaker will be diﬃcult, either as a true speaker or an impostor speaker. In particular, I train a support vector machine (SVM) to distinguish between examples of the speakers who cause the most and fewest errors, corresponding to the most and least diﬃcult speakers, respectively. Since speaker behavior is diﬀerent for target and impostor speakers, I train separate SVMs for detecting diﬃcult true speakers (who will cause false rejections) and diﬃcult impostor speakers (who will cause false alarms).

I begin by discussing the data set that will be used for these experiments in Section 5.1.

Section 5.2 describes the selection of feature statistics used as input to the SVMs.

Details of SVM training are covered in Section 5.3, including the method for determining the diﬃcult and easy speakers to use for training. The results of experiments are given in Section 5.4, and Section 5.5 concludes with a discussion of lessons learned.

5.1 Data Set for SVM Experiments For this approach, I need to ﬁnd speakers who cause very many or very few errors (of either the false rejection or false alarm type). Accordingly, these speakers need to have enough true speaker and impostor trials available for us to make a reliable decision about these error tendencies. This is especially an issue for the true speaker errors, given the limited number of target trials that are available.

In order to maximize the number of true speaker trials, as well as have a reasonable number of impostor trials, I use the same set of SRE08 data that I used for the analysis of 3.2. In particular, I take selected conversation sides from the SRE08 short2 and short3 train and test conditions, which correspond to roughly 2.5-3 minutes of speech per sample. I choose conversation sides from all speakers with at least 5 available speech utterances. Some

## CHAPTER 5. DETECTING DIFFICULT SPEAKERS 72

conversation sides were recorded on multiple channels (telephone and microphones, or just microphones). In these cases, I select only one instance of that conversation side, in order to prevent the introduction of confounding factors due to having the same lexical content across diﬀerent speech samples. There are 416 speakers (256 female, 160 male), with 3049 conversation sides, and a total of 22,210 target trials. For each impostor speaker pair, ﬁve impostor trials are chosen (along with the corresponding trials that have the train and test data switched), for a total of 453,600 impostor trials.

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 12 |

Similar works:

«Neural dynamics in cortical populations Marius Pachitariu Dissertation submitted for the degree of Doctor of Philosophy of University College London Gatsby Computational Neuroscience Unit University College London 2014 Declaration I, Marius Pachitariu, declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualiﬁcation except...»

«DIGESTING THE THIRD: RECONFIGURING BINARIES IN SHAKESPEARE AND EARLY MODERN THOUGHT by Rob Carson A thesis submitted in conformity with the requirements for the degree of doctor of Philosophy Graduate Department of English University of Toronto © Copyright by Rob Carson 2009 Dissertation Abstract “Digesting the Third: Reconfiguring Binaries in Shakespeare and Early Modern Thought” Rob Carson (PhD 2009) Department of English University of Toronto My argument assesses and reconfigures binary...»

«2016 Show Character Breakdown January 9, 2016 Auditions ANYTHING GOES RENO SWEENEY – Reno is a sexy and charismatic nightclub singer, formerly an evangelist. She is confident, clever, philosophical, funny, persuasive and extremely comfortable with herself. She is the consummate performer and a show stopper with a great belting voice. The actor must have great comedic timing and be able to command the stage. The playing age range (this is the age she appears to be on stage) can be 30-55 years...»

«Legal Status, Education, and Latino Youths’ Transition to Adulthood A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Gemma Punti IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Dr. Kendall King, Adviser June 2013 © Gemma Punti 2013 Acknowledgements First, I would like to thank the six participants, as well as their families and friends who welcomed me into their communities. They opened their homes and allowed...»

«Photomechanical Response of Molecular Nanostructures by Ivan Viktorovich Pechenezhskiy A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Physics in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael F. Crommie, Chair Professor Feng Wang Professor Gerard Marriott Spring 2014 Photomechanical Response of Molecular Nanostructures Copyright 2014 by Ivan Viktorovich Pechenezhskiy Abstract...»

«JETS 57/4 (2014) 703–13 RECONSIDERING THE ROLE OF DECEPTION IN SOLOMON’S ASCENT TO THE THRONE MATTHEW NEWKIRK* I. INTRODUCTION 1. Defining deception. The study of deception in the Bible has become increasingly popular in recent years.1 Simply defined, “to deceive” means intentionally to cause someone to believe something one knows to be false. Deception is thus distinct from, though related to, lying. “To lie” means to communicate intentionally and explicitly that something is true...»

«RVP Newsletter — Winter 2012/Spring 2013 Message from the President: The work of the Council for Research in Values and Philosophy (RVP) has been advancing rapidly with the needs of the times. You will find greater detail on the following pages, but below is a summary of some key RVP efforts and their themes. We find ourselves at a point of major transition from a world order based upon separate and competing nations to a global order in which modes of cooperation become newly possible and...»

«Body and Soul in Ancient Philosophy Edited by Dorothea Frede and Burkhard Reis Walter de Gruyter · Berlin · New York Contents Introduction....................................... 1 I. Presocratics Carl Huffman The Pythagorean conception of the soul from Pythagoras to Philolaus.......................................... 21 Christian Schäfer Das Pythagorasfragment des Xenophanes und die Frage nach der Kritik der...»

«Quantum Solids of Two Dimensional Electrons in Magnetic Fields Yong P. Chen A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department of Electrical Engineering November 2005 c Copyright by Yong P. Chen, 2006. All Rights Reserved I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy....»

«ABSTRACT Title of Dissertation: PASSIVE SCALAR DISPERSION IN A TURBULENT MIXING LAYER Ning Li, Doctor of Philosophy, 2004 Dissertation directed by: Professor James M. Wallace Department of Mechanical Engineering Experimental and numerical studies of spatially developed turbulent mixing layers with passive scalar concentrations was performed. In the experiment, a mixing layer was created by an S-shaped splitter plate in a wind tunnel, with a velocity ratio of 2:1. A concentration ﬁeld was...»

«TEXTUAL PROJECTIONS: THE EMERGENCE OF A POSTCOLONIAL AMERICAN GOTHIC By ROBERT C. SCHACHEL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2006 Copyright 2006 by Robert C. Schachel To my father ACKNOWLEDGMENTS I thank my committee chair and mentor, Dr. David Leverenz, for all of his encouragement, support, insight, and academic guidance over the past eleven years....»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.