WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 9 | 10 || 12 |

«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 11 ] --

Table 5.5 shows the tradeoff in recall, precision, specificity, and F-measure values observed when varying the threshold for making a difficult speaker decision, again considering threshold values of -0.

5, 0 (corresponding to the results of Table 5.4), and 0.5. These results are given for the SVM using a third order polynomial kernel, with rank normalized feature statistics.

Threshold Recall Precision Specificity F-measure -0.5 0.895 0.653 0.415 0.755 0 0.736 0.737 0.677 0.737 0.5 0.561 0.848 0.877 0.675

Table 5.5: Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, comparing three different decision thresholds for difficult target speaker detection.

In the difficult target speaker case, the drop in specificity (for a high recall threshold) and the drop in recall (for a high precision threshold) are larger than in the difficult impostor speaker case. In order to obtain close to 90% recall, the false alarm rate becomes almost 60%. Again considering the operating point for low false alarms, with 5% of the difficult speaker labels being incorrect (a threshold around 0.95), the average recall is 0.374, and the average precision is 0.922. Thus, in order to avoid incorrectly labeling difficult target speakers, almost two-thirds of the difficult target speakers will not be found. Such a low recall rate may not be sufficient in many applications. Given the difficult nature of the task, it nevertheless provides an initial starting point that may be improved upon in the future.

Next, Table 5.6 shows results for an SVM using a third order polynomial kernel and rank normalized input features, for the three sets of speech feature statistics.

Feature set Recall Precision Specificity F-measure speech1 0.736 0.737 0.677 0.737 speech2 0.747 0.749 0.694 0.748 speech3 0.753 0.746 0.686 0.749 Table 5.6: Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, comparing three sets of speech feature statistics, [speech1], [speech2], and [speech3].

As with the difficult impostor speaker detection task, adding feature statistics (mean and variance normalized MFCCs or formant frequencies g1-g4) does not change results by much, though there are some small improvements.

CHAPTER 5. DETECTING DIFFICULT SPEAKERS 79

To this point, my approach has treated male and female speakers together. However, male and female speakers may behave differently. In order to see if difficult target speaker detection improves when females and males are two different cases, female- and male-specific SVMs are trained. One disadvantage to this approach is that there are fewer easy and difficult speakers to use for training. I consider two sets of MLPs, one trained using the 20% most and least difficult speakers (female or male), and one trained using 25% most and least difficult speakers. Table 5.7 shows recall, precision, specificity, and F-measure values for male and female difficult target speaker detection. In each case, the number of easy and difficult speaker examples is given (note that there are 256 female speakers and 160 male speakers total). These results are all for SVMs using third order polynomial kernels, and rank normalized input feature statistics.

–  –  –

Table 5.7: Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, using SVMs trained separately for female and male speakers, with either 20% or around 25% of speakers taken as difficult or easy examples.

In both female and male cases, the results do not improve over treating both sexes together. Recall increases slightly, at the cost of lower precision and specificity. Furthermore, note that increasing the number of speakers used as difficult and easy examples does not improve results. Including more speakers also means that the speakers used for training are not necessarily the best examples of difficult (or easy) ones, which potentially counteracts any gain from having more training examples. Training separate female and male SVMs for finding difficult impostor speakers gave results similar to those observed here: there were no gains over using a sex-independent SVM, and increasing the number of training examples to 25% also failed to improve results compared to using the top and bottom 20%. Given more female and male speakers for training, an approach using separate female and male SVMs may yield improvements. However, for the data available here, it is better to maximize the training examples and use the same SVM to detect difficult female and male speakers.

–  –  –

and difficult speakers, for both target (true) speakers and impostor speakers. As input, I used a set of feature statistics calculated over speech regions, where the features include fundamental frequency, formant frequencies, energy, spectral slope, and MFCCs (both with and without mean and variance normalization).

Based on the results for the data set used here, this approach is more successful at finding difficult impostor speakers than difficult target speakers. One reason why finding difficult target speakers is more challenging than finding difficult impostors is that while there may be similar characteristics across difficult impostor speakers (which make them confusable with other speakers), the characteristics that make target speakers difficult may vary more from speaker to speaker. In both cases, however, recall and precision rates over 0.7 (or 0.8 in the case of difficult impostors) can be obtained. Furthermore, the threshold for picking a difficult speaker can be varied according to what errors are most important to minimize.





For a false alarm rate of 5%, over 60% of difficult impostor speakers will still be found, and 37% of difficult target speakers. Given the challenging nature of the task, these recall rates are not particularly high (especially in the case of target speakers). However, for certain applications, the loss in recall may still be worth the gain in precision and specificity. Given enough training examples of difficult and easy speakers, there may be gains from treating female and male speakers separately. With limited data, though, better results are obtained by using the combined set of training examples in one sex-independent SVM.

One advantage of using feature statistics as the input to the SVM is that the statistics can be calculated over an individual conversation side or a set of conversation sides for the given speaker. This allows difficult speaker detection to work for varying amounts of available data. In my approach here, each conversation side of the easy and difficult speakers is used separately, with no exploitation of having more than one conversation side per speaker.

One avenue for future exploration is to see how results change depending on the number of utterances used for each speaker. It may also be possible to find better feature statistics for detecting difficult speakers; the optimal feature statistics may be different for difficult target and impostor speakers, as well as for female and male speakers.

Another possible direction for future investigation is to see how well difficult conversation sides can be detected. The results of my error analysis, as well as the related work of Kahn et al. [34, 33], have shown that there can be particular conversation sides of a speaker that cause more errors than others. Being able to detect these “bad” utterances may provide very useful information for improving system performance.

Chapter 6 Conclusions and Future Work This focus of this dissertation was on the intrinsic, speaker-based factors that contribute to errors in automatic speaker recognition systems. Inspired by the well-known work of Doddington et al. [22], which both categorized speakers according to their tendencies to cause errors and demonstrated the existence of such speaker types, I aimed to further explore the phenomenon of speaker-dependent system performance. In particular, there are two main components of this exploration, which are reviewed in the following sections. Section 6.1 describes the analysis of speaker behavior for two data sets and two types of automatic speaker recognition systems, with which I both confirm and build upon previous results demonstrating that system performance depends on speaker characteristics. Having established that certain speakers are more likely to cause errors than others, I then discuss a simple approach for finding these difficult speakers in Section 6.2. Section 6.3 concludes with a discussion of contributions and possible future work.

6.1 Analysis of Speaker Behavior The aforementioned work of Doddington et al. analyzed errors only for female speakers, using data from the NIST 1998 Speaker Recognition Evaluation. In order to expand such analysis, I examined two data sets and two types of automatic speaker recognition systems, looking for speaker-dependent behaviors for both male and female speakers. The first data set was Switchboard-1, a corpus of conversational speech collected from the telephone. I further restricted this data to one type of telephone handset in order to limit the effects of extrinsic channel variability. Using scores from a GMM-UBM system, I began by considering a score confusion matrix for a set of 34 speakers with 10 conversation sides each. It was observed that the speakers varied both in how high their average true speaker scores were, as well as in how consistent the true speaker scores were across target-test pairs. There was also variability in how different target models of the same speaker behaved; for some speakers, scores were consistent across all models, while for others there was greater score variation.

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 82

Some impostor speaker pairs were more confusable than others, and some speakers had overall tendencies to have higher impostor scores.

Extending this analysis to include a large number of trials and speakers in Switchboard-1, I continued to show examples of varying speaker behavior, in terms of tendencies to have high or low target or impostor scores. For both female and male speakers, there was a correlation (around 0.6) between a tendency to cause high impostor scores as the target speaker and a tendency to cause high impostor scores as the test speaker.

For the Switchboard-1 data, I also investigated the possible effects of speaker sex, age, education level, and dialect area on system scores. Using analysis of variance (ANOVA) tests, I found significant differences between male and female score distributions. Significant differences were also found for score distributions with impostor speakers who have less than a five year age difference compared to impostor speakers with more than a five year age difference. The results for education level and dialect area were inconclusive. Based on such findings, I concluded that the most salient of these speaker demographics was sex, a result in line with other observations regarding differences in speaker recognition behavior between males and females.

For the second data set, I used a more recent collection of conversational and interview speech used in the 2008 NIST Speaker Recognition Evaluation (SRE08); this data contains much more channel variability, including not only landline and cellular telephone data, but also data from a variety of microphones. For this corpus, I used a GMM-UBM system with simplified factor analysis, in order to better handle the differences in channel. Once again, a variety of speaker-dependent system performance was observed, including tendencies to cause false alarm or false rejection errors. For both female and male speakers, 50% of the false rejection and false alarm errors were caused by only 15-25% of the speakers.

6.2 Difficult Speaker Detection My approach for finding difficult speakers began with a method for calculating measures of similarity between impostor speaker pairs. Using statistics of features such as energy, formant frequencies, fundamental frequency, and spectral slope, calculated over all speech, I successfully obtained a variety of simple distance measures that could successfully select both easy- and difficult-to-distinguish speaker pairs, as evaluated by differences in detection cost and false alarm probability across a large number of systems. Of the performance measures tested, the best feature-measure at finding the most and least difficult-to-distinguish speaker pairs was the Euclidean distance between vectors of the mean first, second, and third formant frequencies. Even greater success was attained by the Kullback-Liebler (KL) divergence between pairs of speaker-specific GMMs. Furthermore, an examination of the smallest and biggest distances (as computed by the KL divergence) revealed individual speaker tendencies to consistently fall among the most (or least) difficult-to-distinguish speaker pairs.

I then used a set of feature statistics calculated over speech regions to train a support

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 83



Pages:     | 1 |   ...   | 9 | 10 || 12 |


Similar works:

«HIGH RESOLUTION TIMING AND STYLE OF COSEISMIC DEFORMATION: PALEOSEISMIC STUDIES ON THE NORTHERN AND SOUTHERN SAN ANDREAS FAULT by ASHLEY REBECCA STREIG A DISSERTATION Presented to the Department of Geological Sciences and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2014 DISSERTATION APPROVAL PAGE Student: Ashley Rebecca Streig Title: High Resolution Timing and Style of Coseismic Deformation: Paleoseismic...»

«EXPLORING THE DIVERSITY OF GENTRIFICATION AND THE ROLE OF GENDER IN HONG KONG, 1986 TO 2006 By Minting Ye A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Geography –Doctor of Philosophy 2014 ABSTRACT EXPLORING THE DIVERSITY OF GENTRIFICATION AND THE ROLE OF GENDER IN HONG KONG, 1986 TO 2006 By Minting Ye Gentrification is restructuring the geography of cities all over the world (Clark, 2005; Hackworth and Smith, 2001; Lees,...»

«Designing Statistical Language Learners: Experiments on Noun Compounds Mark Lauer Department of Computing Macquarie University NSW 2109 Australia Submitted in Partial Ful llment of the Requirements of the Degree of Doctor of Philosophy December, 1995 Copyright c Mark Lauer, 1995 To Lesley Johnston, without whom nothing good can ever come. Abstract Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic...»

«NOTE TO USERS This reproduction is the best copy available. UMI° Creaturely Pleasures: The Representation of Animals in Early Modern Drama Yael Margalit August 2008 Department of English McGill University, Montreal A thesis submitted to McGill University in partial fulfillment of the degree of Doctor of Philosophy. © Yael Margalit 2008 1*1 Library and Archives Bibliotheque et Archives Canada Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue...»

«TRADITIONALIST APPROACHES TO SHARĪ‘AH REFORM: MAWLANA ASHRAF ‘ALI THĀNAWI’S FATWA ON WOMEN’S RIGHT TO DIVORCE by Fareeha Khan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Near Eastern Studies) in The University of Michigan Doctoral Committee: Professor Sherman A. Jackson, Co-Chair Professor Barbara D. Metcalf, Co-Chair Professor Alexander D. Knysh Professor Muhammad Qasim Zaman, Princeton University for Ibbs ii...»

«DNA Characterization with Solid-State Nanopores and Combined Carbon Nanotube across Solid-State Nanopore Sensors A dissertation presented by Dimitar Mihaylov Vlassarev to The Department of Physics in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Physics Harvard University Cambridge, Massachusetts May 2012 © 2012 – Dimitar Mihaylov Vlassarev All rights reserved. Advisor Author Professor Jene A. Golovchenko Dimitar Mihaylov Vlassarev DNA...»

«An Investigation of Consumers’ Moral Licensing Behavior by Nicole Robitaille A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Joseph L. Rotman School of Management University of Toronto © Copyright by Nicole Robitaille (2014) An Investigation of Consumers’ Moral Licensing Behavior Nicole Robitaille Doctor of Philosophy Joseph L. Rotman School of Management University of Toronto 2014 Abstract Research suggests that when individuals have done a...»

«Analysis of the Present Curriculum: The Kingdom of God, and Proposal for the Future Curriculum of the Presbyterian Church of Korea by Hyeok-Su Chae A Thesis submitted to the Faculty of Knox College and the Pastoral Department of the Toronto School of Theology In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Theology awarded by the University of St. Michael’s College © Copyright by Hyeok-Su Chae 2014 Analysis of The Present Curriculum: The Kingdom of God,...»

«DETERMINANTS OF CENTRAL BANK INDEPENDENCE IN DEVELOPING COUNTRIES: A TWO-LEVEL THEORY by Ana Carolina Garriga BA, International Relations, Universidad Católica de Córdoba, 1996 MA, Law of the European Union, Universidad Complutense de Madrid, 1999 Submitted to the Graduate Faculty of the College of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh UNIVERSITY OF PITTSBURGH COLLEGE OF ARTS AND SCIENCES This dissertation...»

«DATABASE SUPPORT FOR TOP-DOWN PROTEOMICS BY YONG-BIN KIM DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2010 Urbana, Illinois Doctoral Committee: Professor Geneva Belford, Chair Professor Neil L Kelleher Professor Jiawei Han Professor Chengxiang Zhai i Abstract Top-down proteomics is a revolutionary application for the identification and...»

«HYBRID MOBILE ROBOT SYSTEM: INTERCHANGING LOCOMOTION AND MANIPULATION by PINHAS BEN–TZVI A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Mechanical and Industrial Engineering University of Toronto © Copyright by Pinhas Ben-Tzvi, 2008 HYBRID MOBILE ROBOT SYSTEM: INTERCHANGING LOCOMOTION AND MANIPULATION Doctor of Philosophy Pinhas Ben-Tzvi Department of Mechanical and Industrial Engineering University of Toronto, 2008...»

«RELATIONSHIP FRAMEWORK IN SPORT MANAGEMENT: HOW RELATIONSHIP QUALITY AFFECTS SPORT CONSUMPTION BEHAVIORS By YU KYOUM KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1 © 2008 Yu Kyoum Kim 2 To my wife, Hyun-Ok 3 ACKNOWLEDGMENTS This dissertation benefited tremendously from my committee. I am truly honored that I have learned from the best committee...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.