WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 12 |

«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 1 ] --

Finding Difficult Speakers in Automatic Speaker Recognition

by

Lara Lynn Stoll

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Engineering - Electrical Engineering and Computer Sciences

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Nelson Morgan, Co-chair

Dr. N. Nikki Mirghafori, Co-chair

Professor Michael Jordan

Professor John J. Ohala Fall 2011 Finding Difficult Speakers in Automatic Speaker Recognition Copyright 2011 by Lara Lynn Stoll Abstract Finding Difficult Speakers in Automatic Speaker Recognition by Lara Lynn Stoll Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences University of California, Berkeley Professor Nelson Morgan, Co-chair Dr. N. Nikki Mirghafori, Co-chair The task of automatic speaker recognition, wherein a system verifies or determines a speaker’s identity using a sample of speech, has been studied for a few decades. In that time, a great deal of progress has been made in improving the accuracy of the system’s decisions, through the use of more successful machine learning algorithms, and the application of channel compensation techniques and other methodologies aimed at addressing sources of errors such as noise or data mismatch. In general, errors can be expected to have one or more causes, involving both intrinsic and extrinsic factors. Extrinsic factors correspond to external influences, including reverberation, noise, and channel or microphone effects. Intrinsic factors relate inherently to the speaker himself, and include sex, age, dialect, accent, emotion, speaking style, and other voice characteristics. This dissertation focuses on the relatively unexplored issue of dependence of system errors on intrinsic speaker characteristics. In particular, I investigate the phenomenon that some speakers within a given population have a tendency to cause a large proportion of errors, and explore ways of finding such speakers.

There are two main components to this thesis. First, I establish the dependence of system performance on speaker characteristics, building upon and expanding previous work demonstrating the existence of speakers with tendencies to cause false alarm or false rejection errors. To this end, I explore two different data sets: one that is an older collection of telephone channel conversational speech, and one that is a more recent collection of conversational speech recorded on a variety of channels, including the telephone, as well as various types of microphones. Furthermore, in addition to considering a traditional speaker recognition system approach, for the second data set I utilize the outputs of a more contemporary approach that is better able to handle variations in channel. The results of such analysis repeatedly show variations in behavior across speakers, both for true speaker and impostor speaker cases. Variation occurs both at the level of speech utterances, wherein a given speaker’s performance can depend on which of his speech utterances is used, as well as on the speaker level, wherein some speakers have overall tendencies to cause false rejection or false alarm errors. Additionally, lamb-ish speaker behavior (where the speaker tends to produce false alarms as the target) is correlated with wolf-ish behavior (where the speaker tends to produce false alarms as the impostor). On the more recent data set, 50% of the false rejection and false alarm errors are caused by only 15-25% of the speakers.

The second component of this thesis investigates a straightforward approach to predict speakers that will be difficult for a system to correctly recognize. I use a variety of features to calculate feature statistics that are then used to compute a measure of similarity between speaker pairs. By ranking these similarity measures for a set of impostor speaker pairs, I determine those speaker pairs that are easy for a system to distinguish and those that are difficult-to-distinguish. A variety of these simple distance measures could successfully select both easy- and difficult-to-distinguish speaker pairs, as evaluated by differences in detection cost and false alarm probability across a large number of systems. Of the performance measures tested, the best feature-measure at finding the most and least difficult-to-distinguish speaker pairs was the Euclidean distance between vectors of the mean first, second, and third formant frequencies. Even greater success was attained by the Kullback-Liebler (KL) divergence between pairs of speaker-specific GMMs. Furthermore, an examination of the smallest and biggest distances (as computed by the KL divergence) revealed individual speaker tendencies to consistently fall among the most (or least) difficult-to-distinguish speaker pairs.

I then develop an approach for finding those individual speakers who will be difficult for the system, using a set of feature statistics calculated over regions of speech. In particular, a support vector machine (SVM) classifier is trained to distinguish between difficult and easy speaker examples, in order to produce an overall measure of speaker difficulty as a target or impostor. The resulting precision and recall measures were over 0.8 for difficult impostor speaker detection, and over 0.7 for difficult target speaker detection. Depending on the application, the detection threshold can be tuned to improve precision, recall, or specificity in order to best suit the needs of a particular task. The same approach can be taken with single conversation sides, as with a set of conversation sides corresponding to the same speaker, since the input feature statistics can be calculated over any number of speech samples.





i

–  –  –

3.14 Highest impostor score for a target model versus the true speaker scores for that target model, for male speakers........................ 43

3.15 Highest impostor score for a target model versus the true speaker scores for that target model, for female speakers....................... 44

3.16 Average maximum impostor score versus number of test conversation sides, for male impostor speakers............................. 45

3.17 Average maximum impostor score versus number of test conversation sides, for female impostor speakers............................ 46

3.18 Box plots of target score distributions per speaker, for male speakers, using SRE08 data..................................... 49

3.19 Cumulative distribution of errors across female speakers, for false rejections, false acceptances as the target, and false acceptances as the impostor..... 51

3.20 Cumulative distribution of errors across male speakers, for false rejections, false acceptances as the target, and false acceptances as the impostor..... 52

–  –  –

5.1 Recall, precision, specificity, and F-measure values for detecting difficult impostor speakers using SVMs with different kernels (linear, second order polynomial [poly2], and third order polynomial [poly3]), with the [speech1] set of feature statistics as input, with or without rank normalization applied [rank,nonorm]. 77

5.2 Recall, precision, specificity, and F-measure values for detecting difficult impostor speakers using a linear kernel SVM trained with rank normalized feature statistics, comparing three different decision thresholds for difficult impostor speaker detection.................................. 77

5.3 Recall, precision, specificity, and F-measure values for detecting difficult impostor speakers using a linear kernel SVM trained with rank normalized feature statistics, comparing three sets of speech feature statistics, [speech1], [speech2], and [speech3].................................... 77

5.4 Recall, precision, specificity, and F-measure values for detecting difficult target speakers using SVMs with different kernels (linear, second order polynomial, and third order polynomial), with the [speech1] set of feature statistics as input, with or without rank normalization applied.................... 77

5.5 Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, comparing three different decision thresholds for difficult target speaker detection.............................. 78

5.6 Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, comparing three sets of speech feature statistics, [speech1], [speech2], and [speech3]......................... 78

5.7 Recall, precision, specificity, and F-measure values for detecting difficult target speakers using a third order polynomial kernel SVM trained with rank normalized feature statistics, using SVMs trained separately for female and male speakers, with either 20% or around 25% of speakers taken as difficult or easy examples.................................. 79 vii Acknowledgments Given the many years of my graduate career, there is a long list of people to thank. I begin with my adviser, Professor Nelson Morgan, who welcomed me into the speech group and gave me a research home at the International Computer Science Institute (ICSI). In addition to providing support throughout my academic career, Morgan was also instrumental in helping me find the last puzzle piece to fit in my dissertation work.

Next, I have to express my deep gratitude for my mentor, Nikki Mirghafori. It is hard to describe all the ways in which Nikki has positively influenced me. When I first started in the speaker recognition group in 2005, she provided excellent technical and professional guidance, helping me to gain understanding and confidence, improve my communication skills, and grow as a contributing member of the group. After an interlude without her at ICSI, Nikki returned in 2010 to once again lead the speaker recognition group, introducing a wonderful balance between research and personal concerns to our meetings, and helping me to learn how to better deal with stress, fatigue, and other distractions that arise in daily life. I am truly appreciative of Nikki’s encouragement and support, and it is reassuring to know that it will continue as I move on to the next challenge.

There are many other researchers to be thanked for helping me along the way. One particularly influential person in my thesis work was George Doddington, who was a wonderful source of ideas to try, and a most interesting person to work with. I must also thank Joe Frankel, with whom I collaborated on my Master’s work. Additional members of the speaker recognition community who have provided feedback and help throughout my career include Andreas Stolcke, Liz Shriberg, Sachin Kajarekar, Howard Lei, Andy Hatch, Christian M¨ller,u David van Leeuwen, Eduardo Lopez-Gonzalo, and Joaquin Gonzalez.

Of course, I must also mention some of the many students, post-docs, visitors, and staff at ICSI, who have helped make it the wonderful place that it is. Among these are Kofi Boakye, Marios Athineos, Dan Gillick, Arlo Faria, Oriol Vinyals, Jaeyoung Choi, David Imseng, Benoit Favre, Korbinian Riedhammer, Adam Janin, and Jacob Wolkenhauer.

I would be remiss if I did not acknowledge my friendly officemates throughout the years:

Madelaine Plauch´, Matthew Aylett, and Vijay Ullal. Special recognition goes to my ofe ficemates of the past several years, Mary Knox and Suman Ravuri, who are not only lovely work companions (and excellent contributors on Sporcle quizzes), but also dear friends.

Finally, I want to thank my family. My mom made me who I am, and I would not have been successful without her influence in my life. My dad has been truly supportive of me in every way imaginable, despite the fact that it often appeared that I might never finish. My sister and brother (and their spouses as well) have always been there for me, and are due to receive many a dinner in thanks once I finally have a job. Lastly, I thank my nieces, Lynn and Magnolia, for always reminding me of the simple joys in life.

Chapter 1 Introduction

1.1 Automatic Speaker Recognition The task of automatic speaker recognition, wherein a system verifies or determines a speaker’s identity using a sample of speech, has been studied for a few decades. In that time, a great deal of progress has been made in improving the accuracy of the system’s decisions, through the use of more successful machine learning algorithms, and the application of channel compensation techniques and other methodologies aimed at addressing sources of errors such as noise or data mismatch. This dissertation focuses on the relatively unexplored issue of dependence of system errors on speaker characteristics. In particular, I investigate the phenomenon that some speakers within a given population have a tendency to cause a large proportion of errors, and explore ways of finding such speakers.

There are a number of tasks that fall into the category of speaker recognition. My work uses the speaker verification paradigm, in which there is a hypothesized target speaker identity, with an associated training speech utterance, and the system must decide whether a given test utterance was spoken by the target speaker. In this case, there are two types of errors: false rejections, in which the true speaker is rejected as such, and false acceptances, in which an impostor speaker is accepted as the target speaker. In general, these errors can be expected to have one or more causes, involving both intrinsic and extrinsic factors. Extrinsic factors correspond to external influences, including reverberation, noise, and channel or microphone effects. Intrinsic factors relate inherently to the speaker himself, and include sex, age, dialect, accent, emotion, speaking style, and other voice characteristics. This dissertation analyzes errors in terms of intrinsic speaker attributes.

–  –  –



Pages:   || 2 | 3 | 4 | 5 |   ...   | 12 |


Similar works:

«Oracle Semantics Aquinas Hobor A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance By the Department of Computer Science Advisor: Andrew W. Appel November  Copyright c  Aquinas Hobor. All rights reserved. Abstract We define a Concurrent Separation Logic with first-class locks and threads for the C language, and prove its soundness in Coq with respect to a compilable operataional...»

«Developing a Methodology for appraising Building Integrated Low or Zero Carbon Technologies Yaseen Waseem A thesis submitted in fulfilment of the requirements for the degree of Master in Philosophy in Mechanical Engineering DEPARTMENT OF MECHANICAL AND AEROSPACE ENGINEERING University of Strathclyde Mechanical Engineering James Weir Building, 75 Montrose Street, Glasgow, UK, G1 1XJ Declaration of Authenticity ‘The thesis is the result of the author’s original research. It has been composed...»

«NEEDLESS TO SAY MY PROPOSAL WAS TURNED DOWN” THE EARLY DAYS OF COMMERCIAL CITATION INDEXING, AN „ERRORMAKING“(POPPER) ACTIVITY AND ITS REPERCUSSIONS TILL TODAY Terje Tüür-Fröhlich terje.tuur@jku.at Author address: Department of Philosophy and Philosophy of Science, Johannes Kepler University Linz/Austria Abstract: In today’s neoliberal audit cultures university rankings, quantitative evaluation of publications by JIF (Journal Impact Factor) or researchers by h-index (Hirsch-Index)...»

«Contextualizing Value: Market Stories in Mid-Victorian Periodicals By Emily Catherine Simmons A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of English University of Toronto © Copyright Emily Catherine Simmons 2011 Abstract Contextualizing Value: Market Stories in Mid-Victorian Periodicals Emily Catherine Simmons Doctor of Philosophy Graduate Department of English, University of Toronto Copyright 2011 This dissertation examines...»

«Journal of Jewish Thought & Philosophy �� (�0�4) ��7-�63 brill.com/jjtp “But I Will Tell of Their Deeds”: Retelling a Hasidic Tale about the Power of Storytelling Levi Cooper Faculty of Law, Bar-Ilan University levicoops@gmail.com Abstract A famous Hasidic tale that depicts the decline of mysticism in Hasidic circles also bespeaks the power of storytelling. This study tracks the metamorphosis of this classic tale over a century of its retelling by writers—including Martin...»

«Faculté de génie Département de génie civil BEHAVIOR OF CIRCULAR CONCRETE COLUMNS REINFORCED WITH FRP BARS AND STIRRUPS Comportement de colonnes circulaires en béton armé de barres et de cadres de PRF Thèse de doctorat Spécialité: génie civil Mohammad M. Zaki M. Afifi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Civil Engineering) Jury: Prof. Brahim BENMOKRANE (directeur de recherche) Prof. Amir FAM Prof. Amr EL RAGABY...»

«Article The Scepticism of Descartes’s Meditations James Thomas Laval théologique et philosophique, vol. 67, n° 2, 2011, p. 271-279.Pour citer cet article, utiliser l'information suivante : URI: http://id.erudit.org/iderudit/1007008ar DOI: 10.7202/1007008ar Note : les règles d'écriture des références bibliographiques peuvent varier selon les différents domaines du savoir. Ce document est protégé par la loi sur le droit d'auteur. L'utilisation des services d'Érudit (y compris la...»

«ACQUIRING STYLE: THE DEVELOPMENT OF DIALECT SHIFTING AMONG AFRICAN AMERICAN CHILDREN by Jennifer Renn A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Linguistics. Chapel Hill Approved by: Advisor: J. Michael Terry Advisor: Walt Wolfram Reader: Randall Hendrick Reader: Robert MacCallum Reader: David Mora-Marin © 2010 Jennifer Renn ALL RIGHTS RESERVED...»

«RELATIONSHIP FRAMEWORK IN SPORT MANAGEMENT: HOW RELATIONSHIP QUALITY AFFECTS SPORT CONSUMPTION BEHAVIORS By YU KYOUM KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1 © 2008 Yu Kyoum Kim 2 To my wife, Hyun-Ok 3 ACKNOWLEDGMENTS This dissertation benefited tremendously from my committee. I am truly honored that I have learned from the best committee...»

«MANIPULATION OF COLD ATOMS USING AN OPTICAL ONE-WAY BARRIER by TAO LI A DISSERTATION Presented to the Department of Physics and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy September 2008 ii “Manipulation of Cold Atoms Using an Optical One-Way Barrier,” a dissertation prepared by Tao Li in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Physics. This...»

«SCIENCE AS FICTION: Vilém Flusser’s Philosophy Gustavo Bernardo Krause UERJ – CNPq Abstract: This text comments the “philosophical fiction” of the Brazilian-Czech philosopher Vilém Flusser. It condenses one of the chapter of the book Vilém Flusser: an introduction, recently published in Portuguese. If René Descartes told us that thinking moves through doubts, Friedrich Nietzsche told us that truth is a multiplicity of metaphors, Hans Vaihinger told us that thinking moves through...»

«Restorative Justice Empowerment* Charles Barton** *Acknowledgements Versions of this paper have previously appeared in print as detailed below. The author acknowledges and thanks the relevant Editors for their permission to re-produce the article on the VOMA Web-Page: 1. The Australian Journal of Professional and Applied Ethics, vol. 2, no. 2, 2000.2. Just Peace?: Peace Making and Peace Building for the New Millennium. (Proceedings of a Conference held 24 – 28 April 2000, at Massey...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.