WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 10 | 11 ||

«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»

-- [ Page 12 ] --
vector machine (SVM) classifier to distinguish between difficult and easy speakers. Using scores from an automatic speaker recognition system, I ranked speakers according to the rate at which they caused false rejection and false alarm errors, taking the 20% of speakers with the most and least errors as difficult and easy training examples. Two separate SVMs were trained: one to detect difficult target speakers (who will cause false rejections) and one to detect difficult impostor speakers (who will cause false alarms). The resulting precision and recall measures were over 0.8 for difficult impostor speaker detection, and over 0.7 for difficult target speaker detection. Depending on the application, the detection threshold can be tuned to improve precision, recall, or specificity in order to best suit the needs of a particular task. At a 5% false alarm threshold, over 60% of difficult impostor speakers are found, and over 37% of difficult target speakers. These low recall rates (especially in the case of difficult target speakers) are indicative of the level of difficulty present in the task of finding error-prone speakers using a single conversation side. Nevertheless, the results are promising for a first attempt at such a task. The same approach can be taken with single conversation sides, as with a set of conversation sides corresponding to the same speaker, since the input feature statistics can be calculated over any number of speech samples.

6.3 Contributions and Future Work The analysis showing the ways in which system scores depend on the speakers built upon and added to prior error analysis work. Considering two data sets, with differing degrees of channel and other extrinsic variability, along with two types of speaker recognition systems, I found that in both cases, speaker-dependent behavior is observed. I also noted differences between female and male speakers: there tend to be more confusable female impostor speaker pairs, perhaps due to the more limited range of certain acoustic characteristics, such as fundamental frequency, for female speech. Additionally, not only are there differences in tendencies for certain speakers to cause errors, there is also variability at lower levels, across different conversation sides of the same speaker. Furthermore, the tendency to produce false alarms as the target speaker is correlated with the tendency to produces false alarms as the impostor speaker.

Given such observations, I was then able to successfully predict difficult-to-distinguish impostor speaker pairs through the use of distance measures calculated with statistics of features such as fundamental frequency, formant frequencies, energy, and spectral slope. In addition to considering feature-measures that can give relative rankings of similarity between a pair of speakers, I also generalized the approach to simply detect a difficult individual speaker. Distinguishing between difficult target speakers and difficult impostor speakers, I trained SVMs using examples of the easiest and most difficult speakers in terms of causing errors. Both of these are novel approaches that can be used to address the effects of inherent speaker characteristics on automatic speaker recognition systems. Further exploration of this problem may yield better feature statistics or other improved approaches for finding

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 84

difficult speakers. Additionally, it may be possible to adapt this technique in order to detect particular conversation sides of a given speaker that will produce errors.

Bibliography [1] Andre G. Adami, Radu Mihaescu, Douglas A. Reynolds, and John J. Godfrey. Modeling prosodic dynamics for speaker recognition. In Proceedings of ICASSP, 2003.

[2] Walter D. Andrews, Mary A. Kohler, and Joseph P. Campbell. Phonetic speaker recognition. In Proceedings of Eurospeech, 2001.

[3] Bishnu S. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55(6):1304–1312, 1974.

[4] Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas. Score normalization for text-independent speaker verification systems. In Digital Signal Processing, volume 10, pages 42–54, 2000.

[5] Rainer Banse and Klaus R. Scherer. Acoustic profiles in vocal emotion expression.

Journal of Personality and Social Psychology, 70(3):614–636, 1996.

[6] Kofi Boakye. Speaker recognition in the text-independent domain using keyword hidden markov models. Master’s thesis, University of California at Berkeley, 2005.

[7] Paul Boersma and David Weenink. Praat: doing phonetics by computer (version 5.0.3.0). http://www.praat.org.

[8] Jean-Fran¸ois Bonastre, Driss Matrouf, and Corinne Fredouille. Transfer function-based c voice transformation for speaker recognition. In Proceedings of Odyssey, 2006.

[9] Jean-Fran¸ois Bonastre, Nicolas Scheffer, Driss Matrouf, Corinne Fredouille, Anthony c Larcher, Alexandre Preti, Gilles Pouchoulin, Nicholas Evans, Benoit Fauve, and John Mason. ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition.

In Proceedings of Odyssey, 2008.

–  –  –

[11] Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition.

Data Mining and Knowledge Discovery, 2(2):121–167, 1998.





[12] William M. Campbell. Generalized linear discriminant sequence kernels for speaker recognition. In Proceedings of ICASSP, May 2002.

[13] William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, Douglas A. Jones, and Timothy R. Leek. Phonetic speaker recognition with support vector machines. In Advances in Neural Information Processing Systems 16, 2004.

[14] William M. Campbell, Douglas E. Sturim, and Douglas A. Reynolds. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5):308–311, May 2006.

[15] Christopher Cieri, Walt Andrews, Joseph Campbell, George Doddington, John Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, and Kevin Walker. The mixer and transcript reading corpora: Resources for multilingual, crosschannel speaker recognition research. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC), pages 117–120, 2006.

[16] Christopher Cieri, Linda Corson, David Graff, and Kevin Walker. Resources for new research directions in speaker recognition: The mixer 3, 4 and 5 corpora. In Proceedings of Interspeech, 2007.

[17] Christopher Cieri, David Miller, and Kevin Walker. The Fisher corpus: a resource for the next generations of speech to text. In 4th International Conferenced on Language Resources and Evaluation, LREC, pages 69–71, 2004.

[18] Brian R. Clifford. Voice identification by human listeners: On earwitness reliability.

Law and Human Behavior, 4(4):373–394, 1980.

–  –  –

[20] Volker Dellwo, Mark Huckvale, and Michael Ashby. How is individuality expressed in voice? an introduction to speech production & description for speaker classification.

In Christian M¨ller, editor, Speaker Classification, volume 4343 of Lecture Notes in u Computer Science / Artificial Intelligence. Springer, Heidelberg - Berlin - New York, 2007.

–  –  –

[22] George Doddington, Walter Liggett, Alvin Martin, Mark Przybocki, and Douglas Reynolds. SHEEP, GOATS, LAMBS and WOLVES: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In Proceedings of ICSLP, 1998.

[23] W. Endres, W. Bambach, and G. Flosser. Voice spectrograms as a function of age, voice disguise, and voice imitation. Journal of the Acoustical Society of America, 49(6):1842– 1848, 1971.

[24] Anders Eriksson and Par Wretling. How flexible is the human voice? - a case study of mimicry. presented at European Conference Speech Technology, Rhodes, 1997.

[25] Mireia Farr´s, Javier Hernando, and Pascual Ejarque. Jitter and shimmer measurements u for speaker recognition. In Proceedings of Interspeech, 2007.

[26] Jacob Goldberger and Hagai Aronowitz. A distance measure between gmms based on the unscented transform and its application to speaker recognition. In Proceedings of Eurospeech, 2005.

[27] Craig Greenberg, Alvin Martin, Linda Brandschain, Joseph Campbell, Christopher Cieri, George Doddington, and John Godfrey. Human assisted speaker recognition in NIST SRE10. In Proceedings of Odyssey, 2010.

[28] Andrew Hatch and Andreas Stolcke. Generalized linear kernels for one-versus-all classification: Application to speaker recognition. In Proceedings of ICASSP, 2006.

[29] Andrew O. Hatch, Barbara Peskin, and Andreas Stolcke. Improved phonetic speaker recognition using lattice decoding. In Proceedings of ICASSP, 2005.

[30] Qin Jin, Jiri Navratil, Douglas A. Reynolds, Joseph P. Campbell, Walter D. Andrews, and Joy S. Abramson. Combining cross-stream and time dimensions in phonetic speaker recognition. In Proceedings of ICASSP, 2003.

[31] Qin Jin and Alex Waibel. A naive de-lambing method for speaker identification. In Proceedings of ICSLP, 2000.

[32] Thorsten Joachims. Making large-scale support vector machine learning practical. In Bernhard Schlkopf, Chris Burges, and Alex J. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.

–  –  –

[35] Sachin Kajarekar, Luciana Ferrer, Kemal Sonmez, Jing Zheng, Elizabeth Shriberg, and Andreas Stolcke. Modeling NERFs for speaker recognition. In Proceedings of Odyssey, 2004.

[36] Sachin S. Kajarekar, Harry Bratt, Elizabeth Shriberg, and Rafael de Leon. A study of intentional voice modifications for evading automatic speaker recognition. In Proceedings of Odyssey, 2006.

[37] Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg, Kemal Sonmez, Andreas Stolcke, Anand Venkataraman, and Jing Zheng. SRI’s 2004 NIST speaker recognition evaluation system. In Proceedings of ICASSP, volume 1, pages 173–176, 2005.

[38] Patrick Kenny, Pierre Ouellet, Najim Dehak, Vishwa Gupta, and Pierre Dumouchel. A study of interspeaker variability in speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 16(5):980 –988, july 2008.

[39] David Klusacek, Jiri Navratil, D.A. Reynolds, and J.P. Campbell. Conditional pronunciation modeling in speaker detection. In Proceedings of ICASSP, 2003.

[40] Jody Kreiman and George Papcun. Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10:265–275, 1991.

[41] Hisao Kuwabara and Yoshinori Sagisaka. Acoustic characteristics of speaker individuality: Control and conversion. Speech Communication, 16:165–173, 1995.

[42] Peter Ladefoged. A Course in Phonetics. Thomson Wadsworth, University of California, Los Angeles, fifth edition, 2006.

[43] Howard Lei and Nikki Mirghafori. Word-conditioned phone n-grams for speaker recognition. In Proceedings of ICASSP, 2007.

[44] Kung-Pu Li and Jack E. Porter. Normalizations and selection of speech segments for speaker recognition scoring. In Proceedings of ICASSP, pages 595–598, 1988.

[45] Linguistic Data Consortium. Switchboard-1 corpus. http://www.ldc.upenn.edu.

[46] Linguistic Data Consortium. Switchboard-2 corpus. http://www.ldc.upenn.edu.

–  –  –

[49] Kirsty McDougall and Francis Nolan. Discrimination of speakers using the formant dynamics of /u:/ in british english. In J. Trouvain and W. Barry, editors, Proceedings of ICPhS, pages 1825–1828, 2007.

[50] National Institute of Standards and Technology. The NIST year 2004 speaker recognition evaluation plan. http://www.nist.gov/speech/tests/spk/2004/SRE-04 evalplanv1a.pdf, 2004.

[51] National Institute of Standards and Technology. The NIST year 2005 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2005/sre-05 evalplanv6.pdf, 2004.

[52] National Institute of Standards and Technology. The NIST year 2006 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2006/sre-06 evalplanv9.pdf, 2004.

<

–  –  –

[55] Jiri Navratil, Qin Jin, Walter Andrews, and Joseph Campbell. Phonetic speaker recognition using maximum likelihood binary decision tree models. In Proceedings of ICASSP, 2003.

[56] Burhan F. Necio˘lu, Mark A. Clements, and Thomas P. Barnwell III. Objectively g measured descriptors applied to speaker characterization. In Proceedings of ICASSP, 1996.

[57] Douglas O’Shaughnessy. Speech communications: human and machine. Institute of Electrical and Electronics Engineers, 1999.

–  –  –

[59] Barbara Peskin, Jiri Navratil, Joy Abramson, Douglas Jones, David Klusacek, Douglas A. Reynolds, and Bing Xiang. Using prosodic and conversational features for highperformance speaker recognition: Report from JHU WS’02. In Proceedings of ICASSP, 2003.

[60] Norman Poh, Samy Bengio, and Arun Ross. Revisiting Doddington’s zoo: A systematic method to assess user-dependent variabilities. In Proceedings of Multimodal User Authentication, 2006.

[61] Douglas A. Reynolds. Automatic speaker recognition using gaussian mixture speaker models. The Lincoln Laboratory Journal, 8(2):173–192, 1995.

[62] Douglas A. Reynolds. The effect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. In Proceedings of ICASSP, volume 1, pages 113–116, 1996.

[63] Douglas A. Reynolds. Channel robust speaker verification via feature mapping. In Proceedings of ICASSP, 2003.

[64] Douglas A. Reynolds, Thomas Quatieri, and Robert Dunn. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10:19–41, 2000.

[65] Astrid Schmidt-Nielsen and Thomas H. Crystal. Speaker verification by human listeners:

Experiments comparing human and machine performance using NIST 1998 speaker evaluation data. Digital Signal Processing, 10:249–266, 2000.

[66] Kare Sjolander. The snack sound toolkit. http://www.speech.kth.se/snack/, 2004.

[67] Alex Solomonoff, William M. Campbell, and Ian Boardman. Advances in channel compensation for SVM speaker recognition. In Proceedings of ICASSP, 2005.

[68] Andreas Stolcke, Luciana Ferrer, and Sachin Kajarekar. Improvements in MLLRTransform-based speaker recognition. In IEEE Odyssey Speaker and Language Recognition Workshop, 2006.

[69] Andreas Stolcke, Luciana Ferrer, Sachin Kajarekar, Elizabeth Shriberg, and Anand Venkataraman. MLLR transforms as features in speaker recognition. In Proceedings of Eurospeech, pages 2425–2428, 2005.

[70] Remco Teunen, Ben Shahshahani, and Larry Heck. A model-based transformational approach to robust speaker recognition. In Proceedings of ICSLP, 2000.

–  –  –

[72] Steve J. Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, and Phil

Pages:     | 1 |   ...   | 10 | 11 ||


Similar works:

«IRON FILE SYSTEMS by Vijayan Prabhakaran B.E. Computer Sciences (Regional Engineering College, Trichy, India) 2000 M.S. Computer Sciences (University of Wisconsin-Madison) 2003 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Sciences University of Wisconsin-Madison Committee in charge: Andrea C. Arpaci-Dusseau (Co-chair) Remzi H. Arpaci-Dusseau (Co-chair) David J. DeWitt Mary K. Vernon Mikko H. Lipasti ii iv v Abstract IRON...»

«ABSTRACT Title of dissertation: UNDERSTANDING DYNAMIC CAPABILITIES AT THE SUBUNIT LEVEL: OPERATIONAL FLEXIBILITY AND THE CRUCIAL ROLE OF ORGANIZATION DESIGN AND INFORMATION SHARING Sharyn Dawn Gardner, Doctor of Philosophy, 2004 Dissertation directed by: Professor Cynthia Stevens Department of Management and Organization Professor Samer Faraj Department of Decision and Information Technologies Organizations are currently facing increasingly dynamic environments that require fast action in...»

«Framing Narratives of Irony in Italo Svevo’s La coscienza di Zeno and Robert Musil’s Der Mann ohne Eigenschaften by Sorin Tomuța A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Comparative Literature University of Alberta © Sorin Tomuța, 2014 Tomuța ii Abstract This dissertation examines the use of irony in Italo Svevo’s La coscienza di Zeno and Robert Musil’s Der Mann ohne Eigenschaften. While both these novels can be said to be...»

«The Class of 2015 Doctor of Philosophy Degrees College of Engineering and Mines D r. D o u g l a s J. G o e r i n g, D e a n Katrina Eleanor Bennett ** Ph.D. Hydrology: Interdisciplinary Program M.S., University of Victoria, 2006; B.S., University of Victoria, 2000. Thesis: Changes in Extreme Hydroclimate Events in Interior Alaska Boreal Forest Watersheds Extreme hydroclimate events in the boreal forest of Interior Alaska have changed in the past and are projected to change in the future....»

«PLANT GENOME ENGINEERING WITH SEQUENCE-SPECIFIC NUCLEASES: METHODS FOR EDITING DNA IN WHOLE PLANTS A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA Nicholas J. Baltes IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Advisor: Daniel F. Voytas August 2014 © Nicholas J. Baltes, 2014 Acknowledgements I would like to express my sincere gratitude to everyone in the Voytas lab, both past and present. And I would like to specifically thank my...»

«THE MICRORHEOLOGY OF LIPID BILAYERS by TRISTAN HORMEL A DISSERTATION Presented to the Department of Physics and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2015 DISSERTATION APPROVAL PAGE Student: Tristan Hormel Title: The Microrheology of Lipid Bilayers This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of...»

«A NEW WAY TO THINK ABOUT PRESS FREEDOM: NETWORKED JOURNALISM AND A PUBLIC RIGHT TO HEAR IN AN AGE OF ―NEWSWARE‖ A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMMUNICATION AND THE COMMITTEE OF GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mike Ananny March 2011 © 2011 by Michael Joseph Ananny. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a...»

«Doctoral theses at NTNU, 2015:289 Doctoral thesis Stine Bruland Stine Bruland UNDERNEATH THE MARGOSA TREE Re-creating meaning in a Tamil family after war and migration ISBN 978-82-326-1234-5 (printed ver.) ISBN 978-82-326-1235-2 (electronic ver.) ISSN 1503-8181 Doctoral theses at NTNU, 2015:289 NTNU Norwegian University of Science and Technology Thesis for the degree of Philosophiae Doctor Faculty of Social Sciences and Technology Management Department of Social Anthropology Stine Bruland...»

«REASONS AGAINST BELIEF: A THEORY OF EPISTEMIC DEFEAT by Timothy D. Loughlin A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy Major: Philosophy Under the Supervision of Professor Albert Casullo Lincoln, Nebraska May, 2015 REASONS AGAINST BELIEF: A THEORY OF EPISTEMIC DEFEAT Timothy D. Loughlin, Ph.D. University of Nebraska, 2015 Adviser: Albert Casullo Despite its central...»

«STUDY OF COOLING PRODUCTION WITH A COMBINED POWER AND COOLING THERMODYNAMIC CYCLE By CHRISTOPHER MARTIN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004 Copyright 2004 by Christopher Martin ACKNOWLEDGMENTS I would like to express my appreciation to those people who supported this work and provided me with the encouragement to pursue it. First I would like to...»

«Network Extenality and Mechanism Design by Xiaoming Xu Department of Computer Science Duke University Date: Approved: Kamesh Munagala, Supervisor Pankaj Kumar Agarwal Vincent Conitzer Sasa Pekec Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University Abstract Network Extenality and Mechanism Design by Xiaoming Xu Department of Computer Science Duke University Date:...»

«Copyright 2003 by the American Society of Clinical Hypnosis American Journal of Clinical Hypnosis 46:2, October 2003 Eastern Meditative Techniques and Hypnosis: A New Synthesis Akira Otani University of Maryland Counseling Center In this article major ancient Buddhist meditation techniques, samatha, vipassana, Zen, and ton-len, will be described in reference to contemporary clinical hypnosis. In so doing, the Eastern healing framework out of which these techniques emerged is examined in...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.