«A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering ...»
6.3 Contributions and Future Work The analysis showing the ways in which system scores depend on the speakers built upon and added to prior error analysis work. Considering two data sets, with diﬀering degrees of channel and other extrinsic variability, along with two types of speaker recognition systems, I found that in both cases, speaker-dependent behavior is observed. I also noted diﬀerences between female and male speakers: there tend to be more confusable female impostor speaker pairs, perhaps due to the more limited range of certain acoustic characteristics, such as fundamental frequency, for female speech. Additionally, not only are there diﬀerences in tendencies for certain speakers to cause errors, there is also variability at lower levels, across diﬀerent conversation sides of the same speaker. Furthermore, the tendency to produce false alarms as the target speaker is correlated with the tendency to produces false alarms as the impostor speaker.
Given such observations, I was then able to successfully predict diﬃcult-to-distinguish impostor speaker pairs through the use of distance measures calculated with statistics of features such as fundamental frequency, formant frequencies, energy, and spectral slope. In addition to considering feature-measures that can give relative rankings of similarity between a pair of speakers, I also generalized the approach to simply detect a diﬃcult individual speaker. Distinguishing between diﬃcult target speakers and diﬃcult impostor speakers, I trained SVMs using examples of the easiest and most diﬃcult speakers in terms of causing errors. Both of these are novel approaches that can be used to address the eﬀects of inherent speaker characteristics on automatic speaker recognition systems. Further exploration of this problem may yield better feature statistics or other improved approaches for ﬁnding
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 84diﬃcult speakers. Additionally, it may be possible to adapt this technique in order to detect particular conversation sides of a given speaker that will produce errors.
Bibliography  Andre G. Adami, Radu Mihaescu, Douglas A. Reynolds, and John J. Godfrey. Modeling prosodic dynamics for speaker recognition. In Proceedings of ICASSP, 2003.
 Walter D. Andrews, Mary A. Kohler, and Joseph P. Campbell. Phonetic speaker recognition. In Proceedings of Eurospeech, 2001.
 Bishnu S. Atal. Eﬀectiveness of linear prediction characteristics of the speech wave for automatic speaker identiﬁcation and veriﬁcation. Journal of the Acoustical Society of America, 55(6):1304–1312, 1974.
 Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas. Score normalization for text-independent speaker veriﬁcation systems. In Digital Signal Processing, volume 10, pages 42–54, 2000.
 Rainer Banse and Klaus R. Scherer. Acoustic proﬁles in vocal emotion expression.
Journal of Personality and Social Psychology, 70(3):614–636, 1996.
 Koﬁ Boakye. Speaker recognition in the text-independent domain using keyword hidden markov models. Master’s thesis, University of California at Berkeley, 2005.
 Paul Boersma and David Weenink. Praat: doing phonetics by computer (version 18.104.22.168). http://www.praat.org.
 Jean-Fran¸ois Bonastre, Driss Matrouf, and Corinne Fredouille. Transfer function-based c voice transformation for speaker recognition. In Proceedings of Odyssey, 2006.
 Jean-Fran¸ois Bonastre, Nicolas Scheﬀer, Driss Matrouf, Corinne Fredouille, Anthony c Larcher, Alexandre Preti, Gilles Pouchoulin, Nicholas Evans, Benoit Fauve, and John Mason. ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition.
In Proceedings of Odyssey, 2008.
 Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
 William M. Campbell. Generalized linear discriminant sequence kernels for speaker recognition. In Proceedings of ICASSP, May 2002.
 William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, Douglas A. Jones, and Timothy R. Leek. Phonetic speaker recognition with support vector machines. In Advances in Neural Information Processing Systems 16, 2004.
 William M. Campbell, Douglas E. Sturim, and Douglas A. Reynolds. Support vector machines using GMM supervectors for speaker veriﬁcation. IEEE Signal Processing Letters, 13(5):308–311, May 2006.
 Christopher Cieri, Walt Andrews, Joseph Campbell, George Doddington, John Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, and Kevin Walker. The mixer and transcript reading corpora: Resources for multilingual, crosschannel speaker recognition research. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC), pages 117–120, 2006.
 Christopher Cieri, Linda Corson, David Graﬀ, and Kevin Walker. Resources for new research directions in speaker recognition: The mixer 3, 4 and 5 corpora. In Proceedings of Interspeech, 2007.
 Christopher Cieri, David Miller, and Kevin Walker. The Fisher corpus: a resource for the next generations of speech to text. In 4th International Conferenced on Language Resources and Evaluation, LREC, pages 69–71, 2004.
 Brian R. Cliﬀord. Voice identiﬁcation by human listeners: On earwitness reliability.
Law and Human Behavior, 4(4):373–394, 1980.
 Volker Dellwo, Mark Huckvale, and Michael Ashby. How is individuality expressed in voice? an introduction to speech production & description for speaker classiﬁcation.
In Christian M¨ller, editor, Speaker Classiﬁcation, volume 4343 of Lecture Notes in u Computer Science / Artiﬁcial Intelligence. Springer, Heidelberg - Berlin - New York, 2007.
 George Doddington, Walter Liggett, Alvin Martin, Mark Przybocki, and Douglas Reynolds. SHEEP, GOATS, LAMBS and WOLVES: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In Proceedings of ICSLP, 1998.
 W. Endres, W. Bambach, and G. Flosser. Voice spectrograms as a function of age, voice disguise, and voice imitation. Journal of the Acoustical Society of America, 49(6):1842– 1848, 1971.
 Anders Eriksson and Par Wretling. How ﬂexible is the human voice? - a case study of mimicry. presented at European Conference Speech Technology, Rhodes, 1997.
 Mireia Farr´s, Javier Hernando, and Pascual Ejarque. Jitter and shimmer measurements u for speaker recognition. In Proceedings of Interspeech, 2007.
 Jacob Goldberger and Hagai Aronowitz. A distance measure between gmms based on the unscented transform and its application to speaker recognition. In Proceedings of Eurospeech, 2005.
 Craig Greenberg, Alvin Martin, Linda Brandschain, Joseph Campbell, Christopher Cieri, George Doddington, and John Godfrey. Human assisted speaker recognition in NIST SRE10. In Proceedings of Odyssey, 2010.
 Andrew Hatch and Andreas Stolcke. Generalized linear kernels for one-versus-all classiﬁcation: Application to speaker recognition. In Proceedings of ICASSP, 2006.
 Andrew O. Hatch, Barbara Peskin, and Andreas Stolcke. Improved phonetic speaker recognition using lattice decoding. In Proceedings of ICASSP, 2005.
 Qin Jin, Jiri Navratil, Douglas A. Reynolds, Joseph P. Campbell, Walter D. Andrews, and Joy S. Abramson. Combining cross-stream and time dimensions in phonetic speaker recognition. In Proceedings of ICASSP, 2003.
 Qin Jin and Alex Waibel. A naive de-lambing method for speaker identiﬁcation. In Proceedings of ICSLP, 2000.
 Thorsten Joachims. Making large-scale support vector machine learning practical. In Bernhard Schlkopf, Chris Burges, and Alex J. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.
 Sachin Kajarekar, Luciana Ferrer, Kemal Sonmez, Jing Zheng, Elizabeth Shriberg, and Andreas Stolcke. Modeling NERFs for speaker recognition. In Proceedings of Odyssey, 2004.
 Sachin S. Kajarekar, Harry Bratt, Elizabeth Shriberg, and Rafael de Leon. A study of intentional voice modiﬁcations for evading automatic speaker recognition. In Proceedings of Odyssey, 2006.
 Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg, Kemal Sonmez, Andreas Stolcke, Anand Venkataraman, and Jing Zheng. SRI’s 2004 NIST speaker recognition evaluation system. In Proceedings of ICASSP, volume 1, pages 173–176, 2005.
 Patrick Kenny, Pierre Ouellet, Najim Dehak, Vishwa Gupta, and Pierre Dumouchel. A study of interspeaker variability in speaker veriﬁcation. Audio, Speech, and Language Processing, IEEE Transactions on, 16(5):980 –988, july 2008.
 David Klusacek, Jiri Navratil, D.A. Reynolds, and J.P. Campbell. Conditional pronunciation modeling in speaker detection. In Proceedings of ICASSP, 2003.
 Jody Kreiman and George Papcun. Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10:265–275, 1991.
 Hisao Kuwabara and Yoshinori Sagisaka. Acoustic characteristics of speaker individuality: Control and conversion. Speech Communication, 16:165–173, 1995.
 Peter Ladefoged. A Course in Phonetics. Thomson Wadsworth, University of California, Los Angeles, ﬁfth edition, 2006.
 Howard Lei and Nikki Mirghafori. Word-conditioned phone n-grams for speaker recognition. In Proceedings of ICASSP, 2007.
 Kung-Pu Li and Jack E. Porter. Normalizations and selection of speech segments for speaker recognition scoring. In Proceedings of ICASSP, pages 595–598, 1988.
 Linguistic Data Consortium. Switchboard-1 corpus. http://www.ldc.upenn.edu.
 Linguistic Data Consortium. Switchboard-2 corpus. http://www.ldc.upenn.edu.
 Kirsty McDougall and Francis Nolan. Discrimination of speakers using the formant dynamics of /u:/ in british english. In J. Trouvain and W. Barry, editors, Proceedings of ICPhS, pages 1825–1828, 2007.
 National Institute of Standards and Technology. The NIST year 2004 speaker recognition evaluation plan. http://www.nist.gov/speech/tests/spk/2004/SRE-04 evalplanv1a.pdf, 2004.
 National Institute of Standards and Technology. The NIST year 2005 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2005/sre-05 evalplanv6.pdf, 2004.
 National Institute of Standards and Technology. The NIST year 2006 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2006/sre-06 evalplanv9.pdf, 2004.
 Jiri Navratil, Qin Jin, Walter Andrews, and Joseph Campbell. Phonetic speaker recognition using maximum likelihood binary decision tree models. In Proceedings of ICASSP, 2003.
 Burhan F. Necio˘lu, Mark A. Clements, and Thomas P. Barnwell III. Objectively g measured descriptors applied to speaker characterization. In Proceedings of ICASSP, 1996.
 Douglas O’Shaughnessy. Speech communications: human and machine. Institute of Electrical and Electronics Engineers, 1999.
 Barbara Peskin, Jiri Navratil, Joy Abramson, Douglas Jones, David Klusacek, Douglas A. Reynolds, and Bing Xiang. Using prosodic and conversational features for highperformance speaker recognition: Report from JHU WS’02. In Proceedings of ICASSP, 2003.
 Norman Poh, Samy Bengio, and Arun Ross. Revisiting Doddington’s zoo: A systematic method to assess user-dependent variabilities. In Proceedings of Multimodal User Authentication, 2006.
 Douglas A. Reynolds. Automatic speaker recognition using gaussian mixture speaker models. The Lincoln Laboratory Journal, 8(2):173–192, 1995.
 Douglas A. Reynolds. The eﬀect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. In Proceedings of ICASSP, volume 1, pages 113–116, 1996.
 Douglas A. Reynolds. Channel robust speaker veriﬁcation via feature mapping. In Proceedings of ICASSP, 2003.
 Douglas A. Reynolds, Thomas Quatieri, and Robert Dunn. Speaker veriﬁcation using adapted gaussian mixture models. Digital Signal Processing, 10:19–41, 2000.
 Astrid Schmidt-Nielsen and Thomas H. Crystal. Speaker veriﬁcation by human listeners:
Experiments comparing human and machine performance using NIST 1998 speaker evaluation data. Digital Signal Processing, 10:249–266, 2000.
 Kare Sjolander. The snack sound toolkit. http://www.speech.kth.se/snack/, 2004.
 Alex Solomonoﬀ, William M. Campbell, and Ian Boardman. Advances in channel compensation for SVM speaker recognition. In Proceedings of ICASSP, 2005.
 Andreas Stolcke, Luciana Ferrer, and Sachin Kajarekar. Improvements in MLLRTransform-based speaker recognition. In IEEE Odyssey Speaker and Language Recognition Workshop, 2006.
 Andreas Stolcke, Luciana Ferrer, Sachin Kajarekar, Elizabeth Shriberg, and Anand Venkataraman. MLLR transforms as features in speaker recognition. In Proceedings of Eurospeech, pages 2425–2428, 2005.
 Remco Teunen, Ben Shahshahani, and Larry Heck. A model-based transformational approach to robust speaker recognition. In Proceedings of ICSLP, 2000.
 Steve J. Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, and Phil