«Louise Whiteley Dissertation submitted for the degree of doctor of philosophy of the university of london Gatsby computational neuroscience unit ...»
Uncertainty, Reward, and
Attention in the Bayesian Brain.
Dissertation submitted for the degree of
doctor of philosophy
university of london
Gatsby computational neuroscience unit
university college london
I, Louise Emma Whiteley, conﬁrm that the work presented in this
thesis is my own. Where information has been derived from other
sources, I conﬁrm that this has been indicated in the thesis.
17th September 2008
The ‘Bayesian Coding Hypothesis’ formalises the classic Helmholtzian pic- ture of perception as inverse inference, stating that the brain uses Bayes’ rule to compute posterior belief distributions over states of the world. There is much behavioural evidence that human observers can behave Bayes- optimally, and there is theoretical work that shows how populations of neu- rons might perform the underlying computations. There are, however, many remaining questions, three of which are addressed in this thesis. First, we investigate the limits of optimality, demonstrating that observers can cor- rectly integrate an external loss function with their uncertainty about a very simple stimulus, but behave suboptimally with respect to highly complex stimuli. Second, we use the same paradigm in a collaborative fMRI study, asking where along the path from sensory to motor areas a loss function is integrated with sensory uncertainty. Our results suggest that value aﬀects a fronto-striatal action selection network rather than directly impacting on sensory processing. Finally, we consider a major theoretical problem – the demonstrations of optimality that dominate the ﬁeld have been obtained in tasks with a small number of objects in the focus of attention. When faced instead with a complex scene, the brain can’t be Bayes-optimal every- where. We suggest that a general limitation on the representation of complex posteriors causes the brain to make approximations, which are then locally reﬁned by attention. This framework extends ideas of attention as Bayesian prior, and uniﬁes apparently disparate attentional ‘bottlenecks’. We present simulations of three key paradigms, and discuss how such modelling could be extended to more detailed, neurally inspired settings. Broadening the Bayesian picture of perception and strengthening its connection to neurosci- entiﬁc and psychological literatures is critical to its future as a comprehensive theory of neural inference, and the thesis concludes with a brief discussion of future challenges in this direction.
3 Contents Declaration...................................... 2
5.1 Behavioural data for individual observers in the scanner........... 130
5.2 Decision parameters inside and outside the scanner.............. 132
5.3 Parameters from signal detection analysis.................... 133
5.4 Category-selective activation in extrastriate visual areas............ 134
5.5 There is no eﬀect of value in category-selective extrastriate visual areas... 135
5.6 Eﬀects of asymmetric value are consistent across category........... 136
4.1 Results of Bayesian model comparison..................... 112
4.2 Psychometric curve parameters for the best model for each observer..... 113
4.3 Psychometric curve parameters for the best model for each observer, cont.. 114
5.1 Signal detection response types for a 2-AFC categorisation......... 131
Acknowledgements It seems to be the inevitable lot of the new PhD student to scoﬀ at tales of adversity imparted by battle-scarred ﬁnal years, only to ﬁnd themselves immersed in those very trials – the second year blues, the upgrade frustrations, the null result. I am extremely lucky to have gone through these rites of academic passage in a department, and with a supervisor, who have provided exceptional support. Being around so many talented scientists, with a range of approaches, interests, and backgrounds has greatly enriched my perspective on the brain and on the paradigmatic wranglings of the relatively novel neurosciences. My supervisor, Maneesh Sahani, has given me both freedom and guidance, and has been instrumental in the development of my thinking. His comments and suggestions never fail to be insightful, and his ability to combine technical expertise with philosophical thinking presents an inspiring example. To acknowledge his involvement in every aspect of this thesis I will use the ﬁrst person plural throughout.
I have learnt from too many people at Gatsby to mention them all, but would like to thank Richard Turner and Misha Ahrens in particular – oﬃce mates, good friends, and much-valued sounding boards. I would also like to thank Stephen Fleming for his interest in our work on Bayesian decision making, and for exciting discussions and methodological guidance during the fMRI collaboration that followed. Throughout my PhD, the other students on the Wellcome Trust PhD program – Hanneke Den Ouden, Rosemary Milton, David Barker, Kieran Boyle, and Curtis Asante, and my fellow East-Enders – Lucy Neville and Sarah Brunell, have provided mutual encouragement and lots of fun. To my family, thanks for the ongoing support you have always shown, and for giving me from my earliest years a love of books and learning. Last but certainly not least, my partner Ollie Hulme.
From our ﬁrst meetings at ‘Consciousness Club’ to recent co-authorship, he has been a ﬁrst port of call for matters academic and beyond, and has been an amazing source of inspiration and support in all my endeavours.
For reading part or all of this thesis in various forms, I would like to thank Peter Dayan, Josh Solomon, Richard Turner, Oliver Hulme and Stephen Fleming.
The work reported in Chapters 4 and 5 was the result of a collaboration between myself and Dr Maneesh Sahani at the Gatsby Unit, Mr Stephen Fleming, Prof. Ray Dolan, and Prof. Chris Frith at the Wellcome Trust Centre for Neuroimaging, and Dr Oliver Hulme at the Institute for Ophthalmology. The project was led by Stephen Fleming as part of his PhD, and was based on the paradigm we present in Chapter 3. I contributed to adapting the experimental design for use in the scanner, and to its development based on the analysis of pilot behavioural data. Stephen Fleming collected the behavioural and fMRI data, and I coded psychometric and optimality analysis of the behavioural data. I then contributed to the design and interpretation of the fMRI analysis, and co-wrote a paper on the imaging results for submission to Science.
Publications and Other Work During the PhD
Chapter 3 has been published in Journal of Vision (Whiteley and Sahani, 2008). A paper based on Chapters 4 and 5 has been submitted to Science, and Chapter 6 is in preparation for submission to Psychological Review. During my PhD I also contributed to two fMRI studies investigating the subcortical basis of salience computation in the human brain, both in preparation. I co-wrote a commentary on a target article by Ned Block in Behavioural and Brain Sciences (Hulme and Whiteley, 2007), and a paper based on work done prior to my PhD Acta Psychologica (Whiteley et al., 2008).
13 1 Introduction
1.1 Introducing Bayesian Inference Bayes’ rule is a simple equation with a very complicated life. It was ﬁrst presented in the 18th century by the Reverend Thomas Bayes as a solution to the ‘inverse probability’ problem central to the rather unholy pursuits of gambling and insurance1 (Bayes, 1764).
This problem occurs whenever we have to work backwards from an observation (or ‘data’) to a belief about the state of the world that generated it. For example, imagine a farmer has two varieties of tomato seed, one of which tends to produce much bigger fruits. If the labels on the bags of seeds were lost, he might decide to plant 20 seeds from each bag to work out which contained the larger variety. However, observing that the 20 tomatoes grown from the ﬁrst bag were on average larger than the 20 grown from the second bag should not make him 100% conﬁdent that the ﬁrst bag was the one he was after - it could have been a ﬂuke. According to Bayes’ rule, making an inference about a state of the world (here, seed type) from an observation (here, the size of tomatoes in two samples) requires a likelihood model of how observations are generated, and prior beliefs about the state of the world that generated them, which are used to compute a posterior belief distribution according
to probability theory:
1 When hurricane Barbara strikes Springﬁeld, Marge Simpson reassures the long-suﬀering wife of evangelical Ned Flanders that insurance will cover their damaged house, and Maude replies “Uh, well, no. Neddy doesn’t believe in insurance. He considers it a form of gambling.” 14 The Bayesian farmer could use this equation to compute the posterior probability of each bag containing the larger seed variety. The likelihoods might embody knowledge such as the typical size of each tomato variety, and perhaps information about variability – for example, the larger variety might also yield a greater range of sizes. The prior might embody a bias, for example a suspicion based on where the bags were stored that the second bag contained the larger seed variety, and the stronger the prior, the more impact it has on the posterior. As can be seen in this simple example, despite its perhaps oﬀ-putting mathematical formulation, Bayesian reasoning embodies many intuitions about how we should combine information in arriving at a belief.
The denominator in Bayes’ rule can be treated simply as a normalisation constant, functioning to ensure that the posterior probability distribution adds to one. It also measures the agreement between the likelihood and the prior, providing evidence for the choice of model structure independent of parameter settings (this quantity is also known as the ‘marginal likelihood’). In order to compare competing models, the evidence term then plays the role of the likelihood for each model in a further application of Bayes’ rule (Mackay, 2004). If our tomato farmer suspected that one bag in fact contained a mixture of the two seed varieties, he could compare the marginal likelihood of a model that predicts clusters of tomatoes around the average size of each variety, against the marginal likelihood for the single variety model. The posterior constitutes a full representation of the degree of belief about each possible state of the world, but in many scenarios a speciﬁc estimate is required. The state of the world that gives the highest value of the likelihood is known as the maximum likelihood or ‘ML’ estimate, and the state of the world that is accorded the highest probability by the posterior is known as the maximum a posteriori or ‘MAP’ estimate. With an uninformative prior, which is insensitive to reparameterisation, the two are equivalent (see e.g. Jaynes, 2003).
1.2 Application and Epistemology
The applications of Bayesian inference were considered by Bayes and Laplace in the 18th – 19th centuries (Dale, 1982), but its full potential was not realised until the late 20th century, when machine learning techniques and computational power allowed priors and likelihoods of real world complexity to be used (see Fienberg, 2006). In the meantime, alternative approaches to statistical inference were developed, leading to philosophical arguments about the meaning of probability itself. The frequentist position prominent in the ﬁrst half of the 20th century constitutes a complementary, non-Bayesian framework for statistical inference and hypothesis-testing, diﬀering from the Bayesian approach in the absence of priors and in 15 a focus on punctate statistics rather than degrees of belief (see Wonnacott and Wonnacott, 1990, for a description of both approaches). Behind these concrete diﬀerences lies a deeper disagreement about what a probability is – the frequentist treats probabilities as the relative frequency of each possible outcome in an inﬁnite number of repetitions of a well-deﬁned random experiment, whereas the Bayesian thinks of probability as a subjective degree of belief that can be assigned to any proposition2. To come back to our Bayesian farmer, if he wanted to repeat the tomato experiment, it would be hard to ensure that the conditions were identical. Scientists deal with such issues on a daily basis, designing well-controlled experiments according to accepted principles. However, the philosophical concept of an inﬁnitely repeatable random experiment is rather strange, reﬂecting the diﬃculty with a human observer ever having access to this hypothetical scenario. If we conﬁgure the same scenario in terms of the beliefs of the farmer, this conceptual discomfort is ameliorated, though of course at the risk of sacriﬁcing some degree of observer-independent ‘objectivity’.