methods. Rosser’s second-generation instrument, the Index of Health-related Quality of Life (IHQL), was based on a form of SG methodology that had never been the subject of peer-review scrutiny and made no claim to represent population preferences. Nevertheless both systems have been used in QALY computations published by UK health economists as part of the NICE appraisal process.

It seems as though mainstream health economics continues to endorse the view that ‘utilities’ constitute the only legitimate form of adjustment weight in QALYs. This view appears to arise through vestigial attachment to the need for a methodological foundation grounded in economics or at any rate in a contiguous discipline. This in turn gives rise to the questionable position in which any ‘utility’-based method is deemed acceptable, despite the manifest failure of the classical theory that provides its source DNA and from which these derivatives are formed. This is a weak and fundamentally indefensible position from which to operate, reminiscent of the last throes of the Flat Earth Society in the early days of space flight. Of course we can always find a justification for why the theory does not quite work as a model of real-world behaviours – we may simply regret man’s inability to conform to expected utility theory; or we may construct experiments that test alternative explanations for the behaviours that violate the classical theory. When challenged over the friable nature of the theory to which health economics is apparently wedded, the response seems to be that at least there is a theoretical underpinning, unlike the position in other disciplines that deal with similar issues and where theory is held to be absent.


As practitioners in the field of health economics we can choose between two alternatives. We might take the view that social preferences needed for computing QALYs must be expressed in terms of utilities that are derived from a choice-based methodology linked to relevant theory. In this situation, it is likely that the method by which utilities are generated would simply follow as a logical progression from theory into practice. This fortunate state of affairs would be further complemented by a high degree of consensus in academic circles about the theoretical basis of such measurement and the practical ways of achieving it. Furthermore, novel techniques could be empirically tested against that existing standard as a mechanism for determining their suitability as substitutes. Alternatively, we might consider that social preferences may be expressed as utilities, but that this is not an absolute requirement. The value associated with a health state may be determined by any one of a larger set of methods, the only constraint being that it must produce a single index value on a scale that assigns a value of 0 and 1 to ‘dead’ and ‘full health’ respectively.

Both alternatives leave us well short of a sustainable position. Since different procedures for preference measurement tend to generate different values for PUTTING THE ‘Q’ IN QALYS 123 a given health state, it will require an extraordinary piece of good fortune to come up with a plausible explanation or a unifying theory that allows for transformation between competing value sets. It could be that a retreat into an exclusive utility-based approach has some merit, since this would reduce the range of candidate methods. However, it would still leave us some way short of an accepted (or acceptable) common method.

In the absence of a recognised standard, multiple measurement methods must be tolerated as having some claim to legitimacy. The occasional happy accidental convergence of results offers some comfort that perhaps the picture is less complicated than others would have us believe. Widely differing results give further support for the view that different methods necessarily yield divergent results. The usual response to such a multiplicity of choice is to take refuge in sensitivity analysis rather than attack the problem head on. Does it make any difference to the conclusions if we apply one set of values/utilities or another? If quality adjustment is such a problematic task, then, despite the theoretical niceties, is it imperative that it is always undertaken as part of any cost-effectiveness analysis? Recent attention given to this question suggests that in many studies, quality adjustment had relatively little effect on the final costeffectiveness ratio. Its impact was important in moving ratios across a $50 000/ QALY cost-effectiveness threshold in only some 20% of the investigated cases (Chapman et al., 2004). Where quality adjustment was indicated, then low-level investment in collecting preference data – for example using ad hoc adjustments – may be sufficient. Accepting the luxury of this approach leads to the inescapable conclusion that the choice of preference-elicitation method is an irrelevancy, and that ultimately any number will do. One way of addressing this decline into darkness would be to revisit the requirements of the reference case. Were NICE technical guidance to stipulate that all cost–utility analysis should be based upon a single generic instrument scored using a standard set of weights (perhaps regardless of their pedigree), then many of the problems associated with variability in quality-of-life data would be overcome. At least then the variability in reporting health outcomes could be contained.

True, where one door opens another closes and it would have to be recognised that some clinical studies would lack data based on that standard.

But that is precisely the situation that holds today.

So for now we are faced with a real world that remains free of a consensus over the means by which social preferences of the population should be established. One consequence of this laissez-faire approach is that it permits the use of utility weights that only remotely connect with the specifications demanded for NICE appraisals. At this point, what seems to be the narrow issue about how to measure social preferences assumes a broader and more fundamental importance. The worldly pragmatists argue that decisions about the cost-effectiveness of new treatments have to be made, that we cannot wait for perfect measures or analytical tools, that uncertainty is endemic, that qualms about quality are not restricted to quality-of-life data, that NICE’s


determinations are not based solely on the cost-effectiveness evidence. All these arguments carry some weight of course, but they need to be seen from the perspective of society as a whole, not just from that vantage point of health economics or the scientific research community. Key to the long-term sustainability of NICE-type moderation of new health technologies is the extent to which the public remains convinced of the probity of the process and its outcomes. Decisions that appear to rely heavily on technically opaque methods offer natural targets for those disadvantaged by those decisions.

It is too easy to dismiss such reactions as being the expected consequences from the usual suspects. Those close to the quality-of-life technology and its application in cost–utility analysis have a responsibility to act in ways that are compatible with the discharge of their roles as both scientists and citizens. To ignore or conceal issues that bear on the process of analysis is to risk long-term consequences that could disadvantage us all.

Nearly half a century has past since Bush, Torrance, Rosser, Williams and others first took up the challenge of measuring health outcomes. In the inter vening period, the research landscape has profoundly changed with a complexity today that might have been difficult to envisage in those early days. The academic discipline of health economics was spawned during this time, and with it the emergence of cost–utility analysis in health. Despite some 25 years of sustained enquiry this central question of how to value health in QALY calculations remains both topical and largely unresolved. Perhaps now would be a good time to free ourselves from the self-imposed straitjacket of utility.

CHAPTER 11 Discussion of Paul Kind's paper: 'Putting the "Q" in QALYs'... Ben van Hout


In the same way as Kind’s paper describes the background behind the development of the QALY, it may be useful to provide some historical notes from a slightly different perspective.

