«Edited by ANNE MASON Research Fellow, Centre for Health Economics University of York and ADRIAN TOWSE Director, Ofﬁce of Health Economics Radcliffe ...»
The central role accorded to QALYs by today’s analysts and decision makers bears witness to the leadership and perseverance of those who struggled with a technology that continues to provoke challenge.
The early mantra of ‘a QALY is a QALY is a QALY’ has given way to a more complex debate concerning issues of distribution, of equity and (paradoxically) monetary value. It is not the purpose of this paper to rehearse any of this material but rather to focus on the one issue that has remained at the heart of the QALY matter, namely the mechanism by which the quality-adjustment factor needed in computing QALYs is (and should be) established. Today we can turn for advice on such technical matters to any number of agencies (National Institute for Clinical Excellence (NICE), 2004), books (Gold et al., 1996; Drummond et al., 2005) or indeed health economists. It was not always so. Literature that is currently listed under the keyword ‘quality of life’ was referenced in the days of Index Medicus under the category of ‘health status’.
Neither was the combination of information on duration/survival and health status the exclusive creation of the health economist, with for example, wellyears as a unit of measure being proposed by clinicians as the integral of health status over time (Grogono and Woodgate, 1971). The richness and diversity
of today’s research contrasts with the simplicity and single-mindedness of the work that predates it.
BACKGROUND AND EVOLUTIONAt about the same time that interest in cost–beneﬁt analysis was emerging in the 1960s, we can see the ﬁrst serious attempts to address ways of measuring health status, although the focus was largely directed at the level of the population and relied principally on the use of mortality data. The inﬂuential contributions of Sanders (1964) and Sullivan (1966) continue to resonate today in the use of summary measures of population health (SMPH), and more generally in the concept of health-adjusted life expectancy (HALE).
Assigning a year of life with no disability and a year of life with any disability a ‘value’ of 1 and 0 respectively, and combining this coding convention with actuarial life tables, it is possible to compute the years of life with or without disability. Disability-free life expectancy (DFLE) is of greatest interest to medical demographers and others concerned with an ageing population. It is not just that life expectancy is increasing, but of particular concern is whether the ill-health component is on average being compressed and delayed – do people enjoy relative improvements in survival without problems in young to middle age, only to have to face problems in later years? While the question itself may be interesting, it is the techniques applied to its investigation that are the focus of this paper, namely the convention applied to the description and valuation of health. The very mention of ‘disability’ can be highly provocative, especially in the context of the disability/impairment/handicap debate. The data used more recently to compute DFLE in the UK were derived from the limiting long-standing illness question contained in the General Household Survey question. Individuals who report any degree of limitation in the past two weeks and who have a chronic problem of any sort are categorised as ‘disabled’ and assigned an identical value of zero. All other individuals, regardless of their experience in the remainder of the year, share the value 1.
At about this time in the US, Fanshel and Bush (1970) published their seminal work describing a health status index for use in investigating changes in population health status engineered by health treatment. They noted with dismay the shortcomings of (then) current indicators based on mortality and the reliance on measures of activity rather than outcome.* The model described for their index was hugely inﬂuential, leading directly to the Quality of Well Being (QWB) scale (Patrick et al., 1973) and to related methodological research that was widely cited over the following decade. Set within their original paper are suggestions for obtaining values for functional states deﬁned by any health status index. Their general procedure of choice * The provision of health services is a $60 billion-a-year enterprise, yet no comparable industry spends so little on evaluating its own performance. More is known about the consumption of macaroni and corsets than the health status of the population.
PUTTING THE ‘Q’ IN QALYS 113
BOX 10.1 THREE-DIMENSIONAL DESCRIPTIVE SYSTEMMobility 1(a) Ability to get in and out of bed and/or chair 1(b) Ability to negotiate a level surface 1(c) Ability to climb stairs 1(d) Ability to walk outdoors Capacity for self-care 2(a) Ability to feed self 2(b) Ability to dress self 2(c) Ability to wash self 2(d) Ability to make a hot drink 2(e) Ability to cook a meal 2(f) Ability to light a ﬁre 2(g) Ability to shop 2(h) Whether or not continent Mental state 3(a) Intellectual processes – memory and orientation of person and place 3(b) Loneliness and desolation 3(c) Depression 3(d) Boredom 3(e) Motivation towards independence 3(f) Anxiety 3(g) Antisocial or self-harming behaviour Source: Williams A (1974) Measuring the effectiveness of health care systems. British Journal of Preventive and Social Medicine 28: 196–202. Reproduced with permission from the BMJ Publishing Group.
is that of paired comparisons (Thurstone, 1927) and they describe a range of variants based on weighting through equivalence in time, population and dysfunctional history. These correspond to the current techniques of time trade-off (TTO) and person trade-off (PTO).
First articulated in a paper co-authored with Culyer and Lavers (Culyer et al., 1972), Williams described a health status classiﬁcation system based on three ‘divisions’ – mobility, capacity for self-care and mental state as shown in Box 10.1 – which deﬁned 64 possible health states. He observed that ‘the major stumbling block at present is the absence of any widely used standardized descriptive categories of social functioning and that without these we cannot get off ﬁrst base’ (Williams, 1974). He foresaw a two-stage solution to the problem of measuring the (dis)beneﬁts of health care which ideally would be
114 THE IDEAS AND INFLUENCE OF ALAN WILLIAMSexpressed ‘in monetary units commensurate with the relevant cost estimates’.
Health states deﬁned by a set of descriptive categories of the type proposed in Box 10.1 would be assigned index values on a scale of ill-health intensity. The 10-point scale he proposed was not dissimilar to that originally described by Karnofsky et al., (1948) with endpoints ‘normal’ and ‘dead’ being valued as 0 and 10 respectively. The ﬁnal step in the process was to attach money values to the index points, a process that he suggested might be undertaken using methods applied in constructing an index of the seriousness of crime (Sellin and Wolfgang, 1964).
At roughly the same time, Rachel Rosser was developing a separate descriptive classiﬁcation system based on twin dimensions of disability and distress, divided into 8 and 4 levels respectively. This 8*4 system deﬁned a total of 28 health states, since it was held that being unconscious (disability level 8) necessarily implied that there could be no distress. This simple generic classiﬁcation had been designed using clinician focus groups. Initial attempts to associate a value with these 28 health states had involved the analysis of legal awards data determined in civil actions in English courts (Rosser and Watts, 1972). Judgements in these cases speciﬁcally dealt with the compensation awarded to plaintiffs in respect of loss of physical function (disability) and pain (distress). Rosser recognised an important limitation in this approach, not least being the metric itself, but also that other factors might be relevant in determining the value associated with different health states; these included the personal characteristics of those who made such value assignments – in particular, their age and current health status and exposure to those in poor health. Additionally, she identiﬁed the importance of framing effects including prognosis and the time spent in ill-health states. Using techniques imported from management science, Rosser conducted a series of interviews in which multiple valuation methods were used to elicit scores for disability/ distress health states, including magnitude estimation, equivalence scaling and standard gamble procedures (Churchmann et al., 1957). However, it was the magnitude estimation values derived from interviews with a convenience sample of 70 individuals with different current health experiences that formed the centrepiece of this work (Rosser and Kind, 1978). Following the publication of these values for disability/distress states and further analysis around the valuation of death (Kind and Rosser, 1980), it was Alan Williams who proposed the notion of transforming the original values so that they took on the anchor points of 1 for full health and 0 for dead. It is worth noting that more than half the health states occupy the value space between 0.9 and 1.0 (as can be seen in Table 10.1). The subsequent publication of a scale of health state values based on the transformed median magnitude estimation data (Kind et al., 1982) ultimately proved to be something of a turning point, since it provided health economists, in the UK at least, with the ﬁrst standardised generic measure with the capacity to compute QALY calculations using weights of domestic origin.
PUTTING THE ‘Q’ IN QALYS 115
TABLE 10.1 MEDIAN DISABILITY/DISTRESS STATE VALUATIONS BASED ON MAGNITUDEESTIMATION
Writing at about this time, Weinstein and Stason (1977) observed that although still controversial, methods for explicitly incorporating quality-of-life concerns into formal cost-effectiveness analyses were becoming more widely used and accepted. They went on to exemplify the mechanism by which the weighting system for such quality adjustment should be made, namely using standard gamble (SG) or TTO. A formal justiﬁcation for the selection of ‘utility’ weights in this role in computing QALYs was not put forward in this paper (Pliskin et al., (1980) appear to provide that methodological argument).
By the mid-1980s the ground had been prepared for a veritable explosion of research activity related to the science of quality-of-life measurement.
Established generic measures included the QWB and the Sickness Impact Proﬁle (SIP) (Bergner et al., 1976) of US origin as well as the UK analogue, the Nottingham Health Proﬁle (NHP) (Hunt et al., 1985). In Finland, Harri Sintonen (1981) had developed the 15D and in Canada the Health Utilities Index (HUI) was in being (Torrance et al., 1982). The long-form precursor to what became the SF-36 was already in place (Ware and Sherbourne, 1992).
At York, researchers were concentrating on the issue of valuation in general, and values for the disability/distress states in particular. The use of magnitude estimation methods was not itself a concern, rather it was that the size and nature of Rosser’s sample of respondents rendered questionable the status of the resulting value matrix as representative of social preferences. Efforts to replicate the original value set produced equivocal results (Gudex et al., 1993).
In 1987 a group of researchers met in Rotterdam, at Alan Williams’ behest, with the objective of exploring their common interest in the valuation of health.