«Edited by ANNE MASON Research Fellow, Centre for Health Economics University of York and ADRIAN TOWSE Director, Ofﬁce of Health Economics Radcliffe ...»
At the end of the 1980s, Dutch policy makers were searching for ways to limit the growth of expenditure on health care. Reports were published with titles such as ‘Limits to growth’ (of the insurance package), and a national debate was fought over ‘Dunning’s ﬁlter’ (named after a professor of cardiology), a system devised to stop inefﬁcient, costly and unnecessary technologies from receiving reimbursement. All this activity implied that choices had to be made about funding health care: choices about what should and should not be reimbursed and for whom and under what conditions. Moreover, the mechanisms by which such choices were made needed to be transparent, because explicit choices are open to debate. At the same time, the Dutch government initiated a number of large studies, fashionably called ‘technology assessments’, to evaluate heart transplantation, liver transplantation and in vitro fertilisation. At the centre of these studies was the assessment of costs and effects. In those days, methods to assess the cost-effectiveness of public programmes could be found in the text books of Mishan (1982), Dasgupta and Pearce (1978) and Sugden and Williams (1978). They offered researchers a frame of reference in which changes were assessed using concepts such as ‘opportunity costs’ ‘equivalent variation’ and ‘compensating variation’.
‘Compensating variation’ refers to the amount of money that one has to give a person to make him as happy after a change as he was before that change. In 1978, Broome pointed out that it is somewhat difﬁcult to establish
127128 THE IDEAS AND INFLUENCE OF ALAN WILLIAMS
the amount of money needed to make dead people as happy as when they were alive and that ‘the attempt to value life in terms of money is more or less doomed to failure’ (Broome, 1978). A lively debate followed in which Alan Williams (Williams, 1979) was one of several famous economists (JonesLee, 1979; Mishan, 1981) who wrote replies to Broome’s critique. Mishan’s offer to help ‘to clear the cobwebs from his [Broome’s] mind and to restore perspective’ illustrates that the economists were not persuaded by Broome’s arguments. However, an atmosphere was created in which it seemed politically incorrect to value the effects of medical treatments in monetary terms. It is possible that this catalysed Alan Williams’s efforts to try to develop a less controversial measure that could be used without the accusation of applying a single-minded, short-sighted, internally inconsistent pseudo-science.
THE EUROQOL GROUPResearchers in York were not the only ones searching for a scale to enable the comparison of treatment effects in different therapeutic areas for the sole purpose of the application in economic analyses. Dutch researchers were facing the same questions, and the contact between York and Rotterdam was also mediated via Brunel where Martin Buxton was sharing similar experiences concerning the evaluation of heart and liver transplantation. In my view, the central problem posed by the founding members of the EuroQol group was not – as Kind states – whether values for health differ between countries, but rather an economic one: to devise a metric that could be used in economic evaluations that would facilitate the decision-making process for policy makers. The presence of psychologists in the group was – from the perspective of the economists – instrumental. They didn’t share the same problem, but they did hold most of the solution. And while the landmark publication about the cost-effectiveness of bypass surgery (Williams, 1985) could never have been written without the work of Rosser and Kind, it should be noted that QALYs have always been designed as a solution to an economic problem: the allocation of scarce resources within health care based on the assessment of costs and effects.
QALY The ‘best’ way to derive the values which inform the Q element of the QALY may remain the subject of debate for many years to come. This process may be prolonged if the same system is also expected to be used for decisions other than those to do with resource allocation. One reason for this is that it is often very unclear what people mean by ‘best’. When the goal is to support decision making it is indeed the extent to which the public remains convinced of the probity of the process and its outcomes. And indeed, when harsh decisions are taken there will always be groups who are disadvantaged and who will DISCUSSION OF ‘PUTTING THE “Q” IN QALYS’ 129 challenge the methods that were used. However, as Kind notes, the occasions where QALY calculations have been crucial, are limited. From a decision maker’s perspective, the calculation of the balance between cost and effects is like a diagnostic test. Sometimes an intervention is clearly not cost-effective, sometimes it clearly is cost-effective but often one has to do some additional work. This may be in the form of additional research, additional considerations or both. The balance between costs and effects is assessed to deﬁne whether we are in a white, a black or in a grey area. Thresholds, such as say £30 000 per QALY gained, may be a trigger for doubts and for further thoughts. Such doubts are usually more about whether the threshold is correct, whether there are sufﬁcient numbers of patients, whether the trial data are generalisable or whether there should be certain restrictions, than about the valuation technique used to calculate the QALY weights. This leads to a more pragmatic approach, or what may be termed ‘a decision-maker’s approach’. Such a view is probably quite close to the one held by Alan Williams, and one wonders whether he was one of the purist health economists Kind refers to. Alan’s decisions to transform the values under 0 to a limit of –1, and to use means instead of medians (something he later seemed to regret), suggested that he was not. For him, political acceptability seemed to score more highly than scientiﬁc rigour. Additionally, Alan was also very aware of Joan Robinson’s view that ‘Utility is a metaphysical concept of impregnable circularity; utility is the quality in commodities that makes individuals want to buy them, and the fact that individuals want to buy them shows that they have utility’ (Robinson, 1962). In other words ‘real economists’ know that they can’t measure utility, but can only derive it by observing real behaviour, something that in health care is rarely done. Instead, economists are asked to ‘prescribe’ what a society should decide rather than ‘describe’ what ‘typical’ people do decide. Moreover, they do not really have any experience with doing this. Given that the whole question is about choices, and given that it has to be done on a collective basis, it may be best to derive the answers by asking people to decide in imaginary situations. Whatever the method, the concept of choice is eminently present.
VALUATION METHODSThe fact that decision makers might be less worried about the theoretical underpinnings of their value sets than purist economists (whoever these are) does not mean that they do not have any preferences about the ideal attributes for a value set. A good start might be a scale that puts 1 at perfect health, dead at 0, non-perfect states that are better than dead between 0 and 1, and states worse than dead below 0. Additionally, the scale would ideally have cardinal properties such that a year in a state of 0.50 is about equal to two years in
0.25. The latter is a harsh requirement whatever the technique used to derive the scale. As Kind points out, time trade-off and standard gamble seem to be the preferred methods, with visual analogue scaling, category rating and
130 THE IDEAS AND INFLUENCE OF ALAN WILLIAMSsome discrete choice models following at a respectable distance. According to Kind, this distance was created by NICE and not by theory. Unfortunately, nobody will ever know whether he is right or wrong. There is no gold standard, and the fact that discrete choice models are often used in other areas does not mean that they give the right scale, either in their existing applications or in health.
When the aim is to value and compare a number of life years gained in states x, y and z, TTO seems to be the method that most closely reﬂects the question, in that it explicitly asks for the value of a given number of years in these health states. In contrast, standard gamble requires one to imagine a risk-taking situation. Visual analogue scaling and category rating only have an implicit choice element. Furthermore, all the discrete-choice methods need additional heroic assumptions before being able to derive a meaningful scale.
Research has shown that each method produces a slightly different scale.
All methods seem to have their pros and cons. In order to decide whether one is better than any other, one has to deﬁne ‘better’. Better – in light of the use in economic evaluations (the beginning of the EuroQol group) should be concerned with whether a different method would lead to decisions that better reﬂect what the majority of society prefers. At this point one may also wonder whether a different method would lead to any change in the decisions that are currently being taken. In addition, the decisions have to be defended in public and, in the absence of a gold standard, basing them on a method that seems intuitively closest to the original question seems as good as any other.
There are a number of different value functions for EQ-5D available, based on different sets of data. The so called ‘A1 tariff’, as derived from the TTO questions in the MVH study, is probably most often used. This is not only in the UK but also in the Netherlands where the logical preference for a Dutch tariff does not always prevail. The perceived effect on the probability of an international publication often outweighs any other argument. Indeed it is questionable whether using other value sets leads to different orderings of therapies which should be reimbursed. And indeed it may be suggested – as Kind does – that it is a scientiﬁc duty to keep on checking whether we are still on the right path. However, there is also something of a scientiﬁc duty to focus one’s brain-power where it is most needed. This may be what Claxton (1999) refers to when considering the value of perfect information. And one may wonder whether Alan Williams wasn’t one of the ﬁrst to apply this concept when allocating his research activities away from QALYs, towards other issues such as those about equity weights?
THE GOLD STANDARDEach valuation method has its own advantages and disadvantages, and one may never get to a gold standard in the sense of a perfect diagnostic test. The DISCUSSION OF ‘PUTTING THE “Q” IN QALYS’ 131 word ‘gold standard’ however may apply in its more traditional meaning. Just as in the late 19th century gold was arbitrarily chosen above silver after years of attempts to maintain a bimetallic standard, TTO might just be accepted as the standard. Maybe one should just accept that Alan Williams – or the MVH study for that matter – has deﬁned TTO as the gold standard, just like the Germans decided in favour of gold when they wanted to be paid after the 100-year war.
ABOUT TTO That TTO may be identiﬁed as the ‘reasonable’ Alan Williams way and thus as the way to go, does not imply that it is beyond improvement. For example TTO was used to estimate the impact of various degrees of erectile dysfunction and estimated the QALY weight for a complete dysfunctional state at 0.74 (Stolk et al., 2000). This led to a very favourable cost-effectiveness ratio for Viagra.
But the Dutch government decided not to reimburse it. This suggests that the valuation was not accepted to really reﬂect the disease burden. And indeed, taking a chance of 25% to die on an operating table, or to be in coma for almost two days a week just to be perfectly ‘erectile ready’ for the rest of the week, may seem rather high. This is especially so considering that medications that have to be injected into the penis – and that offer effective relief of the problem – are hardly ever used. This type of revealed preference suggests that maybe this isn’t as serious a health problem as Stolk’s work implies.
Does this study mean that we should use a different method to TTO? Not necessarily, but is does suggest that maybe one should be careful applying TTO in a disease-speciﬁc context.
Deﬁning TTO as the gold standard does not mean that one should stop exploring alternatives. However, it would be foolish not to anchor that work within the rich experience that is already available. Additionally, it would be rather foolish to step away from the face-validity that TTO brings with it.
Any discrete-choice study aiming to establish value sets could be improved if informed by TTO values for a number of the states being evaluated. Analyses are needed in which times are traded off without using ‘perfect health’ as a comparator. Additionally, a deeper understanding of the values elicited using TTO is needed. The observation that respondents especially disagree about the positioning of ‘death’ (Macran and Kind, 2001) warrants further research.
ABOUT ALAN WILLIAMSI think that knowing Alan Williams personally – talking to him at conferences – increased my scientiﬁc ‘street cred’ among other Dutch researchers. Knowing Alan was ‘cool’. Moreover he has helped us, less talented health economists, so often. Whenever some ethicist stood up to challenge the fruits of our research, he was the ﬁrst to take up the challenge and did so with a ﬂair that many can
132 THE IDEAS AND INFLUENCE OF ALAN WILLIAMSonly aspire to ever reach. He led an international battle, not just a personal or a UK one.
Alan Williams made an impact on many, and those who met him or read his work will easily remember him. There are parts of the world where one is not really dead as long as one is still remembered. This may imply that one is ‘more dead’ when remembered by only one person than when remembered by thousands; perhaps this might be scored on a type of ‘scale’, measuring how much someone lives on in other people’s memories. On this scale – assuming that health economists count too – he might be close to being alive. I like that thought.
ACKNOWLEDGEMENTSI would like to thank Susan Macran for very helpful comments.