FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 |

«Abstract. Two measures of association for dichotomous variables, the phi-coefficient and the tetrachoric correlation coefficient, are reviewed and ...»

-- [ Page 1 ] --





Abstract. Two measures of association for dichotomous variables, the phi-coefficient

and the tetrachoric correlation coefficient, are reviewed and differences between the two

are discussed in the context of the famous so-called Pearson-Yule debate, that took place in the early 20th century. The two measures of association are given mathemat- ically rigorous definitions, their underlying assumptions are formalized, and some key properties are derived. Furthermore, existence of a continuous bijection between the phi-coefficient and the tetrachoric correlation coefficient under given marginal proba- bilities is shown. As a consequence, the tetrachoric correlation coefficient can be com- puted using the assumptions of the phi-coefficient construction, and the phi-coefficient can be computed using the assumptions of the tetrachoric correlation construction.

The efforts lead to an attempt to reconcile the Pearson-Yule debate, showing that the two measures of association are in fact more similar than different and that between the two, the choice of measure of association does not carry a substantial impact on the conclusions of the association analysis.

Key words and phrases. Phi-coefficient, Tetrachoric Correlation Coefficient, 2×2 Contingency Tables, Measures of Association, Dichotomous Variables.

Financial support from the Jan Wallander and Tom Hedelius Research Foundation, project P2008- 0102:1, is gratefully acknowledged.

1 ¨


(a) Karl Pearson (1857-1936) (b) George Udny Yule (1871-1951) Figure 1. Pearson portrait is from Pearson (1938), and is in the public domain. Yule portrait is from Yule et al. (1971), reproduced with the kind permission of Hodder & Stoughton.

1. Introduction The phi -coefficient and the tetrachoric correlation coefficient are two measures of as- sociation for dichotomous variables. The association between variables is of fundamental interest in most scientific disciplines, and dichotomous variables occur in a wide range of applications. Consequently, measures of association for dichotomous variables are useful in many situations. For example in medicine, many phenomena can only be reliably measured in terms of dichotomous variables. Another example is psychology, where many conditions only can be reliably measured in terms of, for instance, diagnosed or not diagnosed. Data is often presented in the form of 2 × 2 contingency tables. A his- torically prominent example is Pearson’s smallpox recovery data, see Table 1, studying possible association between vaccination against, and recovery from, smallpox infection.

Another interesting data set is Pearson’s diphtheria recovery data, Table 2, studying possible association between antitoxin serum treatment and recovery from diphtheria.

Measures of association for dichotomous variables is an area that has been studied from the very infancy of modern statistics. One of the first scholars to treat the subject was Karl Pearson, one of the fathers of modern statistics. In the 7th article in the seminal series Mathematical contributions to the theory of evolution, Pearson (1900) proposed what later became known as the tetrachoric correlation coefficient, as well as, Pearson would later argue, the phi -coefficient. The fundamental idea of the tetrachoric correlation coefficient is to consider the 2 × 2 contingency table as a double dichotomization of a bivariate standard normal distribution, and then to solve for the parameter such that the volumes of the dichotomized bivariate standard normal distribution equal the joint


Figure 2. Care at the Hampstead fever hospital, London 1872. One of many hospitals opened for the sick poor by the Metropolitan Asylums Board in the late 19th century. With the kind permission of workhouses.org.uk.

probabilities of the contingency table. The tetrachoric correlation coefficient is then defined as that parameter, which, of course, corresponds to the linear correlation of the bivariate normal distribution.

According to Pearson’s colleague Burton H. Camp (1933), Pearson considered the tetrachoric correlation coefficient as being one of his most important contributions to the theory of statistics, right besides his system of continuous curves, the chi-square test and his contributions to small sample statistics. However, the tetrachoric correlation coefficient suffered in popularity because of the difficulty in its computation. Throughout his career, Pearson published statistical tables aimed at reducing that difficulty (Camp,

–  –  –

1933), reflecting an interest in promoting a wider adoption of the tetrachoric correlation coefficient among practitioners.

While the tetrachoric correlation coefficient is the linear correlation of a so-called underlying bivariate normal distribution, the phi -coefficient is the linear correlation of an underlying bivariate discrete distribution. This measure of association was independently proposed by Boas (1909), Pearson (1900), Yule (1912), and possibly others.

The question of whether the underlying bivariate distribution should be considered continuous or discrete is at the core of the so-called Pearson-Yule debate. In the historical context of the Pearson-Yule debate, though, it is important to understand that no one at the time looked upon these two measures of association as the linear correlations of different underlying distributions, the framework in which both were presented in the preceding paragraph. On the contrary, according to Yule (1912) the tetrachoric correlation coefficient is founded upon ideas entirely different from those of which the phi -coefficient is founded upon. The sentiment is echoed by Pearson & Heron (1913), which even claims that the phi -coefficient is not based on a reasoned theory, while at the same time arguing for the soundness of the tetrachoric correlation coefficient. In fact, the point of view that both measures of association are the linear correlations of underlying distributions is one of the contributions of the present article.

1.1. The Pearson-Yule debate. George Udny Yule, a former student of Pearson, favored the approach of an inherently discrete underlying distribution. Yule (1912) is a comprehensive review of the area of measures of association for dichotomous variables, as well as a response to Heron (1911), and contains blunt criticism of Pearson’s tetrachoric correlation coefficient. Regarding the tetrachoric correlation coefficient’s assumptions of

underlying continuous variables, Yule (1912) reads:

Here, I am concerned rather with the assumptions and their applicability.

[...] Those who are unvaccinated are all equally non-vaccinated, and similarly, all those who have died of small-pox are all equally dead. [...] From


this standpoint Professor Pearson’s assumptions are quite inapplicable, and do not lead to the true correlation between the attributes. But this is not, apparently, the standpoint taken by Professor Pearson himself.

The example that Yule (1912) referes to is the smallpox recovery data which was prominently featured in Pearson (1900), see Table 1.

Yule (1912) also contains a bibliographical discussion which could be interpreted as a questioning of whether Pearson really is the originator of some of the ideas that Pearson claimed credit for. In all, Pearson quite evidently felt offended by some of Yule’s wordings and was upset by his former student’s publicly expressed, and in Pearson’s opinion uninformed, misgivings about the tetrachoric correlation coefficient. And from there on, it is by most accounts fair to say that the debate lost all proportions.

Pearson & Heron (1913) is a scathing, almost 200 pages long reply. The introduction


The recent paper by Mr Yule calls for an early reply on two grounds, first because of its singularly acrimonious tone [...], and secondly because we believe that if Mr Yule’s views are accepted, irreparable damage will be done to the growth of modern statistical theory. Mr Yule has invented a series of methods which are in no case based on a reasoned theory, but which possess the dangerous fascination of easy application [...], and therefore are seized upon by those who are without adequate training in statistical theory.

With regards to the smallpox recovery example, Pearson & Heron (1913) replies:

Recovery and death in cases of small-pox were used to measure a continuous variable - the severity of the attack. [ Moreover, ] vaccination regarded as conferring immunity is an essentially continuous variable.

With respect to Yule’s contrasting view of the dichotomous variables as inherently discrete, while still unidimensional, Pearson & Heron (1913) rhetorically counter-asks:

Does Mr Yule look upon death as the addition of one unit to recovery?

Pearson may also have taken offense at the fact that Yule wrote a review on one of the regarded Professor’s favorite topics. Pearson & Heron (1913) mentions Yule’s statistical textbook on several occasions.

It may be said that a vigorous protest against Mr Yule’s coefficient is unnecessary. We believe on the contrary that, if not made now and made strongly, there will be great set-back to both modern statistical theory and practice. The publication of Mr Yule’s text-book has resuscitated the use of his coefficient of association; it is now being used in all sorts of quarters on all sorts of unsuitable data. The coefficient of association is in our opinion wholly fallacious, it represents no true properties of the actual distribution, and it has no adequate physical interpretation.

¨ 6 JOAKIM EKSTROM The exchange became known as the Pearson-Yule debate. The tone was indeed caustic, many readers likely felt intimidated by the gravity of the accusations, and Camp (1933) acknowledges that it may have contributed to Pearson’s reputation of being unkind.

Though in the end, it is important to point out, Yule wrote Pearson’s obituary for the Royal Society (Yule & Filon, 1936) and according to Kendall (1952), Yule was deeply affected by Pearson’s death.

The unresolved nature of the debate must also have had the negative effect that practitioners and fellow statisticians alike were left in doubt about what measure of association to use in different situations. The tone of the debate leaves the reader with the impression that the choice of measure of association almost is a matter of life and death. And that is, of course, not quite the case. In fact, one of the conclusions of the present article is that between the two, the choice does not carry a substantial impact on the conclusions of the association analysis. So quite on the contrary, as it will be seen, practitioners have no reason to be anxious. And neither Pearson nor Yule, as will also be seen, had really any reason to fear for the future of modern statistics.

1.2. Outline of the present article. The core of the Pearson-Yule debate is about the assumptions implied by the two measures of association. In this article, a close look at the two measures of association will be taken and the implied assumptions will be pinpointed and formalized. Pearson & Heron (1913) argued that dichotomous variables should be considered dichotomizations of continuous underlying variables, while Yule (1912) argued that they should be considered inherently discrete. In this article, however, it is shown that under given marginal probabilities there exists a continuous bijection between the two, which moreover has a fixed point at zero for all marginal probabilities. Consequently, both measures of association can be computed equally well no matter whether the variables are considered dichotomizations of continuous variables or not. As long as one of the assumptions is deemed appropriate, it does not make a difference which one it is. As a consequence, it turns out, whether to use the tetrachoric correlation coefficient or the phi -coefficient is in principle a matter of preference only.

The main result of this article, that there exists a continuous bijection between the phi coefficient and the tetrachoric correlation coefficient under given marginal probabilities, has not been found in the literature. Guilford & Perry (1951) and Perry & Michael (1952) use series expansion of the integral equation of the tetrachoric correlation coefficient to find an approximate formula of the tetrachoric correlation coefficient as a function of the phi -coefficient whose errors, according to Perry & Michael, “are negligible for values of [the approximate tetrachoric correlation coefficient] less than |0.35| and probably relatively small for values of [the approximate tetrachoric correlation coefficient] between |0.35| and |0.6|”. Though Guilford & Perry and Perry & Michael consider the relationship phi -coefficient - tetrachoric correlation coefficient, their result does, however, not imply a continuous bijection.

In Section 2, the phi -coefficient and the tetrachoric correlation coefficient are introduced, necessary assumptions formalized, and a proof that the tetrachoric correlation


coefficient is well defined is given. In Section 3, the main theorem of this article is stated and proved, and its implications are briefly discussed. Thereafter, in Section 4, some numerical examples and graphs of the relation phi -coefficient - tetrachoric correlation coefficient are considered. And finally, the article is concluded with Section 5.

2. The two measures of association

2.1. Dichotomous variables. Let X and Y be two dichotomous variables. In the most general setting, the values of a dichotomous variable cannot be added, multiplied, ordered, or otherwise acted on by any binary operator, save projection. The algebraically most stringent way to model a dichotomous variable is to define it as a random element X : Ω → C, where the sample space C is an


Pages:   || 2 | 3 |

Similar works:

«Media Contact: Valerie Cisneros marketing@enzian.org 407-629-1088 x302 FOR IMMEDIATE RELEASE 25TH ANNUAL FLORIDA FILM FESTIVAL ANNOUNCES PROGRAM LINEUP, CELEBRITY GUESTS, AND NEW OSCAR® ACCREDITATION Orlando, FL – (March 16, 2016) – The Florida Film Festival announced today the program lineup for the 25th Annual Festival, April 8-17, 2016, in Maitland and Winter Park, Florida, with Primary Sponsor Full Sail University and Primary Public Partners Orange County Government and the Florida...»

«Philos Stud (2011) 153:123–142 DOI 10.1007/s11098-010-9643-8 Taxonomising the senses Fiona Macpherson Published online: 30 October 2010 Ó Springer Science+Business Media B.V. 2010 Abstract I argue that we should reject the sparse view that there are or could be only a small number of rather distinct senses. When one appreciates this then one can see that there is no need to choose between the standard criteria that have been proposed as ways of individuating the senses—representation,...»

«Zurich Open Repository and Archive University of Zurich Main Library Winterthurerstr. 190 CH-8057 Zurich www.zora.uzh.ch Year: 2005 Study by proteomics of a transgenic mouse model of Alzheimer's disease tau pathology Della Crysta, D Della Crysta, D. Study by proteomics of a transgenic mouse model of Alzheimer's disease tau pathology. 2005, University of Zurich, Faculty of Science.Postprint available at: http://www.zora.uzh.ch Posted at the Zurich Open Repository and Archive, University of...»


«Like Us On Facebook Your Quick Sport Fix Page 1 Type to enter text www.betfan.com www.betkudos.com www.winninginformationnetwork.com http://members.tipsterplanet.com Issue 24 Saturday 1st August 2015 www.freeracingtips.co.uk www.raceadvisor.co.uk Welcome to issue 24 of “Your Quick Sport Fix!” www.tiptv.co.uk It’s your new bite sized sports newsletter that’s distributed freely online via Email and Social Media by Sport Fans. We hope you like it and share it....»

«An Inverted Market: Niche Market Dynamics Of The Local Organic Food Movement Item type text; Electronic Dissertation Authors Schrank, Zachary Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Downloaded...»

«NDA 21-275/S-006 Page 3 LUMIGAN (bimatoprost ophthalmic solution) 0.03% DESCRIPTION LUMIGAN (bimatoprost ophthalmic solution) 0.03% is a synthetic prostamide analog with ocular hypotensive activity. Its chemical name is (Z)-7-[(1R,2R,3R,5S)-3,5-Dihydroxy-2-[1E,3S)-3-hydroxy5-phenyl-1-pentenyl]cyclopentyl]-5-N-ethylheptenamide, and its molecular weight is 415.58. Its molecular formula is C25H37NO4. Its chemical structure is: HO C2H5 CON H HO OH Bimatoprost is a powder, which is very...»

«The Board of Governors of the California Community Colleges PRESENTED TO THE BOARD OF GOVERNORS DATE: September 19-20, 2016 SUBJECT: July 18, 2016 Board Meeting Minutes Item Number: 1.1 Attachment: No CATEGORY: Executive TYPE OF BOARD CONSIDERATION: Recommended By: Consent/Routine X First Reading Paul Feist, Vice Chancellor Approved for Action Consideration: Information Erik E. Skinner, Interim Chancellor ISSUE: This item presents the July 18, 2016, board meeting minutes for review and approval...»

«Low–Cost Eye–Trackers: Useful for Information Systems Research? Stefan Zugal and Jakob Pinggera University of Innsbruck, Austria {stefan.zugal,jakob.pinggera}@uibk.ac.at Abstract. Research investigating cognitive aspects of information systems is often dependent on detail–rich data. Eye–trackers promise to provide respective data, but the associated costs are often beyond the researchers’ budget. Recently, eye–trackers have entered the market that arXiv:1511.04308v1 [cs.HC] 11 Nov...»

«JAPANESE VISUAL COMMUNICATION SURVEY: A BRIEF ANALYSIS OF CONTEMPORARY GRAPHIC DESIGNERS FROM JAPAN Flávio de Almeida Hobo Centro de Investigação em Arquitectura, Urbanismo e Design da UTL – FA IADE flaviohobo@gmail.com Abstract To research graphic design in a globalized context it is primordial to consider cultural, social, historical and even anthropological studies to fully understand the aesthetics’ choices made by the designer’s. Being the “Japanese graphic design” a topic...»

«Jellinek anti-hagiográfiája Jellinek's anti-hagiography MÁRK MÓNIKA – BRETTNER ZSUZSANNA „A tapasztalt utas mögött nem marad nyom.” (Lao-ce) Összefoglalás A korszerű addiktológia alapelvei döntően az alkoholizmussal kapcsolatban alakultak ki. Modern addiktológiai tudás pedig aligha képzelhető el az alkohológiát önálló tudományág rangjára emelő E. M. Jellinek és a nevéhez fűződő betegségkoncepció ismerete nélkül. A jelen tanulmány az alkohológia...»

«Ayuntamiento de Las Rozas de Madrid BORRADOR DEL ACTA DE LA SESIÓN ORDINARIA DEL AYUNTAMIENTO PLENO DE LAS ROZAS DE MADRID, CELEBRADA EL DÍA 26 DE OCTUBRE DE 2011.ASISTENTES: Sr. Alcalde-Presidente: D. José Ignacio Fernández Rubio Sres. Concejales: D. Ángel Francisco Alonso Bernal D. Fco. Javier Espadas López-Terradas Dª Mercedes Piera Rojo Dª Mª Jesús Villamediana Díez D. Juan Blasco Martínez D. José Luis Álvarez de Francisco Dª Mª Cristina Sopeña de la Torre Dª Paula...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.