WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 1 ] --

University of California

Los Angeles

Bridging the Gap Between Tools for Learning

and for Doing Statistics

A dissertation submitted in partial satisfaction

of the requirements for the degree

Doctor of Philosophy in Statistics

by

Amelia Ahlers McNamara

c Copyright by

Amelia Ahlers McNamara

Abstract

of the Dissertation

Bridging the Gap Between Tools for Learning

and for Doing Statistics

by

Amelia Ahlers McNamara

Doctor of Philosophy in Statistics

University of California, Los Angeles, 2015 Professor Robert L Gould, Co-chair Professor Fredrick R Paik Schoenberg, Co-chair Computers have changed the way we think about data and data analysis. While statistical programming tools have attempted to keep pace with these developments, there is room for improvement in interactivity, randomization methods, and reproducible research.

In addition, as in many disciplines, statistical novices tend to have a reduced experience of the true practice. This dissertation discusses the gap between tools for teaching statistics (like TinkerPlots, Fathom, and web applets) and tools for doing statistics (like SAS, SPSS, or R). We posit this gap should not exist, and provide some suggestions for bridging it. Methods for this bridge can be curricular or technological, although this work focuses primarily on the technological.

We provide a list of attributes for a modern data analysis tool to support novices through the entire learning-to-doing trajectory. These attributes include easy entry for novice users, data as a first-order persistent object, support for a cycle of exploratory and confirmatory analysis, flexible plot creation, support for randomization throughout, interactivity at every level, inherent visual documentation, simple support for narrative, publishing, and reproducibility, and flexibility ii to build extensions.

While this ideal data analysis tool is still a few years in the future, we describe several projects attempting to close the gap between tools for learning and tools for doing statistics. One is curricular, a high school level Introduction to Data Science curriculum. Others are technological, like the experimental LivelyR interface for interactive analysis, the MobilizeSimple R package, and Shiny microworlds.

Much of this work was inspired by Biehler (1997), which describes the attributes necessary for a software package for learning statistics. Biehler’s vision has largely come to light, but advancements in computing and ‘data science’ have moved the goalposts, and there is more to learning statistics than before. This work envisions a tool not only encompassing these advancements, but providing an extensible system capable of growing with the times.

–  –  –

5.5 LivelyR interface showing a histogram cloud of miles per gallon. 143

5.6 Regression guess functionality in LivelyR.............. 147

–  –  –

This work owes a debt of gratitude to Rolf Biehler. I have been wrestling with these issues for several years, and after coming across Biehler’s 1997 paper “Software for Learning and for Doing Statistics,” things started to click into place.

I need to acknowledge all the people who have helped and supported me throughout this process. First, none of this would have been possible without the incomparable Glenda Jones, whose salary should be at least doubled.

Of course, my committee helped me frame and work through my dissertation.

In particular, Mark Hansen, Rob Gould, and Alan Kay discussed this work with me repeatedly over the course of years.

Also instrumental was my family, Kathy Ahlers, Curt McNamara, Alena McNamara, and Mitch Halverson. They taught me math, encouraged me to change paths to follow my heart, offered countless words of advice, and cooked me dinner when I really needed it.

I wouldn’t have made it through without my friends and colleagues at UCLA, particularly Terri Johnson, LeeAnn Trusela, James Molyneux, and Josh EmBree.

And, from the Macalester math department, my friend and mentor Chad Higdon-Topaz. Chad was with me through college, during my graduate school applications and decision making, and has continued to be available to me at the drop of a hat. He has given me his opinion on everything from the direction my research might go to the appropriate outfit to wear to an interview.

It takes a village to get through graduate school. I am very grateful to my village.

xii Some of the material in Chapter 5 is based on work presented or published

elsewhere:

Gould, R., Johnson, T., McNamara, A., Molyneux, J., and Moncada-Machado, S. (2015). Introduction to Data Science. Mobilize: Mobilizing for Innovative Computer Science Teaching and Learning Lunzer, A. and McNamara, A. (2014). It ain’t necessarily so: Checking charts for robustness. In IEEE Vis 2014 Lunzer, A., McNamara, A., and Krahn, R. (2014). LivelyR: Making R charts livelier. In useR! Conference McNamara, A. (2013a). Mobilize wiki: RStudio. http://wiki.mobilizingcs.

org/rstudio McNamara, A. and Hansen, M. (2014). Teaching data science to teenagers. In ICOTS-9 McNamara, A. and Molyneux, J. (2014). Teaching R to high school students. In useR! Conference This material is also based upon work supported by the National Science Foundation Grant No. DRL-0962919.





The LivelyR work was done jointly with Aran Lunzer, who deserves most of the credit. Both Lunzer and I are associated with the Communications Design Group.

The bibliography is formatted in APA-like format, and programming languages and R packages are denoted using the styles prescribed by the Journal of Statistical Software.

–  –  –

2010 B.A. Mathematics and English, Macalester College 2013-2015 Teaching Assistant/Teaching Fellow, Statistics Department, University of California, Los Angeles 2011-2014 Graduate Student Researcher, Mobilize Project 2013-2015 Graduate Intern, Communications Design Group

–  –  –

We live in a world where the term “Big Data” is splashed across every media outlet. From the Large Hadron Collider to Twitter and Google Maps, we and our built world are creating data constantly. This data deluge has expanded society’s view of data. No longer are we limited to spreadsheets filled with numbers, now almost anything can become data in statistical applications. Photos, maps, the text of books – the field is limitless. While there is no accepted definition of Big Data, it is often spoken of in terms of volume, velocity, and variety (Dumbill, 2012). That is, big data isn’t just big in terms of storage space.

It can also be big because it is streaming in at high velocity (as Twitter data does), or because of the extreme variety of data types (as in the use of photos and text as data). All of these features require new computational methods to address, and they can lead to new ethical issues as well (Crawford et al., 2013).

Because of the volume and variety of data now available, it is even easier to mash up or de-anonymize data. The vast majority of personal data can be uniquely identified using just zip code, gender, and date of birth, which has prompted work on new methods to keep data truly private (Sweeney, 2002).

Having the power to merge data across many sources can expose facts about the world, create targeted advertisements, or provide arguments for new legislation.

‘Power’ is a crucial word here. For those with access to the data and tools to analyze it (corporations, newspapers, policy makers, scientists), data are power.

Data and data products, such as statistics, models, and visualization, are used to convey information, to argue, and to tell stories. Hal Varian thinks that being a statistician is a sexy job because Google has built their company on data – optimizing search results, doing machine translation, developing selfdriving cars. Gone are the days where statisticians were only employed as academics or insurance agents. Today, every major corporation keeps data scientists on staff.

Outside of the corporate world, every scientific domain has recognized the value of data, and the humanities are not far behind. Whether it is a computational biologist sequencing genes or an English professor doing ‘far reading’ analysis on a corpus of text, practitioners have begun incorporating data everywhere. For the informed consumer of data, statistics and statistical graphics are as susceptible to biases and inaccuracies as journalistic accounts. But, without at least some statistical knowledge, citizens are highly unlikely to critique what they read and thus accept it as fact. So the ability to unpack data analysis and statistics is crucial to the informed citizen today.

This explosion of data and data analysis was wrought by computers. The volume, velocity, and variety of data are all made possible by computerized sensors, massive server space, the internet, and digital technologies of all kinds.

Statistical methods to deal with these data have lagged behind, but data scientists are learning to compute on the data explosion and are gleaning insights that allow them to predict behavior, recommend movies, tailor advertisements, and uncover corruption in government.

Yet the average citizen does not have the opportunity to experience this type of data analysis. The current conception of statistics still echoes that of the 1970s – performing computations like mean and standard deviation on small, lifeless data sets presented in a textbook. However, with access to statistical software packages, computing summary statistics can become a small part of a much deeper analysis.

The word ‘access’ is crucial. Given the current state of computing, the practical reality is most citizens do not have access to statistical computing tools.

They are either prohibitively expensive or prohibitively difficult to learn, and those tools that might be considered accessible to a large audience have not been kept up to date with the methods necessary for modern data analysis.

These and other issues will be discussed in more detail in this dissertation, which identifies and begins to develop the most important aspects of software to promote statistical analysis and ‘data science’ as broadly as possible.

While the explosion of data and its accompanying challenges may sound like young problems, they are not. Much of the inspiration for this project has come from decades ago. John Tukey argued for more flexible tools for statistical programming in 1965, Jacques Bertin was developing matrix manipulation techniques by hand in 1967, and Seymour Papert was creating computer tools to better support users and learners in 1967. In other words, most of these ideas are almost 50 years old, and we are still struggling to see them implemented.

1.1 State of statistics knowledge

While statisticians have a vested interest in improving the statistical understanding of our society (Wild et al., 2011) and it is clear data skills are becoming more necessary and relevant, many people still find statistics boring, hard, or downright impossible (Gal et al., 1997).

Even teachers of statistics often have major misconceptions about statistical concepts or are unable to explain the reasoning behind them (Thompson et al., 2007; Hammerman and Rubin, 2004; Kennedy, 2001). So it is no surprise students often struggle to understand proportional groups (Rubin and Hammerman, 2006), use aggregates (Hancock et al., 1992), or find information from graphs (Vellom and Pape, 2000).

While some argue that attempting to incorporate computational methods into introductory statistics courses will only make this worse, research suggests the opposite. For example, teachers in professional development (even with a low baseline of knowledge) have more success when the training incorporates innovative practices and technology (Hall, 2008). Research shows similar findings among students, as active learning and exploratory analysis helps users integrate their knowledge (Garfield and Ben-Zvi, 2007). This suggests courses integrating computer technology into data analysis topics – even outside the statistics classroom – could be more successful in both engaging students and in teaching the material.

However, there is an important distinction to be made between statistical computation and computational statistics (Horton et al., 2014b). Statistical computation is using a computer to remove the need for pencil-and-paper calculations, while computational statistics takes the computer into the practice.

This work discusses the development of computational statistics.

1.2 Relevancy at a variety of experience levels

Much of the research cited here refers specifically to high school education, but the problems discussed are more general. Introducing novices to new computer tools is a general problem in many disciplines (Deek and McHugh, 1998; Muller and Kidd, 2014), and when the tool requires broader contextual knowledge, the challenge can be even more complex. While secondary school students conform to standard ideas about what a novice looks like, there are novices at many ages and education levels. For example, masters students in data journalism are typically novices when it comes to statistical analysis and programming. Given the current state of statistics education at the secondary school level, most adults who have not specialized in statistics are also novices. Ideally, the statistical programming tools of the future will be useful to a broad variety of novices, regardless of age.

However, this work does not try to consider students below secondary school.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 20 |


Similar works:

«TRANSCRIPTIONAL PROFILING OF PANCREATIC PROGENITOR CELLS By Leah Ashley Potter Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Molecular Physiology and Biophysics December, 2011 Nashville, Tennessee Approved: Professor Maureen Gannon Professor Alyssa Hasty Professor Patricia Labosky Professor David Piston Professor Christopher V.E. Wright MOLECULAR PHYSIOLOGY AND BIOPHYSICS...»

«Safety-driven Early Concept Analysis and Development by Cody Harrison Fleming B.S. Engineering Hope College, 2003 M.Eng. Civil & Environmental Engineering Massachusetts Institute of Technology, 2004 Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology February 2015 ©2015 Massachusetts Institute of Technology. All rights reserved.Signature of Author: Department...»

«Producing the Agora: Appropriation of Health Information Systems in Developing Countries Rangarirai Matavire A thesis submitted in partial fulfilment of the requirements of the degree of Doctor of Philosophy (PhD) Department of Informatics Faculty of Mathematics and Natural Sciences University of Oslo February 2016 © Rangarirai Matavire, 2016 Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo No. 1717 ISSN 1501-7710 All rights reserved. No...»

«THE DIGITAL AFFECT: A RHETORICAL HERMENEUTIC FOR READING, WRITING, AND UNDERSTANDING NARRATIVE IN CONTEMPORARY LITERATURE AND NEW MEDIA by Richard Elliott Parent II Bachelor of Arts, University of North Texas, 1994 Master of Arts, Mills College, 2000 Submitted to the Graduate Faculty of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in English: Cultural and Critical Studies University of Pittsburgh UNIVERSITY OF PITTSBURGH ARTS AND SCIENCES...»

«i CUSTOMER SATISFACTION AND LOYALTY IN THE APPLICATION OF THE ALL-INCLUSIVE HOLIDAY CONCEPT AT THE KENYAN COAST BY MARY. M. MUTISYA (B.ED; M.SC) REG NO. H87/10857/2007 A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF THE DEGREE OF DOCTOR OF PHILOSOPHY (HOSPITALITY MANAGEMENT) IN THE SCHOOL OF HOSPITALITY AND TOURISM MANAGEMENT OF KENYATTA UNIVERSITY NOVEMBER 2011 ii DECLARATION This Thesis is my original work and has not been presented for a degree in any other university...»

«MELODRAMATIC SCENARIOS AND MODES OF MARGINALITY: THE POETICS OF ANTON CHEKHOV’S EARLY DRAMA AND OF FIN-DE-SIÈCLE RUSSIAN POPULAR DRAMA by Mila B. Shevchenko A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Slavic Languages and Literatures) in The University of Michigan Doctoral Committee: Associate Professor Michael Makin, Chair Professor Bogdana Carpenter Associate Professor Alina M. Clej Associate Professor Herbert J. Eagle Mila B....»

«Philosophy of Language: Preliminaries OVERVIEW Philosophy of language is concerned with a cluster of problems involving the relationships among these three elements: 1. Language 2. Language users (speakers, hearers) 3. World Language Theory of Theory of Meaning Truth Users World Theory of Knowledge Complication: the elements overlap: both language and speakers of language are things in the world. Users can use language to say things about themselves, and about language. PHILOSOPHY OF LANGUAGE...»

«MISSION-WITH IN INNER-SOUTH MANCHESTER AN AUTO-ETHNOGRAPHIC EXPLORATION OF PRESENCEAMONG AND PROJECT-PRAXIS WITH LOCAL COMMUNITY AS A MODEL OF URBAN MISSION, WITH PARTICULAR REFERENCE TO THE COMMUNITY GROUP CARISMA. ! by PAUL BRIAN KEEBLE A thesis submitted to The University of Birmingham for the degree of MASTER OF PHILOSOPHY Urban Theology Unit, Sheffield Department of Theology and Religion College of Arts and Law The University of Birmingham June 2013 University of Birmingham Research...»

«Wayward Christians, Worldly Scriptures: Disarticulating Christianities in the Black Atlantic Public Sphere by Joseph Lennis Tucker Edmonds Program in Religion Duke University Date: Approved: Mary McClintock Fulkerson, Supervisor J. Kameron Carter Willie Jennings Ken Surin Walter Mignolo Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Program in Religion in the Graduate School of Duke University ABSTRACT Wayward Christians, Worldly...»

«LOVE AND LONERGAN’S COGNITIONAL-INTENTIONAL ANTHROPOLOGY: AN INQUIRY ON THE QUESTION OF A “FIFTH LEVEL OF CONSCIOUSNESS” by Jeremy W. Blackwood, B.A., M.A. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Milwaukee, Wisconsin May 2012 ABSTRACT LOVE AND LONERGAN’S COGNITIONAL-INTENTIONAL ANTHROPOLOGY: AN INQUIRY ON THE QUESTION OF A “FIFTH LEVEL OF CONSCIOUSNESS”...»

«THE ROLE OF IL-21 AND IL-17 IN REGULATING FOLLICULAR T HELPER CELLS IN GERMINAL CENTER RESPONSE OF AUTOIMMUNITY by YANNA DING DR. ZDENEK HEL, COMMITTEE CHAIR DR. PETER D. BURROWS DR. JOHN D. MOUNTZ, MENTOR DR. MOON NAHM DR. CHANDER RAMAN A DISSERTATION Submitted to the graduate faculty of The University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of Doctor of Philosophy BIRMINGHAM, ALABAMA 2013 THE ROLE OF IL-21 AND IL-17 IN REGULATING FOLLICULAR T HELPER...»

«On the Dynamic Multiple Intelligence Informed Personalization of the Learning Environment A thesis submitted to the University of Dublin, Trinity College for the degree of Doctor of Philosophy Declan Kelly Department of Computer Science University of Dublin, Trinity College, December, 2005 i Declaration The work presented in this thesis is, except where otherwise stated, entirely that of the author and has not been submitted as an exercise for a degree at this or any other university. Signed: _...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.