FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 20 |

«University of California Los Angeles Bridging the Gap Between Tools for Learning and for Doing Statistics A dissertation submitted in partial ...»

-- [ Page 14 ] --

On the Mobilize project, we have done our best to provide satisfactory user rewards. In fact, being able to analyze their own data is one of the main payoffs students receive for participating in the collection exercise. The dashboards produced by the technical team (discussed in Section help with this. But again, the tools we are using are falling short, as they encourage students to become users of a dashboard tool and not true producers of statistics. It is also clear other participatory projects could benefit from an easy-to-use data analysis platform as well. Analyzing participatory sensing data While participatory sensing data has many benefits, in particular giving power back to data creators, it is always messy. It rarely represents a random sample, either in terms of people collecting data or their collection methods.

For example, classes involved in the Mobilize project have collected data on snacking habits every year since the grant began in 2010. If students are completely successful at collecting every snack they eat, the data represent a census rather than a sample, and it is hard to say what statistics should be used to analyze it. However, even the most conscientious of data collectors usually miss a few observations. It is unknown if those observations are missing at random or missing not at random, and even less clear what to do with the data.

Up to this point, the grant has side-stepped this issue by focusing on exploratory data analysis rather than formal inference. The IDS course has used randomization to make limited conclusions from the data. However, it is often unsatisfying for teachers, students, and even non-statistical PIs to learn we cannot make formal inference from these data.

Many of the examples of participatory sensing and dealing with missing data come from ornithology. The Cornell Laboratory of Ornithology has found data mining methods can be used to help deal with missing data (Caruana et al., 2006). Decision trees, bagging, and boosting can all be used to help fill in data where it is missing not at random. For example, bird data tends to be biased toward higher population areas, where more people exist to track bird sightings (Hochachka et al., 2010). Another approach is to do data augmentation, combining two datasets – one collected using a participatory method and one more rigorously collected (perhaps by paid researchers). Using these two data sets, data augmentation suggests researchers fit a model, one to each data set, and compare the predictions (Munson et al., 2010).

In environmental contexts, where the variable of interest is assumed to be smoothly distributed, researchers have used interpolation (Mendez et al., 2013).

If this results in a model with high variability in certain areas, researchers can then try to incentivize data collectors to collect data in highly variable spatial areas (Mendez and Labrador, 2012). There are also examples predicting election outcomes using non-representative polls (Wang et al., 2014).

Finally, because many psychological researchers are using the Amazon Mechanical Turk3, there is a growing field of research for dealing with those data.

While Turkers do not represent a random sample of humanity, they do seem to represent the demographics of the internet well, although they tend to skew a bit young (Ipeirotis, 2010; Ross et al., 2010). One method for increasing the quality of data from Turkers is to do repeated labeling (i.e., have several Turkers answer the same question to see what the consensus is). But, too much reThe Amazon Mechanical Turk is an online forum to find humans to perform tasks difficult for computers. It was first developed to create reference data sets for image recognition research, as humans can easily identify a number shown in an image, for example.

peated labelling is costly. An alternative is to use the EM algorithm to convert ‘hard’ labels to ‘soft’ (probabilistic) labels (Ipeirotis et al., 2010). A simpler method is to determine some measure of trustworthiness for each data collector, and use weightings to put emphasis on the more trusted data (Welinder et al., 2010).

5.1.2 Computational and statistical thinking

“Computational thinking” is a term first described by Jeannette Wing, the head of the computer science department at Carnegie Mellon (Wing, 2006). Wing is concerned with access to computer science in the general population, and has developed the concept of computational thinking to describe the skills she believes are foundational. Computational thinking encompasses much more than the traditional view of computer science. Wing describes it as “a fundamental skill for everyone” (Wing, 2006). It includes many facets of thinking like a computer: problem solving, algorithmic thinking, recursive thinking, and abstraction. Computational thinking, therefore, is one of the crucial skills in today’s economy. Not coincidentally, true statistical literacy requires many computational thinking skills.

‘Statistical thinking’ is a similarly general term, encompassing fundamental skills of thinking related to statistical concepts. For example, statistical thinking means considering the tendency of phenomena in a distribution or over time, as well as the variation within data or with repeated sampling. While these topics are included in introductory statistics curriculum, they are often presented as skills to be applied within class, not generally in life. Students presented with data visualizations often do not connect them to the context of the data from which they came (Meirelles, 2011; Wickham, 2010).

Statistical thinking also includes what Darrell Huff called “talking back to

a statistic,” or checking for bias in reporting on statistical issues, and asking if it makes sense when presented with a statistical argument (Huff, 1954). On a broader level, it can be argued that people should develop a “data habit of mind” and look for evidence to ground things in their life, even outside of a statistics class (Finzer, 2013; Pfannkuch et al., 2014).

Computational and statistical thinking tie together because computers are so necessary for statistics today. There is no ‘data science’ without computation, and statistics provides an excellent place to introduce computing because it is inherently contextualized. Traditional methods of motivating programming in the classroom (e.g., building a game) are often not as appealing to women (Kelleher and Pausch, 2005; Cooper and Cunningham, 2010). However, by couching statistics in data in which students are interested (particularly data in which they can see themselves, like participatory sensing data) makes the desire for inference natural (Wild et al., 2011). Speaking about the difference between mathematics and statistics, Cobb and Moore note, “in mathematics, context obscures structure. [...] In data analysis, context provides meaning” (Cobb and Moore, 1997).

5.1.3 Curriculum

As previously note, Mobilize has created curricular units across content areas.

They are all grounded in participatory sensing, computational thinking, and statistical thinking. While the grant has units to insert into computer science, algebra, and biology courses, as well as a stand-alone, year-long Introduction to Data Science curriculum, my primary contributions were on the ECS and IDS material. The ECS and IDS curricula were also the two most computationallybased courses. Exploring Computer Science unit

The first Mobilize curriculum to be developed was a six-week-long unit on data analysis, written to fit within a year-long curriculum called Exploring Computer Science (ECS)4. ECS initially piloted in the LAUSD, but has now grown to include schools in Chicago, Oregon, Utah, Washington, D.C. and New York.

Thousands of high school students are exposed to ECS annually. ECS includes 6-week-long units on human computer interaction, problem solving, HTML and web design, Scratch programming (animation and game design), LEGO Mindstorms robotics, and data analysis.

Members of the Mobilize team, including myself, helped developed the data analysis unit for ECS. In the unit, students engage in exploratory data analysis, creatively interacting with their own data.

The initial version of the curriculum involved students collecting data using one of two ‘canonical’ surveys provided: one for collecting data about advertising in the community, and one for collecting data about personal snacking habits. In the example of the advertising survey, a student would take a photo of an ad (e.g., a billboard), and then answer questions about the demographic they believe the ad is targeting, what product it is selling, how much they want the product, etc.

In keeping with participatory sensing, the survey is implemented on a smartphone, and the data are automatically uploaded to a server, along with information gathered by the phone. The unit incorporated student-collected data alongside previously-collected data from sources like the CDC to expose students to a variety of data analysis topics.

The curriculum and its implementation struggled with many issues. A major stumbling block for the teachers in professional development was learning the Mobilize is essentially a sibling grant to ECS, with many overlapping PIs.

data analysis tool. However, as we moved through the various technical tools discussed in Section 5.1.4, we realized that not only was our professional development too short to get the teachers up to speed, but the six-week unit really was not enough time for students to truly engage with R. Additionally, as ECS grew nationally, it became harder to support a particular data analysis tool.

The curriculum was later re-written to be tool agnostic, and the need for smartphones for data collection was minimized. In the most current version, students decide what topic they want to study, collect data using pencil and paper, then analyze it using a tool of their teacher’s choice.

This experience underscored the need for a longer data science curriculum, which formed the inspiration for the Introduction to Data Science curriculum. Math and science units The Mobilize math and science curricula comprise shorter units, and were designed to be inserted into existing courses: Algebra I and Biology, respectively.

In math, students collect participatory sensing data on their snacking habits, and learn to connect linear modeling and predictions to the equation of the line, y = mx + b. In science, the participatory sensing campaign is related to trash– whether recyclables are being put into the incorrect container or not. This gets related to environmental concerns biology teachers are comfortable discussing.

The math and science units were loosely based on the ECS curriculum I helped develop. For more on the curricula, see (Board et al., 2015; Perez et al., 2015).

Because these curricula were on much shorter time scales, and because the teachers we trained were even less comfortable with statistics, neither the Algebra I nor the Biology unit include true computational statistics. Instead, students use bespoke dashboards to analyze their data. By using the dashboards, they are able to ask and answer questions with data, but the questions are somewhat limited by the abilities of the tool. Introduction to Data Science course The most exciting curricular development in Mobilize is the year-long course, Introduction to Data Science (IDS). This course piloted in 10 schools in the 2014-2015 school year, and will be expanded to 25 teachers in 34 schools in 2015-2016. Unlike the previous curricula mentioned, the IDS course is a full, free standing course for high school students.

Historically, it has been difficult to schedule students into courses in statistics and computer science because these courses do not typically fulfill graduation requirements. The state of California adds an additional hitch, as all high school courses must be approved by the University of California Office of the President (UCOP) to be used for admission to college within the UC system, the Cal States, or the California community colleges. One boon to the IDS course is we were able to get it approved by the UCOP, so as of fall 2014, taking the IDS course counts for “C” credit.5 As a result of the UCOP approval, high school counselors are interested in scheduling students into the course, and students are interested in taking it.

I was involved in the overall planning for the curriculum, the UCOP application, and the detailed development of the first two units of the curriculum. However, I was not involved in the detailed development of the second two units. The rest of the IDS team includes Suyen Moncada-Machado, Robert Gould, Terri Johnson, and James Molyneux.

The topics covered by the class include visualizing data in one and multiCalifornia has “A-G” requirements, which require high school students to take two years history, four years of English, three years of math, two years of science, two years of a foreign language, one year of visual or performing arts, and one year of an elective. “C” credit indicates the IDS course satisfies one of the three years of college preparatory math.

ple dimensions, statistical questioning, randomization methods, linear modeling, classification and regression trees, and k-means clustering. For the full curriculum, see (Gould et al., 2015).

The entire curriculum is based on R within RStudio, with regular labs for students to work through the techniques they are learning in class. There are also numerous ‘hands-on’ activities allowing students to enact data analysis physically, whether creating a human box plot as captured in (Menezes, 2015) or doing randomization activities with notecards. Labs

All of the IDS coursework is grounded in computational labs (Figure 5.1), which take place in RStudio (Molyneux et al., 2014). The labs take advantage of the features of RStudio, and provide an integrated method for viewing lab prompts and accomplishing the associated tasks6.

Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 20 |

Similar works:

«E r r o r s in V a r i a b l e s R e g r e s s i o n : W h a t is t h e a p p r o p r i a t e m o d e l ? Thesis subm itted to CARDIFF UNIVERSITY for the degree of DOCTOR OF PHILOSOPHY by JONATHAN WILLIAM GILLARD November 2007, School of M athematics, Cardiff University UMI Number: U585018 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and...»

«Incremental Development of Primate Dental Enamel A Dissertation Presented by Tanya Michelle Smith to The Graduate School in Partial fulfillment of the Requirements for the Degree of Doctor of Philosophy in Anthropology Stony Brook University December 2004 Copyright by Tanya Michelle Smith Stony Brook University The Graduate School Tanya M. Smith We, the dissertation committee for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation. _...»

«PATRICK GRIM SUNY Distinguished Teaching Professor Department of Philosophy State University of New York at Stony Brook Stony Brook, New York 11794 cell (631) 790-2356 fax (631) 632-7522 patrick.grim@stonybrook.edu www.pgrim.org Specializations Philosophical Logic, Philosophical Computer Modeling (Agent-Based Modeling, Networks, Artificial Societies, and Evolutionary Game Theory), Ethics, Philosophy of Religion, Philosophy of Science Positions Stony Brook: Distinguished Teaching Professor, 2001...»

«People Manipulate Objects (but Cultivate Fields): Beyond the Raster-Vector Debate in GIS Helen Couclelis Department of Geography, University of California Santa Barbara, CA 93106, USA A b s t r a c t. The ongoing debate in GIS regarding the relative merits of vector versus raster representations of spatial information is usually couched in technical terms. Yet the technical question of the most appropriate data structure begs the philosophical question of the most appropriate conceptualization...»

«NASSIM NICHOLAS TALEB: TO PREVAIL IN AN UNCERTAIN WORLD, GET CONVEX Laurence B. Siegel July 2013 Investment professionals know the value of a convex bond — it gains more from falling rates than it loses from rising ones. According to Nassim Nicholas Taleb, people and institutions can and should position themselves to be convex. Indeed, they should be antifragile — ready to gain from disorder or uncertainty. That is the theme of the provocative, sometimes playful, and often infuriating book,...»

«Ghent University Faculty of Arts and Philosophy FACT OR FICTION? Possible implications of choosing one subject over the other Paper submitted in partial fulfilment of the requirements for the degree of “Master in de Taal-en Letterkunde: Engels-Spaans” by Supervisor: Dr. Kate Macdonald May 2008 Katrien Persoons Preface This thesis came about in a rather unorthodox way. About five years ago, some time before the well-known films were released, I took a book from a shelf in my local library...»

«A STUDY OF SHARED LEADERSHIP AMONG DEPARTMENT CHAIRS IN A STATE COLLEGE A Dissertation Presented to The Faculty of Tennessee Temple University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy By Richard Joseph Costanza March 2011 A STUDY OF SHARED LEADERSHIP ii A STUDY OF SHARED LEADERSHIP AMONG DEPARTMENT CHAIRS IN A STATE COLLEGE by Richard Joseph Costanza APPROVED: COMMITTEE CHAIR Andrew T. Alexson, Ed.D. COMMITTEE MEMBERS Lori Robertson, Ed.D. Tom Bell, Ph.D.,...»

«Tuning Magnetic Order in Transition Metal Oxide Thin Films by Alexander John Grutter A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering Materials Science and Engineering in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Yuri Suzuki, Co-Chair Professor Oscar Dubon, Co-Chair Professor Frances Hellman Professor David Attwood Fall 2013 Tuning Magnetic Order in Transition Metal...»

«MICRO ELECTRET POWER GENERATORS Thesis by Justin Boland In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CALIFORNIA INSTITUTE OF TECHNOLOGY Pasadena, California (Defended May 24, 2005) ii © 2005 Justin Boland All Rights Reserved iii ACKNOWLEDGEMENTS Yu-Chong Tai, Trevor Roper, Tanya Owen, Wen Hsieh, Ellis Meng, Tom Tsao, Mattieu Liger, Qing He, Chi-Yuan (Victor) Shih, Scott Miserendino, Po-Jui (PJ) Chen, Nick Lo, Jayson Messenger, Svanhild (Swan) Simonson,...»

«Donald W. Livingston’s Philosophical Melancholy and Delirium Peter S. Fosl Hume Studies Volume XXIV, Number 2 (November, 1998) 355-366. Your use of the HUME STUDIES archive indicates your acceptance of HUME STUDIES’ Terms and Conditions of Use, available at http://www.humesociety.org/hs/about/terms.html. HUME STUDIES’ Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of...»

«Integrative analysis frameworks for improved peptide and protein identifications from tandem mass spectrometry data by Avinash Kumar Shanmugam A dissertation submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in the University of Michigan Doctoral Committee: Associate Professor Alexey I. Nesvizhskii, Chair Professor Philip C. Andrews Assistant Professor Yuanfang Guan Assistant Professor Hui Jiang Associate Professor Jun Li ©Avinash Kumar...»

«INTERCONNECTS FOR FUTURE TECHNOLOGY GENERATIONS—CONVENTIONAL CMOS WITH COPPER/LOW–κ AND BEYOND A Dissertation Presented to The Academic Faculty By Ahmet Ceyhan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Electrical and Computer Engineering School of Electrical and Computer Engineering Georgia Institute of Technology December 2014 Copyright © 2014 by Ahmet Ceyhan INTERCONNECTS FOR FUTURE TECHNOLOGY GENERATIONS—CONVENTIONAL CMOS WITH COPPER/LOW–κ...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.