«Creativity Support for Computational Literature By Daniel C. Howe A dissertation submitted in partial fulfillment of the requirements for the degree ...»
Our choice of Parsons problems was motivated primarily by two factors: available time and student diversity. Due to the limited time available for this component of our evaluation, separate code reading and writing problems, the latter being a potentially timeconsuming activity, would have reduced the number of additional measures we could test.
Rather than asking students to write code from scratch in one problem then to trace provided code in another, Parsons problems embed both skills into a single problem in which no code need be written from scratch. Further, the task presented is immediately comprehensible to students, even those who have no idea as to an answer. As regards the diversity of students taking the quiz, allowing students with little or no programming experience to simply guess at answers to the Parsons problems reduced, in our opinion, the chance that they would become intimidated by the material in the pre-test, possibly even to the extent that they might drop the class.
While both code reading and writing are embedded in a single Parsons problem, they can be coded and evaluated separately100. Further, as discussed in Denny , this type of question simplifies distinguishing syntax errors from logic errors. For example, if a student We should note that, in contrast to code-writing tests, for which scores on Parsons problems show a high degree of correlation, scores on code-tracing problems seem to vary substantially from those on Parsons problems, suggesting that this somewhat novel approach needs further refinement in order to more adequately capture both skills [Denny 2008].
were to select the incorrect line from the second, fourth, or sixth pairs, this would prevent the code from compiling and hence be classified as a syntax error. Selecting the incorrect line from the first, third, or fifth pairs would generate incorrect programs, and would therefore classify as logic errors. While various strategies have been employed for evaluating such questions, in this study we chose to use a negative marking scheme, following Denny , by defining the types of errors for which marks would be deducted.
Each Parsons problem consisted of six pairs of statements and was evaluated for both syntax and ordering on a scale of 0-9. For syntax, as in Denny , we deducted one mark for each incorrectly chosen line from the pairs. For ordering, however, we used a minimumedit, or Levenshtein distance101 measurement to evaluate student variation from the correct result. This has several advantageous properties including: a) it can be reliably (and automatically) calculated by treating student answers as simple strings, so that inter-coder reliability is not an issue; and b) misplacing a single line of code does not destroy a student's score as could happen when using a simple binary (correct/incorrect) rubric for ordering.
Consider the two example answers in the following table:
Levenshtein distance is a measure of the similarity between two strings, generally referred to as the source string and target strings. The Levenshtein, (or ‘min-edit’) distance, is the minimal number of ‘edits’ needed to transform the source string into the target string, where an ‘edit’ consists of the deletion, insertion, or substitution of one character. [Levenshtein 1966; Marzal and Vidal 1993] Table 7: Comparison of ordering metrics (incorrect lines are bolded).
In both of these cases, the ordering is correct with the exception of two lines that have been swapped. According to a naive binary metric, Student A would be marked as having four lines in the correct locations and two misplaced, while Student B would be marked as having all six lines incorrectly placed. According to the Min-Edit metric however, both students would be assessed an edit-distance of two points. Since both represent optimal non-correct answers as regards ordering (it is impossible in this format to have only one line out of place), these were assessed a one-point penalty by halving the Min-Edit-Distance. While Denny deducts a maximum of two out of nine total points for ordering, our metric, in the worst case, deducts three points (with an MED of six). Thus, by comparison, the ratio of syntax to ordering in our evaluation was 2-1, while in Denny , still more weight was given to syntax (~78%) in relation to ordering (~22%).
5.6.2 Results The multimedia terrain, with its strata of meanings, its combination of media, its compilation of data, and its branching, tangential connections would seem the ideal tool for this ‘postmodern’ age. But its chameleon character – a tool for writing, reading, talking and listening, a tool for drawing and looking, a tool for animating and viewing and a tool for gaming, interacting and consuming – makes it less easy to gauge in evaluative terms. [Sinker 2000] The Parsons question, measuring students’ code reading and writing skills, showed a significant improvement between students’ pre- and post-test scores: t(7) = 7.5863, p.00001, d= 2.1702. The mean improvement was 3.3571 with a standard deviation of 1.5468, strongly suggesting that students’ programming skills had increased through exposure to RiTa and PDAL.
Table 8. Breakdown of the Parsons problem results (syntax).
Table 8 shows the breakdown of results obtained for the Parsons problem. The proportion of students that selected the correct statement from each pair is listed. In addition, we looked at the correctness of the ordering of the statements according to the MED algorithm described above. The results for the ordering of the statements in the Parsons problem are shown in Table 9.
Table 9. Breakdown of the Parsons problem results (ordering).
Since Parsons problems consist of lines of code arranged in pairs, and students are asked to select the correct line of each pair to use in the solution, it was our hope to more easily measure the kinds of errors that students make. Since the correct option is always visible to students, when they choose the incorrect option we know that it is not a typo, but rather that a deliberate choice of the wrong option, potentially providing insight into the kinds of mistakes and misconceptions students have and allowing us to identify elements of the course that students are struggling with. The use of Parsons problems thus allows us to test knowledge needed for code writing and reading in a manner in which we can isolate specific misconceptions.
For example, Table 8 shows that only 29% of students chose the correct line for accumulating values on the pre-test. Thus, when presented with both alternatives, 81% chose the simple assignment statement rather than the correct statement. In a different context this might lead one to consider spending more time discussing the different roles that a variable can play in a program, and explicitly distinguishing between assignment operation and accumulation of a value in a variable. Although neither survey was scored until after the end of the class, we see that in the post-survey 100% of students correctly selected the right accumulation statement.
Table 10. Breakdown of the Parsons problem results (total score).
Table 10 presents students’ total scores on the programming quiz in both pre- and post-tests.
Clearly, according to this metric, significant gains were made in code reading and writing skills over the course of the semester.
5.7 Observation Set 3: Creativity Support One important implication of this inescapable trade-off between control and generalizability is that laboratory definitions of “creativity” are often so tightly constrained that they do not capture more than a piece of a person, product or process. [Hewett et al. 2005] As one of the project’s hypotheses was that the RiTa tools provided a significant degree of creativity support for students and artists working in digital media, we attempted to further address this (beyond the survey data) by evaluating several aspects of students’ final projects. Final project topics were proposed by students and developed over the last month of the course and while each proposal required the instructor’s approval, the only requirements were that the project utilized computational methods and be of appropriate scope. Students were not required to use any of the RiTa modules, the Eclipse Plugin, the Processing environment, or even Java itself, nor to focus specifically on language-based art. For those interested further, a number of these projects have been included in the RiTa gallery102.
5.7.1. Evaluating Creativity Basically, creativity can be considered to be the development of a novel product that has some value to the individual and to a social group.
However, it seems that the research conducted by psychologists on creativity does not allow us to clarify or simplify this definition any further. Different authors may provide a slightly different emphasis in their definition but most (if not all) include such notions as novelty and value. [Hewett et al. 2005] To evaluate the degree of creativity support provided by tools like RiTa, it would be ideal to first decide unequivocally on a definition of creativity to employ. At the same time, such a definition has been highly contested in the literature [Turner, 2007] and is beyond the scope of this research. In the various definitions proposed in recent research however, there appear to be at least two components common to a majority [Sternberg 1999], specifically novelty, and value103. With this fact in mind we have chosen adopt the rather generic definition used in the 2005 Creativity Support Tools conference [Hewett et al, 2005], which is in turn based on Sternberg  and focuses on the notion of creative outputs. Creative See http://www.rednoise.org/rita/rita_gallery.htm.
For example, Gardner (1989) emphasizes that creativity is a human capacity but includes novelty and social value in his definition. Thus, our decision to adopt Sternberg’s  definition is not arbitrary as it represents somewhat of a consensus in the field. As Hewett et al. note , the authors in Sternberg’s collection provide a high level view of the state-ofthe-art… “the work in this Handbook is highly consistent with the work of several other authors who have also surveyed major aspects of the research findings, e.g., Csikszentmihalyi  and Gardner .” outputs can be conceptualized as artifacts generated in a specific context that score highly on both of these metrics, not only differing significantly from those artifacts already in existence (novelty), but doing so in a way that demonstrates value to the individual and/or social group (value) [Sternberg 1999]. To evaluate the creative outputs of our participant group we analyzed and coded students’ final projects on a number of dimensions, believing this to be the most representative of the semester’s outputs.
18.104.22.168 Evaluating “Value” The National Advisory Committee on Creative and Cultural Education draws upon a range of conceptualisations of creativity and presents a definition which is a useful framework for educators - ‘imaginative activity fashioned so as to produce outcomes that are both original and of value’. [Loveless 2002] As discussed above, the tools and techniques employed in the context of the PDAL course appear to have facilitated significant increases in students’ programming efficacy and ability. Students’ self-assessed ability to creatively express themselves through programming,, as noted above, showed a significant improvement over the course of the semester t(12) = 2.560, p.001, d =.739. This, in combination with the fact that a broad range of students with highly variable prior experience104 were able to complete works of significant depth and breadth in digital media, suggests that some non-trivial degree of creative utility (or value) was achieved. But what kinds of creative outputs were these?
As noted in our previous discussion of support software (see Chapter 3: Pedagogy), various tools will support diversity of output to varying degrees (often in inverse proportion See demographic data on majors and prior computing experience above.
to the specificity of the context for which they were designed,) and this is a key property to consider when assessing their efficacy. A common critique of ‘user-level’ tools like Photoshop or PowerPoint is that they tend to generate outputs that converge toward a distinct ‘style’ or ‘signature’. General-purpose languages like C++ or Python, on the other hand, tend to support a wide variety of outputs, but exhibit steeper learning curves and often require significant scaffolding, especially for those users with diverse, or non-typical, backgrounds [Kelleher 2007].
In contrast to these approaches, we have positioned RiTa in a productive middleground position, attempting to satisfying all three of Resnick’s primary design criteria for creativity support software: “low steps, wide walls, and high ceilings”. In his discussion of these design principles, he uses low steps to refer to the incline of the learning curve, which should be as shallow as possible, while wide walls refers to the degree to which different learning paths can be followed and diverse outcomes achieved; and high-ceilings refers to the degree to which the tools grow with users through various levels of mastery – tools with low ceilings are easily mastered and thus do not continue to challenge and inspire learning [Resnick 2005].
The RiTa tools were designed with these principles in mind, targeting the joint goal of providing a) adequate scaffolding for new users, b) adequate flexibility and expressive power for advancing users, and c) support for a diverse range of creative outputs reflecting the diversity of users themselves. As all of the participants in the study were able to complete multiple computational literary projects, at least one of which was a mid to large-scale artwork, it would appear this goal was, at least partially, achieved. While it is beyond our scope to argue for the societal utility of such artwork in the abstract, we can however note the perceived utility to students in the class, who felt their work (and that of their peers,) to represent an important means of expression. The following comments, from four different
PDAL students, demonstrate the point: