«Creativity Support for Computational Literature By Daniel C. Howe A dissertation submitted in partial fulfillment of the requirements for the degree ...»
4.2.1 For Natural Language Processing As one might expect, there is significant overlap in the functionality required for computational literature and those designed for Natural Language Processing (NLP) and/or Natural Language Generation (NLG). Over the years, a number of general-purpose NLP toolkits have been created for research purposes including the CMU-Cambridge Statistical Language Modeling Toolkit [Clarkson and Rosenfeld 1997], the EMU Speech Database System [Harrington and Cassidy 1999], the General Architecture for Text Engineering (GATE) [Bontcheva et al., 2002], the Maxent Package for Maximum Entropy Models (Baldridge et al., 2002b), the Annotation Graph Toolkit (AGT) [Maeda et al. 2001], and MontyLingua [Liu 2004]. Although some of these resources have educational applications and have been used in teaching, their development has not been motivated primarily by pedagogical needs or requirements. On the other hand, there have been several toolkits, designed with less experienced users and/or students in mind, that directly address educational concerns. While not directed at either an art or literary context, or (with the exception of SimpleNLG) even generation specifically, these tools have at least indirectly informed the development of RiTa, and as such warrant at least brief coverage here.
Specifically we will look at two below: the Natural Language Tool Kit (or NLTK) by Loper and Bird, and the SimpleNLG package by Ehud Reiter.
126.96.36.199 NLTK The Natural Language ToolKit (or NTLK) was designed by Edward Loper and Steven Bird as an end-to-end solution for new students in the field of Natural Language Processing. The first paper on the NTLK was published in 2002 and provides the following
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and statistical natural language processing, and is interfaced to annotated corpora. Students augment and replace existing components, learn structured programming by example, and manipulate sophisticated models from the outset. [Bird and Loper, 2002] In its first iteration, the NLTK, implemented as a command line tool written in Python, provided modules for the following tasks: Parsing, Chunking, Tagging, Finite State Automata, Type Checking, Visualization, and Text Classiﬁcation. While the functionality of the NLTK overlaps only slightly with that of RiTa, its emphasis on pedagogical concerns, specifically the difficulties inherent in teaching language processing to new students in a classroom context is directly relevant. See Figure 17 below for a detailed comparison of the various components and functionalities.
In fact, several of the design considerations listed in chapter two were first discovered and/or confirmed in the NLTK research. Although none of these can be claimed to have originated with the authors of the NLTK (or RiTa), the two do share the following explicit design requirements: Ease of Use, Consistency, Extensibility, Documentation, Simplicity, Modularity, Comprehensiveness, and Efficiency. Additionally, the NLTK authors’ concern with in-class demonstration, open-source deployment, exhaustive documentation, and online tutorials all provided inspiration for the development of RiTa.
188.8.131.52 SimpleNLG The SimpleNLG package, written by Ehud Reiter, was developed contemporaneously with RiTa and focused on providing a simple interface for a range of Natural Language Generation (NLG) tasks. The website73 provides the following information: “SimpleNLG is a simple Java class library which does basic NLG lexicalisation and realisation; it is primarily designed for data-to-text applications.”
“[A] realisation engine which grew out of recent experiences in building large-scale data-to-text NL G systems, whose goal is to summarise large volumes of numeric and symbolic data (Reiter, 2007). Sublanguage requirements and efﬁciency are important considerations in such systems.
Although meeting these requirements was the initial motivation behind SimpleNLG, it has since been developed into an engine with signiﬁcant coverage of English syntax and morphology, while at the same time providing a simple AP I that offers users direct programmatic control over the realisation process” [Reiter, 09] Here, as with the NLTK, there are a few areas of overlap between the realization process employed in SimpleNLG and the core RiTa functionality, specifically stemming, noun-pluralization, and verb conjugation. In fact, verb-conjugation and noun-pluralization in both packages use the often cited morphological rules specified in Minnen . Like RiTa, SimpleNLG places all functions in users' direct programmatic control, and appears to place high-importance on documentation and tutorials, as does the NLTK. Reiter says that http://www.csd.abdn.ac.uk/~ereiter/simplenlg/ the simplicity of use of SimpleNLG is reflected in its community of users.
The currently available public distribution, has been used by several groups for three main purposes: (a) as a front-end to NLG systems in projects where realisation is not the primary research focus; (b) as a simple natural language component in user interfaces for other kinds of systems, by researchers who do not work in NLG proper; (c) as a teaching tool in advanced undergraduate and postgraduate courses on Natural Language Processing. [Reiter, 09] Further, as perhaps expected from its name, the syntax is both simple and consistent,
with most method calls exposed as setters, e.g., setSubject() or setInterrogative() as below:
Phrase s1 = new SPhraseSpec(‘leave’);
// OUTPUTS - ‘Did the boys leave home?’ s1.setInterrogative(WHERE, OBJECT);
// OUTPUTS - ‘Where did the boys leave?’ While SimpleNLG represents a useful solution to the specific tasks for which it was designed, there are limitations one notices when it is applied to a literary context. First, employing a so-called “pipeline” architecture, it requires all parts of a sentence to be known before realization, as opposed to the varieties of ‘incremental’ generation allowed by other systems [Manurung 2003]. The standard pipeline architecture (as presented in Reiter and Dale) presents other problems for the realm of literature. Its limitations become apparent in applications that include specific goals for the resulting surface text, from the inclusion of specific literary features, to the inclusion of idiomatic constructions, even to aiming toward a particular phrase, paragraph, or document length. Manurung , summarizes the issue by describing how the elements of text generation are, in the case of poetry, not independent at
Making choices in one aspect can preclude the possibility of choices in other aspects. When these decisions are made by the separate modules in the pipeline architecture... the resulting texts may be suboptimal, and in the worst case, the system may fail to generate a text at all. This problem has been identified in Meteer (1991) as the 'generation gap', in Kantrowitz and Bates (1992) as 'talking oneself into a corner', and also in Eddy (2002), who notes that it is not easy to determine what effect a decision taken at an early stage in the pipeline will have at the surface, and decisions taken at one stage may preclude at a later stage a choice which results in a more desirable surface form.
Additionally, SimpleNLG provides no support for custom features or constraints— literary or otherwise—at the level of the phrase or sentence. Thus, the realization of a sentence in which formal and semantic elements are intended to receive equal weight in choosing a final surface realization can be problematic. In fairness however, we should note again that this is not a use for which SimpleNLG was intended.
4.2.2 For Computational Art In recent years there has been impressive growth in both the number and quality of libraries and environments designed specifically for computational artists. While once artists and art students were forced to work either in medium-specific tools (e.g. Photoshop or ProTools) which are only very minimally programmatic, or so-called “general-purpose” languages (e.g., C/C++ or Basic), which provide little specific support for art practice, they are now faced with a wide range of languages, libraries, and environments designed specifically for the art context. While none of those listed below target the domain of literature or even language processing—nearly all target visual media, and if not visual, then aural media—their approach to providing programmatic supports for the arts has directly influenced the development of the RiTa tools to varying degrees. Tools that have been peripherally important to RiTa have been included in the comparison table below for the purpose of presenting an overview of the domain. The most direct influence, however, has been from the Processing environment, with which RiTa optionally integrates, and which is discussed in detail below.
184.108.40.206 The Processing Environment Processing is an open-source programming library, development environment, and online community that has promoted software literacy within the visual arts since 2001.
Initially created to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing quickly developed into a tool for creating finished professional work as well. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It was created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool for programming images, animation, and interactions.
Processing is a free alternative to proprietary software tools with expensive licenses, making it accessible to schools and individual students. Its open-source status encourages the community participation and collaboration that is vital to its growth. Contributors share programs, contribute code, answer questions in the discussion forum, and build libraries to extend the possibilities of the software. The Processing community has written over seventy libraries to facilitate computer vision, data visualization, music, networking, and electronics.
Processing was founded by Ben Fry and Casey Reas in 2001 while both were John Maeda's students at the MIT Media Lab and was developed as a direct descendant of Maeda's “Design By Numbers” language [Maeda 2001].
Like Processing itself, which can be used directly as a Java library, RiTa is only loosely coupled with the Processing environment. It can be used with or without the Processing IDE and libraries.74 Both the Processing libraries and development environment have been integral elements in the practical implementation of the PDAL class, and in the development of the RiTa tools. Having offered a discussion in this section of current tools available to students and programmers, the next section will present a chronological survey of procedural writing methods, experiments, and tools in the contexts of both computer science and Literary Arts.
The one exception is the text display capabilities of RiTa which (as of v80) require Processing’s core.jar archive.
Figure 17: Comparison of educational programming environments
4.3 Literary-focused Computer Science This section presents a series of literary experiments in mainstream computer and information science that have led to important research results. Our goal here is to introduce the reader to early efforts that provide the underpinnings for subsequent work in the field, including, but not limited to, the RiTa project. In Christopher Strachey’s “Love Letter Generator”, we find a very early experiment with computer programs designed to produce creative literary outputs, “composed” without intervention by a human author. Claude Shannon later focuses on how literary language can be represented via probabilistic models.
Joseph Weizenbaum uses simple transformational rules on natural language to create the first conversational agent, and Selmer Bringsjord argues convincingly for a new “Turing Test” based on literary creativity, and implements a state-of-the art story generation system designed to pass it. What all the researchers in this section have in common is that they use what have become core computer science methods to engage with the literary, a methodology that has significantly influenced the construction of the RiTa tools.
4.3.1 Christopher Strachey The first known literary experiment with a modern computer was Christopher Strachey’s “Love Letter Generator”, written for the Manchester Mark I and completed in
1952. Much of the important research on Strachey, and his work with the seminal computer scientist Alan Turing, was done only recently by Noah Wardrip-Fruin in his 2006 dissertation at Brown University, in which he has argued rather convincingly that Strachey's piece is actually the first work of digital art of any kind, a not insignificant fact when considering the importance of literary experiments to the burgeoning fields of both digital art and computer science. In this light, Strachey is a figure of some importance, not least of which due to the insight his story provides into Turing’s early career. And of course, without Turing, Strachey would have not had access to the computer on which the “Love Letter Generator” was programmed. Their association story begins in 1951 when Strachey, still only a teacher at the Harrow School, asked Turing for a copy of the manual for the Mark I computer he had recently written. Turing's somewhat surprising acceptance of the request facilitated Strachey’s sudden appearance in the world of modern computing [Wardrip-Fruin 2006].
Strachey visited Manchester for the first time in July of 1951 and discussed his ideas for a checkers-playing program with Turing. These ideas impressed Turing, who suggested that the problem of making the machine simulate itself using interpretive trace routines would also be interesting Strachey, taken with this suggestion, wrote such a program. As Strachey's
biographer Martin Campbell-Kelly writes: