«Creativity Support for Computational Literature By Daniel C. Howe A dissertation submitted in partial fulfillment of the requirements for the degree ...»
The final trace program was some 1000 instructions long, by far the longest program that had yet been written for the machine, although Strachey was unaware of this. Some weeks later he visited Manchester for a second time to try out the program. He arrived in the evening, and after a “typical highspeed high-pitched” introduction from Turing, he was left to it. By the morning, the program was mostly working, and it finished with a characteristic flourish by playing the national anthem on the “hooter”.75 This was a considerable achievement for an unknown amateur. He had written, within a single session, the longest program for the Mark I thus far. As Martin CampbellKelly asserts, his reputation was virtually established overnight. By June 1952 Strachey had completed his responsibilities at the school and officially began full-time computing work as an employee of the NRDC. That summer he developed, probably with some input from others including Turing, the Mark I program that y, it is unlikely that Strachey had digital art of the sort we see today in mind. For one thing, there would have been little thought of an audience. As with his checkers-playing program, the love letter generator could be reported to a wider public, but only experienced directly by a small audience of his fellow computing researchers.
Turing biographer Andrew Hodges  reports that “Those doing real men’s jobs on the computer, concerned with optics or aerodynamics, thought this silly, but... it greatly amused Alan and Christopher”. Looking at the program’s output today, we can understand why Turing and Strachey’s colleagues thought the project silly. In 1954, Strachey published the following article in the art journal Encounter (immediately following texts by William
Faulkner and P. G. Wodehouse):
For further information, see http://grandtextauto.org/2005/08/01/christopher-strachey-firstdigital-artist/.
You are my avid fellow feeling. My affection curiously clings to your passionate wish. My liking yearns for your heart. You are my wistful
Clearly there are a number of shortcomings apparent in the “letter” above, but, like many creative computing experiments, such outputs are not the most interesting part of the project, but rather the vast combinatory potential that such programs afford. It is likely that this unpredictability and procedural “expansion” is part of what amused Strachey and Turing, though, as has often been the case, the process itself is now lost to us and only sample outputs remain. As Wardrip-Fruin argues , this is a problem for work in digital literature and art generally. We tend to focus on surface output, and as a result our understandings do not
include the “hidden” procedural elements that work to create such outputs. He goes on to say:
[Process and data] are integral parts of computational works, and to fail to consider them means we only see digital literature from the audience's quite partial perspective. The fundamental fact about digital works is that they operate, that they are in process, and only once our interpretations begin to grapple with the specifics of these operations will we be practicing a method commensurate with our objects of study.
This is a problem that RiTa, with its focus on interpretation at multiple levels (from outputs, to source code, to intermediate ‘texts’ like templates and grammar files) addresses directly.
Here is another example from Strachey's Encounter article:
My sympathetic affection beautifully attracts your affectionate enthusiasm.
You are my loving adoration: my breathless adoration. My fellow feeling breathlessly hopes for your dear eagerness. My lovesick adoration
“M. U. C.” is of course a reference to the Manchester University Computer, or Mark I, who “plays” the part of a love letter author by carrying out the process outlined in the article:
Apart from the beginning and the ending of the letters, there are only two basic types of sentence. The first is “My, (adj.), (noun), (adv.), (verb) your, (adj.), (noun).” There are lists of appropriate adjectives, nouns, adverbs, and verbs from which the blanks are filled in at random. There is also a further random choice as to whether or not the adjectives and adverb are included at all. The second type is simply “You are my, (adj.), (noun),” and in this case the adjective is always present. There is a random choice of which type of sentence is to be used, but if there are two consecutive sentences of the second type, the first ends with a colon (unfortunately the teleprinter of the computer had no comma) and the initial “You are” of the second is omitted.
The letter starts with two words chosen from the special lists; there are then five sentences of one of the two basic types, and the letter ends “Yours, (adv.) M. U. C.” Words in parenthesis are randomly substituted according to the
anxious, wistful, curious, craving, covetous,...
desire, wish, fancy, liking, love, fondness,...
anxiously, wistfully, curiously, covetously,...
desires, wishes, longs for, hopes for, likes,...
dear, darling, honey, jewel, love, duck, moppet, sweetheart Table 5: Examples from the Love Letter Generator’s Input Data.
As we can see in the data presented above, Strachey’s generator involves a high degree of combinatorial choice, with a choice among many options provided for nearly every word. It is at once a literary work and a work of computer science exploiting non-determinism over a clearly-defined search space to achieve a specific (creative) effect. These days, processoriented works of digital literature tend to use algorithms of a complexity that dwarfs that of those described in Strachey's generator, but the importance of the context—computers that can emulate human creative processes—has in no way diminished.
4.3.2 Claude Shannon Claude Shannon was a seminal thinker in both computer science and information theory—he arguably invented the latter76—whose work laid the foundations for the statistical methods we find in such widespread use today. Though the experiment described below does not target literary outputs as directly as Strachey’s “Love Letter Generator”, it presents another example of fruitful synthesis between the literary context and computer science research. Working from already-constructed literary texts Shannon created probabilistic models that could approximate various properties of the text being examined. Like the other researchers presented in this section, the context for Shannon’s experiments was based not only in natural language, but specifically in literature. To quote Golumb , “it is no exaggeration to refer to Claude Shannon as the ‘father of the information age’, and his intellectual achievement as one of the greatest of the twentieth century”.
One of Shannon’s important early contributions was his work with n-grams, based on the notion of Markov models as invented by Andre Markov in 1906. Like Shannon, Markov himself used literary language, specifically the novels of Pushkin, as a means of analyzing the general statistical properties of natural language. The basic question the two researchers considered was, given any sequence of English letters or words, what is the likelihood of the occurrence of the next letter or word? Shannon published the answer to his question in a paper entitled “A Mathematical Theory of Communication” [Shannon 1949] where he formalized, among other things, the notion of n-grams. To illustrate the concept, he provided six sample “messages” from the English alphabet.
In the first message, each of the alphabet’s 26 letters and the space appear with equal probability:
XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD...
See [Golomb 2002].
In the second, the symbols appear with frequencies weighted by how commonly they appear
in English text (i.e., “E” is more likely than “W”):
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA...
Shannon’s remaining four sample messages were produced with a somewhat different process. In the third, symbols appear based on the frequencies with which sets of two of the symbols appear in English. That is to say, after one letter is recorded, the next is chosen in a manner weighted by how commonly different letters follow the just-recorded letter. So, for example, in generating the previous message it is only that “E” is a more common letter than “U”. However, in creating the third message, it is also important that if a pair of letters begins with “Q” it is more likely that the complete pair will be “QU” than “QE”. Taking the frequencies of pairs into account in this manner means paying attention to the frequencies of “bigrams” [Wardrip-Fruin 2006]. The sample message created in this way begins:
In the fourth, symbols appear based on the frequencies with which sets of three of the symbols appear in English. This is called a “trigram”, with the choice of the next letter weighted by the frequencies with which various letters follow the set of two just recorded.
Shannon’s sample message begins:
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME...
In the fifth, the unit is moved from letters to words. In this message, words appear in a
manner weighted by their frequency in English:
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME...
Finally, in the sixth sample message, words are chosen based on the frequency with which pairs of words appear in English. This, again, like the technique of choosing based on pairs of letters, is called a “bigram” technique, but here applied to words. The complete final message
Shannon used was:
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER
THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER
EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
Shannon describes the progression as follows:
The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note that these samples have reasonably good structure out to about twice the range that is taken into account in their construction.
Thus in (3) the statistical process insures reasonable text for two-letter sequences, but four-letter sequences from the sample can usually be ﬁtted [sic] into good sentences. In (6) sequences of four or more words can easily be placed in sentences without unusual or strained constructions. The particular sequence of ten words “attack on an English writer that the character of this” is not at all unreasonable. It appears then that a sufficiently complex stochastic process will give a satisfactory representation of a discrete source. [Shannon 1949]
To create the last four messages, Shannon  using ordinary books, which he explains:
To construct (3) for example, one opens a book at random and selects a letter at random on the page. This letter is recorded. The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded. Turning to another page this second letter is searched for and the succeeding letter recorded, etc. A similar process was used for (4), (5) and (6). It would be interesting if further approximations could be constructed, but the labor involved becomes enormous at the next stage.
That is to say that the last sample message (which begins with a sequence that sounds surprisingly coherent) was generated by opening a book to a random page, writing down a random word, opening the book again, reading until the just-recorded word was found, writing down the following word, opening the book again, reading until that second word is found, writing down the following word, and so on.
So why does the 6th message sound so coherent, if all Shannon did was repeatedly open a book at random? As Wardrip-Fruin points out , the answer is that Shannon is operating on the assumption that ordinary books reflect (more or less) the frequencies of letters and words in English. And this, in turn, is why we find a passage of unexpected coherence in the last sample message. Because choosing an ordinary book is actually choosing a piece of highly-shaped textual data (shaped, for example, by the frequencies of words and sequences of words in the book’s language, the topic of the book, the author’s particular style.77 When these “statistical” measures are aggregated for one or more input texts, we have what is called a language model.78 Further, any new text that we choose to generate from this model will produce results that reflect the data we used to create it.
Shannon was not working in a literary or artistic context, but his ideas were applied by practitioners for years to come, even though it is only in relatively recent time that This type of analysis is often referred to as “computational stylistics”.