Andrés Gregor Zelman The University of Amsterdam 2002

The differences between SOEIS print and electronic communications will necessarily also be theorized in terms of function – where print served to integrate the SOEIS in terms of research output, electronic communications integrated the SOEIS group in a more individual and less formal way. In either case, our a priori assumptions are that these media differ in their modes of enabling communication, and that within the context of the SOEIS project each medium was designated a specific function. A perspective is gained whereby individual events compile to become collective actions, and the complex interrelation of these elements can be broken down into architectural, network, and systemic dimensions.

The use of the word in different contexts provides a focal point for questions concerning both individual and collective behaviour. Through analyzing word use as an event in the hermeneutic sense, keyword collocates and associates can be understood to integrate these events into sentences, articles, and indeed into larger social ramifications. By analysing how individuals use words in certain circumstances, one can theorize how the changes in word distribution over time may indicate fluctuations in processes of knowledge production that taken together may indicate a collective codification of knowledge.

The SOEIS print communications can be expected to be more codified than the electronic communications because of the inscribed nature of the output itself. Similar information has been exchanged over these two different information channels and differences between the respective databases are thereby expected to be significant in terms of revealing particular media bias. The print output is examined for its basic architectural features including the document size, word frequency and unique-word / word frequency (and their ratio), for network features such as rate of word-use change, and for systemic features such as phases of ‘pathway dependence’ or critical revisions in the dataset, and finally differences in overall word distribution. Relevant here are the differences in the consistency across each of the four print document sets representing the four 6 month time periods of the project. In the next chapter we compare the consistency of print and electronic communications and seek to reveal distinct biases of each medium in question. The analyses described herein should be understood in the context of the Architecture – Network – System theoretical triad outlined in Chapter II: Theoretical Grounding.

Research Questions & Expectations

There are three research questions, each stated with reference to a particular theoretical body as outlined by the theoretic triad of Architecture – Network – System.

The research questions and expectations of each analysis are introduced in tandem, as each is theoretically informed using the a part of the triad. Each theoretic perspective enables additional questions to be posed and alternative, yet complimentary, frames of analysis to be employed.

56 First, does the SOEIS print communication have a discernable architecture, and can particular qualities be identified with a decidedly print mode of communication? This question will be addressed through an analysis and comparison of the ratio of unique words to overall word occurrence in each of the four six month time segments of the print database. This question is informed by the Medium Theory notion of the information network, and aims to characterize print word-use as central to the functioning of the SOEIS project, as distinct from electronic word-use which is expected to perform a more supplementary role in the SOEIS project. Changes in the percentage of unique words used over the 2 years of the project should indicate processes of information codification, thereby revealing the network architecture of the SOEIS print communications. The primary expectation here is that from this basic architecture, qualities particular to print communication in general can be identified.

The print communications are suggestive of Mode I processes of knowledge production, whereas electronic communications appear more Mode II oriented, and this difference should be observable through a comparison of the results of the analysis of the print and electronic databases.

Second: are network properties of SOEIS print communication discernable by comparing the fluctuation of keyword use over the four time periods in the analysis?

Here the primary concern is to isolate words which can be identified as ‘key’, and to examine their distribution over the four respective time periods to find evidence of changing emphases in the cognitive orientation of the research group. This procedure is accomplished by examining SOEIS print keyword distribution in a variety of different ways; notably, by comparing the top 50 keywords, and comparing keywords isolated by comparing the text of each time period with the full document set, and by comparing the text in each time period with each other. By identifying the increase, decrease or disappearance of keywords in each time period it is expected that changes in the cognitive orientation of the research group can be identified. The question is informed by Actor Network Theory and outlines the network features of print communication as a product of collective action. A key focus here is how individuals collectively contributed to produce texts and thereby formed information networks.

The primary expectation here is that a readily discernable pattern of keyword reoccurrence will be visible in the print dataset. A point of concern is the tracing of the transmission of keywords across the four time periods of the SOIES project to locate evidence of discourse codification. If certain words or word patterns occur in particular time periods, or only in either the print or the electronic datasets, then certain assumptions can be made about the biases of each medium and its attribute function. Thus, within the print dataset it is expected that some words will be distributed more or less evenly across the time periods, but certain words or word patterns will likely occur in only particular periods. By tracing which words occur when, it should be possible to discern shifting emphases and patterns of information codification in the print dataset.

The analysis of the network dimension of the SOEIS print communication therefore aims to reveal patterns of keyword distribution and thereby describe major developments in the concepts being exchanged – specifically by isolating patterns of relation between keywords. Importantly, keywords are positively or negatively 57 emphasized, and this designation can provide insight into the general overlap between the texts, as changes in the position of keywords should reveal constancy or a relative increase or decrease in emphasis or importance. Collective cognitive changes can be thereby be identified by observing fluctuations in keyword distribution. Unlike the electronic dataset, which is comprised of individual email communications, the print document set is an amalgamation of co-authored texts. Nevertheless, changes in the cognitive orientation of the SOEIS project itself can be identified by observing changes in keyword distribution at the aggregate level.

Word collocates are also important here. Collocates are defined as words which occur near other words, and they are normally distinguished between associate collocates and neighbourhood collocates. Associate collocates include words which would necessarily be cognitively connected; for example, the word car might have as its associate collocates words like gasoline, road, or driver. These associations are difficult to locate using textual analysis programs, and are more suited to manual analyses of individual texts or a small group of texts. Neighbourhood collocates, by contrast, are words which occur frequently near other words, and these are easy to locate over large document sets. More precisely, neighbourhood collocates are words that occur within a designated window of analysis; here only those words which occur within five words from the query word are used for the analysis.1 Keyword collocate fluctuation across the time series may be interpreted as evidence of changing cognitive biases as exhibited by their respective differences. In this way we can appreciate the means through which individual communications aggregate and reflect collective orientation. This reinforces the Actor Network Theory contention that textual analyses can indeed be conceptualized using network approaches, and positions the analysis toward the systemic aspects of the aggregate data sets.

Finally, can we identify path dependencies in the SOEIS print dataset, indicating systemically that critical transitions were necessary for the communication to develop in the way that it did? Self-Organization Theory informs this approach; here the analysis aims to determine if the communicated information followed particular pathways over others, thereby indicating processes of critical revision. The analysis compares linear and non-linear associations between the time periods of the SOEIS Communications, by comparing the expected information content of each time period as compared to the previous state of the communication. With respect to this final research question it is expected that when examined for path dependency, points of critical transition will be observable, thereby indicating that each stage of the communication was necessary for the project’s productivity. As argued above, fluctuations in keyword use may lead one to expect that certain words will likely appear in later time periods, and this is interesting for the network analysis. By contrast, the systems analysis measures the expected information content by comparing the overall information dataflow in each period with every other by comparing the linear associations (first period to the second, the second to the third, third to the fourth) with the non-linear (first to fourth, first to the third, second to the fourth). It is expected that the SOEIS print communications can be shown to have an 1 The window of five words to the left and right of the query word was used in the collocate analysis as windows of less than five words often render too few collocates, and windows of more than five words deliver too wide a scope.

58 evolutionary component – precisely by observing the overall keyword distributions over the body of texts to determine if there have been crucial transitions in the ways that they are distributed over time.

Results Architecture All documents were collated into four individual document sets correlating with four major periods of the SOEIS in which print writing was generated2, representing roughly four time periods of six months each. An adapted stop-list was used to filter most commonly repeated words useless for the analysis (e.g.: ‘if’, ‘and’, ‘but’), after the basic information for each of the 6 month chunks was identified to provide an initial basis for comparison.3 The collected and collated documents representing the four time periods were run through the WordSmith program. Each document set and the aggregated collection were examined for basic statistics including size, word count, unique-word count, their percentage of unique words and the standardized or mean ratio percentage of unique words.4 The program also produced the word lists that are used in the subsequent analyses of network and systemic properties. Table 4.1: Print Architecture below provides the basic descriptive information; note that P1, P2, P3 and P4 represent the four respective six month time periods of the print communications of SOEIS research project.

–  –  –

From this distribution it is observed that the final time period has almost 60 % of the word occurrences, and over 40 % of the unique words. Perhaps more interesting is the unique word percent which illustrates a considerable increase of new word use in the third time period, and a marked reduction in the last. However, the standardized type / token ratio (mean ratio %) illustrated here remains consistent across the four time 2 The four periods correspond with the project Application, Milestones, Reports and final Results.

3 See Appendix A.1 for the complete list of words included in the stop-list. The results presented below are all necessarily abstractions from the larger set of results located in Appendix A.

4 Importantly, the unique word / word ratio percent varies widely in accordance with the length of the texts being analyzed. A 1,000 word article might have a type/token ratio of 40%; a shorter one might reach 70%; a million words will probably give a type/token ratio of about 2%, and so on. But arguably such type/token information is rather meaningless in most cases; Wordsmith uses a different strategy for this computation. The standardized or mean ratio percent is computed every 1,000 words as Wordsmith goes through each text file. Thus, the ratio is calculated for the first running words, and then calculated afresh for the next 1,000, and so on through the entire document set. A running average is computed which provides an average type/token ratio based on consecutive 1,000-word chunks of text.

The standardized type/token ratio is interpreted here as an indicator of the style of variation between unique words and word occurrence over the entire print dataset.

59 periods. Both observations may indicate evidence of codification in the print document set, but for different reasons. The general rise in the unique-word percentage over the first three time periods, and then the marked reduction in the final time period, is suggestive of a ‘cut and paste’ environment wherein considerable segments of previous writings (project milestones, for example) were reused in later submissions.5 The codification evident here is a matter of process; indeed the stabilized unique word percentage in the print document set could be due to the influence of a single author or group of authors.6 By contrast, the mean ratio percent suggests an a priori codification, whereby the project is kept within certain structural boundaries as is the case with EU funded research projects which are characterized by time constraints, project deliverables, and precise expectations about the format of final results.

