«Andrés Gregor Zelman The University of Amsterdam 2002 ii Mediated Communication and the Evolving Science System: Mapping the Network Architecture of ...»
The four 6 month time chunks of the print communications were thereby grouped and analyzed as a time series to determine features of their collective architecture. The expectations of the first analysis were confirmed; indeed, the analysis not only identified evidence of information codification, but revealed two distinct forms of codification. The architectural statistics about the groupings of texts has thereby served to ground the analysis of print writing as a distinct medium of communication employed throughout the course of the SOEIS research project; one with evidence of both a priori and processual codification.
The analysis of the print communications of the SOEIS project entailed two distinct stages of analysis beyond the basic statistics described above; namely, the examination of the texts for network properties of collective co-word and collocate usage, and searching for evidence of systems transformation. The results of the network and system oriented analyses are described below.
Network As outlined above, the expectations of this network analysis revolve around observed fluctuations in the distribution of keywords. Changes in the distribution of words are expected to reveal changes in the emphasis of the group over the 4 time periods, and thereby show changes in its cognitive orientation. Three keyword analyses are described below: the first isolates the top 50 keywords in the four texts, the second compares each text with the print dataset (P1 with P-All, P2 with P-All, and so on), and the last compares each text sequentially (P1 by P2, P2 by P3, and P3 by P4).
The first keyword isolation observed the most commonly occurring words in each time period. Ten salient examples from the top 50 keywords occurring in each time period are selected on the basis of their relevance to the SOEIS project in order to illustrate the range of topics found in its distribution; they are shown below in Table 4.2: Top Print Keywords.
5 This cut and paste phenomenon is a normal procedure in EU funded research projects in general, as identified in Leydesdorff “Scientometric Indicators and the Evaluation of Research” (forthcoming, 2003).
6 The final project reports were edited as one unified text prior to the publication of the final results.
The codification of keyword use may therefore be related to the bias of an individual or group of individuals.
These examples have been selected to highlight the most relevant commonly occurring keywords and their different distributions across the time periods. For example the words European, Information and Analysis appear to occur proportionately with increasing frequency over the course of the project, with the slight exception that Information appears to occur slightly more often in the second period than one would expect, given the status of these words as project title words.
Likewise, the word Data occurs more often than expected in the second time period but its use decreases by the end of the project.
The words Networks, Systems, Task and Project occur very often in the first time period, and have relatively high occurrence rates in the final time period. Indeed, in the context of the SOEIS project one would expect the words Task and Project to occur with a high frequency in the first period which included the text of the initial application to the EU. However, like the words Networks and Systems one might expect that they would occur with a higher frequency in the final period with closure of the project. It is also quite notable that the word Firms is included in the top 50 keywords of the print dataset, despite the unusually low rates of occurrence in both the first and third time periods.
The first of the keyword analyses conforms to our expectations that a general sense of the project’s cognitive orientation can be gleaned by comparing the most frequently occurring words. Many examples can be found comparing the wordlists alone to understand the parameters the shared informational worlds comprising the SOEIS research project. However, it is arguable that the most commonly occurring words in any textual analysis reveal less about the exchange of critical topics than other less frequently used words since collectively the words exchanged are so general in the context of the specific research project.7 The limitation identified is that the most commonly occurring keywords found using this approach tend to be project title words which tell us very little about the actual content of the communications. The next analysis therefore compares these lists for the most commonly occurring shared words; this approach revealed more interesting emphases in the word distribution patterns.
7 See: Gerard Salton Some Research Problems in Automatic Information Retrieval International Conference on Research and Development in Information Retrieval (SIGIR) Proceedings, 252-263, 1983 61 The additional analyses were performed on the print document set using two different approaches: the first comparing P1, P2, P3, and P4 against P-All, thereby gaining four lists of varying length, and the second comparing P1 to P2, P2 to P3, and P3 to P4, resulting in three wordlists representing the periods of transition between the four defined segments. Importantly, the shared keywords revealed by comparing wordlists are assigned either positively (+) or negatively (–) thereby representing those words which occur more or less often than expected, given the texts that are compared (P1 compared with P2, etc.). The equal (=) signs represent not that these words do not occur in the texts, but that when the two texts under analysis were compared, the words were not designated as key.8 The individual wordlists were compared with the full document set (P-All). Twelve salient examples are selected here on the basis of their direct relevance to the research project. They are presented below in Table 4.3: Print x All Documents; see Appendix A.4 for the complete results.
When compared by frequency and by distribution, the positively or negatively emphasized keyword occurrence provides insight into cognitive orientation of the different texts. It is interesting to note which words are (or are not) presented as
keywords in the different time periods. For example, it is observed that the keywords:
Leydesdorff, and Models have a peculiar distribution whereby they are not deemed central to either P2 or P3 when compared with P-All. Moreover, both words are negatively denoted in the final time period and this reveals something about the SOEIS. As a project participant it is clear why – Dr. Leydesdorff was a SOEIS project co-ordinator and was a participant for its first year; his presence explains in part why there is such a high occurrence of the word Models. In the latter half of the SOEIS project the project assumed a slightly different perspective, as evidenced by the high frequency and positive designation of the word Discourse in the final time period, as compared to the negatively denoted Leydesdorff and Models in same time period.
8 The WordSmith program identifies the words in wordlists after they have been filtered using a stoplist as keywords, and words identified by comparing texts as key-keywords. Here words located via comparing texts are referred to simply as keywords.
Other interesting examples gained from the above distribution are the keywords:
Communication, Networks, and Systems. It is observed that these are deemed keywords in the first two time periods, but remain absent from the second two time periods. The three keywords are all negatively denoted in P2 suggesting that in this part of the project there was a particularly different focus – one concerning the use of the word Firms, which occurs with an extremely high frequency in the second time period, while exhibiting a particularly low frequency in the first and third periods. We learn that the occurrence of Firms was emphasized in the second time period revealing the cognitive orientation of the group to have been organized around this particular research topic, and that in terms of its overall impact, the word Firms did not comprise a central part of the final output.9 Additionally, it is interesting that both Self and Organization were deemed as keywords in the second and third time periods but not in the first and last. This shows that when compared to the entire document set these words occurred less frequently than would be expected given their status as project title words. Task, by contrast, was the only isolated keyword that occurred in all time periods – as one would expect, as a the word serves to collate the individual reports. It is also notable that the keyword Project doesn’t occur in the second time period and that it is negatively denoted in the final time period. This suggests a decreasing importance of the use of the word.
Finally, the keyword Data was found to be key in the first period where it was negatively denoted, suggesting little use of the term here; but it was found to be positively denoted in the third periods revealing that during this period the collection, exchange or discussion of data was prevalent. These examples confirm the expectation that comparing wordlists can reveal different emphases of the research project during different time periods.
Table 4.4: Print x Each Document, below, shows the results of comparing the word lists from each time period with each other, not the full document set.
9 As with many other examples used in this analysis, this can be confirmed simply by referring back to the original documents. However, the aim of this analysis is in part to illustrate the ability represent these features without direct reference to the specific texts under analysis.
This second approach whereby the individual wordlists were compared with each other emphasizes the transmission of word use as a process over the time periods.
Several interesting examples were found using this approach. Most notably, many keywords appear to decrease in emphasis during the middle of the research project,
only to increase in relative importance in the final period; examples include:
Amsterdam, Empirical, Methods, Partners, and Task. By contrast, there are keywords which clearly decrease in emphasis over the course of the project: Models, Organization, Project and Work. These latter examples occur with more frequency in the middle transition (from P2 to P3), yet in every case they are negatively denoted thereby suggesting they occurred with much more frequency in the first and final texts in the analysis.
These indicators suggest that the second and third texts lacked a certain orientation that was central to the first and final texts, as exemplified by the decrease in importance of many theory oriented keywords in the second transition. The fact that the project proposal and final results comprise the first and last time periods, respectively, suggest a particular mode of communication whereby the keywords which gave form to the project are maintained, but the keywords in the middle transition (represented by the middle time periods) suggest a different cognitive emphasis. This is shown by the high occurrence of process oriented keywords in the middle transition (Models, Organization, Project and Work), as compared with those project oriented words which emphasize the project’s functional component (Amsterdam, Empirical, Methods, Partners, and Task). It may be however that the unusually high occurrence of Firms in the second time period somehow deemphasizes the weight of the many negatively denoted keywords identified in the P2 to P3 transition phase, given its crucial role in the second transition.
It was also found in this transitional analysis that the word Analysis was not designated keyword status in the first transition stage, but occurred with a very high frequency in the second transition (from P2 to P3) and relatively high frequency in the third transition (P3 to P4). The word Progress was only designated keyword status in the middle transition (P2 to P3), and is negatively denoted. This shows that similar to the function oriented words identified above, the word Progress was not designated as central in the middle of the research project but was evidently an important statement in the beginning and end of the project.
These procedures provided an additional filtering of the most frequently occurring words in the original wordlists, and revealed important changes in emphasis. The expectation that the cognitive orientation of the SOEIS project could be shown using different forms of keyword analyses has been confirmed. The fluctuating occurrence of different words designated as ‘key’ in each approach has clearly illustrated these changes. However, while the observation of individual word occurrence provides a sense of cognitive orientation, capturing the meaning of words demands an 64 observation of words in context. As argued by the Structuralist, Poststructuralist, and Structurational approaches to understanding meaning, words only mean by association with other words.
To capture the meaning of the cognitive realm revealed by the network analysis of keyword distribution described above, keyword word collocates were sought out for their possible significance. A collocate analysis was performed on the four print document sets using a handful of keywords isolated in the preceding analysis. The keyword Firms was selected given its primacy in P2, and Organization was selected on the basis of its centrality as a title keyword and its designation as key in P2 and P3.
Task and Project were selected because of their specific centrality to the operation of the SOEIS research project. The analysis entailed the examination of the original texts (P1, P2, P3, and P4) for the frequently occurring neighbourhood collocates of these four query keywords.
The occurrence of each query word was first plotted over the print document set, and visual displays generated to see the distribution.10 The results of the initial analysis are presented below in Table 4.5: Distribution of Print Keywords for the Collocate Analysis. Here the query words are measured for their distribution across each document set, and for their relative distribution every 1000 words (as a standardized mean).