«Andrés Gregor Zelman The University of Amsterdam 2002 ii Mediated Communication and the Evolving Science System: Mapping the Network Architecture of ...»
Briefly defined, textual analysis is the examination of texts for underlying structure; it entails the search for patterns of word use which are then used to determine the cognitive influences upon an author’s text. Hermeneutic procedures are often used to settle arguments concerning authorship; some obvious examples include the Testaments, and Shakespearean works. Medieval Scholarship aimed to find parallels between the Old and New Testaments by finding places where words used in the Old Testament would pre-shadow a passage in the New Testament (Bradley & Rockwell, 1997). Indeed, many of Christ’s parables referred to older teachings appearing in the Old Testament. It became clear that it would be useful to group words into categories 5 and then develop indexes that pointed to occurrences of those words in the different books of the Bible. Using this technique, early hermeneutic researchers formed an understanding of the various influences upon Christ’s teachings. Formally this process is known as thematic concordance; it involves finding the co-occurrence of names, places, events, etceteras, in the Bible. Hermeneutics has a rich history that includes not only examinations of biblical texts, but the extensive analysis of other texts such as Homer’s Iliad and Odyssey (Luria 1976, Ong 1977 & 1982), Shakespearean and Joycean works (Theall 1997, 2000), and more recently Discourse Analysis of a myriad of accessible data sets (Mehta et. al.
The current analysis concerns changes in textual expression from print writing to electronic writing. Textual analysis techniques are employed to determine the similarities and differences between print and electronic modes of SOEIS communication. Here it is important to understand that the historical development and use of textual analysis techniques stretches back to scribal culture, and that it extends to present day analyses which are enhanced by computer aided techniques. Thus, by extension, computer assisted textual analysis is more than just a ‘find’ function to locate words as is standard with word processing programs. It is characterized by the ability to search large texts quickly by creating electronic indexes to locate and reorganize information. With such programs one can both conduct complex searches for lists of words or patterns of association, and present the results in a way that suits the study of texts. That is, the output of textual analysis provides a visual means of interpreting results through displaying Key Words In Context (KWIC), for example.
In the 1940’s the study of texts became aided by machinery. The first machine used to this effect was called the Concordance. Concordance uses essentially the same methods of co-occurrence indexing developed in the Middle Ages. In the mid 1940’s, an Italian scholar named Robert Busa was examining the use of the term ‘presence’ in the writings of Saint Thomas Aquinas.5 After manually searching the texts he realized that for Aquinas the meaning of the word was connected with the use of the preposition “in”. He came to believe that function words (such as: ‘in’ or ‘sum’) provided clues to understand Aquinas’ conceptual world through deconstructing the words he used to describe it. By the late 1940’s Busa began to create a complete concordance of all of the over 10 million words in Aquinas’ writings. This was done using punch cards and card sorting machines. The project lasted over 30 years since the technology then available for the electronic transfer of text was still lacking. In the 1970’s technology became available for this type of large-scale project. With the help of large IBM mainframe computers and computer driven typesetting equipment the project was finally completed.6 By the late 1970s computing had become well established within the Humanities, and particularly within the field of hermeneutics. It became apparent that if the analysis of texts was to evolve, there were two necessary developments: in order to compare texts 5 For additional details see Bradley & Rockwell “TactWeb 1.0; ‘Midsummer’ Workbook” at http://tactweb.humanities.mcmaster.ca/tactweb/doc/catahist.htm, 1997.
6 It is estimated that over 1 million person hours were used inputting the text and creating the indexes (Bradley & Rockwell, 1997).
6 there was the need for a machine-readable format, and in order to share results the need arose for a standardized software package that many could easily use.
Oxford University was an early leader in this with regard to two major projects. The first is the Oxford Text Archive through which all scholars are invited to submit electronic texts. The second is the Oxford Concordance Program which serves to provide links between all submitted texts. It is important to realize that the Oxford Text Archive and the Oxford Concordance Program are primarily indexing tools not textual analysis programs in and of themselves, yet their development facilitated the possibility of doing more efficient and effective computer assisted textual analysis.
In the 1980s, computer assisted textual analysis took another great leap with the advent of desktop computing and standardized word processing. The first standard software for computer assisted textual analysis in the humanities was the Brigham Young Concordance (BYC) Program that was later commercially packaged as WordCruncher. The 1980s and 1990s saw the development of a myriad of other textual analysis programs, thereby extending the possibilities of bibliometric analysis techniques in understanding media bias. Bibliometrics can be understood as the quantitative study of patterns in both individual texts and their correlation with each other – it is hermeneutics enhanced by metrics, or specifically, by bibliographic coupling.7 In the 1990s enormous effort was put into the development of sophisticated techniques for the analysis of textual patterns. There is an emerging discourse that addresses new and sophisticated means of mapping processes of knowledge production. Indeed, significant movements are being made towards understanding the visualization of text itself as the new research object. That is to say, textual analysts are largely dependant upon how effective they are in visually representing their findings; different representations of research results will generate different understandings of a text’s significance.8 In the last several years the central arguments concerning scientific visualization have focused on the development of new software tools to aid in analysis. For this analysis the WordSmith program of the Oxford University Press was selected for its ease of use, data exportability, and sophisticated sorting logics.
Leydesdorff provides arguments both for (1989) and against (1997) the use of coword analysis to map the intellectual development of the sciences. In the case of the former, indicators of intellectual structure and organization were found by comparing titles of scientific articles; yet in the case of the latter, a comparison of bio-technology articles revealed a document structure at the micro level, but at the level of the document set the structure was no longer discernable. Thus, Leydesdorff (1997) argues that co-word analysis cannot adequately map intellectual organization – this argument will be employed as a methodological caveat. This study will illustrate how 7 The reader will note that bibliographic coupling has a different meaning in citation analysis; here the term is used to emphasize the comparison of textual patterns across many texts, not simply citation patterns.
8 For details concerning the importance of visualization to the field of textual analysis see: Bradley (1991), Bradley & Rockwell (1992, 1994) and Monger & Rockwell (1999).
Above and beyond mere textual analysis, bibliometric techniques have aided many recent attempts to develop more sophisticated means of measuring and understanding the changes in our mode of information and subsequent changes on our processes of knowledge production. In particular, Hicks & Katz (1997) argue for the use of bibliometric indicators to understand changes in the emerging knowledge-based economy, Hjortgaard (1997) imported bibliometric techniques for analysing online research publications, and Larson (1996) used bibliometric indicators to map the World Wide Web as an architecture, or ‘intellectual structure’.
Scientometrics operates on the same principles of co-analysis; it is basically bibliometrics applied to scientific texts – a study of quantitative patterns in both individual texts and in their correlation. In a word, it is hermeneutics enhanced by metrics, but it is distinct from bibliometric analyses in that formal publications can be understood as indicators of scientific codification (Price 1970). Publications refer to each other, and this leads to networks of scientific papers (Price, 1965), which thereby provide a geography of science (Small & Garfield, 1985). Importantly, these networks can be examined to delineate between academic specialties (Leydesorff & Cozzens 1993), and to assess national research performance (Leydesdorff & Gauthier, 1996).
Networks are established by the researcher by deciding what to measure, be it nationally, institutionally, disciplinarily, or otherwise. Indeed, Small (1973) argues for a co-citation analysis whereby the citation of two texts by a third is registered and a network delineated. Similarly, author co-citation is a measure of the frequency of the citation of two authors by others, and this provides us with a measure of proximity useful in mapping these relationships (White & Griffith, 1981).
Scientometricians not only concern themselves with the normative aspect of modelling these networks, but they contend with the symbolic dimensions of publication behaviour as well. For example, Small (1978) argued that cited documents operate as concept symbols, and Gilbert (1984) argued that references operate as property or as a means of persuasion. Further, Leydesdorff & Amsterdamska (1990) have argued that with respect to the dimensions of citation analysis, the subjective reasons of scientists to reference another paper are not equivalent to the normally assumed argumentative uses of cited articles.
As the field of Scientometrics matures there is a notable increase in attempts to create a macro theory of citation behaviour. Van den Besselaar (2000) aimed to discern relationships between Scientometrics and Communications Theory by developing theoretically informed indicators. Leydesdorff & Wouters (1999) argued for an emerging macro theory of Citation Culture; and Wouters (1999) provides a thorough History of Scientometrics. Finally, Fujigaki (1998) argues that the citation system operates as a recursive network which involves a process of continual re-evaluation and knowledge accumulation. It is notable that scientometric methods are increasingly 8 imported into other domains to understand the networked dynamics of other social phenomena. Rousseau’s Sitations (1997) offers a salient example whereby bibliometric and scientometric methods are combined to understand relationships between linked web-sites (and hence an indicator of an emerging cybermetrics).
Like bibliometrics and scientometrics, cybermetrics shares a common emphasis on co-analysis. Cybermetrics is used here as a broad term meant to incorporate a number of notable developments; namely, infometrics, webometrics, and cybermetrics.
Infometrics was formalized as a research agenda in the 1980s and differs from bibliometrics and scientometrics in that the latter refer solely to communications media and the sciences. Infometrics is neither limited to specific media nor the sciences; it refers to the quantitative analysis and modelling of a myriad of different information sources including credit databases, information flow on the Internet, and genetic histories for such purposes as demographic monitoring, enhanced surveillance of workplace activities, and insurance eligibility (Egghe & Rousseau, 1995). Abraham develops the field of what he terms ‘Webometry’ whereby he aims to sort the complexity of the World Wide Web by creating a chronotopography (1996, 1997).
And Almind and Ingwersen (1997) use infometric methods for webometric analyses.
Elsewhere, Ingwersen (1998) offers a means of calculating Web Impact Factors by adopting infometric methods to test national, sector and institutional impact factors.
Other important data sources than that infometrically obtained are relevant here. In particular, the cybermetric approaches of Korenman & Wyatts’ (1996) analysis of group dynamics within email lists, Hernandez-Borges’ (1997) comparative study of Pediatric mailing lists on the Internet, and Matzat’s (1998) analysis of informal academic communication via Internet mailing lists. These approaches all use email communications as the hermeneutic unit of analysis in a similar way that scientometric analyses employ scientific publications. Finally, some notable work has been done recently with respect to mapping the networked dimensions of online web debates (Rogers & Zelman, 2002; Rogers & Marres, 2000), Triple Helix relations on the web (Leydesdorff, 2001, Boudourides & Sigrist, 1999), and with respect to purely quantitative approaches such as Kugiumtzis & Boudourides’ analysis of Internet ping data (1998).
These approaches have each addressed either the print medium (predominantly in the case of bibliometrics and scientometrics), or the electronic medium (in the case of cybermetrics). However, the approaches lack sufficient comparison of media forms with each other. One would expect that analysis of media impact on processes of knowledge production would be concerned with the nature of the medium itself in biasing these processes. In this study a number of metrically oriented approaches have been combined and employed to discern notable differences between the print medium and the electronic. In this way this analysis aims to contribute a new perspective to this discourse. Further, while representative of an extremely interesting and burgeoning field, the literature outlined above lacks sufficient theoretical underpinnings to make the studies relevant to larger audiences. In part this study is motivated by this lack, and thereby weaves together distinct theoretical perspectives with empirical analyses to provide a more holistic account of the changing science system.