«Andrés Gregor Zelman The University of Amsterdam 2002 ii Mediated Communication and the Evolving Science System: Mapping the Network Architecture of ...»
The remainder of this section provides a brief description of the limitations perceived in the respective analyses performed in the previous chapters. With respect to the textual analyses performed in Chapters IV and V, the individual datasets proved relatively easy to overlay, despite the difference in precise origin and termination dates of the print and electronic communications. The four time periods were roughly delineated according to the project application, two milestone report periods and the final report, as contained in the print communications dataset. The limitation in the data was that the SOEIS project output was collectively written as print documents, whereas the emails which comprised the project mailing lists were individually written and collated in the same four time periods of six months. The email documents were manually filtered for redundancy as there was no tool available to perform the filtering process.
Where the analysis of the print and electronic communications revealed the internal communicative dynamics of the SOEIS, the publication analysis and mailing list analysis uncovered its external communicative dynamics. Both analyses imparted valuable information concerning the larger scientific environment in which the SOEIS operated, yet neither database was truly comparable as print versus electronic modes of publication. It would be valuable if a means of collecting comparable datasets were developed. One could conceivably compare traditional journal publication – as 141 recorded by the Web of Science in the Science Citation Index (SCI), Social Science Citation Index (SSCI) and the Arts & Humanities Index (A&HI) – with indexes of online articles such as the NEC Research Index2, though arguably it is likely to be some time before the latter becomes more of an academic standard. Nevertheless, efforts are underway to map the hyper-linking between websites in a manner similar to scientometric approaches (Rogers, 2000) The mailing list analysis performed here observed threaded messages, not online journals, indexes or individual web pages. Nevertheless, comparisons of email archives or the content of web pages would arguably present new and interesting research avenues. Webmasters and mailing list providers alike employ a myriad of different programs for enabling, indexing and archiving messages – presenting the problematic that there is no standardized form of archiving mailing list output, or of copying and filtering a web site. It is possible to write a program to perform the necessary standardization of indexical, archival, and retrieval functions, in principle, but there remains the need for a standardized mark-up language among information providers.
There is a general consensus that there should be a move towards the development of an encoding standard. In 1994 the Text Encoding Initiative (TEI) was published.
However, the flexibility of TEI permits individual scholars to create their own marking schemes thereby making it much more difficult to develop a standard data interchange format. Future developers will need to decide whether to provide a standard mark-up (like TEI), or XML (eXtensible Mark-up Language). Such standardization of mailing list archival functions, for example, would enable much more potential to compare larger datasets. If future researchers were to find evidence of self-organized criticality in Internet Mailing lists, for example, the examination would necessarily demand that a much greater amount of raw data be collected.
Moreover, an analysis performed on a greater amount of emails would reveal even richer keyword distribution.
The research performed in this dissertation has clearly shown that in order to perform effective research of this sort into the future it would be wise to aim towards the development of a software program to aid in the performance and integration of a range of related tasks. While the lack of standardized formatting of email archives and the like remains a limitation, it is not an obstacle and indeed, the recent acceptance of XML as a de facto Internet standard yields promise. Thus, among the most interesting insights obtained during the process of this research can be communicated here in a number of recommended features of a modularized software program that would enable such integrated analyses.
MAT Program Design:
The core of this final discussion and suggestions for future research lies in a proposal for the development of a modularized software program for a Media Analysis Toolbox (MAT) to aid in the performance and integration of particular tasks which 2 The Computer Science oriented Research Index is located at http://citeseer.nj.nec.com/cs 142 have proven problematic in this research. This line of argumentation is pursued purely as an intellectual enterprise, thereby providing another type of reflection upon the understanding gained through this study. The design parameters of any academic software tool would necessarily have to consider several basic features such as an indexing program to record, archive and retrieve one’s data, and an exportation function to enable further interpretation of the results in standard statistical and graphing programs. Below I will outline major considerations concerning the design and integration of 3 core tools and 3 periphery tools, collectively containing a range of distinct sub-modules, all encompassed under the general rubric of MAT.3 The program could be written using Visual Basic software, in order to permit the easy development of additional modules. The program under consideration is open ended and infinitely expandable, in principle.
The need to develop new forms of scientific instrumentation in order to understand the dynamics of the technological shift from print to electronic as the predominant medium of exchange has never been more salient. The development of electronic publishing, sophisticated modes of online indexing, and academic emailing lists (to name but a few developments) have introduced radical changes to the traditionally print-biased modes of knowledge production, and print based modes of information storage. How can we further assess the impact of ICT’s on current modes of knowledge production? One possible means is the development of a research software program to aid the execution of analyses similar to those covered in this dissertation.
The proposed modularized software package incorporates advantages from a range of existing research tools, and implies the formalization of new research methodology similar to that developed in this study. The research methodology enables researchers to use a combination of existing tools to locate relevant networks of academics, institutes and research foci, and to compare the substance of publications, websites and emails based on their textual content. The proposed modularized software package will provide researchers with a unique set of research tools to categorize, analyze and compare both online and offline research behaviour. Here I will outline several major considerations concerning the design of the MAT software package, and thereby outline the parameters of what would constitute a program more suited to the analysis of emails, websites and the content of other online textual environments.
It is important to highlight that any research software tool designed for academic or other purposes should keep the program as simple as possible, as the people who are likely to use it will not be familiar with programming per se (i.e.: not scientists or
engineers), but rather people involved with the Social Sciences (namely:
Communication Science), and the Humanities (namely: Hermeneutics). With respect to program users the interactivity of the design must be considerate of all of the distinct modules used in the program. While the seasoned researcher may know and understand all of the different calculations being performed, a novice may not understand the issues. Thus, designers and developers would be wise to consider an interactive program that provides immediate feedback to changes made in the 3 This program design has been largely inspired by the design specifications of the Réseau-Lu Network Laboratory Research Software Package, and the textual analysis tool: EyeConTACT. See: Monger & Rockwell (1999), and Rockwell & Bradley (1998).
The 3 core tools of the MAT are: the Visual Mapping Tool (VMT), the Web
Harvester Tool (WHT), and the Data Filtering Tool (DFT). The 3 periphery tools are:
the Textual Analysis Tool (TAT), the Citation Analysis Tool (CAT), and the List Analysis Tool (LAT). Each individual tool is comprised of several sub-modules that can be added or replaced as needed. The aim is to develop a cohesive set of modules and sub-modules to perform multiple tasks on a myriad of related data sets. In what
follows, the tools and their respective sub-modules are described in detail. Figure 9.1:
Media Analysis Toolbox, below, shows the interrelation of the conceptualized research software toolbox and its constituent parts.
Of the three core tools of the MAT (VMT, WHT, and DFT), the most integral to the program is the Visual Mapping Tool (VMT). There are 2 integral components of the VMT: the visual display module, and the mapping module. The VMT is conceptualized to be the centre of analytic operations – it is through this tool that the entire research project can be conducted, tracked and visualized.4 The visual display module permits the researcher to visually organize all data under analysis, and additionally permits the use of other tools in the MAT from the same interface. The mapping module permits the analyst to record all tasks performed in the research, and to manipulate the results accordingly.
The Web Harvester Tool (WHT) and the Data Filtering Tool (DFT) are also core tools of the MAT. The WHT permits the gathering of data from various online resources through two sub-modules: the crawler module and the site ripper module.
The crawler module permits users to browse the World Wide Web (WWW) in a way similar to Netscape or the Microsoft Internet Explorer, in order to locate relevant information to be imported for analysis. The site ripper module serves to download and filter the content of websites relevant for the analysis. This results in output that 4 The Visual Mapping Tool reflects the design of the Réseau-Lu software in which the network laboratory operates as the central visualization and project mapping tool.
144 can be used for text analysis using the TAT, or for citation analysis using the CAT.
The DFT serves as the tool for filtering a variety of different data forms; its submodules include: a text filter module, a citation filter module, and a list filter module.
In addition, the DFT has a general conversion module to convert documents (e.g.:
.doc,.rtf) into ASCII text files in preparation for analysis in one of the three periphery modules described below; in addition, the general conversion module permits the viewing and splitting of texts.5 The text filter module permits the analyst to create and use stop lists to filter texts prior to analysis; the citation filter module permits analysis to filter Web of Science and Dialogue Medline data obtained from the WWW or CDROM in order to perform Scientometric analyses using the CAT periphery tool.6 The list filter module enables the researcher to filter and standardize the output of mailing list archives on the WWW. Indeed, one of the major limitations identified in Chapter VII: Analysis of Mailing List Environment was that Internet Mailing List providers use a myriad of different ways of archiving messages; the list filter module would permit the standardization of mailing list output into comparable units for analysis.
Collectively the VMT, WHT and DFT comprise the core elements of the proposed Media Analysis Toolbox; each is integral to functioning of the software program. In short, it is through these 3 core tools that data is collected in preparation for analysis in the periphery tools. The output of the periphery tools is in turn imported and displayed using the VMT, uploaded for publication on the web using the WHT, and finally converted into a range of formats for use in other programs such as Microsoft Word, Excel, or Access using the DFT.
The 3 periphery tools: Textual Analysis Tool (TAT), Citation Analysis Tool (CAT), and List Analysis Tool (LAT) are all additional components created to perform particular research tasks (corresponding with those performed through the course of this dissertation). Importantly these individual programs can be added or subtracted as deemed necessary by the analysis, and are fully compatible with the other programs of the MAT. Moreover, the MAT is conceptualized to be expandable in order to incorporate the development of new tools and sub-modules to perform additional tasks as necessary. As indicated, the program would be best designed to permit the easy creation of modularized tools to be used in combination with existing tools. The three periphery tools are best conceptualized as enabling architectural, networked and systemic analyses as exemplified by the methodology developed in this study.
The Text Analysis Tool (TAT) will permit the same basic analyses as the WordSmith program used for the textual analyses of Chapter IV: Analysis of Print Communication and Chapter V: Analysis of Electronic Communication. Traditionally, textual analysis programs have been designed to do text-retrieval on literary texts. The 5 Provided that Visual Basic was used to generate visual displays of data in texts, one would want the texts to be in XML. Rendering software permits the inclusion of images and non-textual elements of documents that can be very important in the case of some texts. Finnegan’s Wake provides a salient example of a text which contains drawings, unique means of displaying text (using blank characters), as well as obscure references and symbols (from the I Ching, the Masonic tradition, the chakra system, etc.). Including these features can be crucial to the hermeneutic examination of complex texts.