«Doctoral dissertations of Library and Information Science in China: A co-word analysis Qian-Jin Zong • Hong-Zhou Shen • Qin-Jian Yuan • ...»
In this study, a program in RUBY was developed to processes the raw data. And then, BibExcel (Persson et al. 2009) was employed to calculate the frequencies that two keywords appeared together in the same doctoral dissertation. Subsequently, a symmetrical cooccurrence matrix based on the word co-occurrence was built. The value of the cell of two keywords was decided by the frequencies these two keywords both appear in the same dissertation. The higher co-occurrence frequency of the two keywords means the closer relationship between them (Ding et al. 2001). The symmetrical co-occurrence matrix was then transformed into a correlation matrix by using equivalence index (Cahlik 2000; Callon ÀÁ et al. 1991; Coulter et al. 1998). The equivalence index Eij describes the strength of the association between words i and j in each word pair ij (Neff and Corley 2009; Callon et al.
Eij ¼ Cij =Cii Cjj :
Similarly, the value of the cell indicates the distance of two keywords; the higher value means the closer relationship between them.
We converted co-occurrence matrix to binary matrix by the program developed in RUBY. The average co-occurrence times between the high-frequency keywords (in the cooccurrence matrix) were 0.41. So, we set one as the threshold. If the value of the cell in cooccurrence was less than one, the value of the cell in binary matrix would be zero;
otherwise, the value of the cell in binary matrix would be one.
Method of data analysis
Similar to other studies using co-word analysis, we chose hierarchical cluster analysis, strategic diagram and SNA. As we know, hierarchical cluster analysis and strategic diagram are commonly used in co-occurrence analysis.
Hierarchical cluster algorithm helps us ﬁnd the clusters and the result of clustering can be graphically displayed as tree which shows the merging process and the intermediate clusters (Yang et al. 2012). Hierarchical clustering coupled with co-word analysis has been used widely in many studies (An and Wu 2011; Ding et al. 2001; Milojevic et al. 2011).
The strategic diagram is mainly used to describe the internal relations within a certain ﬁeld and the interactions between ﬁelds (Law et al. 1988). This diagram (Bredillet 2009) is
created by putting the strength of global context on the X axis (called centrality) and putting the strength of local context on the Y axis (called density). Two kinds of indexes (density and centrality) are used to measure the strength of local context and global context, respectively. Centrality is used to measure the strength of a subject area’s interaction with other subject areas. The value of the centrality of a given cluster can be the sum of all external link values (Courtial et al. 1993; Turner 1988) or the square root of the sum of the squares of all external link values (Coulter et al. 1998). In this study, we take the square root of the sum of the squares of all external link values (Coulter et al. 1998) as centrality. Density is used to measure the strength of the links that tie together the words making up the cluster; that is the internal strength of a cluster (He 1999). The density value can be the average value (mean) of internal links (Coulter et al. 1998; Turner 1988), the median value of internal links (Courtial et al. 1993), or the sum of the squares of the value of internal links (Bauin et al. 1991). In this paper, we take the average value (mean) of internal links as density (Coulter et al. 1998).
Social network analysis assesses the unique structure of interrelationships among individuals (Lurie et al. 2009), and has been extensively used in social science, management science, scientometrics, etc. SNA can also map the network by using methods of information visualization. We map the co-word network to show the relationships among research topics. Meanwhile, k-core analysis is commonly used in SNA. A k-core is a maximal group of nodes, all of which are connected to at least k other nodes in the group (Eschenfelder 1980; Maimon and Rokach 2005). By varying the value of k (that is, how many members of the group do you have to be connected to), different pictures can emerge. As the value of k becomes larger, group sizes will decrease, and the relationship among the members will be tighter (Yang et al. 2012). In bibliometrics, some studies have been investigated hot research topics though co-word analysis coupled with k-core analysis (Yang et al. 2012; Zhao and Zhang 2011).
In this study, the hierarchical cluster analysis and strategic diagram was conducted by using SPSS20. Simultaneously, the mapping and network were also obtained by analyzing original co-occurrence matrix and a binary matrix with Ucinet6.0 (Borgatti et al. 2002).
Result and discussion Descriptive statistic of doctoral dissertations and keywords We obtained 640 LIS doctoral dissertations in this study. Table 3 shows the distribution of institutions to which these dissertations belong. As shown in Table 3, Wuhan University has the largest number of LIS doctoral dissertations, indicating it is the most important institution of LIS doctoral education in China.
There were two dissertations which we could not get their keywords. We totally obtained 3,015 keywords (4.7 keywords per dissertation) from the 638 dissertations, and took the 3,015 keywords as the data sample of co-word analysis. Due to the lacking of uniﬁed indexing on keyword, we standardized these keywords by merging the synonyms (e.g., ‘‘Bibliometric analysis’’ is replaced by ‘‘Bibliometric’’). Finally, 56 keywords with frequency more than six were selected as shown in Table 4. The frequencies of these 56 keywords are 612 times (about 20.3 % of the total), covering the main research topics of LIS doctoral dissertations in China. Notice that the keywords ‘‘library’’, ‘‘China’’, ‘‘countermeasure’’, ‘‘Information Science’’ and ‘‘information study’’ have very broad 123 Doctoral dissertations of Library and Information Science in China
meanings. In other words, this kind of keywords are meaningless for this study, and we excluded them in the below analysis.
The words with high frequency of occurrence and co-occurrence can reﬂect research focuses to some extent. The top ten keywords with high frequency of occurrence are knowledge management (39), digital library (34), network (31), ontology (22), information service (20), evaluation (20), electronic government (19), information resource (16), competitive intelligence (16), and library (15). The top ten keywords with high frequency of co-occurrence are network (32), knowledge management (32), digital library (30), information resource (26), ontology (25), knowledge organization (20), information resource management (19), electronic government (18), information retrieval (16), and evaluation (19). Notice that knowledge management, digital library, network, ontology, evaluation, electronic government, information resource have the higher frequency of occurrence and co-occurrence, and indicating that these research topics are major focuses and the bridges connecting other research topics (Liu et al. 2011) in the research of LIS doctoral dissertations in China.
We conducted the cluster analysis using hierarchical cluster analysis, with Ward’s method (Ding et al. 2001; Gordon 1996; Neff and Corley 2009; Lee and Jeong 2008; Liu et al.
2011) and the distance measure is ‘‘Squared Euclidean distance’’ as recommended by Bacher (2002). The 51 keywords of LIS doctoral dissertations in China were divided into 15 clusters. It indicated that the research ﬁelds of LIS doctoral dissertations in China are varied. The dendrogram of the cluster analysis is shown in Fig. 1. Cluster names are given for each cluster as shown in Table 5.
As can be seen, cluster 10 has the largest number of keywords, indicating that the cluster 10 is the most centralized research ﬁelds. The keywords, that is, the research topics in cluster 10 are paid close attention to in LIS doctoral dissertations in China.
Drawing strategic diagram Centrality and density, the two indicators of strategic diagram, could reﬂect the strength of relation within and between clusters. The strategic diagram can display the structure of
research ﬁelds, and it can also reveal the focuses and trends of research ﬁelds by dividing the clusters into four quadrants. We calculated the values of centralities and densities of all clusters (as shown in Table 6), and drew the strategic diagram (Fig. 2).
As shown in Fig. 2, clusters in quadrant I (upper right hand quadrant) include cluster 1 (information resource), cluster 2 (Ontology), cluster 3 (Electronic government), cluster 6 (Knowledge management) and cluster 14 (digital library). Both of the centrality and density of these clusters are high, indicating these clusters not only contain close internal connections but also are widely connected with other clusters. These ﬁelds are the research focuses in LIS doctoral dissertations of China and tends to be mature.
Clusters in quadrant II (upper left hand quadrant) only include cluster 4 (Information retrieval). This cluster has close internal connections, indicating that the research of this cluster has formed a relative stable scale. Contrary to the internal connections, connections between this cluster and the other clusters are not so close. That is to say, this ﬁeld is located on the edge of the whole research network.
Clusters in quadrant III (lower left hand quadrant) include cluster 5 (social network), cluster 7 (evaluation of humanities and social sciences), cluster 8 (performance evaluation), cluster 9 (academic journal), cluster 10 (competitive intelligence), cluster 11 (library management), cluster 12 (bibliometrics) and cluster 15 (open access). These clusters have low centrality and density, thus, they have loose internal and external connections. The ﬁelds are still immature and located on the edge of research network.
Clusters in quadrant IV (lower right hand quadrant) only contain cluster 13 (information management). Although this ﬁeld have loose internal connections, it has been attracted many researchers’ attentions. Consequently, there is vast space for further development in this ﬁeld. In other words, information management will become research trends in the future, and need to be further studied.
Social network analysis We conduct two types of co-word networks through NetDraw. In each network, nodes represent keywords, and line between two nodes indicates that the two keywords have appeared in a same dissertation.
The ﬁrst network was generated by using original co-occurrence matrix (Fig. 3). It could intuitively show the relationship of research topics of LIS doctoral dissertations in China. The relative size of nodes is proportional to the frequency of keywords. Line thickness reﬂects the closeness of connections between two keywords, the thicker the line between two keywords, the closer the connection is. As shown in Fig. 3, the ‘‘KM (Knowledge management)’’ node has the biggest size, which represents it has the highest frequency of keyword. The thicker lines between two keywords, such as ‘‘IRM (Information Resource Management)’’ and ‘‘EG (Electronic Government)’’, ‘‘Onto (Ontology)’’ and ‘‘Sema-web (Semantic Web)’’, etc. represent closer relationships.
We conducted the second network by using the binary matrix which was converted from the original co-occurrence matrix. As show in Fig. 4, ﬁve cores are identiﬁed by k-cores analysis. In order to display the cores clearly, different shapes are conﬁgured: thirty up triangle nodes (k = 5) represent core themes of the network. Ten square nodes (k = 4) represent the secondary core themes. Six circle nodes (k = 3) are the themes which are located between core and periphery. Three down triangle nodes (k = 2) and two plus nodes (k = 1) are the periphery themes.
It should be noted that, Figs. 3, 4 are two different networks of the research topics of LIS dissertations in China. Figure 3 focuses on the relationships between research topics.
Compared with Fig. 3, Fig. 4 focuses on ﬁnding core-verge research topics.
Fig. 2 Strategic diagram of Clusters 123 Doctoral dissertations of Library and Information Science in China Fig. 3 Social network maps of original co-occurrence matrix Fig. 4 k-cores analysis of binary matrix Conclusions In this paper, we investigated the intellectual structure of LIS doctoral dissertations in China by using co-word analysis, including hierarchical cluster analysis, strategic diagram and SNA. We obtain some clear and reasonable results about researches of LIS doctoral dissertations in China.
123 Q.-J. Zong et al.
The distribution of LIS doctoral dissertations in universities/institutes implies that Wuhan University is the most important institution of doctoral education in LIS in China.
School of Information Management of Wuhan University is the earliest LIS education institute in China. After 92 years of development, it has become a comprehensive and largest-scale research-oriented LIS education and research institute in China.
According to keyword frequency, strategic diagram and k-cores, we identify the focuses of researches in LIS doctoral dissertations in China, including information resource and allocation, ontology, semantic web, semantic search, electronic government, information resource management, knowledge management, knowledge innovation, knowledge sharing, knowledge organization, network, information service, information need and digital library. In these research focuses, there are only a small percentage of topics about library/ library-related. This may be caused by many reasons, such as the researchers are no longer studying topics that are relevant to the practical ﬁeld (Finlay et al. 2012). A lack of connections between research, education, and practice, is not only harmful to the development of LIS disciplines, but also to the future of the practice.