DOI 10.1007/s11192-012-0799-1

Doctoral dissertations of Library and Information

Science in China: A co-word analysis

Qian-Jin Zong • Hong-Zhou Shen • Qin-Jian Yuan • Xiao-Wei Hu •

Zhi-Ping Hou • Shun-Guo Deng

Received: 29 February 2012

The aim of this paper is to map the intellectual structure of research in doctoral

dissertations of Library and Information Science in China. By use of Co-word analysis, including cluster analysis, strategic diagram and social network analysis, we studied the internal and external structure and relationship of research fields in doctoral dissertations of Library and Information Science in China. Data was collected, during the period of 1994–2011, from six public dissertation databases and ten degree databases provided by the universities/institutes which have been authorized to grant doctoral degrees of Library and Information Science in China. The results show that Wuhan University is the most important institution of doctoral education in LIS in China. The focuses of researches, including information resource, ontology, semantic web, semantic search, electronic government, information resource management, knowledge management, knowledge innovation, knowledge sharing, knowledge organization, network, information service, information need and digital library. The research fields of LIS doctoral dissertations in China are varied. Many of these research fields are still immature; accordingly, the well- developed and core research fields are fewer.

Keywords Co-word analysis Á Strategic diagram Á LIS doctoral dissertations in China Á Library and Information Science

Introduction A dissertation shows proof of both an original contribution to knowledge and substantial subject knowledge in a discipline, and it also provides evidence of significant scholarly achievement (Kushkowski et al. 2003). Doctoral dissertation is not only a measure of research output, but is also a measure of the production of qualified manpower; a resource that is essential for contemporary knowledge societies (Andersen and Hammarfelt 2011).

For these reasons, researches about doctoral dissertations in different disciplines and countries (Anwar 2004; Breimer 1996; Herubel 2007; Macauley et al. 2010; Villarroya et al. 2008; Yaman and Atay 2007) have beenreceived a lot of attention. Doctoral dissertation of Library and Information Science (LIS) is also a valuable research subject, for example, Anwar (2004) examined the publication output of LIS graduates that was derived from their doctoral dissertations.

Since the ‘‘opening-up’’ of China to the West in 1978, Library and Information Science in China has developed rapidly along with the growth of Chinese economy (Wu and Yuan 1994). Peking University and Wuhan University were approved in 1990 authorized for the Library Science doctorate and Information Science doctorate respectively, and LIS doctor recruitment in China was started in 1991. Now, 20 years later, understanding the development of the discipline is particularly necessary. Doctoral dissertations offer unique insight into the field by revealing the foci of research and instruction within the institutions that produce LIS scholars (Finlay et al. 2012), and serve a critical function in the exploration of disciplinary development (Sugimoto et al. 2011). Thus, in this study, we map the intellectual structure of the research fields of LIS doctoral dissertations in China by using co-word analysis. This will contribute to better understanding the field and providing a basis for its future development. The remainder of this paper is organized as follows. First, we conducted a literature review. Next we introduced the methodology of this paper, included data collection, data process, and method of data analysis. Then descriptive statistic, cluster analysis, strategic diagram and social network analysis (SNA) was conducted to analyze the dataset, and the results were interpreted. Finally, we draw our conclusions in the last section.

Literature review This section reviews the Co-word analysis and its latest development (especially Latent Dirichlet Allocation (LDA), a newly developed approach of topic modeling), and bibliometric research on dissertations (especially the LIS doctoral dissertations).

Co-word analysis and its latest development Similar to the method of co-citation analysis (Small 1973; Small and Grifth 1974), co-word analysis rests on the assumption that a paper’s keywords constitute an adequate description of its content. Two keywords co-occurring within the same paper are an indication of a link between the topics to which they refer (Cambrosio et al. 1993). The presence of many cooccurrences of pair of words within scientific papers reveals that they may correspond to a research theme. The main feature of co-word analysis is that it visualizes the intellectual structure of one specific discipline into maps of the conceptual space of this field, and that a time-series of such maps produces a trace of the changes in this conceptual space (Ding et al. 2001). Many researchers have used co-word analysis as an important method to

explore the concept network in different fields, for instance, biology (An and Wu 2011;

Cambrosio et al. 1993; Rip and Courtial 1984), education (Ritzhaupt et al. 2010), Library and Information Science (Ding et al. 2000; Milojevic et al. 2011; Uzun 2002; Zhao and Zhang 2011), and so on. With the support from strategic diagram and SNA, co-word analysis, different from other co-occurrence methods, is able to visualize the intellectual structure of a specific discipline through measuring the association strength of keywords from publications in a research area (Liu et al. 2011).

Topic model is a kind of statistical model for discovering the scientific topics in a literature corpus, and can be considered as the latest development of co-word analysis. One of the newly developed approaches of topic model is LDA, which was proposed by Blei, Ng, and Jordan (Blei et al. 2003) as a generative probabilistic model for identifying the topics in a set of documents. LDA has been used to study the topical structure of a filed and it performs well. Griffiths and Steyvers (2004) presented a statistical inference algorithm for LDA to analyze abstracts from PNAS. The topics recovered by their algorithm picked out meaningful aspects of the structure of science and revealed some of the relationships between scientific papers in different disciplines. Zheng et al. (2006) extracted a set of major semantic concepts from a protein-related corpus of text words from MEDLINE titles and abstracts by applying the LDA model. They found that the identified concepts are semantically coherent, and most of them are biologically relevant. Expansions of LDA have also been used to understand (Sugimoto et al. 2011) correlations between topics (Blei and Lafferty 2007), authors (Rosen-Zvi et al. 2004), academic networks (Tang et al. 2008) and changes in topic overtime (Pruteanu-Malinici et al. 2010; Rzeszutek et al. 2010).

Moreover, the topic modeling approaches are also used in field of community detection (Ding 2011), and some models (e.g., CTM, DCTM, etc.) are proposed to better understand the dynamic features of social networks and make improved personalized recommendations (Li et al. 2012).

Bibliometric research on LIS dissertations

As a bibliometric approach, citation analysis of LIS dissertations has been conducted to study the sources, rankings of disciplines and authors (Buttlar 1999; Gao et al. 2009;

Sugimoto 2011). A few studies have investigated the topics of LIS dissertations. Schlater and Thomison investigated the methods used in Library Science dissertations (Schlater and Thomison 1974, 1982). Franklin and Jaeger (2007) examined the LIS doctoral dissertations written by African American women between 1993 and 2003, and the research fields were divided into five categories of research topics: information issues, library/librarianship issues, literature, and technology. Sugimoto et al. (2011) identified changes in dominant topics in LIS over time, by analyzing the 3,121 doctoral dissertations completed between 1930 and 2009 at North American Library and Information Science programs. In this study, core research areas (library history, citation analysis, and information-seeking behavior) was identified; meanwhile, one of the notable changes in the topics was the diminishing use of the word library (and related terms). Finlay et al. (2012) examined the topicality of LIS dissertations written between 1930 and 2009 at schools with American Library Association (ALA)-accredited university programs in North America. The results of this article indicated that the percentage of dissertations found to contain no instance of any of the selected library keywords had steadily risen since 1980; similarly, the percentage of dissertations found to contain instances of keywords in both the title and abstract had steadily declined.

Similarly, some researchers have analyzed the doctoral dissertations of Library Science or Information Science in China from the quantitative views. Gao et al. (2009) conducted a citation analysis of 14 doctoral dissertations in LIS at Wuhan University, the results revealed that the cited literatures came primarily from Chinese sources. Based on 110 doctoral dissertations of Information Science from China Doctoral Dissertations Full-text Database (CDFD), Wanfang Data, and National Science and Technology Library (NSTL), Wang et al. (2009) revealed the distribution of time and themes, as well as the focuses according to the frequency of keywords. They found that knowledge management and information services are the top two highest frequency keywords. Jin (2010) gathered 256 doctoral dissertations of LIS in China from 1994 to 2010. In this study, she investigated method systems of these dissertations, and results showed that researchers paid close attention to research methods, but ignored the methodology. Similarly, Yang (2011) surveyed research methods of 70 LIS doctoral dissertations of National Science Library of Chinese Academy of Sciences from 2000 to 2009, and the findings indicated that conventional methods were mainly used, such as investigation and experimental methods. Ye (2011) conducted a statistical analysis of doctoral dissertations of Library Science published in CDFD, Wanfang Data, and National Library of China (NLC) from 1994 to 2010, and analyzed the keywords with word frequency statistics method. The conclusion of this article indicated that publishing, digital libraries, library and information management whose frequency exceeded 10 and these keywords with high frequency can reflect the hot researches in Library Science.

However, until now, little attention has focused on the internal and external relationship of research fields in doctoral dissertations of LIS in China. Furthermore, the dataset of previous studies in China was only from public doctoral dissertations databases, such as CDFD, Wanfang data, and NLC. Thus, the data of these previous researches is still scarce.

This paper aims to map the intellectual structure of the research fields of LIS doctoral dissertations in China. Compared with previous studies, in order to get more data, we not only gather doctoral dissertations from public degree databases, but also obtain doctoral dissertations from the degree databases provided by the universities/institutes which have been authorized to grant LIS doctoral degrees.

Methodology Data collection and process Keywords are the most important research elements in co-word analysis and should be exacted from publications when the research area is selected. We gathered data (1994–2011) from 16 databases, more specifically, six public degree databases and ten degree databases provided by the universities/institutes which have been authorized to grant LIS doctoral degrees.

In China, doctoral dissertations should be submitted to library/archive of the universities/institutes when the authors obtained their doctoral degrees. Until now, there are nine universities/institutes have been authorized to grant Library Science, or Information Science, or LIS doctoral degrees in China. They are Wuhan University (WHU), Nanjing University (NJU), Peking University (PKU), The National Science Library of Chinese Academy of Sciences (NSLC), Nankai University (NanKai), Jilin University (JLU), Renmin University of China (RUC), Central China Normal University (CCNU), and Sun

Yat-sen University (SYSU). Table 1 shows degree databases of the nine universities/ institute.

Except for being submitted to library/archive of the degree-conferring universities/ institutes, some doctoral dissertations (full text or bibliography) will be also submitted to the public degree databases. There are six common public degree databases, including CDFD, Wangfang Data, NCL, NSTL, China Academic Library and Information System (CALIS), and, Institute of Scientific and Technical Information of China (ISTIC). Table 2 shows the six public degree databases in China.

We gained data of doctoral dissertations by four steps. Firstly, we gathered bibliographies of LIS doctoral dissertations from the 16 degree databases. Secondly, we merged these data, and removed duplicated ones. Notice that, some bibliographies, especially the item ‘‘Keywords’’ is not correct in some public databases. Therefore, thirdly, we checked all keywords of the dataset one by one through reading full text or the first 16/24 pages in the databases above. Unfortunately, there were still some doctoral dissertations without full text (or full text of first 16/24 pages) in the databases or Internet. Thus, fourthly, we obtained these doctoral dissertations through document delivery services to fill the gap.

