WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 10 | 11 ||

«by MICHAŁ WRÓBLEWSKI Supervisor JERZY STEFANOWSKI, Assistant Professor Referee ROBERT SUSMAGA, Assistant Professor MASTER THESIS Submitted in ...»

-- [ Page 12 ] --

• extension of input pre-processing Methods of pre-processing of the input snippets, performed by the other modules of the Carrot2 system, such as stemming or stop-words removal also have a strong influence on the final results. We feel that for instance extension of stop-lists would result in a considerable improvement of result clusters quality as descriptions of clusters created by our algorithm often consist of single terms only. This fact makes them unclear in case when these terms are ones which contain no information and should in fact be put on the stop-list.

• overlapping clusters Unfortunately, in results given by AHC one document may appear only in one group (not counting its parent groups in hierarchy). Yet in the real world single documents may correspond to several different topics and should be contained in an appropriate number

- 86 of clusters. However, during our work on this thesis we haven't found any publications

mentioning attempts to create version of the AHC algorithm with such capabilities.

• improvement of usefulness of created clusters We feel that main problem with the quality of AHC results is that it quite often creates clusters made up from documents which share just one or two single words, which usually don't tell us anything about their real topic. So rejecting some of the created clusters and moving documents contained in them to the "Other Topics" group may have in fact an overall good influence on the results.

• speed of implementation Unfortunately, our implementation of the AHC algorithm is certainly too slow for its practical application. Main reason of this fact is the cubic complexity of used version of the clustering algorithm itself.

- 87 BIBLIOGRAPHY

[AltaVista] AltaVista indexing service, http://www.altavista.com [AnswerBus] AnswerBus search engine with a natural language interface, http://misshoover.si.umich.edu/~zzheng/qa-new [AskJeeves] AskJeeves search engine with a natural language interface, http://www.askjeeves.com [Bezdek 81] Bezdek, J. C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981 [Carrot] Carrot Web search results clustering interface, http://www.cs.put.poznan.pl/dweiss/index.php/research/carrot/obsolete.xml [Carrot2] Carrot2 Web search results clustering interface, http://www.cs.put.poznan.pl/dweiss/carrot [Chabiński and Bugajska 03] Chabiński A., Bugajska M.: Multiszperacze, CHIP, 118 (3):

128-132, March 2003 [Church et al. 91] Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis, in: Zernik, U. (ed.): Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon: 115-164, Lawrence Erlbaum, New Jersey, 1991 [Cutting et al. 92] Cutting, D. R., Karger, D. R., Pedersen, J. O., Tukey, J. W.:

Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'92): 318-329, 1992.

[Dogpile] Dogpile meta-search engine, http://www.dogpile.com [Dom 2001] Dom, E. B., An information-theoretic external cluster validity measure, IBM research report RJ 10219, 2001.

[Duff et. al 02] Duff, I. S., Heroux, M. A., Pozo R., An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum, ACM Transactions on Mathematical Software (TOMS), 28 (2): 239-267, June 2002.

[Egothor] Egothor indexing service, http://somis4.ais.dundee.ac.uk:8080/egothor-api [Emulti] Emulti meta-search engine, http://www.emulti.pl [Everitt 80] Everitt, B.: Cluster Analysis. Halsted Press (John Wiley & Sons), New York, 1980 [Google] Google indexing service, http://www.google.com [Hill 68] Hill, D. R.: A vector clustering technique, in: Samuelson (ed.): Mechanized Information Storage, Retrieval and Dissemination, North-Holland, Amsterdam, 1968.

[Java] The source for Java Technology, http://java.sun.com [Kartoo] Kartoo graphical search results visualization service, http://www.kartoo.com

- 88 Karypis and Han 00] Karypis, G., Han, E-H., Concept Indexing A Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval & Categorization, Technical Report TR-00-0016, University of Minnesota, 2000 [Lance and Williams 66] Lance, G. N., Williams, W. T.: A General Theory of Classificatory Sorting Strategies. 1. Hierarchical Systems, Computer Journal, 9: 373-380, May 1966.

[LookSmart] LookSmart Web directory, http://www.looksmart.com [Lovins 68] Lovins, J. B.: Development of a Stemming Algorithm, Mechanical Translation and Computational Linguistics, 11(1): 23-31, March 1968 [Maarek et al. 91] Maarek, Y., Berry, D. M., Kaiser G. E.: An Information Retrieval Approach For Automatically Constructing Software Libraries, IEEE Transactions On Software Engineering, 17 (8): 800-813, August 1991 [Maarek et al. 00] Maarek, Y., Fagin R., Ben-Shaul, I., Pelleg D.: Ephemereal Document Clustering for Web Applications, IBM Research Report RJ 10186, April 2000.

[MacQueen 67] MacQueen, J.: Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1: 281-297, University of California Press, Berkeley, 1967.

[Mamma] Mamma meta-search engine, http://www.mamma.com [MapNet] MapNet graphical search results visualization service, http://maps.map.net [Masłowska and Słowiński 03] Masłowska, I., Słowiński, R., Hierarchical Clustering of Text Corpora Using Suffix Trees, in: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K.





(eds.): Intelligent Information Processing and Web Mining, Advances in Soft Computing, 179-188, Springer-Verlag, 2003 [Metasearch] Metasearch meta-search engine, http://www.metasearch.com [MSN] MSN indexing service, http://www.msn.com [NorthernLight] NorthernLight Web directory, http://www.northernlight.com [Notess 99] Notess, G.: Dead Links Report, http://www.searchengineshowdown.com/stats/dead.shtml [ODP] Open Directory Project Web directory, http://dmoz.org [Osinski 03] Osiński, S.: An Algorithm for Clustering of Web Search Results, Master thesis, Poznań University of Technology, 2003 [Page et al. 98] Page, L., Brin, S., Motwani, R., Winograd, T.: The Page Rank citation ranking: Bringing order to the Web, Technical Report, Stanford University, 1998 [Porter 80] Porter, M. F.: An algorithm for suffix stripping, Program, 14(3): 130-137, 1980 [RazDwaTrzy] RazDwaTrzy meta-search engine, http://razdwatrzy.com [Rocchio 66] Rocchio, J. J.: Document retrieval systems – optimization and evaluation, Ph.D. thesis, Harvard University, 1966.

- 89 Salton 89] Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer, Addison-Wesley, 1989 [Salton and Buckley 87] Salton, G., Buckley, C.: Text Weighting Approaches in Automatic Text Retrieval, Cornell University Technical Report: 87-881, New York, 1987 [Salton et al. 75] Salton, G., Wong, A., Yang, C. S.: A Vector Space Model for Automatic Indexing, Communications of the ACM, 18 (11): 613-620, November 1975 [SearchEngineWatch] Web service containing lots of information about search engines, http://www.searchenginewatch.com [Selberg 99] Selberg, E. W.: Towards Comprehensive Web Search, Doctoral dissertation, University of Washington, 1999 [Smadja 91] Smadja, F. A.: From N-Grams to Collocations: An Evaluation of Xtract, Proceedings of 29th ACL Meeting, Berkeley, 1991 [Smadja 93] Smadja, F. A.: Retrieving collocations from text: Xtract, Computational Linguistics, 19(1): 143—177, 1993 [Stefanowski and Weiss 03] Stefanowski, J., Weiss, D., Carrot2 and Language Properties in Web Search Results Clustering, Proceedings of the First International Atlantic Web Intelligence Conference (AWIC'2003), 240-249, Madrid, Spain, 2003 [Ukkonen 95] Ukkonen, E.: On-line construction of suffix trees, Algorithmica, 14(3): 249September 1995 [UML] UML Resource Page, http://www.omg.org/technology/uml [vanRijsbergen 79] van Rijsbergen, C. J.: Information Retrieval, Butterworths, London, 1979 [Vivisimo] Vivisimo Web search results clustering interface, http://www.vivisimo.com [Voorhees 86] Voorhees, E. M: Implementing agglomerative hierarchical clustering

algorithms for use in information retrieval, Information Processing and Management, 22:

465-476, 1986 [Weiss 01] Weiss, D.: A Clustering Interface for Web Search Results in Polish and English. Master thesis, Poznań University of Technology, 2001 Weiss, D., Carrot Developers: Carrot2 Developers [Weiss 02a] Guide, http://www.cs.put.poznan.pl/dweiss/carrot/index.php/developers, 2002 [Weiss 02b] Weiss, D.: Szukanie igły w sieci, Magazyn Internet, 80 (5): 46-51, May 2002 [Weiss 02c] Weiss D.: Choć na chwilę zdjąć gogle.., CHIP, 112 (9): 130-134, September 2002 [Weiss and Stefanowski 03] Weiss, D., Stefanowski, J.: Web search results clustering in Polish: experimental evaluation of Carrot, Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM´03 Conference, vol. 579 (XIV), 209-220, Zakopane, Poland, 2003.

[WordNet] WordNet linguistic system, http://www.cogsci.princeton.edu/~wn [Yahoo] Yahoo indexing service, http://www.yahoo.com

- 90 Zamir 99] Zamir, O., Clustering Web Algorithms: A Phrase-Based Method For Grouping Search Engine Results, Doctoral Dissertation, University of Washington, 1999.

[Zamir and Etzioni 98] Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration, Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98): 46-54, 1998.

[Zamir and Etzioni 99] Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. WWW8 / Computer Networks 31(11-16): 1361-1374, 1999

- 91 -

Pages:     | 1 |   ...   | 10 | 11 ||


Similar works:

«Magnificat My soul proclaims the greatness of the Lord, my spirit rejoices in God my saviour, for you, Lord, have looked with favour on your lowly servant.From this day all generations will call me blessed: You, the almighty, have done great things for me, and holy is your name. You have mercy on those who fear you, from generation to generation. You have shown strength with your arm and scattered the proud in their conceit, casting down the mighty from their thrones and lifting up the lowly....»

«Frequently asked questions on how to manage sports facilities to minimise the impact of drought conditions How do we know if our sport pitch may be in area that is susceptible to drought conditions? To find out which areas of the country are more likely to experience drought conditions you should visit the website of the Environment Agency at: http://www.environmentagency.gov.uk/homeandleisure/drought/default.aspx How do we find out if our sports facilities are covered by a drought...»

«La danza de palos y sus contextos festivos: Zamora y Bragança (The sticks dance and its traditional festivities: Zamora and Bragança) Tiza, António Escola Profissional Práctica Universal. Dpto de Turismo. Av. Abade de Baçal, piso 4. Edif. Shopping Center Loreto. 5300 Bragança (Portugal) pinelotiza@sapo.pt Recep.: 14.07.2011 BIBLID [1137-859X (2012), 14; 383-402] Acep.: 01.10.2012 El presente artículo refleja, sin olvidar los demás aspectos, las festividades tradicionales de la provincia...»

«The Bosnian State a Decade after Dayton SUMANTRA BOSE A decade on from the Dayton peace settlement, this essay sets out to examine two questions. First, is the consociational and confederal paradigm established by the Dayton agreement, and subsequently institutionalized, the appropriate framework for the Bosnian state? It will be suggested that in the circumstances that prevail, this framework does in fact provide the most feasible and most democratic form of government for Bosnia’s...»

«GoSpark: An In-Memory Distributed Computation Platform in Go Kuan-Ting Yu Jiasi Shen Bolei Zhou CSAIL CSAIL CSAIL MIT MIT MIT 1 Introduction Computation tasks in Computer Vision research are data-intensive. Such tasks usually involve performing repeated single operations, such as image resizing and feature extraction, on thousands of images. They also require training iterative algorithms, such as K-means clustering or logistic regressions, on thousands of image feature vectors. To support the...»

«International Journal of Oceans and Oceanography ISSN 0973-2667 Vol.1 No.1 (2006), pp. 99-109 © Research India Publications http://www.ripublication.com/ijoo.htm Environmental Assessment of Heavy Metal Pollution in Bottom Sediments of Aden Port, Yemen  2 Samir M. Nasr¹, Mohamed. A. Okbah, Shaif. M. Kasem³ ¹Department of Environmental Studies, Institute of Graduate Studies and Research, University of Alexandria, Alexandria, Egypt E-mail: samir_nasr@yahoo.com 2 National Institute of...»

«D O U G L A S A. F. VA N D E N B E R G H E Working Across Borders: Multinational Enterprises and the Internationalization of Employment WORKING ACROSS BORDERS: MULTINATIONAL ENTERPRISES AND THE INTERNATIONALIZATION OF EMPLOYMENT Grensoverschrijdend werken: Multinationale ondernemingen en de internationalisering van werkgelegenheid Proefschrift ter verkrijging van de graad van doctor aan de Erasmus Universiteit Rotterdam op gezag van de Rector Magnificus Prof.dr.ir. J.H. van Bemmel en volgens...»

«Case: 14-41127 Document: 00512802898 Page: 1 Date Filed: 10/14/2014 IN THE UNITED STATES COURT OF APPEALS FOR THE FIFTH CIRCUIT United States Court of Appeals Fifth Circuit FILED October 14, 2014 No. 14-41127 Lyle W. Cayce Clerk MARC VEASEY; JANE HAMILTON; SERGIO DELEON; FLOYD CARRIER; ANNA BURNS; MICHAEL MONTEZ; PENNY POPE; OSCAR ORTIZ; KOBY OZIAS; LEAGUE OF UNITED LATIN AMERICAN CITIZENS; JOHN MELLOR-CRUMLEY; DALLAS COUNTY, TEXAS, Plaintiffs Appellees TEXAS ASSOCIATION OF HISPANIC COUNTY...»

«Global onitoringReport on the UGANDA status of action against commercial sexual exploitation of children This publication has been produced with the financial assistance of the Swedish International Development Cooperation Agency (SIDA) and the Ministry of Foreign Affairs of the Grand Duchy of Luxembourg, the Ministry of Foreign Affairs of France, Groupe Développement and ECPAT Luxembourg. The views expressed herein are solely those of ECPAT International. The support received from SIDA, the...»

«Pocatello Beauty Academy Inc. d.b.a The School of Hairstyling 141 East Chubbuck Road Chubbuck Idaho 83202 208-232-9170 208-232-9486 fax LindaMottishaw@msn.com www.theschoolofhairstyling.com Revised May 19, 2014 STUDENT HANDBOOK “YOU BECOME SUCCESSFUL THE MOMENT YOU START MOVING TOWARD A WORTHWHILE GOAL!” ~1~ MISSION STATEMENT The MISSION of the School of Hairstyling is to assist our students in achieving success within their Cosmetology Career as they progress from student to entry level...»

«THERORETICAL ANALYSIS OF SOLAR DRIVEN FLASH DESALINATION SYSTEM BASED ON PASSIVE VACUUM GENERATION By SHALABH CHANDRA MAROO A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2006 Copyright 2006 by SHALABH CHANDRA MAROO To the loving memory of my grandfather and grandmother, whom I shall always remember ACKNOWLEDGMENTS I would like to express my gratitude and respect towards...»

«No. 16In the Supreme Court of the United States DE FACTO PARENTS, Petitioners, V.LOS ANGELES DEPARTMENT OF CHILDREN AND FAMILY SERVICES, J.E., THE CHOCTAW NATION OF OKLAHOMA, AND ALEXANDRIA P., A MINOR UNDER THE AGE OF FOURTEEN YEARS, Respondents. ON PETITION FOR A WRIT OF CERTIORARI TO THE CALIFORNIA SUPREME COURT PETITION FOR A WRIT OF CERTIORARI LORI ALVINO MCGILL Counsel of Record KATYA S. CRONIN TUCKER ELLIS LLP WILKINSON WALSH + 950 MAIN AVE, ESKOVITZ LLP SUITE 1100 1900 M STREET, NW...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.