A qualitative analysis of the Wikipedia N-Substate Algorithm's Enhancement Terms

Kyle Goslin, Markus Hofmann


Automatic Search Query Enhancement (ASQE) is the process of modifying a user submitted search query and identifying terms that can be added or removed to enhance the relevance of documents retrieved from a search engine. ASQE differs from other enhancement approaches as no human interaction is required. ASQE algorithms typically rely on a source of a priori knowledge to aid the process of identifying relevant enhancement terms. This paper describes the results of a qualitative analysis of the enhancement terms generated by the Wikipedia NSubstate Algorithm (WNSSA) for ASQE. The WNSSA utilises Wikipedia as the sole source of a priori knowledge during the query enhancement process. As each Wikipedia article typically represents a single topic, during the enhancement process of the WNSSA, a mapping is performed between the user’s original search query and Wikipedia articles relevant to the query. If this mapping is performed correctly, a collection of potentially relevant terms and acronyms are accessible for ASQE. This paper reviews the results of a qualitative analysis process performed for the individual enhancement term generated for each of the 50 test topics from the TREC-9 Web Topic collection. The contributions of this paper include: (a) a qualitative analysis of generated WNSSA search query enhancement terms and (b) an analysis of the concepts represented in the TREC-9 Web Topics, detailing interpretation issues during query-to-Wikipedia article mapping performed by the WNSSA.


Automatic Search Query Enhancement; Text Analysis; Wikipedia

Full Text:



Asfari, Ounas, Doan, Bich-liên, Bourda, Yolaine and Sansonnet, Jean-Paul. 2009. “Personalized Access to Information by Query Reformulation Based on the State of the Current Task and User Profile.” Paper presented at Third International Conference on Advances in Semantic Processing, 113-116. IEEE. https://doi.org/10.1109/SEMAPRO.2009.17

Bazzanella, Barbara, Stoermer, Heiko, and Bouquet, Paolo. 2010. “Searching for individual entities: A query analysis.”, Paper presented at International Conference on Information Reuse & Integration, 115-120. IEEE. https://doi.org/10.1109/IRI.2010.5558955

Gao, Jianfeng, Xu , Gu and Xu, Jinxi. 2013. Query expansion using path-constrained random walks. Paper presented at 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13), 563-572. ACM. https://doi.org/10.1145/2484028.2484058

Goslin, Kyle, Hofmann, Markus. 2017. “A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge.” Paper presented at 9th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE'17), 6-13. ACM. https://doi.org/10.1145/3158354.3158356

Goslin, Kyle, Hofmann, Markus. 2018. “A Wikipedia powered state-based approach to automatic search query enhancement.” Journal of Information Processing & Management 54(4), 726-739. Elsevier. https://doi.org/10.1016/j.ipm.2017.10.001

Jansen, Bernard, Spink, Amanda, Bateman, Judy and Saracevic, Tefko. 1998. “Real life information retrieval: a study of user queries on the Web.” Paper presented at ACM SIGIR Forum 32, 5-17. ACM. https://doi.org/10.1145/281250.281253

Mastora, Anna, Monopoli, Maria and Kapidakis, Sarantos. 2008. “Term selection patterns for formulating queries: a User study focused on term semantics.” Paper presented at Third International Conference on Digital Information Management, 125-130. IEEE. https://doi.org/10.1109/ICDIM.2008.4746747

Ogilvie, Paul, Voorhees, Ellen and Callan, Jamie. 2009. “On the number of terms used in automatic query expansion.” Journal of Information Retrieval 12(6): 666. Springer. https://doi.org/10.1007/s10791-009-9104-1

Voorhees, Ellen M. 1994. “Query expansion using lexical-semantic relations.” Paper presented at the 17th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '94), 61-69. Springer-Verlag. https://doi.org/10.1007/978-1-4471-2099-5_7

Abstract Views

Metrics Loading ...

Metrics powered by PLOS ALM

Creative Commons License

This journal is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Universitat Politècnica de València

e-ISSN: 2530-9455   https://doi.org/10.4995/jclr