A qualitative analysis of the Wikipedia N-Substate Algorithm's Enhancement Terms


  • Kyle Goslin Technological University Dublin
  • Markus Hofmann Technological University Dublin




Automatic Search Query Enhancement, Text Analysis, Wikipedia


Automatic Search Query Enhancement (ASQE) is the process of modifying a user submitted search query and identifying terms that can be added or removed to enhance the relevance of documents retrieved from a search engine. ASQE differs from other enhancement approaches as no human interaction is required. ASQE algorithms typically rely on a source of a priori knowledge to aid the process of identifying relevant enhancement terms. This paper describes the results of a qualitative analysis of the enhancement terms generated by the Wikipedia NSubstate Algorithm (WNSSA) for ASQE. The WNSSA utilises Wikipedia as the sole source of a priori knowledge during the query enhancement process. As each Wikipedia article typically represents a single topic, during the enhancement process of the WNSSA, a mapping is performed between the user’s original search query and Wikipedia articles relevant to the query. If this mapping is performed correctly, a collection of potentially relevant terms and acronyms are accessible for ASQE. This paper reviews the results of a qualitative analysis process performed for the individual enhancement term generated for each of the 50 test topics from the TREC-9 Web Topic collection. The contributions of this paper include: (a) a qualitative analysis of generated WNSSA search query enhancement terms and (b) an analysis of the concepts represented in the TREC-9 Web Topics, detailing interpretation issues during query-to-Wikipedia article mapping performed by the WNSSA.


Download data is not yet available.

Author Biographies

Kyle Goslin, Technological University Dublin

Lecturer in Computing, Department of Informatics andEngineering

Markus Hofmann, Technological University Dublin

Department of Informatics andEngineering


Asfari, Ounas, Doan, Bich-liên, Bourda, Yolaine and Sansonnet, Jean-Paul. 2009. “Personalized Access to Information by Query Reformulation Based on the State of the Current Task and User Profile.” Paper presented at Third International Conference on Advances in Semantic Processing, 113-116. IEEE. https://doi.org/10.1109/SEMAPRO.2009.17

Bazzanella, Barbara, Stoermer, Heiko, and Bouquet, Paolo. 2010. “Searching for individual entities: A query analysis.”, Paper presented at International Conference on Information Reuse & Integration, 115-120. IEEE. https://doi.org/10.1109/IRI.2010.5558955

Gao, Jianfeng, Xu , Gu and Xu, Jinxi. 2013. Query expansion using path-constrained random walks. Paper presented at 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13), 563-572. ACM. https://doi.org/10.1145/2484028.2484058

Goslin, Kyle, Hofmann, Markus. 2017. “A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge.” Paper presented at 9th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE'17), 6-13. ACM. https://doi.org/10.1145/3158354.3158356

Goslin, Kyle, Hofmann, Markus. 2018. “A Wikipedia powered state-based approach to automatic search query enhancement.” Journal of Information Processing & Management 54(4), 726-739. Elsevier. https://doi.org/10.1016/j.ipm.2017.10.001

Jansen, Bernard, Spink, Amanda, Bateman, Judy and Saracevic, Tefko. 1998. “Real life information retrieval: a study of user queries on the Web.” Paper presented at ACM SIGIR Forum 32, 5-17. ACM. https://doi.org/10.1145/281250.281253

Mastora, Anna, Monopoli, Maria and Kapidakis, Sarantos. 2008. “Term selection patterns for formulating queries: a User study focused on term semantics.” Paper presented at Third International Conference on Digital Information Management, 125-130. IEEE. https://doi.org/10.1109/ICDIM.2008.4746747

Ogilvie, Paul, Voorhees, Ellen and Callan, Jamie. 2009. “On the number of terms used in automatic query expansion.” Journal of Information Retrieval 12(6): 666. Springer. https://doi.org/10.1007/s10791-009-9104-1

Voorhees, Ellen M. 1994. “Query expansion using lexical-semantic relations.” Paper presented at the 17th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '94), 61-69. Springer-Verlag. https://doi.org/10.1007/978-1-4471-2099-5_7