Use and application of artificial intelligence in public policy evaluation. A Scoping Review.
Submitted: 2025-05-09
|Accepted: 2025-09-03
|Published: 2025-11-04
Copyright (c) 2025 Barbara Branchini, Beatriz Vallina Acha

This work is licensed under a Creative Commons Attribution 4.0 International License.
Downloads
Keywords:
big data, evaluation, public policy, artificial intelligence, review
Supporting agencies:
Besaldi - Evaluation Body for Employment and Inclusion Policies. Basque Government
Abstract:
Context: Despite growing interest in using AI to improve public policies and services, its application in evaluation lacks systematised evidence and scientific publications on the implications of these technologies in evaluation practice.
Objective: To map, through available literature, the current and emerging state of AI application in public policy evaluation.
Methods: We conducted the study through an exploratory literature review, or scoping review, following the methodological framework of Levac et al. (2010). The synthesis was carried out with a thematic analysis of 27 studies and analytical-theoretical literature.
Results: AI is increasingly being applied in various phases of the evaluation cycle, primarily as a support tool (“human-in-the-loop”), especially in the operationalisation, report preparation and results dissemination phases. Use cases include the analysis of large volumes of administrative and textual data through Machine Learning (ML) and Natural Language Processing (NLP), the performance of simulations and counterfactual analyses, the potential for real-time monitoring, and the use of Large Language Models (LLMs) for synthesis or visualisation tasks, among others.
Conclusions and implications: Current evidence points toward the predominance of human-machine collaboration models (human-in-the-loop), indicating that realising the benefits of AI in this field does not involve total automation, but rather strategic, critically reflective and contextually adapted implementation.
References:
Alexander, W. (2022). Applying Artificial Intelligence to Public Sector Decision Making [Major Research Paper]. University of Ottawa.
Arguelles Toache, E. (2023). Ventajas y desventajas del uso de la Inteligencia Artificial en el ciclo de las políticas públicas: Análisis de casos internacionales. Acta universitaria, 33. https://www.redalyc.org/journal/416/41677664054/html/
Babšek, M., Ravšelj, D., Umek, L., & Aristovnik, A. (2025). Artificial Intelligence Adoption in Public Administration: An Overview of Top-Cited Articles and Practical Applications. AI, 6(3), Article 3. https://doi.org/10.3390/ai6030044
Bajgar, M., & Criscuolo, C. (2019). Designing Evaluation of Modern Apprenticeships in Scotland. In N. Crato & P. Paruolo (Eds.), Data-Driven Policy Impact Evaluation (pp. 289–311). Springer International Publishing. https://doi.org/10.1007/978-3-319-78461-8_18
Bamberger, M., & York, P. (2020). Transforming Evaluation in the 4th Industrial Revolution: Exciting Opportunities and New Challenges (pp. 11–21) [eVALUation Matters, Second Quarter 2020, 11-21.]. African Development Bank Group. https://idev.afdb.org/sites/default/files/documents/files/EM%20Q2-2020-article1-challenges%20and%20opportunities%204th%20industrial%20revolution%28En%29.pdf
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Beer, D. (2017). The social power of algorithms. Information, Communication & Society, 20(1), 1–13. https://doi.org/10.1080/1369118X.2016.1216147
Bertolucci, M. (2024). L’intelligence artificielle dans le secteur public: Revue de la littérature et programme de recherche: Gestion et Management Public, Vol. 12(3), 71–91. https://doi.org/10.3917/gmp.123.0071
Better Evaluation. (2014). Rainbow Framework. https://www.betterevaluation.org/frameworks-guides/rainbow-framework
Bilbao-Goyoaga, E. (2023). Perceptions Matter: Quasi-Experimental Evidence on the Effects of Spain’s New Minimum Income on Households’ Financial Wellbeing (No. Social Policy Working Paper 02-23; LSE Department of Social Policy.). https://www.lse.ac.uk/social-policy/Assets/Documents/PDF/working-paper-series/WPS-02-23-Eugenia-Bilbao-Goyoaga.pdf
Bohni Nielsen, S., Mazzeo Rinaldi, F., & Petersson, G. J. (2024). Artificial Intelligence and Evaluation: Emerging Technologies and Their Implications for Evaluation (1st ed.). Routledge. https://doi.org/10.4324/9781003512493
Bouyousfi, S. E., & Ouedraogo, M. (2024). Artificial intelligence and big data-driven evaluation research and practices: A systematic literature review. Evaluation, 13563890241289937. https://doi.org/10.1177/13563890241289937
Brioscú, A., Lauringson, A., Saint-Martin, A., & Xenogiani, T. (2024). A new dawn for Public Employment Services. Service Delivery in the age of Artificial Intelligence (No. 19; OECD Artificial Intelligence Papers). https://www.oecd.org/content/dam/oecd/en/publications/reports/2024/06/a-new-dawn-for-public-employment-services_25e1e70e/5dc3eb8e-en.pdf
Buttow, C. V. (2024). Data-Driven Policy Making and Its Impacts on Regulation: A Study of the OECD Vision in the Light of Data Critical Studies. European Journal of Risk Regulation, 1–19. https://doi.org/10.1017/err.2024.73
Carabantes, M. (2020). Black-box artificial intelligence: An epistemological and critical analysis. AI & SOCIETY, 35(2), 309–317. https://doi.org/10.1007/s00146-019-00888-w
Carlizzi, D. N., & Quattrone, A. (2023). Artificial Intelligence and Data Governance for Precision ePolicy Cycle. In D. Marino & M. Monaca (Eds.), Artificial Intelligence and Economics: The Key to the Future (pp. 67–84). Springer International Publishing. https://doi.org/10.1007/978-3-031-14605-3_6
Crato, N., & Paruolo, P. (2019). The Power of Microdata: An Introduction. In N. Crato & P. Paruolo (Eds.), Data-Driven Policy Impact Evaluation (pp. 1–14). Springer International Publishing. https://doi.org/10.1007/978-3-319-78461-8_1
Directorate-General for Employment, Social Affairs and Inclusion, European Commission, ICF, & Willen, P. (2025). Opportunities of AI within PES processes and services: Exploring PES experiences, best practices and emerging business value. Publications Office. https://data.europa.eu/doi/10.2767/84293
Dwivedi, Y. K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., Duan, Y., Dwivedi, R., Edwards, J., Eirug, A., Galanos, V., Ilavarasan, P. V., Janssen, M., Jones, P., Kar, A. K., Kizgin, H., Kronemann, B., Lal, B., Lucini, B.,…Williams, M. D. (2021). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994. https://doi.org/10.1016/j.ijinfomgt.2019.08.002
Elevati, C. (2025). 73—L’Intelligenza Artificiale come alleato strategico per i professionisti MEAL |. LinkedIn. https://www.linkedin.com/pulse/73-lintelligenza-artificiale-come-alleato-per-i-meal-elevati-4xvff/?trackingId=rvljDuu5T5yjRNJAMx9XPQ%3D%3D
European Commission. (2018). Communication from the Commission: Artificial Intelligence for Europe (COM(2018) 237 final). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM:2018:237:FIN
European Commission. (2024). Evaluation Handbook (2024). https://capacity4dev.europa.eu/library/evaluation-handbook-2024_en
Ferretti, S. (2023). Hacking by the prompt: Innovative ways to utilize ChatGPT for evaluators. New Directions for Evaluation, 2023(178–179), 73–84. https://doi.org/10.1002/ev.20557
Franzen, S., Quang, C., Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., & Raimondo, E. (2022). Advanced Content Analysis: Can Artificial Intelligence Accelerate Theory-Driven Complex Program Evaluation? (IEG Methods and Evaluation Capacity Development Working Paper Series). International Bank for Reconstruction and Development / The World Bank.
Franzen, S., Quang, Cuong, Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., & Raimondo, E. (2022). Advanced Content Analysis: Can Artificial Intelligence Accelerate Theory-Driven Complex Program Evaluation? (Independent Evaluation Group) [IEG Methods and Evaluation Capacity Development Working Paper Series]. World Bank.
Goodfellow, I., Courville, A., & Bengio, Y. (2016). Deep learning. The MIT Press.
Hasan Chy, M. K., & Nana Buadi, O. (2024). Role of Machine Learning in Policy Making and Evaluation. International Journal of Innovative Science and Research Technology (IJISRT), 456–463. https://doi.org/10.38124/ijisrt/IJISRT24OCT687
Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023(178–179), 33–46. https://doi.org/10.1002/ev.20556
International Organization for Standardization & International Electrotechnical Commission. (2022). ISO/IEC 22989:2022(en), Information technology—Artificial intelligence—Artificial intelligence concepts and terminology. https://www.iso.org/obp/ui/#iso:std:iso-iec:22989:ed-1:v1:en
Jacob, S. (2025). Artificial Intelligence and the Future of Evaluation: From Augmented to Automated Evaluation. Digit. Gov.: Res. Pract., 6(1), 10:1-10:10. https://doi.org/10.1145/3696009
Kates, A. W., & Wilson, K. (2023). AI for Evaluators: Opportunities and Risks. Journal of MultiDisciplinary Evaluation, 19(45), 99–104. https://doi.org/10.56645/jmde.v19i45.907
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481
Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: Advancing the methodology. Implementation Science, 5(1), 69. https://doi.org/10.1186/1748-5908-5-69
Ligero Lasa, J. A. (2015). Tres métodos de evaluación de programas y servicios. Fundación Caja Madrid.
MacArthur, J., Moung, V., Carrard, N., & Willetts, J. (2025). Personas for program evaluation: Insights from a gender-focused evaluation in Cambodia. Evaluation, 31(1), 70–91. https://doi.org/10.1177/13563890241284425
Mason, S. (2023). Finding a safe zone in the highlands: Exploring evaluator competencies in the world of AI. New Directions for Evaluation, 2023(178–179), 11–22. https://doi.org/10.1002/ev.20561
Ministerio para la Transformación Digital y de la Función Pública. (2024). Estrategia de Inteligencia Artificial 2024. https://portal.mineco.gob.es/es-es/digitalizacionIA/Documents/Estrategia_IA_2024.pdf
Moyano-Arias, R. J., Salazar-Alvarez, E. G., & Toalombo-Vargas, V. M. (2024). Matemáticas Aplicadas a la Programación: Una Revisión sobre la Solución de Algoritmos Complejos. MQRInvestigar, 8(4), 3667–3692. https://doi.org/10.56048/MQR20225.8.4.2024.3667-3692
Newman, J., & Mintrom, M. (2023). Mapping the discourse on evidence-based policy, artificial intelligence, and the ethical practice of policy analysis. Journal of European Public Policy, 30(9), 1839–1859. https://doi.org/10.1080/13501763.2023.2193223
Picciotto, R. (2020). Evaluation and the Big Data Challenge. American Journal of Evaluation, 41(2), 166–181. https://doi.org/10.1177/1098214019850334
Potasznik, A. (2023). ABCs: Differentiating Algorithmic Bias, Automation Bias, and Automation Complacency. 2023 IEEE International Symposium on Ethics in Engineering, Science, and Technology (Ethics), 1–5. https://doi.org/10.1109/ETHICS57328.2023.10155094
Powell, S., Copestake, J., & Remnant, F. (2024). Causal mapping for evaluators. Evaluation, 30(1), 100–119. https://doi.org/10.1177/13563890231196601
Raveh, E., Ofek, Y., Bekkerman, R., & Cohen, H. (2020). Applying Big Data visualization to detect trends in 30 years of performance reports. Evaluation, 26(4), 516–540. https://doi.org/10.1177/1356389020905322
Recommendation of the Council on Artificial Intelligence, No. OECD/LEGAL/0449, Compendium of Legal Instruments of the OECD (2024). https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
Schimpf, C., Barbrook-Johnson, P., & Castellani, B. (2021). Cased-based modelling and scenario simulation for ex-post evaluation. Evaluation, 27(1), 116–137. https://doi.org/10.1177/1356389020978490
Stadelmann, T. (2025). Evidence-based AI risk assessment for public policy. Public Money & Management, 1–3. https://doi.org/10.1080/09540962.2025.2541304
Stern, E. (2020). Editorial. Evaluation, 26(4), 401–403. https://doi.org/10.1177/1356389020966442
Straub, V. J., Morgan, D., Bright, J., & Margetts, H. (2023). Artificial intelligence in government: Concepts, standards, and a unified framework. Government Information Quarterly, 40(4), 101881. https://doi.org/10.1016/j.giq.2023.101881
Tangi, L., van Noordt, C., Combetto, M., Gattwinkel, D., & Pignatelli, F. (2022). AI Watch: European landscape on the use of artificial intelligence by the public sector. European Commission. Publications Office of the European Union. https://data.europa.eu/doi/10.2760/39336
Tilton, Z., Lavelle, J. M., Ford, T., & Montenegro, M. (2023). Artificial intelligence and the future of evaluation education: Possibilities and prototypes. New Directions for Evaluation. https://doi.org/10.1002/EV.20564
Valle-Cruz, D., Criado, J. I., Sandoval-Almazán, R., & Ruvalcaba-Gomez, E. A. (2020). Assessing the public policy-cycle framework in the age of artificial intelligence: From agenda-setting to policy evaluation. Government Information Quarterly, 37(4), 101509. https://doi.org/10.1016/j.giq.2020.101509
Yar, M. A., Hamdan, M., Anshari, M., Fitriyani, N. L., & Syafrudin, M. (2024). Governing with Intelligence: The Impact of Artificial Intelligence on Policy Development. Information, 15(9), 556. https://doi.org/10.3390/info15090556
York, P. (2024). The Future of Evaluation Analytics. In S. Bohni Nielsen, F. Mazzeo Rinaldi, & G. J. Petersson, Artificial Intelligence and Evaluation (1st ed., pp. 219–241). Routledge. https://doi.org/10.4324/9781003512493-11
York, P., & Bamberger, M. (2020). Measuring Results and Impact in the Age of Big Data: The Nexus of Evaluation, Analytics, and Digital Technology. The Rockefeller Foundation. https://www.rockefellerfoundation.org/reports/measuring-results-and-impact-in-the-age-of-big-data-the-nexus-of-evaluation-analytics-and-digital-technology/
Ziulu, V., Anuj, H., Hagh, A., Raimondo, E., & Vaessen, J. (2024). Extracting Meaning from Textual Data for Evaluation. In S. Bohni Nielsen, F. Mazzeo Rinaldi, & G. J. Petersson, Artificial Intelligence and Evaluation (1st ed., pp. 78–102). Routledge. https://doi.org/10.4324/9781003512493-5
Züger, T., & Asghari, H. (2023). AI for the public. How public interest theory shifts the discourse on AI. AI & Society, 38(2), 815–828. https://doi.org/10.1007/s00146-022-01480-5


