Use and application of artificial intelligence in public policy evaluation. A Scoping Review.

Barbara Branchini

https://orcid.org/0000-0002-5202-0914

Spain

Fresno Servicios Sociales S.L.

Barbara Branchini is Project Manager in the Studies, Evaluation and Strategies department at Fresno. Her academic background spans the University of Florence, where she obtained her degree in Political Sciences, and Complutense University of Madrid, where she completed a Master's in Evaluation of Programmes and Public Policies, work that earned her the Third Prize from the European Evaluation Society.

She possesses over 12 years of experience in strategic consultancy for organisations, management of European research and social innovation projects, evaluation of projects and programmes, and implementation of participatory processes in urban innovation interventions. She actively participates in the study and dissemination of evaluation through organisations such as APROEVAL, the European Society of Evaluation (EES) and Avalua·lab, the Laboratory for Analysis and Evaluation of Public Policies in València.

Since 2022, she has been part of the Fresno team, where she focuses on developing monitoring systems and evaluating social inclusion and employment programmes and policies.

Beatriz Vallina Acha

https://orcid.org/0000-0002-6338-8028

Spain

Universitat Politècnica de Catalunya image/svg+xml

Beatriz is an interdisciplinary researcher with extensive experience in social innovation, public policy evaluation and operations management. She holds a PhD in Social Sciences (Cum Laude) from the Universitat de València - Design, Management and Evaluation of Social Welfare Public Policies.

Her career spans over 10 years as a researcher and consultant in European projects focusing on social innovation, technology, health and sustainability. She has authored scientific publications on co-production methodologies and interventions in vulnerable populations.

Currently, she is pursuing an MSc in Supply Chain Management and Technology (Universitat Ramon Llull - la Salle BCN) and developing her second doctoral thesis in Agrifood Economics at the UPV, examining the socioeconomic impacts of Generative AI in the agroindustry.

|

Accepted: 2025-09-03

|

Published: 2025-11-04

DOI: https://doi.org/10.4995/jpeval.2025.23837
Funding Data

Downloads

Cover

Keywords:

big data, evaluation, public policy, artificial intelligence, review

Supporting agencies:

Besaldi - Evaluation Body for Employment and Inclusion Policies. Basque Government

Abstract:

Context: Despite growing interest in using AI to improve public policies and services, its application in evaluation lacks systematised evidence and scientific publications on the implications of these technologies in evaluation practice.

Objective: To map, through available literature, the current and emerging state of AI application in public policy evaluation.

Methods: We conducted the study through an exploratory literature review, or scoping review, following the methodological framework of Levac et al. (2010). The synthesis was carried out with a thematic analysis of 27 studies and analytical-theoretical literature.

Results: AI is increasingly being applied in various phases of the evaluation cycle, primarily as a support tool (“human-in-the-loop”), especially in the operationalisation, report preparation and results dissemination phases. Use cases include the analysis of large volumes of administrative and textual data through Machine Learning (ML) and Natural Language Processing (NLP), the performance of simulations and counterfactual analyses, the potential for real-time monitoring, and the use of Large Language Models (LLMs) for synthesis or visualisation tasks, among others.

Conclusions and implications: Current evidence points toward the predominance of human-machine collaboration models (human-in-the-loop), indicating that realising the benefits of AI in this field does not involve total automation, but rather strategic, critically reflective and contextually adapted implementation.

Show more Show less

References:

Alexander, W. (2022). Applying Artificial Intelligence to Public Sector Decision Making [Major Research Paper]. University of Ottawa.

Arguelles Toache, E. (2023). Ventajas y desventajas del uso de la Inteligencia Artificial en el ciclo de las políticas públicas: Análisis de casos internacionales. Acta universitaria, 33. https://www.redalyc.org/journal/416/41677664054/html/

Babšek, M., Ravšelj, D., Umek, L., & Aristovnik, A. (2025). Artificial Intelligence Adoption in Public Administration: An Overview of Top-Cited Articles and Practical Applications. AI, 6(3), Article 3. https://doi.org/10.3390/ai6030044

Bajgar, M., & Criscuolo, C. (2019). Designing Evaluation of Modern Apprenticeships in Scotland. In N. Crato & P. Paruolo (Eds.), Data-Driven Policy Impact Evaluation (pp. 289–311). Springer International Publishing. https://doi.org/10.1007/978-3-319-78461-8_18

Bamberger, M., & York, P. (2020). Transforming Evaluation in the 4th Industrial Revolution: Exciting Opportunities and New Challenges (pp. 11–21) [eVALUation Matters, Second Quarter 2020, 11-21.]. African Development Bank Group. https://idev.afdb.org/sites/default/files/documents/files/EM%20Q2-2020-article1-challenges%20and%20opportunities%204th%20industrial%20revolution%28En%29.pdf

Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Beer, D. (2017). The social power of algorithms. Information, Communication & Society, 20(1), 1–13. https://doi.org/10.1080/1369118X.2016.1216147

Bertolucci, M. (2024). L’intelligence artificielle dans le secteur public: Revue de la littérature et programme de recherche: Gestion et Management Public, Vol. 12(3), 71–91. https://doi.org/10.3917/gmp.123.0071

Better Evaluation. (2014). Rainbow Framework. https://www.betterevaluation.org/frameworks-guides/rainbow-framework

Bilbao-Goyoaga, E. (2023). Perceptions Matter: Quasi-Experimental Evidence on the Effects of Spain’s New Minimum Income on Households’ Financial Wellbeing (No. Social Policy Working Paper 02-23; LSE Department of Social Policy.). https://www.lse.ac.uk/social-policy/Assets/Documents/PDF/working-paper-series/WPS-02-23-Eugenia-Bilbao-Goyoaga.pdf

Bohni Nielsen, S., Mazzeo Rinaldi, F., & Petersson, G. J. (2024). Artificial Intelligence and Evaluation: Emerging Technologies and Their Implications for Evaluation (1st ed.). Routledge. https://doi.org/10.4324/9781003512493

Bouyousfi, S. E., & Ouedraogo, M. (2024). Artificial intelligence and big data-driven evaluation research and practices: A systematic literature review. Evaluation, 13563890241289937. https://doi.org/10.1177/13563890241289937

Brioscú, A., Lauringson, A., Saint-Martin, A., & Xenogiani, T. (2024). A new dawn for Public Employment Services. Service Delivery in the age of Artificial Intelligence (No. 19; OECD Artificial Intelligence Papers). https://www.oecd.org/content/dam/oecd/en/publications/reports/2024/06/a-new-dawn-for-public-employment-services_25e1e70e/5dc3eb8e-en.pdf

Buttow, C. V. (2024). Data-Driven Policy Making and Its Impacts on Regulation: A Study of the OECD Vision in the Light of Data Critical Studies. European Journal of Risk Regulation, 1–19. https://doi.org/10.1017/err.2024.73

Carabantes, M. (2020). Black-box artificial intelligence: An epistemological and critical analysis. AI & SOCIETY, 35(2), 309–317. https://doi.org/10.1007/s00146-019-00888-w

Carlizzi, D. N., & Quattrone, A. (2023). Artificial Intelligence and Data Governance for Precision ePolicy Cycle. In D. Marino & M. Monaca (Eds.), Artificial Intelligence and Economics: The Key to the Future (pp. 67–84). Springer International Publishing. https://doi.org/10.1007/978-3-031-14605-3_6

Crato, N., & Paruolo, P. (2019). The Power of Microdata: An Introduction. In N. Crato & P. Paruolo (Eds.), Data-Driven Policy Impact Evaluation (pp. 1–14). Springer International Publishing. https://doi.org/10.1007/978-3-319-78461-8_1

Directorate-General for Employment, Social Affairs and Inclusion, European Commission, ICF, & Willen, P. (2025). Opportunities of AI within PES processes and services: Exploring PES experiences, best practices and emerging business value. Publications Office. https://data.europa.eu/doi/10.2767/84293

Dwivedi, Y. K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., Duan, Y., Dwivedi, R., Edwards, J., Eirug, A., Galanos, V., Ilavarasan, P. V., Janssen, M., Jones, P., Kar, A. K., Kizgin, H., Kronemann, B., Lal, B., Lucini, B.,…Williams, M. D. (2021). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994. https://doi.org/10.1016/j.ijinfomgt.2019.08.002

Elevati, C. (2025). 73—L’Intelligenza Artificiale come alleato strategico per i professionisti MEAL |. LinkedIn. https://www.linkedin.com/pulse/73-lintelligenza-artificiale-come-alleato-per-i-meal-elevati-4xvff/?trackingId=rvljDuu5T5yjRNJAMx9XPQ%3D%3D

European Commission. (2018). Communication from the Commission: Artificial Intelligence for Europe (COM(2018) 237 final). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM:2018:237:FIN

European Commission. (2024). Evaluation Handbook (2024). https://capacity4dev.europa.eu/library/evaluation-handbook-2024_en

Ferretti, S. (2023). Hacking by the prompt: Innovative ways to utilize ChatGPT for evaluators. New Directions for Evaluation, 2023(178–179), 73–84. https://doi.org/10.1002/ev.20557

Franzen, S., Quang, C., Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., & Raimondo, E. (2022). Advanced Content Analysis: Can Artificial Intelligence Accelerate Theory-Driven Complex Program Evaluation? (IEG Methods and Evaluation Capacity Development Working Paper Series). International Bank for Reconstruction and Development / The World Bank.

Franzen, S., Quang, Cuong, Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., & Raimondo, E. (2022). Advanced Content Analysis: Can Artificial Intelligence Accelerate Theory-Driven Complex Program Evaluation? (Independent Evaluation Group) [IEG Methods and Evaluation Capacity Development Working Paper Series]. World Bank.

Goodfellow, I., Courville, A., & Bengio, Y. (2016). Deep learning. The MIT Press.

Hasan Chy, M. K., & Nana Buadi, O. (2024). Role of Machine Learning in Policy Making and Evaluation. International Journal of Innovative Science and Research Technology (IJISRT), 456–463. https://doi.org/10.38124/ijisrt/IJISRT24OCT687

Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023(178–179), 33–46. https://doi.org/10.1002/ev.20556

International Organization for Standardization & International Electrotechnical Commission. (2022). ISO/IEC 22989:2022(en), Information technology—Artificial intelligence—Artificial intelligence concepts and terminology. https://www.iso.org/obp/ui/#iso:std:iso-iec:22989:ed-1:v1:en

Jacob, S. (2025). Artificial Intelligence and the Future of Evaluation: From Augmented to Automated Evaluation. Digit. Gov.: Res. Pract., 6(1), 10:1-10:10. https://doi.org/10.1145/3696009

Kates, A. W., & Wilson, K. (2023). AI for Evaluators: Opportunities and Risks. Journal of MultiDisciplinary Evaluation, 19(45), 99–104. https://doi.org/10.56645/jmde.v19i45.907

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481

Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: Advancing the methodology. Implementation Science, 5(1), 69. https://doi.org/10.1186/1748-5908-5-69

Ligero Lasa, J. A. (2015). Tres métodos de evaluación de programas y servicios. Fundación Caja Madrid.

MacArthur, J., Moung, V., Carrard, N., & Willetts, J. (2025). Personas for program evaluation: Insights from a gender-focused evaluation in Cambodia. Evaluation, 31(1), 70–91. https://doi.org/10.1177/13563890241284425

Mason, S. (2023). Finding a safe zone in the highlands: Exploring evaluator competencies in the world of AI. New Directions for Evaluation, 2023(178–179), 11–22. https://doi.org/10.1002/ev.20561

Ministerio para la Transformación Digital y de la Función Pública. (2024). Estrategia de Inteligencia Artificial 2024. https://portal.mineco.gob.es/es-es/digitalizacionIA/Documents/Estrategia_IA_2024.pdf

Moyano-Arias, R. J., Salazar-Alvarez, E. G., & Toalombo-Vargas, V. M. (2024). Matemáticas Aplicadas a la Programación: Una Revisión sobre la Solución de Algoritmos Complejos. MQRInvestigar, 8(4), 3667–3692. https://doi.org/10.56048/MQR20225.8.4.2024.3667-3692

Newman, J., & Mintrom, M. (2023). Mapping the discourse on evidence-based policy, artificial intelligence, and the ethical practice of policy analysis. Journal of European Public Policy, 30(9), 1839–1859. https://doi.org/10.1080/13501763.2023.2193223

Picciotto, R. (2020). Evaluation and the Big Data Challenge. American Journal of Evaluation, 41(2), 166–181. https://doi.org/10.1177/1098214019850334

Potasznik, A. (2023). ABCs: Differentiating Algorithmic Bias, Automation Bias, and Automation Complacency. 2023 IEEE International Symposium on Ethics in Engineering, Science, and Technology (Ethics), 1–5. https://doi.org/10.1109/ETHICS57328.2023.10155094

Powell, S., Copestake, J., & Remnant, F. (2024). Causal mapping for evaluators. Evaluation, 30(1), 100–119. https://doi.org/10.1177/13563890231196601

Raveh, E., Ofek, Y., Bekkerman, R., & Cohen, H. (2020). Applying Big Data visualization to detect trends in 30 years of performance reports. Evaluation, 26(4), 516–540. https://doi.org/10.1177/1356389020905322

Recommendation of the Council on Artificial Intelligence, No. OECD/LEGAL/0449, Compendium of Legal Instruments of the OECD (2024). https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449

Schimpf, C., Barbrook-Johnson, P., & Castellani, B. (2021). Cased-based modelling and scenario simulation for ex-post evaluation. Evaluation, 27(1), 116–137. https://doi.org/10.1177/1356389020978490

Stadelmann, T. (2025). Evidence-based AI risk assessment for public policy. Public Money & Management, 1–3. https://doi.org/10.1080/09540962.2025.2541304

Stern, E. (2020). Editorial. Evaluation, 26(4), 401–403. https://doi.org/10.1177/1356389020966442

Straub, V. J., Morgan, D., Bright, J., & Margetts, H. (2023). Artificial intelligence in government: Concepts, standards, and a unified framework. Government Information Quarterly, 40(4), 101881. https://doi.org/10.1016/j.giq.2023.101881

Tangi, L., van Noordt, C., Combetto, M., Gattwinkel, D., & Pignatelli, F. (2022). AI Watch: European landscape on the use of artificial intelligence by the public sector. European Commission. Publications Office of the European Union. https://data.europa.eu/doi/10.2760/39336

Tilton, Z., Lavelle, J. M., Ford, T., & Montenegro, M. (2023). Artificial intelligence and the future of evaluation education: Possibilities and prototypes. New Directions for Evaluation. https://doi.org/10.1002/EV.20564

Valle-Cruz, D., Criado, J. I., Sandoval-Almazán, R., & Ruvalcaba-Gomez, E. A. (2020). Assessing the public policy-cycle framework in the age of artificial intelligence: From agenda-setting to policy evaluation. Government Information Quarterly, 37(4), 101509. https://doi.org/10.1016/j.giq.2020.101509

Yar, M. A., Hamdan, M., Anshari, M., Fitriyani, N. L., & Syafrudin, M. (2024). Governing with Intelligence: The Impact of Artificial Intelligence on Policy Development. Information, 15(9), 556. https://doi.org/10.3390/info15090556

York, P. (2024). The Future of Evaluation Analytics. In S. Bohni Nielsen, F. Mazzeo Rinaldi, & G. J. Petersson, Artificial Intelligence and Evaluation (1st ed., pp. 219–241). Routledge. https://doi.org/10.4324/9781003512493-11

York, P., & Bamberger, M. (2020). Measuring Results and Impact in the Age of Big Data: The Nexus of Evaluation, Analytics, and Digital Technology. The Rockefeller Foundation. https://www.rockefellerfoundation.org/reports/measuring-results-and-impact-in-the-age-of-big-data-the-nexus-of-evaluation-analytics-and-digital-technology/

Ziulu, V., Anuj, H., Hagh, A., Raimondo, E., & Vaessen, J. (2024). Extracting Meaning from Textual Data for Evaluation. In S. Bohni Nielsen, F. Mazzeo Rinaldi, & G. J. Petersson, Artificial Intelligence and Evaluation (1st ed., pp. 78–102). Routledge. https://doi.org/10.4324/9781003512493-5

Züger, T., & Asghari, H. (2023). AI for the public. How public interest theory shifts the discourse on AI. AI & Society, 38(2), 815–828. https://doi.org/10.1007/s00146-022-01480-5

Show more Show less