PhD in Economics · Researcher in Statistics

|

Explaining the Unexplainable
Machine Learning meets Statistical Interpretation

I work at the intersection of statistics and machine learning, developing methods to make complex models genuinely understandable. Most of my research revolves around ensemble models and the question of what they actually do.

01

About Me

Agostino Gnasso

I am a Doctor of Economics from the University of Naples Federico II. My research sits at the boundary between statistical methodology and machine learning, with a focus on Explainable Machine Learning — developing frameworks that make powerful black-box models transparent and interpretable, in ways that are meaningful for researchers and decision-makers alike.

The main thread of my work is E2Tree (Explainable Ensemble Trees), a method that represents Random Forests and other ensemble models through a single, interpretable tree structure. The goal is to produce explanations that are genuinely faithful to what the model does, in both classification and regression settings. Several papers from this line of work have appeared in leading statistical journals.

Beyond interpretability, I work on applied problems in health economics and social science — predicting depression from national survey data, evaluating hospital quality, and studying the relationship between scientific output and patient outcomes. I am part of the K-Synth Research Lab and contribute to Bibliometrix, an R package used by thousands of researchers worldwide for systematic literature reviews and science mapping.

Research Interests
Explainable AI (XAI) Machine Learning Random Forests Ensemble Methods Decision Trees Statistical Inference Bibliometrics Health Economics Science Mapping R Programming Applied Statistics Data Science
02

Research Areas

Explainable AI (XAI)

My main line of research. I develop methods — in particular E2Tree — that translate the internal logic of ensemble models like Random Forests into a single interpretable tree, covering both classification and regression settings.

Machine Learning & Ensembles

Random Forests, XGBoost, and other ensemble methods, studied both for their predictive properties and their interpretability. Part of this work focuses on how to measure whether an explanation actually captures what a model does.

Bibliometrics & Science Mapping

Quantitative analysis of scientific literature, citation networks, and research trends. I contribute to bibliometrix and Biblioshiny, open-source R tools used by researchers worldwide for systematic literature reviews and science mapping.

Applied Health Analytics

Applying machine learning to health and clinical data: depression prediction from national survey data, hospital service quality evaluation, and the relationship between research output and patient outcomes. Part of two PRIN-funded national projects.

Applied Economics

Statistical and machine learning methods applied to economic questions — composite indicators, stochastic evaluation frameworks, and public sector performance analysis. The common thread is bringing methodological rigour to problems that matter for policy.

Applied Quantitative Methods

Applying quantitative methods across disciplines — from social surveys and bibliographic databases to environmental and geophysical data. Some of the most interesting problems sit at the boundary between fields, and that is where I find myself most often.

Live · Currently in Progress

What I'm working on now

A live snapshot of the work currently moving forward — papers under review, methods in preparation, and projects in the pipeline.

Under Review since 2025

A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees

Defining a principled way to measure how faithfully a global surrogate (E2Tree) reconstructs the predictive behaviour of the underlying ensemble — across classification and regression.

Computational Statistics and Data Analysis
Under Review since 2025

Hospital Service Quality: A Stochastic Composite and Genetic-Matching Framework

Comparing service quality between research and non-research Italian public hospitals through stochastic composite indicators and genetic matching.

Applied Stochastic Models in Business and Industry
Under Submission since 2026

What Drives Saltwater Intrusion? A Hierarchical Bayesian Model for the Volturno River Estuary

Hierarchical Bayesian errors-in-variables model with Gaussian-process depth dependence for saltwater intrusion in the Volturno River. Temperature and dissolved oxygen identified as robustly associated covariates across seasons.

Environmental Statistics · Applied
Working Paper since 2026

Extending Explainable Ensemble Trees to Boosting-Based Ensembles

Extending the E2Tree explainability framework to gradient boosting ensembles (XGBoost, LightGBM), working towards a unified interpretation approach across bagging and boosting paradigms.

With Massimo Aria
Working Paper since 2026

Asymptotic Theory for the Normalized Loss of Interpretability (nLoI)

Establishing formal statistical foundations for the nLoI index, deriving its asymptotic properties and limit distributions as both sample size and ensemble size grow.

Statistical Theory · E2Tree Framework
Working Paper since 2026

Inference for Conditional Shapley Values via Vine Copulas

Developing inference tools for conditional Shapley-based feature importance, enabling asymptotically valid confidence intervals without resampling.

XAI · Statistical Inference
Working Paper since 2026

Turning Regularization into Inference: A Cross-Fitted Test for Pairwise Interactions in Gradient Boosting

Connecting regularization mechanisms in gradient boosting to formal hypothesis testing for pairwise feature interactions, yielding calibrated p-values.

Gradient Boosting · Hypothesis Testing
Working Paper since 2026

Decision-Support Mapping of Pyroclastic Cover Thickness: Spatial Validation, Uncertainty and the Sampling-Requirement Curve

Spatial validation, calibrated uncertainty intervals, and a sampling-requirement curve for machine-learning maps of pyroclastic cover thickness around Somma–Vesuvius and the Phlegrean Fields.

Applied Spatial Statistics · Landslide Hazard
Working Paper since 2026

Socio-sanitary Configuration and the Shape of National Longevity Trajectories

Investigating how socio-sanitary conditions and health system configurations shape national longevity trajectories, examining cross-country patterns in population-level lifespan dynamics.

Longevity · Comparative Health Policy
Working Paper since 2026

e2tree: Explainable Ensemble Trees in R

An R package that generates a single human-readable tree explaining the latent similarity structure learned by a tree ensemble. Dissimilarity between observations is derived from terminal-node co-occurrence frequencies — not a competing predictor, but a faithful reconstruction of the ensemble's grouping logic.

R Package · XAI · CRAN
Open to Collaboration

Think you can contribute? Let's talk.

Do you see a point of contact, or have an idea worth developing together? Let's find out.

If any of these threads intersect with your own work — methodological (interpretability, ensembles, statistical frameworks) or applied (health, policy, science mapping) — I'd be glad to explore a joint paper, exchange data, or simply trade ideas over a coffee.

Last updated · May 2026
03

Publications

2025
Extending Explainable Ensemble Trees to Regression Contexts
Aria, M., Gnasso, A., Iorio, C., & Fokkema, M.
Applied Stochastic Models in Business and Industry, 42(1), e70064
Fascia A · Area 13 / STAT-01/A
@article{aria2025extending,
  title     = {Extending Explainable Ensemble Trees to Regression Contexts},
  author    = {Aria, Massimo and Gnasso, Agostino and Iorio, Carmela and Fokkema, Marjolein},
  journal   = {Applied Stochastic Models in Business and Industry},
  volume    = {42},
  number    = {1},
  pages     = {e70064},
  year      = {2025},
  publisher = {Wiley},
  doi       = {10.1002/asmb.70064}
}
2025
Predicting depression in Italy using random forest through the E2Tree methodology
Aria, M., Gnasso, A., Rivieccio, R., & Siciliano, R.
Annals of Operations Research
Fascia A · Area 13 / STAT-01/A
@article{aria2025predicting,
  title     = {Predicting depression in Italy using random forest through the {E2Tree} methodology},
  author    = {Aria, Massimo and Gnasso, Agostino and Rivieccio, Roberta and Siciliano, Roberta},
  journal   = {Annals of Operations Research},
  year      = {2025},
  publisher = {Springer},
  doi       = {10.1007/s10479-025-06758-7}
}
2024
Explainable ensemble trees
Aria, M., Gnasso, A., Iorio, C., & Pandolfo, G.
Computational Statistics, 39(1), 3–19
Fascia A · Area 13 / STAT-01/A
@article{aria2024explainable,
  title     = {Explainable ensemble trees},
  author    = {Aria, Massimo and Gnasso, Agostino and Iorio, Carmela and Pandolfo, Giuseppe},
  journal   = {Computational Statistics},
  volume    = {39},
  number    = {1},
  pages     = {3--19},
  year      = {2024},
  publisher = {Springer},
  doi       = {10.1007/s00180-022-01312-6}
}
2021
A comparison among interpretative proposals for Random Forests
Aria M., Cuccurullo C., Gnasso A.
Machine Learning with Applications
@article{aria2021comparison,
  title     = {A comparison among interpretative proposals for {Random Forests}},
  author    = {Aria, Massimo and Cuccurullo, Corrado and Gnasso, Agostino},
  journal   = {Machine Learning with Applications},
  volume    = {6},
  pages     = {100094},
  year      = {2021},
  publisher = {Elsevier},
  doi       = {10.1016/j.mlwa.2021.100094}
}
2021
Assessment of Sleep Disturbance in Oral Lichen Planus and Validation of PSQI: a case-control multicenter study from the SIPMO
Adamo D., Gnasso A., et al.
Journal of Oral Pathology & Medicine
@article{adamo2021assessment,
  title   = {Assessment of Sleep Disturbance in Oral Lichen Planus and Validation of {PSQI}: a case-control multicenter study from the {SIPMO}},
  author  = {Adamo, Daniela and Gnasso, Agostino and others},
  journal = {Journal of Oral Pathology \& Medicine},
  year    = {2021},
  doi     = {10.1111/jop.13255}
}
WP
A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees
Aria M., Gnasso A., Iorio C.
Computational Statistics and Data Analysis
Under Review
WP
Evaluating the Quality of Hospital Services: A Stochastic Composite and Genetic-Matching Framework for Research and Non-Research Italian Public Hospitals
Gnasso A., Aria M., Beraldo S., Collaro M.
Applied Stochastic Models in Business and Industry
Under Review
WP
e2tree: Explainable Ensemble Trees in R
Aria, M., Gnasso, A.
Working Paper
WP
Extending Explainable Ensemble Trees to Boosting-Based Ensembles: A Unified Framework for the Interpretation of Tree Ensembles
Aria, M., Gnasso, A.
Working Paper
WP
Asymptotic Theory for the Normalized Loss of Interpretability (nLoI): U-Statistic Representation, Limit Distributions, and Two-Stage Consistency
Gnasso, A.
Working Note for JRSS-B / JASA Submission
Working Paper
WP
Inference for Conditional Shapley Values via Vine Copulas
Gnasso, A.
Working Paper
WP
Turning Regularization into Inference: A Cross-Fitted Test for Pairwise Interactions in Gradient Boosting
Gnasso, A.
Working Paper
WP
What Drives Saltwater Intrusion? A Hierarchical Bayesian Errors-in-Variables Model with Gaussian-Process Depth Dependence for the Volturno River Estuary
Gnasso, A., et al.
Under Submission
2026
A statistical approach on environmental data towards a Digital Twin of Volturno river transitional water system
Pacifico L., D'Adamo R., Matano F., Gnasso A., Scepi G.
SDS 2026 – Statistics and Data Science Conference Proceedings, Caserta
2025
"Can You Explain That?" E2Tree, SHAP, and LIME for Interpretable Random Forests
Gnasso, A., Aria, M.
CLADAG-VOC 2025 · Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham.
@inproceedings{gnasso2025canyou,
  title     = {``{Can You Explain That?}'' {E2Tree}, {SHAP}, and {LIME} for Interpretable {Random Forests}},
  author    = {Gnasso, Agostino and Aria, Massimo},
  booktitle = {CLADAG-VOC 2025 -- Studies in Classification, Data Analysis, and Knowledge Organization},
  publisher = {Springer, Cham},
  year      = {2025},
  doi       = {10.1007/978-3-032-03042-9_20}
}
2025
From Prediction to Explanation: Interpreting Risk Factors in Health Survey Analytics
Gnasso, A., Aria, M., Siciliano, R.
CLADAG-VOC 2025 · Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham.
@inproceedings{gnasso2025fromprediction,
  title     = {From Prediction to Explanation: Interpreting Risk Factors in Health Survey Analytics},
  author    = {Gnasso, Agostino and Aria, Massimo and Siciliano, Roberta},
  booktitle = {CLADAG-VOC 2025 -- Studies in Classification, Data Analysis, and Knowledge Organization},
  publisher = {Springer, Cham},
  year      = {2025},
  doi       = {10.1007/978-3-032-03042-9_21}
}
2025
Research excellence and patient perception: investigating the impact of AHSCs' scientific output
Gnasso, A., Sacco, D., Celardo, L., Smecca, M.A., Alabiso, C., & Spano, M.
IES 2025 – Innovation & Society · Invited Session IPS40
2025
From research to care: measuring the impact of AHSC on patient experience
Gnasso, A., Aria, M.
RC33 2025 – 9th International Conference on Social Science Methodology, Naples, Italy · Session 44: Sustainability and High-dimensional Data Analysis
2025
Explainable Decision Tree Ensembles
Gnasso, A., Aria, M., Iorio, C., & Fokkema, M.
SIS 2024 · Italian Statistical Society Series on Advances in Statistics. Springer, Cham.
@inproceedings{gnasso2025explainable,
  title     = {Explainable Decision Tree Ensembles},
  author    = {Gnasso, Agostino and Aria, Massimo and Iorio, Carmela and Fokkema, Marjolein},
  booktitle = {SIS 2024 -- Italian Statistical Society Series on Advances in Statistics},
  publisher = {Springer, Cham},
  year      = {2025},
  doi       = {10.1007/978-3-031-64447-4_21}
}
2024
The evolution of Explainable Artificial Intelligence (XAI): a preliminary systematic literature review
Gnasso, A., Aria, M.
Book of Short Papers – ASA Conference 2024
2024
Inside the black-box models through explainable decision tree ensembles
Iorio, C., Gnasso, A., Aria, M.
Programme & Abstracts – COMPSTAT 2024
2023
Unlocking explainability in ensemble trees
Aria, M., Gnasso, A., Iorio, C., Pandolfo, G.
Programme & Abstracts – CMStatistics 2023 and CFE 2023, ECOSTA
2022
Twenty Years of Random Forest: preliminary results of a systematic literature review
Aria M., Gnasso A., D'Aniello L.
IES 2022, pp. 225–230
2022
AI and ML in accounting and finance: a bibliometric review
Belfiore, A., Gnasso A., Cuccurullo, C., Aria, M.
JADT 2022 – 16th International Conference on Statistical Analysis of Textual Data, Vol. 1, pp. 95–101
2021
Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests
Aria M., Cuccurullo C., Gnasso A.
ASA 2021 – Book of Short Papers, Vol. 132, pp. 179–184. Firenze University Press
No publications match
04

Talks & Conferences

Invited Seminars
Invited Seminar
"E2Tree: Explaining Decision Tree Ensembles"
Leiden University · Institute of Psychology, Methodology and Statistics
STAT-TALK Colloquium — February 2024
PhD Seminar
"Explainable Ensemble Trees (E2Tree)"
University of Naples Federico II · Department of Economics and Statistics
October 2024
PhD Seminar
"Applying E2Tree to Economic Contexts"
University of Naples Federico II · Department of Economics and Statistics
April 2025
Conference Talks
2026
SDS 2026
Caserta, Italy · Mar. 2026
Environmental Data Science
A statistical approach on environmental data towards a Digital Twin of Volturno river transitional water system.
Summer Schools & Workshops
Conference Committee Roles
05

Teaching

Quantitative Methods
Metodi Quantitativi
Statistical Inference
Inferenza Statistica
Statistical Methods for Evaluation
Metodi Statistici per la Valutazione
Data Analysis
Analisi dei Dati
Statistics & Time Series Analysis
Statistica e Analisi delle Serie Storiche
Survey Methods
Indagini Campionarie
Statistics for Finance
Statistica per la Finanza
Student Support

Tutoring & Academic Assistance

Are you a university student and need support with your studies, thesis, or academic projects? I can help you in the following areas:

  • Statistics, Probability, and Econometrics
  • Data Analysis with R and Python
  • Machine Learning and Predictive Modeling
  • Thesis writing and research methodology
  • Project development and data-driven assignments
Get in Touch
06

Projects & Software

e2tree
CRAN v0.2.0
Explainable Ensemble Trees

The e2tree package implements the Explainable Ensemble Trees (E2Tree) methodology. Rather than fitting a CART tree directly on raw data, E2Tree learns the relational structure that the ensemble has already established: it extracts a co-occurrence matrix from the trained model and uses hierarchical clustering to build a transparent, interpretable dendrogram. The result is a global surrogate that faithfully approximates Random Forests, XGBoost, and other boosting models.

Key Features

  • Global interpretability: converts ensemble models into a single explanatory tree
  • Classification and regression: frequency-based and weighted connectivity modes
  • Visual output: dendrogram-like visualisations with decision rules for each cluster
  • XGBoost support: extended to gradient boosting in v0.2.0
  • rpart integration: outputs compatible with rpart objects
# Stable version from CRAN install.packages("e2tree") # Development version from GitHub devtools::install_github("agostinognasso/e2tree")
Aria, M., Gnasso, A., Iorio, C., & Pandolfo, G. (2024). Explainable ensemble trees. Computational Statistics, 39(1), 3–19. DOI ↗
Aria, M., Gnasso, A., Iorio, C., & Fokkema, M. (2025). Extending Explainable Ensemble Trees to Regression Contexts. Applied Stochastic Models in Business and Industry. DOI ↗
bibliometrix
CRAN v5.2.1
Comprehensive Science Mapping Analysis

An R package for quantitative research in scientometrics and bibliometrics. Provides a comprehensive workflow for science mapping analysis, supporting data import from major bibliographic databases including Scopus, Web of Science, Dimensions, OpenAlex, PubMed, Cochrane Library, and Lens. As a core developer team member, I contribute to development, maintenance, and the web interface Biblioshiny.

Key Features

  • Import and convert data from 7+ bibliographic databases
  • Descriptive bibliometric analysis: author productivity, citation networks, h-index
  • Network matrices for co-citation, coupling, collaboration, and co-word analysis
  • Thematic maps, thematic evolution plots, and conceptual structure visualisations
  • Interactive web interface via Biblioshiny for non-coders
Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975.
Research Projects
PRIN 2022 · MUR National Research
SciK-Health

Mapping Scientific Knowledge about Health for Decision-Making. 24-month PRIN project at the intersection of bibliometrics, knowledge synthesis, and health policy (Code: 2022825Y5E, PI: Prof. Massimo Aria). Junior Researcher.

2023 – 2026
PRIN 2022 PNRR · MUR National Research
AHS Centres

The value of scientific production for patient care in Academic Health Science Centres. A study on the relationship between research output and clinical outcomes across Italian public hospitals (PI: Prof. Corrado Cuccurullo). Junior Researcher.

2023 – 2026
07

Network & Affiliations

I work with researchers across several institutions in Italy and abroad, with shared interests in statistics, machine learning, and the methodological problems that sit between them.

Institutions & Research Groups
PhD & Research Home
The oldest public non-sectarian university in the world, and where most of my research takes place.
Department of Economics and Statistics
My home department, with a strong tradition in economic and statistical research.
Academic Spin-off · Member
Spin-off of Federico II University focused on knowledge synthesis through advanced quantitative analysis of large heterogeneous datasets.
Core Developer Team
Open-source R-tool for comprehensive science mapping analysis — used by thousands of researchers worldwide.
Visiting Scholar (2023–2024)
Department of Methodology and Statistics. Visiting period focused on psychometrics and statistical learning methods.
PhD Program
The PhD program I completed, with a strong international dimension in economics and finance.
Colleagues & Collaborators
Full Professor in Statistics · University of Naples Federico II
Full Professor in Statistics · University of Naples Federico II
Germana Scepi
Full Professor in Statistics · University of Naples Federico II
Roberta Siciliano
Full Professor in Statistics · University of Naples Federico II
Full Professor of Management · University of Campania Luigi Vanvitelli
Associate Professor · Leiden University
Associate Professor in Statistics · University of Naples Federico II
Giuseppe Pandolfo
Associate Professor in Statistics · University of Naples Federico II
Associate Professor in Statistics · University of Salerno
Assistant Professor of Statistics · University of Naples Federico II
Maria Spano
Assistant Professor of Statistics · University of Naples Federico II
Assistant Professor of Management · UniPegaso
08

Consulting

Statistical Modeling

Regression analysis, hypothesis testing, survey design, and advanced multivariate methods tailored to your research or business questions.

Machine Learning & Predictive Analytics

Development of classification and regression models using Random Forests, XGBoost, and ensemble methods with a focus on interpretability and robustness.

Explainable AI (XAI)

Making black-box models transparent through interpretable tree structures, SHAP values, LIME, and custom XAI solutions for high-stakes applications.

Bibliometrics & Science Mapping

Systematic literature reviews, citation analysis, research trend mapping, and science mapping using Bibliometrix and Biblioshiny.

R & Python Development

Custom scripts, data pipelines, dashboards (Shiny, Streamlit), and R/Python package development for reproducible data analysis workflows.

Data Visualisation & Reporting

Publication-quality charts, interactive visualisations, and comprehensive analytical reports to communicate your findings effectively.

How It Works
1
Initial Consultation

We discuss your project goals, data availability, and expected outcomes. This first meeting is free and without commitment.

2
Proposal & Planning

I prepare a tailored proposal outlining the methodology, timeline, and deliverables for your project.

3
Analysis & Development

I carry out the analytical work, keeping you updated with regular progress reports and intermediate results.

4
Delivery & Support

You receive the final output — reports, code, models — with documentation and follow-up support as needed.

Who Can Benefit

Let's Work Together

If you have a project in mind, feel free to reach out.

Get in Touch

Let’s talk. Whether it’s for a project or just a virtual coffee, you can find me right here.

agostino.gnasso@unina.it

Office

Department of Economics and Statistics
University of Naples Federico II

Monte S. Angelo, Via Cinthia, 80126 Napoli, Italy
Room D-22 · Sector D · 2nd Floor · Building 3

Cite
APA
MLA
ISO 690
Download / Export