Software
I actively contribute to the development of open-source software for statistical analysis and machine learning, primarily in the R programming environment. Below are the main packages I work on.
e2tree 
Explainable Ensemble Trees
The e2tree package implements the Explainable Ensemble Trees (E2Tree) methodology, a global surrogate model designed to approximate the prediction mechanism of tree-based ensembles (such as Random Forests and Boosting) through a single, interpretable tree structure.
Unlike traditional single decision trees (CART) that optimize a loss function directly on the data, E2Tree learns the relational structure established by the black-box model. It achieves this by extracting a co-occurences matrix from the trained ensemble, which quantifies the frequency with which observations share the same terminal nodes. Through hierarchical clustering of this matrix, E2Tree constructs a representative dendrogram that remains faithful to the complex non-linear patterns captured by the original model while ensuring transparency.
Key features:
- Global Interpretability: Converts complex Ensemble Models into a single explanatory tree.
- Structure Preservation: Uses hierarchical clustering on connectivity matrices to capture the “proximity” learned by the ensemble.
- Dual Mode: Supports Classification (via frequency-based connectivity) and Regression (via weighted connectivity based on leaf predictions).
- Visual Insights: Generates dendrogram-like visualizations and extracts clear decision rules (prototypes) for each cluster.
- Integration: Outputs are compatible with
rpartobjects for seamless analysis.
Installation:
# Stable version from CRAN
install.packages("e2tree")
# Development version from GitHub
devtools::install_github("agostinognasso/e2tree")References:
- Aria, M., Gnasso, A., Iorio, C., & Pandolfo, G. (2024). “Explainable ensemble trees”. Computational Statistics, 39(1), 3-19. DOI
- Aria, M., Gnasso, A., Iorio, C., & Fokkema, M. (2025). “Extending Explainable Ensemble Trees to Regression Contexts”. Applied Stochastic Models in Business and Industry, 42(1), e70064. DOI
bibliometrix 
Comprehensive Science Mapping Analysis
bibliometrix is an R package for quantitative research in scientometrics and bibliometrics. It provides a comprehensive workflow for science mapping analysis, supporting data import from major bibliographic databases including Scopus, Web of Science, Dimensions, OpenAlex, PubMed, Cochrane Library, and Lens.
As a core developer team member, I contribute to the development, maintenance, and evolution of the package and its web interface Biblioshiny.
Key features:
- Import and convert data from 7+ bibliographic databases
- Perform descriptive bibliometric analysis (author productivity, citation analysis, h-index)
- Build network matrices for co-citation, coupling, collaboration, and co-word analysis
- Generate thematic maps, thematic evolution plots, and conceptual structure visualizations
- Interactive web interface via Biblioshiny for non-coders
Reference:
- Aria, M. & Cuccurullo, C. (2017). “bibliometrix: An R-tool for comprehensive science mapping analysis”. Journal of Informetrics, 11(4), 959-975.