Software

I actively contribute to the development of open-source software for statistical analysis and machine learning, primarily in the R programming environment. Below are the main packages I work on.

e2tree

Explainable Ensemble Trees

The e2tree package implements the Explainable Ensemble Trees (E2Tree) methodology, a global surrogate model designed to approximate the prediction mechanism of tree-based ensembles (such as Random Forests and Boosting) through a single, interpretable tree structure.

Unlike traditional single decision trees (CART) that optimize a loss function directly on the data, E2Tree learns the relational structure established by the black-box model. It achieves this by extracting a co-occurences matrix from the trained ensemble, which quantifies the frequency with which observations share the same terminal nodes. Through hierarchical clustering of this matrix, E2Tree constructs a representative dendrogram that remains faithful to the complex non-linear patterns captured by the original model while ensuring transparency.

Key features:

Global Interpretability: Converts complex Ensemble Models into a single explanatory tree.
Structure Preservation: Uses hierarchical clustering on connectivity matrices to capture the “proximity” learned by the ensemble.
Dual Mode: Supports Classification (via frequency-based connectivity) and Regression (via weighted connectivity based on leaf predictions).
Visual Insights: Generates dendrogram-like visualizations and extracts clear decision rules (prototypes) for each cluster.
Integration: Outputs are compatible with rpart objects for seamless analysis.

Installation:

# Stable version from CRAN
install.packages("e2tree")

# Development version from GitHub
devtools::install_github("agostinognasso/e2tree")

References:

Aria, M., Gnasso, A., Iorio, C., & Pandolfo, G. (2024). “Explainable ensemble trees”. Computational Statistics, 39(1), 3-19. DOI
Aria, M., Gnasso, A., Iorio, C., & Fokkema, M. (2025). “Extending Explainable Ensemble Trees to Regression Contexts”. Applied Stochastic Models in Business and Industry, 42(1), e70064. DOI

CRAN GitHub

bibliometrix

Comprehensive Science Mapping Analysis

bibliometrix is an R package for quantitative research in scientometrics and bibliometrics. It provides a comprehensive workflow for science mapping analysis, supporting data import from major bibliographic databases including Scopus, Web of Science, Dimensions, OpenAlex, PubMed, Cochrane Library, and Lens.

As a core developer team member, I contribute to the development, maintenance, and evolution of the package and its web interface Biblioshiny.

Key features:

Import and convert data from 7+ bibliographic databases
Perform descriptive bibliometric analysis (author productivity, citation analysis, h-index)
Build network matrices for co-citation, coupling, collaboration, and co-word analysis
Generate thematic maps, thematic evolution plots, and conceptual structure visualizations
Interactive web interface via Biblioshiny for non-coders

Reference:

Aria, M. & Cuccurullo, C. (2017). “bibliometrix: An R-tool for comprehensive science mapping analysis”. Journal of Informetrics, 11(4), 959-975.

CRAN GitHub Website

Software

e2tree

Explainable Ensemble Trees

bibliometrix

Contact

Affiliations

Follow