Software

I actively contribute to the development of open-source software for statistical analysis and machine learning, primarily in the R programming environment. Below are the main packages I work on.


e2tree CRAN version

Explainable Ensemble Trees

Download CRAN Stelle GitHub

The e2tree package implements the Explainable Ensemble Trees (E2Tree) methodology, a global surrogate model designed to approximate the prediction mechanism of tree-based ensembles (such as Random Forests and Boosting) through a single, interpretable tree structure.

Unlike traditional single decision trees (CART) that optimize a loss function directly on the data, E2Tree learns the relational structure established by the black-box model. It achieves this by extracting a co-occurences matrix from the trained ensemble, which quantifies the frequency with which observations share the same terminal nodes. Through hierarchical clustering of this matrix, E2Tree constructs a representative dendrogram that remains faithful to the complex non-linear patterns captured by the original model while ensuring transparency.

Key features:

  • Global Interpretability: Converts complex Ensemble Models into a single explanatory tree.
  • Structure Preservation: Uses hierarchical clustering on connectivity matrices to capture the “proximity” learned by the ensemble.
  • Dual Mode: Supports Classification (via frequency-based connectivity) and Regression (via weighted connectivity based on leaf predictions).
  • Visual Insights: Generates dendrogram-like visualizations and extracts clear decision rules (prototypes) for each cluster.
  • Integration: Outputs are compatible with rpart objects for seamless analysis.

Installation:

# Stable version from CRAN
install.packages("e2tree")

# Development version from GitHub
devtools::install_github("agostinognasso/e2tree")

References:

  • Aria, M., Gnasso, A., Iorio, C., & Pandolfo, G. (2024). “Explainable ensemble trees”. Computational Statistics, 39(1), 3-19. DOI
  • Aria, M., Gnasso, A., Iorio, C., & Fokkema, M. (2025). “Extending Explainable Ensemble Trees to Regression Contexts”. Applied Stochastic Models in Business and Industry, 42(1), e70064. DOI

CRAN GitHub


bibliometrix CRAN version

Comprehensive Science Mapping Analysis

CRAN Downloads GitHub Stars

bibliometrix is an R package for quantitative research in scientometrics and bibliometrics. It provides a comprehensive workflow for science mapping analysis, supporting data import from major bibliographic databases including Scopus, Web of Science, Dimensions, OpenAlex, PubMed, Cochrane Library, and Lens.

As a core developer team member, I contribute to the development, maintenance, and evolution of the package and its web interface Biblioshiny.

Key features:

  • Import and convert data from 7+ bibliographic databases
  • Perform descriptive bibliometric analysis (author productivity, citation analysis, h-index)
  • Build network matrices for co-citation, coupling, collaboration, and co-word analysis
  • Generate thematic maps, thematic evolution plots, and conceptual structure visualizations
  • Interactive web interface via Biblioshiny for non-coders

Reference:

  • Aria, M. & Cuccurullo, C. (2017). “bibliometrix: An R-tool for comprehensive science mapping analysis”. Journal of Informetrics, 11(4), 959-975.

CRAN GitHub Website