| Title: | Explainable Ensemble Trees |
|---|---|
| Description: | The Explainable Ensemble Trees 'e2tree' approach has been proposed by Aria et al. (2024) <doi:10.1007/s00180-022-01312-6>. It aims to explain and interpret decision tree ensemble models using a single tree-like structure. 'e2tree' is a new way of explaining an ensemble tree trained through 'randomForest' or 'xgboost' packages. |
| Authors: | Massimo Aria [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-8517-9411>), Agostino Gnasso [aut, cph] (ORCID: <https://orcid.org/0000-0002-8046-3923>) |
| Maintainer: | Massimo Aria <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.0.9000 |
| Built: | 2026-05-16 11:06:02 UTC |
| Source: | https://github.com/massimoaria/e2tree |
Coerces an e2tree object into an rpart object, which can
then be used with standard rpart methods for printing, plotting
(e.g., via rpart.plot), and prediction.
as.rpart(x, ...) ## S3 method for class 'e2tree' as.rpart(x, ensemble, ...)as.rpart(x, ...) ## S3 method for class 'e2tree' as.rpart(x, ensemble, ...)
x |
An e2tree object. |
... |
Additional arguments (ignored). |
ensemble |
The ensemble model used to build the E2Tree. Supported classes:
|
An rpart object.
as.party.e2tree for conversion to partykit format.
data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) rpart_obj <- as.rpart(tree, ensemble)data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) rpart_obj <- as.rpart(tree, ensemble)
The function createDisMatrix creates a dissimilarity matrix among observations from an ensemble tree. This optimized version is designed for large datasets (50K-500K observations) with improved memory management and chunking capabilities.
createDisMatrix( ensemble, data, label, parallel = list(active = FALSE, no_cores = 1), verbose = FALSE, chunk_size = NULL, memory_limit = NULL, use_disk = FALSE, temp_dir = tempdir(), batch_aggregate = 10 )createDisMatrix( ensemble, data, label, parallel = list(active = FALSE, no_cores = 1), verbose = FALSE, chunk_size = NULL, memory_limit = NULL, use_disk = FALSE, temp_dir = tempdir(), batch_aggregate = 10 )
ensemble |
is an ensemble tree object |
data |
is a data frame containing the variables in the model. It is the data frame used for ensemble learning. |
label |
is a character. It indicates the response label. |
parallel |
A list with two elements: |
verbose |
Logical. If TRUE, the function prints progress messages and other information during execution. If FALSE (the default), messages are suppressed. |
chunk_size |
Integer. Number of rows to process in each chunk. If NULL, automatically determined based on available memory and dataset size. Default: NULL (auto). |
memory_limit |
Numeric. Maximum memory to use in GB. Default: NULL (no limit). |
use_disk |
Logical. If TRUE and dataset is very large, intermediate results are saved to disk. Default: FALSE. |
temp_dir |
Character. Directory for temporary files if use_disk = TRUE. Default: tempdir(). |
batch_aggregate |
Integer. Number of tree results to aggregate at once before adding to main matrix (reduces memory peaks). Default: 10. |
This optimized version implements several strategies for handling large datasets:
Memory-efficient aggregation: Results from parallel trees are aggregated in batches to avoid memory peaks
Chunking: For very large matrices, computation can be split into manageable chunks
Sparse matrix optimization: Maintains sparsity throughout computation
Automatic garbage collection: Explicit memory cleanup at critical points
Disk-based computation: Optional saving of intermediate results for datasets exceeding memory capacity
Supported ensemble types for classification or regression tasks:
randomForest
ranger
xgb.Booster (xgboost)
lgb.Booster (lightgbm)
gbm (gbm)
catboost.CatBoost (catboost)
A dissimilarity matrix. This is a dissimilarity matrix measuring the discordance between two observations concerning a given random forest model.
For bagging ensembles (randomForest, ranger) the trees are
grown independently on bootstrap samples; co-occurrence in the same leaf
captures local similarity in the predictor space. For boosting ensembles
(xgb.Booster, lgb.Booster, gbm, catboost)
each tree is fit to the residual of the previous ones, so leaf
co-occurrence reflects similarity in the error-correction trajectory
rather than in the final prediction space. The resulting dissimilarity
matrices therefore have systematically different scales (typically
for bagging vs. for
boosting). The surrogate tree built on top of D should be
interpreted accordingly.
The returned matrix carries an ensemble_backend attribute identifying
the backend used, which downstream functions check to detect mismatched
(D, ensemble) pairs.
data("iris") # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } # Compute dissimilarity matrix with optimizations D <- createDisMatrix( ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1), chunk_size = 10000, # Process 10K rows at a time batch_aggregate = 20, # Aggregate 20 trees at once verbose = TRUE )data("iris") # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } # Compute dissimilarity matrix with optimizations D <- createDisMatrix( ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1), chunk_size = 10000, # Process 10K rows at a time batch_aggregate = 20, # Aggregate 20 trees at once verbose = TRUE )
A dataset containing socio-economic and banking information for 468 bank clients, used to assess creditworthiness. All variables are categorical.
creditcredit
A data frame with 468 rows and 12 columns:
Credit evaluation outcome: "Creditworthy" or
"Non-Creditworthy".
Age class of the client (e.g., "less than 23 years",
"from 23 to 35 years", "from 35 to 50 years",
"over 50 years").
Marital/family status of the client (e.g.,
"single", "married", "divorced").
Length of the client's relationship with the bank
(e.g., "1 year or less", "from 2 to 5 years",
"plus 12 years").
Whether the client's salary is
credited to the bank account (e.g., "domicile salary",
"no domicile salary").
Client's level of savings (e.g.,
"no savings", "less than 5 thousand",
"from 5 to 30 thousand", "more than 30 thousand").
Employment category of the client (e.g.,
"employee", "self-employed", "retired").
Average balance held in the account (e.g.,
"from 2 to 5 thousand", "more than 5 thousand").
Average monthly turnover on the account
(e.g., "Less than 10 thousand", "from 10 to 50 thousand",
"more than 50 thousand").
Number of credit card
transactions per month (e.g., "less than 40", "from 40 to 100",
"more than 100").
Whether the client has an authorized
overdraft facility ("Authorised" or "forbidden").
Whether the client is authorized
to issue bank checks ("Authorised" or "forbidden").
Returns the split matrix and categorical split encoding from a fitted E2Tree model.
e2splits(x, ...) ## S3 method for class 'e2tree' e2splits(x, ...)e2splits(x, ...) ## S3 method for class 'e2tree' e2splits(x, ...)
x |
An e2tree object. |
... |
Additional arguments (ignored). |
A list with components:
The split information matrix.
The categorical split encoding matrix.
It creates an explainable tree for Random Forest. Explainable Ensemble Trees (E2Tree) aimed to generate a “new tree” that can explain and represent the relational structure between the response variable and the predictors. This lead to providing a tree structure similar to those obtained for a decision tree exploiting the advantages of a dendrogram-like output.
e2tree( formula, data, D, ensemble, setting = list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) )e2tree( formula, data, D, ensemble, setting = list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) )
formula |
is a formula describing the model to be fitted, with a response but no interaction terms. |
||||||||||||
data |
a data frame containing the variables in the model. It is a data frame in which to interpret the variables named in the formula. |
||||||||||||
D |
is the dissimilarity matrix. This is a dissimilarity matrix measuring the discordance between two observations concerning a given classifier of a random forest model. The dissimilarity matrix is obtained with the createDisMatrix function. |
||||||||||||
ensemble |
is an ensemble tree object (for the moment ensemble works only with random forest objects) |
||||||||||||
setting |
is a list containing the set of stopping rules for the tree building procedure.
Default is |
A e2tree object, which is a list with the following components:
tree
|
A data frame representing the main structure of the tree aimed at explaining and graphically representing the relationships and interactions between the variables used to perform an ensemble method. | |
call
|
The matched call | |
terms
|
A list of terms and attributes | |
control
|
A list containing the set of stopping rules for the tree building procedure | |
varimp
|
A list containing a table and a plot for the variable importance. Variable importance refers to a quantitative measure that assesses the contribution of individual variables within a predictive model towards accurate predictions. It quantifies the influence or impact that each variable has on the model's overall performance. Variable importance provides insights into the relative significance of different variables in explaining the observed outcomes and aids in understanding the underlying relationships and dynamics within the model |
## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] validation <- mtcars[-train_ind, ] response_training <- training[,1] response_validation <- validation[,1] # Perform training ## "randomForest" package ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(formula = mpg ~ ., data = training, num.trees = 1000, importance = "permutation") } D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting)## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] validation <- mtcars[-train_ind, ] response_training <- training[,1] response_validation <- validation[,1] # Perform training ## "randomForest" package ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(formula = mpg ~ ., data = training, num.trees = 1000, importance = "permutation") } D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting)
Predicts classification and regression tree responses.
ePredTree(fit, data, target = "1")ePredTree(fit, data, target = "1")
fit |
An e2tree object. |
data |
A data frame with new observations. |
target |
Target class for classification scoring. |
Deprecated: Use predict.e2tree instead.
A data frame with predictions.
## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Preferred method: predict(tree, newdata = validation, target = "1") ## Legacy function (deprecated): ePredTree(tree, validation, target = "1") ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] validation <- mtcars[-train_ind, ] response_training <- training[,1] response_validation <- validation[,1] # Perform training ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) ## Preferred method: predict(tree, newdata = validation) ## Legacy function (deprecated): ePredTree(tree, validation)## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Preferred method: predict(tree, newdata = validation, target = "1") ## Legacy function (deprecated): ePredTree(tree, validation, target = "1") ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] validation <- mtcars[-train_ind, ] response_training <- training[,1] response_validation <- validation[,1] # Perform training ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) ## Preferred method: predict(tree, newdata = validation) ## Legacy function (deprecated): ePredTree(tree, validation)
Compares the ensemble proximity matrix with the E2Tree-estimated proximity matrix using multiple divergence and similarity measures. Can perform the Mantel test, permutation tests on divergence/similarity measures (nLoI, Hellinger, wRMSE, RV, SSIM), or both.
eValidation( data, fit, D, test = c("both", "mantel", "measures"), graph = TRUE, n_perm = 999, conf.level = 0.95, seed = NULL )eValidation( data, fit, D, test = c("both", "mantel", "measures"), graph = TRUE, n_perm = 999, conf.level = 0.95, seed = NULL )
data |
A data frame containing the variables in the model. |
fit |
An e2tree object. |
D |
The dissimilarity matrix obtained with |
test |
Character string specifying which tests to perform. One of
|
graph |
Logical (default TRUE). If TRUE, heatmaps are displayed. |
n_perm |
Integer. Number of permutations for the permutation
test on measures. Default is 999. Set to 0 to skip permutation testing.
Ignored when |
conf.level |
Numeric. Confidence level for intervals. Default is 0.95. |
seed |
Integer or NULL. Random seed for reproducibility. |
An object of class "eValidation" containing:
Ensemble proximity matrix (reordered)
E2Tree proximity matrix (reordered)
Mantel test result (NULL if test = "measures")
LoI object with decomposition (NULL if test = "mantel")
Data frame with all measures (NULL if test = "mantel")
Permutation test results for measures (if applicable)
## Classification: data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting <- list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) val <- eValidation(training, tree, D, n_perm = 199) print(val) summary(val) plot(val)## Classification: data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting <- list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) val <- eValidation(training, tree, D, n_perm = 199) print(val) summary(val) plot(val)
Returns a data.frame with n_obs rows and n_trees
columns where each cell is the terminal-node index assigned to that
observation by that tree.
extract_terminal_nodes(ensemble, data)extract_terminal_nodes(ensemble, data)
ensemble |
A trained ensemble model. |
data |
A |
A data.frame with n_obs rows and n_trees columns
of integer terminal-node identifiers.
Returns the fitted values (predictions) for the training data used to build the E2Tree model.
## S3 method for class 'e2tree' fitted(object, ...)## S3 method for class 'e2tree' fitted(object, ...)
object |
An e2tree object. |
... |
Additional arguments (ignored). |
A vector of fitted values.
data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) fitted(tree)data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) fitted(tree)
Returns a numeric vector of length n_obs with the ensemble's
prediction for every training observation. For models that store
out-of-bag (OOB) predictions (randomForest, ranger) the
stored OOB vector is returned; for other models in-sample predictions
are computed from the training data.
get_ensemble_predictions(ensemble, data, type)get_ensemble_predictions(ensemble, data, type)
ensemble |
A trained ensemble model. |
data |
The training |
type |
Character: |
Numeric vector of length nrow(data).
Returns "classification" or "regression" depending on the
objective used to train the ensemble.
get_ensemble_type(ensemble)get_ensemble_type(ensemble)
ensemble |
A trained ensemble model. Supported classes:
|
Character scalar: "classification" or "regression".
Computes the LoI index and its decomposition, measuring how well the E2Tree-estimated proximity matrix reconstructs the original ensemble proximity matrix.
loi(O, O_hat, normalize = TRUE)loi(O, O_hat, normalize = TRUE)
O |
Proximity matrix from the ensemble model (n x n), values in the interval 0 to 1 |
O_hat |
Proximity matrix estimated by E2Tree (n x n), values in the interval 0 to 1 |
normalize |
Logical. If TRUE (default), returns nLoI (divided by M). If FALSE, returns raw LoI. |
The statistic is defined as:
The Normalized LoI divides by the number of pairs :
The LoI decomposes into two components:
LoI_in: within-node loss (pairs grouped together by E2Tree)
LoI_out: between-node loss (pairs separated by E2Tree)
The per-pair averages mean_in and mean_out enable direct
comparison between the two components despite their different pair counts.
The statistic uses a normalized squared difference, where each cell's contribution is weighted by the maximum of the two proximity values. This gives more weight to discrepancies in high-proximity regions.
Decomposition interpretation (per-pair averages):
mean_out: average ensemble proximity lost by the partition.
Low values (< 0.1) indicate the tree correctly separates low-proximity
pairs. High values (> 0.3) suggest the tree splits apart pairs that
the ensemble considers similar –more terminal nodes may help.
mean_in: average calibration error within nodes. Low values
(< 0.01) indicate excellent within-node reconstruction. Higher values
reflect the inherent fuzzy-to-crisp transition.
An object of class "loi" containing:
loi |
Raw LoI value (unnormalized) |
nloi |
Normalized LoI (LoI / M) |
loi_in |
Within-node component (total) |
loi_out |
Between-node component (total) |
mean_in |
Per-pair average within-node loss (comparable with mean_out) |
mean_out |
Per-pair average between-node loss (comparable with mean_in) |
n |
Matrix dimension |
m |
Number of unique pairs |
n_within |
Number of within-node pairs |
n_between |
Number of between-node pairs |
data(iris) smp_size <- floor(0.75 * nrow(iris)) set.seed(42) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) vs <- eValidation(training, tree, D) prox <- proximity(vs) O <- prox$ensemble O_hat <- prox$e2tree # Compute LoI with decomposition result <- loi(O, O_hat) print(result) summary(result) plot(result) # Permutation test perm <- loi_perm(O, O_hat, n_perm = 999, seed = 42) print(perm) plot(perm)data(iris) smp_size <- floor(0.75 * nrow(iris)) set.seed(42) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) vs <- eValidation(training, tree, D) prox <- proximity(vs) O <- prox$ensemble O_hat <- prox$e2tree # Compute LoI with decomposition result <- loi(O, O_hat) print(result) summary(result) plot(result) # Permutation test perm <- loi_perm(O, O_hat, n_perm = 999, seed = 42) print(perm) plot(perm)
Performs a permutation test using row/column permutation to assess whether the E2Tree reconstruction is significantly better than expected by chance.
loi_perm(O, O_hat, n_perm = 999, conf.level = 0.95, seed = NULL)loi_perm(O, O_hat, n_perm = 999, conf.level = 0.95, seed = NULL)
O |
Proximity matrix from the ensemble model (n x n) |
O_hat |
Proximity matrix estimated by E2Tree (n x n) |
n_perm |
Number of permutations (default: 999) |
conf.level |
Confidence level for intervals (default: 0.95) |
seed |
Random seed for reproducibility. Default is NULL. |
The test uses simultaneous row/column permutation of
: for each replicate, a random permutation
of is drawn and is computed. This preserves the block-diagonal
structure of while breaking the correspondence with
.
The null hypothesis is: the E2Tree labeling is unrelated to the ensemble structure. Under H1 (good reconstruction), the observed nLoI should be significantly lower than the null distribution.
P-values include the +1 correction of Phipson & Smyth (2010).
An object of class "loi_perm" containing:
observed |
Observed nLoI value and decomposition (loi object) |
statistic |
Observed nLoI value (scalar) |
p.value |
Test p-value (one-sided, less) |
ci |
Permutation-based confidence interval for nLoI |
null_dist |
Null distribution of nLoI values |
null_mean |
Mean of the null distribution |
null_sd |
Standard deviation of the null distribution |
z_stat |
Standardized Z statistic |
n_perm |
Number of permutations |
conf.level |
Confidence level |
n <- 50 O <- matrix(runif(n^2, 0.3, 1), n, n) O <- (O + t(O)) / 2; diag(O) <- 1 O_hat <- O + matrix(rnorm(n^2, 0, 0.05), n, n) O_hat <- pmin(pmax((O_hat + t(O_hat)) / 2, 0), 1); diag(O_hat) <- 1 result <- loi_perm(O, O_hat, n_perm = 199, seed = 42) print(result) summary(result) plot(result)n <- 50 O <- matrix(runif(n^2, 0.3, 1), n, n) O <- (O + t(O)) / 2; diag(O) <- 1 O_hat <- O + matrix(rnorm(n^2, 0, 0.05), n, n) O_hat <- pmin(pmax((O_hat + t(O_hat)) / 2, 0), 1); diag(O_hat) <- 1 result <- loi_perm(O, O_hat, n_perm = 199, seed = 42) print(result) summary(result) plot(result)
Extracts the data frame of validation measures from an eValidation object, including divergence and similarity metrics between the ensemble and E2Tree proximity matrices.
measures(x, ...) ## S3 method for class 'eValidation' measures(x, ...)measures(x, ...) ## S3 method for class 'eValidation' measures(x, ...)
x |
An eValidation object. |
... |
Additional arguments (ignored). |
A data frame with columns for method name, type, observed value, and (if permutation tests were performed) null distribution statistics and p-values.
Extracts the data frame describing the nodes of an E2Tree model, including split rules, predictions, and node statistics.
nodes(x, ...) ## S3 method for class 'e2tree' nodes(x, terminal = FALSE, ...)nodes(x, ...) ## S3 method for class 'e2tree' nodes(x, terminal = FALSE, ...)
x |
An e2tree object. |
... |
Additional arguments (ignored). |
terminal |
Logical. If |
A data frame with one row per node.
data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) nodes(tree) nodes(tree, terminal = TRUE)data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) nodes(tree) nodes(tree, terminal = TRUE)
Displays an E2Tree as a static plot using rpart.plot. For interactive exploration, use plot_e2tree_click().
plot_e2tree(fit, ensemble, main = "E2Tree", ...)plot_e2tree(fit, ensemble, main = "E2Tree", ...)
fit |
An e2tree object |
ensemble |
The ensemble model (randomForest or ranger) |
main |
Plot title |
... |
Additional arguments passed to rpart.plot |
Invisibly returns the rpart object
Displays an E2Tree as an interactive plot in the R graphics device. Click on nodes to see detailed information in the console. Right-click or press ESC to exit interactive mode.
plot_e2tree_click( fit, data, ensemble, main = "E2Tree - Click on nodes (ESC to exit)", ... )plot_e2tree_click( fit, data, ensemble, main = "E2Tree - Click on nodes (ESC to exit)", ... )
fit |
An e2tree object |
data |
The training data used to build the tree |
ensemble |
The ensemble model (randomForest or ranger) |
main |
Plot title (default: "E2Tree - Click on nodes (ESC to exit)") |
... |
Additional arguments passed to rpart.plot |
This function converts the e2tree object to an rpart object and displays it using rpart.plot. You can then click on any node to see:
Node ID and type (terminal/internal)
Number of observations
Prediction and probability/purity
Decision path to reach the node
Class distribution (for classification)
Split rule (for internal nodes)
Observations in the node (for terminal nodes)
Invisibly returns the rpart object
# After creating an e2tree object (requires interactive session) if (interactive()) { plot_e2tree_click(tree, training, ensemble) }# After creating an e2tree object (requires interactive session) if (interactive()) { plot_e2tree_click(tree, training, ensemble) }
Displays an E2Tree as an interactive network plot using visNetwork. Features: drag nodes anywhere, zoom, pan, click for details. Starts with hierarchical layout, then you can freely move nodes.
plot_e2tree_vis( fit, data, ensemble, width = "100%", height = "100%", direction = "UD", node_spacing = 200, level_separation = 200, colors = NULL, show_percent = TRUE, show_prob = TRUE, show_n = TRUE, font_size = 14, edge_font_size = 12, split_label_style = "rpart", max_label_length = 50, details_on = "hover", navigation_buttons = FALSE, free_drag = FALSE )plot_e2tree_vis( fit, data, ensemble, width = "100%", height = "100%", direction = "UD", node_spacing = 200, level_separation = 200, colors = NULL, show_percent = TRUE, show_prob = TRUE, show_n = TRUE, font_size = 14, edge_font_size = 12, split_label_style = "rpart", max_label_length = 50, details_on = "hover", navigation_buttons = FALSE, free_drag = FALSE )
fit |
An e2tree object |
data |
The training data used to build the tree |
ensemble |
The ensemble model (randomForest or ranger) |
width |
Width of the widget (default: "100%") |
height |
Height of the widget (default: "100%") |
direction |
Layout direction: "UD" (top-down), "DU" (bottom-up), "LR" (left-right), "RL" (right-left) |
node_spacing |
Spacing between nodes at same level (default: 200) |
level_separation |
Spacing between levels (default: 200) |
colors |
Named vector of colors for classes, or NULL for auto |
show_percent |
Show percentage in nodes (default: TRUE) |
show_prob |
Show class probabilities in nodes (default: TRUE) |
show_n |
Show observation count in nodes (default: TRUE) |
font_size |
Font size for node labels (default: 14) |
edge_font_size |
Font size for edge labels (default: 12) |
split_label_style |
How to display split information:
|
max_label_length |
Maximum characters for edge labels before truncating (default: 50) |
details_on |
When to show node details:
|
navigation_buttons |
Show navigation buttons (default: FALSE) |
free_drag |
If TRUE, nodes can be dragged in ALL directions (horizontal, vertical, diagonal). If FALSE (default), nodes can only be moved horizontally within their level. |
A visNetwork htmlwidget object
data(iris) set.seed(42) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) # Basic usage plot_e2tree_vis(tree, training, ensemble)data(iris) set.seed(42) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) # Basic usage plot_e2tree_vis(tree, training, ensemble)
Displays the tree structure using rpart.plot.
This is a convenience wrapper around plot_e2tree.
## S3 method for class 'e2tree' plot(x, ensemble = NULL, main = "E2Tree", ...)## S3 method for class 'e2tree' plot(x, ensemble = NULL, main = "E2Tree", ...)
x |
An e2tree object |
ensemble |
The ensemble model (randomForest or ranger).
Required for converting the tree to rpart format. Supported classes:
|
main |
Plot title. Default is "E2Tree". |
... |
Additional arguments passed to |
Predicts classification or regression responses for new data using the fitted E2Tree model.
## S3 method for class 'e2tree' predict(object, newdata, target = NULL, ...)## S3 method for class 'e2tree' predict(object, newdata, target = NULL, ...)
object |
An e2tree object. |
newdata |
A data frame containing the new observations. If missing, the fitted values for the training data are returned. |
target |
Character string specifying the target class for computing
classification scores. Only used for classification trees. Default is
|
... |
Additional arguments (ignored). |
For regression: a data frame with columns fit (predicted
value) and sd (standard deviation of the response within the
terminal node, computed from the training data).
For classification: a data frame with columns fit (predicted class),
accuracy (probability of the predicted class), and score
(probability of the target class).
data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Predict on new data pred <- predict(tree, newdata = validation)data(iris) smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] ensemble <- randomForest::randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "Species", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Predict on new data pred <- predict(tree, newdata = validation)
Prints a comprehensive summary of an E2Tree model including all decision rules, variable importance, and node statistics.
print_e2tree_summary(fit, data)print_e2tree_summary(fit, data)
fit |
An e2tree object |
data |
The training data |
Displays a compact summary of the fitted E2Tree model including task type, tree size, terminal nodes, and splitting variables.
## S3 method for class 'e2tree' print(x, ...)## S3 method for class 'e2tree' print(x, ...)
x |
An e2tree object |
... |
Additional arguments (ignored) |
Extracts proximity matrices from an eValidation object. The ensemble proximity matrix is derived from the original ensemble model, while the E2Tree proximity matrix is estimated from the fitted E2Tree.
proximity(x, ...) ## S3 method for class 'eValidation' proximity(x, type = c("both", "ensemble", "e2tree"), ...)proximity(x, ...) ## S3 method for class 'eValidation' proximity(x, type = c("both", "ensemble", "e2tree"), ...)
x |
An eValidation object. |
... |
Additional arguments (ignored). |
type |
Character string specifying which proximity matrix to extract.
One of |
A matrix (if type is "ensemble" or "e2tree")
or a list of two matrices (if type is "both").
Returns the residuals (observed minus fitted) for regression E2Tree models. Not available for classification models.
## S3 method for class 'e2tree' residuals(object, ...)## S3 method for class 'e2tree' residuals(object, ...)
object |
An e2tree object. |
... |
Additional arguments (ignored). |
A numeric vector of residuals.
data("mtcars") smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] ensemble <- randomForest::randomForest(mpg ~ ., data = training, ntree = 500, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "mpg", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 1e-6, n = 2, level = 5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) residuals(tree)data("mtcars") smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] ensemble <- randomForest::randomForest(mpg ~ ., data = training, ntree = 500, importance = TRUE, proximity = TRUE) D <- createDisMatrix(ensemble, data = training, label = "mpg", parallel = list(active = FALSE, no_cores = 1)) setting <- list(impTotal = 0.1, maxDec = 1e-6, n = 2, level = 5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) residuals(tree)
Computes and plots the Receiver Operating Characteristic (ROC) curve for a binary classification model, along with the Area Under the Curve (AUC). The ROC curve is a graphical representation of a classifier’s performance across all classification thresholds.
roc(response, scores, target = "1")roc(response, scores, target = "1")
response |
is the response variable vector |
scores |
is the probability vector of the prediction |
target |
is the target response class |
an object.
## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) pr <- ePredTree(tree, validation, target="setosa") roc(response_training, scores = pr$score, target = "setosa")## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) pr <- ePredTree(tree, validation, target="setosa") roc(response_training, scores = pr$score, target = "setosa")
Converts an e2tree output into an rpart object.
rpart2Tree(fit, ensemble)rpart2Tree(fit, ensemble)
fit |
is e2tree object. |
ensemble |
A trained ensemble model. Supported classes: |
Note: as.rpart.e2tree is the preferred coercion method.
This function is kept for backward compatibility.
An rpart object. It contains the following components:
frame
|
The data frame includes a singular row for each node present in the tree. The row.names within the frame are assigned as unique node numbers, following a binary ordering system indexed by the depth of the nodes. The columns of the frame consist of the following components: (var) this variable denotes the names of the variables employed in the split at each node. In the case of leaf nodes, the level "leaf" is used to indicate their status as terminal nodes; (n) the variable 'n' represents the number of observations that reach a particular node; (wt) 'wt' signifies the sum of case weights associated with the observations reaching a given node; (dev) the deviance of the node, which serves as a measure of the node's impurity or lack of fit; (yval) the fitted value of the response variable at the node; (splits) this two-column matrix presents the labels for the left and right splits associated with each node; (complexity) the complexity parameter indicates the threshold value at which the split is likely to collapse; (ncompete) 'ncompete' denotes the number of competitor splits recorded for a node; (nsurrogate) the variable 'nsurrogate' represents the number of surrogate splits recorded for a node | |
where
|
An integer vector that matches the length of observations in the root node. The vector contains the row numbers in the frame that correspond to the leaf nodes where each observation is assigned | |
call
|
The matched call | |
terms
|
A list of terms and attributes | |
control
|
A list containing the set of stopping rules for the tree building procedure | |
functions
|
The summary, print, and text functions are utilized for the specific method required | |
variable.importance
|
Variable importance refers to a quantitative measure that assesses the contribution of individual variables within a predictive model towards accurate predictions. It quantifies the influence or impact that each variable has on the model's overall performance. Variable importance provides insights into the relative significance of different variables in explaining the observed outcomes and aids in understanding the underlying relationships and dynamics within the model |
## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Preferred coercion method: rpart_obj <- as.rpart(tree, ensemble) ## Legacy function (see as.rpart): rpart_obj <- rpart2Tree(tree, ensemble) # Plot using rpart.plot package: rpart.plot::rpart.plot(rpart_obj)## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] validation <- iris[-train_ind, ] response_training <- training[,5] response_validation <- validation[,5] # Perform training: ## "randomForest" package ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) ## "ranger" package if (requireNamespace("ranger", quietly = TRUE)) { ensemble <- ranger::ranger(Species ~ ., data = iris, num.trees = 1000, importance = 'impurity') } D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) ## Preferred coercion method: rpart_obj <- as.rpart(tree, ensemble) ## Legacy function (see as.rpart): rpart_obj <- rpart2Tree(tree, ensemble) # Plot using rpart.plot package: rpart.plot::rpart.plot(rpart_obj)
Save E2Tree visNetwork Plot to HTML
save_e2tree_html(vis, file = "e2tree_plot.html", selfcontained = TRUE)save_e2tree_html(vis, file = "e2tree_plot.html", selfcontained = TRUE)
vis |
A visNetwork object from plot_e2tree_vis() |
file |
Output file path (should end with .html) |
selfcontained |
Include all dependencies in single file |
Displays a comprehensive summary including tree structure, decision rules, terminal node statistics, and variable importance.
## S3 method for class 'e2tree' summary(object, ...)## S3 method for class 'e2tree' summary(object, ...)
object |
An e2tree object |
... |
Additional arguments (ignored) |
Computes variable importance for an E2Tree model based on mean impurity decrease and (for classification) mean accuracy decrease.
vimp(fit, data, type = NULL)vimp(fit, data, type = NULL)
fit |
An e2tree object. |
data |
A data frame containing the variables in the model. |
type |
Character string: |
A list containing:
A data frame with variable importance metrics.
A ggplot bar chart of Mean Impurity Decrease.
(Classification only) A ggplot bar chart of Mean Accuracy Decrease.
## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) vi <- vimp(tree, training) vi$vimp vi$g_imp ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] # Perform training ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) vi <- vimp(tree, training) vi$vimp vi$g_imp## Classification: data(iris) # Create training and validation set: smp_size <- floor(0.75 * nrow(iris)) train_ind <- sample(seq_len(nrow(iris)), size = smp_size) training <- iris[train_ind, ] # Perform training: ensemble <- randomForest::randomForest(Species ~ ., data=training, importance=TRUE, proximity=TRUE) D <- createDisMatrix(ensemble, data=training, label = "Species", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5) tree <- e2tree(Species ~ ., training, D, ensemble, setting) vi <- vimp(tree, training) vi$vimp vi$g_imp ## Regression data("mtcars") # Create training and validation set: smp_size <- floor(0.75 * nrow(mtcars)) train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size) training <- mtcars[train_ind, ] # Perform training ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, importance=TRUE, proximity=TRUE) D = createDisMatrix(ensemble, data=training, label = "mpg", parallel = list(active=FALSE, no_cores = 1)) setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5) tree <- e2tree(mpg ~ ., training, D, ensemble, setting) vi <- vimp(tree, training) vi$vimp vi$g_imp