
Search for an Optimal Multi-Regime (Shift) Configuration on a Phylogeny
Source:R/searchOptimalConfiguration.R
searchOptimalConfiguration.RdGreedy, stepwise search for evolutionary regime shifts on a SIMMAP-style phylogeny
using multivariate mvgls fits from mvMORPH. The routine:
builds one-shift candidate trees for all internal nodes meeting a tip-size threshold (via
generatePaintedTrees),fits each candidate in parallel and ranks them by improvement in the chosen information criterion (IC;
GICorBIC),iteratively adds shifts that pass a user-defined acceptance threshold,
optionally revisits accepted shifts to prune overfitting using a small IC tolerance window,
optionally computes per-shift IC weights by refitting the model with each shift removed.
Models are fitted directly in multivariate trait space (no PCA), assuming a multi-rate
Brownian Motion with proportional VCV scaling across regimes. Extra arguments in ...
are forwarded to mvgls (e.g., method = "LL" or
method = "PL-LOOCV", penalty, error = TRUE, etc.).
Usage
searchOptimalConfiguration(
baseline_tree,
trait_data,
formula = "trait_data~1",
min_descendant_tips,
num_cores = 2,
ic_uncertainty_threshold = 1,
shift_acceptance_threshold = 1,
uncertaintyweights = F,
uncertaintyweights_par = F,
plot = T,
IC = "GIC",
store_model_fit_history = TRUE,
...
)Arguments
- baseline_tree
A
phylobaseline tree.- trait_data
Data frame of trait values with tip rownames.
- formula
Model formula, e.g.
cbind(y1, y2) ~ x.- min_descendant_tips
Integer minimum tips per candidate shift.
- num_cores
Integer, cores for parallel search.
- ic_uncertainty_threshold
Numeric, IC change threshold.
- shift_acceptance_threshold
Numeric, acceptance cutoff.
- uncertaintyweights
Numeric vector of weights.
- uncertaintyweights_par
List of parameters for weights.
- plot
Logical, produce plots during search.
- IC
Character, information criterion, e.g.
"BIC".- store_model_fit_history
Logical, keep fit history.
- ...
Passed to mvgls internally.
Value
A named list with (at minimum):
user_input: captured call (as a list) for reproducibility.tree_no_uncertainty_transformed: SIMMAP tree from the optimal (no-uncertainty) model on the transformed scale used internally bymvgls.tree_no_uncertainty_untransformed: same topology with original edge lengths restored.model_no_uncertainty: the finalmvglsmodel object.shift_nodes_no_uncertainty: integer vector of accepted shift nodes.optimal_ic: final IC value;baseline_ic: baseline IC.IC_used:"GIC"or"BIC";num_candidates: count of candidate one-shift models evaluated.model_fit_history: ifstore_model_fit_history = TRUE, a list of per-iteration fits and anic_acceptance_matrix(IC value and acceptance flag per step).VCVs: named list of regime-specific VCV matrices extracted from the final model (penalized-likelihood estimates if PL was used).
Additional components appear conditionally:
ic_weights: adata.frameof per-shift IC weights and evidence ratios whenuncertaintyweightsoruncertaintyweights_parisTRUE.tree_uncertainty_transformed,tree_uncertainty_untransformed,warnings: character vector of warnings/errors encountered during fitting (if any).
Details
Search outline.
Baseline: Fit
mvglson the baseline tree (single regime) to obtain the baseline IC.Candidates: Build one-shift trees for eligible internal nodes (
generatePaintedTrees); fit each withfitMvglsAndExtractGIC.formulaorfitMvglsAndExtractBIC.formula(internal helpers; not exported) and rank by \(\Delta\)IC.Greedy add: Add the top candidate, refit, and accept if \(\Delta\)IC \(\ge\)
shift_acceptance_threshold; continue down the ranked list.Optional IC weights: If
uncertaintyweights(oruncertaintyweights_par) isTRUE, compute an IC weight for each accepted shift by refitting the final model with that shift removed and comparing the two ICs viaaicw.
Parallelization. Candidate sub-model fits are distributed with future + future.apply.
On Unix, multicore is used; on Windows, multisession. A sequential plan is restored afterward.
Plotting. If plot = TRUE, trees are rendered with
plotSimmap(); shift IDs are labeled with nodelabels().
Regime VCVs. The returned $VCVs are extracted from the fitted multi-regime model via
extractRegimeVCVs and reflect regime-specific covariance
estimates (when mvgls is fitted under a PL/ML method).
Note
Internally, this routine coordinates multiple unexported helper functions:
generatePaintedTrees, fitMvglsAndExtractGIC.formula,
fitMvglsAndExtractBIC.formula, addShiftToModel,
removeShiftFromTree, and extractRegimeVCVs. Through these,
it may also invoke lower-level utilities such as paintSubTree_mod
and paintSubTree_removeShift. These helpers are internal
implementation details and are not part of the public API.
Convergence and robustness
The search is greedy and may converge to a local optimum. Use a stricter
shift_acceptance_threshold and/or enable uncertainty = TRUE to reduce overfitting.
Re-run with different min_descendant_tips and IC choices (GIC vs BIC) to assess stability.
See also
mvgls, GIC, BIC,
plotSimmap, nodelabels,
packages: mvMORPH, phytools, ape, future, future.apply.
Examples
if (FALSE) { # \dontrun{
library(ape)
library(phytools)
set.seed(1)
tr <- pbtree(n = 80, scale = 1)
# Paint a single global baseline state "0"
base <- phytools::paintBranches(tr, edge = unique(tr$edge[,2]),
state = "0", anc.state = "0")
# Fake multivariate data (2 traits)
X <- cbind(trait1 = rnorm(Ntip(base)), trait2 = rnorm(Ntip(base)))
rownames(X) <- base$tip.label
res <- searchOptimalConfiguration(
baseline_tree = base,
trait_data = as.data.frame(X),
formula = "trait_data ~ 1",
min_descendant_tips = 10,
num_cores = 2,
shift_acceptance_threshold = 10, # conservative
IC = "GIC",
plot = FALSE
)
res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)
} # }