Search for an Optimal Multi-Regime (Shift) Configuration on a Phylogeny

Greedy, stepwise search for evolutionary regime shifts on a SIMMAP-style phylogeny using multivariate mvgls fits from mvMORPH. The routine:

builds one-shift candidate trees for all internal nodes meeting a tip-size threshold (via generatePaintedTrees),
fits each candidate in parallel and ranks them by improvement in the chosen information criterion (IC; GIC or BIC),
iteratively adds shifts that pass a user-defined acceptance threshold,
optionally revisits accepted shifts to prune overfitting using a small IC tolerance window,
optionally computes per-shift IC weights by refitting the model with each shift removed.

Models are fitted directly in multivariate trait space (no PCA), assuming a multi-rate Brownian Motion with proportional VCV scaling across regimes. Extra arguments in ... are forwarded to mvgls (e.g., method = "LL" or method = "PL-LOOCV", penalty, error = TRUE, etc.).

Usage

searchOptimalConfiguration(
  baseline_tree,
  trait_data,
  formula = "trait_data~1",
  min_descendant_tips,
  num_cores = 2,
  ic_uncertainty_threshold = 1,
  shift_acceptance_threshold = 1,
  uncertaintyweights = F,
  uncertaintyweights_par = F,
  plot = T,
  IC = "GIC",
  store_model_fit_history = TRUE,
  ...
)

Arguments

baseline_tree: A phylo baseline tree.
trait_data: Data frame of trait values with tip rownames.
formula: Model formula, e.g. cbind(y1, y2) ~ x.
min_descendant_tips: Integer minimum tips per candidate shift.
num_cores: Integer, cores for parallel search.
ic_uncertainty_threshold: Numeric, IC change threshold.
shift_acceptance_threshold: Numeric, acceptance cutoff.
uncertaintyweights: Numeric vector of weights.
uncertaintyweights_par: List of parameters for weights.
plot: Logical, produce plots during search.
IC: Character, information criterion, e.g. "BIC".
store_model_fit_history: Logical, keep fit history.
...: Passed to mvgls internally.

Value

A named list with (at minimum):

user_input: captured call (as a list) for reproducibility.
tree_no_uncertainty_transformed: SIMMAP tree from the optimal (no-uncertainty) model on the transformed scale used internally by mvgls.
tree_no_uncertainty_untransformed: same topology with original edge lengths restored.
model_no_uncertainty: the final mvgls model object.
shift_nodes_no_uncertainty: integer vector of accepted shift nodes.
optimal_ic: final IC value; baseline_ic: baseline IC.
IC_used: "GIC" or "BIC"; num_candidates: count of candidate one-shift models evaluated.
model_fit_history: if store_model_fit_history = TRUE, a list of per-iteration fits and an ic_acceptance_matrix (IC value and acceptance flag per step).
VCVs: named list of regime-specific VCV matrices extracted from the final model (penalized-likelihood estimates if PL was used).

Additional components appear conditionally:

ic_weights: a data.frame of per-shift IC weights and evidence ratios when uncertaintyweights or uncertaintyweights_par is TRUE.
tree_uncertainty_transformed, tree_uncertainty_untransformed,
warnings: character vector of warnings/errors encountered during fitting (if any).

Details

Search outline.

Baseline: Fit mvgls on the baseline tree (single regime) to obtain the baseline IC.
Candidates: Build one-shift trees for eligible internal nodes (generatePaintedTrees); fit each with fitMvglsAndExtractGIC.formula or fitMvglsAndExtractBIC.formula (internal helpers; not exported) and rank by $\Delta$IC.
Greedy add: Add the top candidate, refit, and accept if $\Delta$IC $\ge$ shift_acceptance_threshold; continue down the ranked list.
Optional IC weights: If uncertaintyweights (or uncertaintyweights_par) is TRUE, compute an IC weight for each accepted shift by refitting the final model with that shift removed and comparing the two ICs via aicw.

Parallelization. Candidate sub-model fits are distributed with future + future.apply. On Unix, multicore is used; on Windows, multisession. A sequential plan is restored afterward.

Plotting. If plot = TRUE, trees are rendered with plotSimmap(); shift IDs are labeled with nodelabels().

Regime VCVs. The returned $VCVs are extracted from the fitted multi-regime model via extractRegimeVCVs and reflect regime-specific covariance estimates (when mvgls is fitted under a PL/ML method).

Note

Internally, this routine coordinates multiple unexported helper functions: generatePaintedTrees, fitMvglsAndExtractGIC.formula, fitMvglsAndExtractBIC.formula, addShiftToModel, removeShiftFromTree, and extractRegimeVCVs. Through these, it may also invoke lower-level utilities such as paintSubTree_mod and paintSubTree_removeShift. These helpers are internal implementation details and are not part of the public API.

Convergence and robustness

The search is greedy and may converge to a local optimum. Use a stricter shift_acceptance_threshold and/or enable uncertainty = TRUE to reduce overfitting. Re-run with different min_descendant_tips and IC choices (GIC vs BIC) to assess stability.

Examples

if (FALSE) { # \dontrun{
library(ape)
library(phytools)
set.seed(1)
tr <- pbtree(n = 80, scale = 1)
# Paint a single global baseline state "0"
base <- phytools::paintBranches(tr, edge = unique(tr$edge[,2]),
                                state = "0", anc.state = "0")

# Fake multivariate data (2 traits)
X <- cbind(trait1 = rnorm(Ntip(base)), trait2 = rnorm(Ntip(base)))
rownames(X) <- base$tip.label

res <- searchOptimalConfiguration(
  baseline_tree = base,
  trait_data    = as.data.frame(X),
  formula       = "trait_data ~ 1",
  min_descendant_tips = 10,
  num_cores = 2,
  shift_acceptance_threshold = 10,  # conservative
  IC = "GIC",
  plot = FALSE
)

res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)
} # }