Skip to contents

Greedy, stepwise search for evolutionary regime shifts on a SIMMAP-style phylogeny using multivariate mvgls fits from mvMORPH. The routine:

  1. builds one-shift candidate trees for all internal nodes meeting a tip-size threshold (via generatePaintedTrees),

  2. fits each candidate in parallel and ranks them by improvement in the chosen information criterion (IC; GIC or BIC),

  3. iteratively adds shifts that pass a user-defined acceptance threshold,

  4. optionally revisits accepted shifts to prune overfitting using a small IC tolerance window,

  5. optionally computes per-shift IC weights by refitting the model with each shift removed.

Models are fitted directly in multivariate trait space (no PCA), assuming a multi-rate Brownian Motion with proportional VCV scaling across regimes. Extra arguments in ... are forwarded to mvgls (e.g., method = "LL" or method = "PL-LOOCV", penalty, error = TRUE, etc.).

Usage

searchOptimalConfiguration(
  baseline_tree,
  trait_data,
  formula = "trait_data~1",
  min_descendant_tips,
  num_cores = 2,
  ic_uncertainty_threshold = 1,
  shift_acceptance_threshold = 1,
  uncertaintyweights = F,
  uncertaintyweights_par = F,
  plot = T,
  IC = "GIC",
  store_model_fit_history = TRUE,
  ...
)

Arguments

baseline_tree

A phylo baseline tree.

trait_data

Data frame of trait values with tip rownames.

formula

Model formula, e.g. cbind(y1, y2) ~ x.

min_descendant_tips

Integer minimum tips per candidate shift.

num_cores

Integer, cores for parallel search.

ic_uncertainty_threshold

Numeric, IC change threshold.

shift_acceptance_threshold

Numeric, acceptance cutoff.

uncertaintyweights

Numeric vector of weights.

uncertaintyweights_par

List of parameters for weights.

plot

Logical, produce plots during search.

IC

Character, information criterion, e.g. "BIC".

store_model_fit_history

Logical, keep fit history.

...

Passed to mvgls internally.

Value

A named list with (at minimum):

  • user_input: captured call (as a list) for reproducibility.

  • tree_no_uncertainty_transformed: SIMMAP tree from the optimal (no-uncertainty) model on the transformed scale used internally by mvgls.

  • tree_no_uncertainty_untransformed: same topology with original edge lengths restored.

  • model_no_uncertainty: the final mvgls model object.

  • shift_nodes_no_uncertainty: integer vector of accepted shift nodes.

  • optimal_ic: final IC value; baseline_ic: baseline IC.

  • IC_used: "GIC" or "BIC"; num_candidates: count of candidate one-shift models evaluated.

  • model_fit_history: if store_model_fit_history = TRUE, a list of per-iteration fits and an ic_acceptance_matrix (IC value and acceptance flag per step).

  • VCVs: named list of regime-specific VCV matrices extracted from the final model (penalized-likelihood estimates if PL was used).

Additional components appear conditionally:

  • ic_weights: a data.frame of per-shift IC weights and evidence ratios when uncertaintyweights or uncertaintyweights_par is TRUE.

  • tree_uncertainty_transformed, tree_uncertainty_untransformed,

  • warnings: character vector of warnings/errors encountered during fitting (if any).

Details

Search outline.

  1. Baseline: Fit mvgls on the baseline tree (single regime) to obtain the baseline IC.

  2. Candidates: Build one-shift trees for eligible internal nodes (generatePaintedTrees); fit each with fitMvglsAndExtractGIC.formula or fitMvglsAndExtractBIC.formula (internal helpers; not exported) and rank by \(\Delta\)IC.

  3. Greedy add: Add the top candidate, refit, and accept if \(\Delta\)IC \(\ge\) shift_acceptance_threshold; continue down the ranked list.

  4. Optional IC weights: If uncertaintyweights (or uncertaintyweights_par) is TRUE, compute an IC weight for each accepted shift by refitting the final model with that shift removed and comparing the two ICs via aicw.

Parallelization. Candidate sub-model fits are distributed with future + future.apply. On Unix, multicore is used; on Windows, multisession. A sequential plan is restored afterward.

Plotting. If plot = TRUE, trees are rendered with plotSimmap(); shift IDs are labeled with nodelabels().

Regime VCVs. The returned $VCVs are extracted from the fitted multi-regime model via extractRegimeVCVs and reflect regime-specific covariance estimates (when mvgls is fitted under a PL/ML method).

Note

Internally, this routine coordinates multiple unexported helper functions: generatePaintedTrees, fitMvglsAndExtractGIC.formula, fitMvglsAndExtractBIC.formula, addShiftToModel, removeShiftFromTree, and extractRegimeVCVs. Through these, it may also invoke lower-level utilities such as paintSubTree_mod and paintSubTree_removeShift. These helpers are internal implementation details and are not part of the public API.

Convergence and robustness

The search is greedy and may converge to a local optimum. Use a stricter shift_acceptance_threshold and/or enable uncertainty = TRUE to reduce overfitting. Re-run with different min_descendant_tips and IC choices (GIC vs BIC) to assess stability.

See also

mvgls, GIC, BIC, plotSimmap, nodelabels, packages: mvMORPH, phytools, ape, future, future.apply.

Examples

if (FALSE) { # \dontrun{
library(ape)
library(phytools)
set.seed(1)
tr <- pbtree(n = 80, scale = 1)
# Paint a single global baseline state "0"
base <- phytools::paintBranches(tr, edge = unique(tr$edge[,2]),
                                state = "0", anc.state = "0")

# Fake multivariate data (2 traits)
X <- cbind(trait1 = rnorm(Ntip(base)), trait2 = rnorm(Ntip(base)))
rownames(X) <- base$tip.label

res <- searchOptimalConfiguration(
  baseline_tree = base,
  trait_data    = as.data.frame(X),
  formula       = "trait_data ~ 1",
  min_descendant_tips = 10,
  num_cores = 2,
  shift_acceptance_threshold = 10,  # conservative
  IC = "GIC",
  plot = FALSE
)

res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)
} # }