Overview
bifrost performs branch-level inference of multi-regime,
multivariate trait evolution on a phylogeny using penalized-likelihood
multivariate GLS fits. The current implementation searches for
evolutionary model shifts under a multi-rate Brownian Motion (BMM) model
with proportional regime variance-covariance (VCV) scaling. It operates
directly in trait space, works with fossil tip-dated trees, and is
designed for high-dimensional datasets (p > n) and large
trees (> 1000 tips).
If you have not installed the package yet, see the installation instructions on the package README.
In practical terms, bifrost asks:
- Where on the tree are shifts supported?
- When do those shifts occur?
- How does covariance and rate structure differ across inferred regimes?
Minimal worked example
The smallest end-to-end workflow is a simulated single-regime
example. Running it is mainly a quick way to confirm that
bifrost and its dependencies are installed correctly and
that the core search routine runs on your machine. In this case, we
typically expect bifrost to recover no shifts.
library(bifrost)
library(ape)
library(phytools)
library(mvMORPH)
set.seed(1)
# Simulate a tree
tr <- pbtree(n = 50, scale = 1)
# Simulate multivariate traits under a single-regime BM1 model (no shifts)
sigma <- diag(0.1, 2) # 2 x 2 variance-covariance matrix for two traits
theta <- c(0, 0) # ancestral means for the two traits
sim <- mvSIM(
tree = tr,
nsim = 1,
model = "BM1",
param = list(
ntraits = 2,
sigma = sigma,
theta = theta
)
)
# mvSIM returns either a matrix or a list of matrices depending on mvMORPH version
X <- if (is.list(sim)) sim[[1]] else sim
rownames(X) <- tr$tip.label
# Run bifrost's greedy search for shifts
res <- searchOptimalConfiguration(
baseline_tree = tr,
trait_data = X,
formula = "trait_data ~ 1",
min_descendant_tips = 10,
num_cores = 1,
shift_acceptance_threshold = 20, # conservative GIC threshold
IC = "GIC",
plot = FALSE,
store_model_fit_history = FALSE,
verbose = FALSE
)
# For this single-regime BM1 simulation, we typically expect no inferred shifts
res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)What to expect from this example:
-
res$shift_nodes_no_uncertaintyis typicallyinteger(0), -
res$optimal_ic - res$baseline_icis typically close to0, -
res$VCVscontains a single regime-specific VCV for the baseline regime"0".
Before you run on real data
At minimum, provide:
- A rooted phylogeny with branch lengths interpreted in units of time.
- A multivariate trait matrix or data frame with taxa as row names.
Important checks before fitting:
-
Tree and data alignment.
rownames(trait_data)must matchtree$tip.labelexactly, including both names and order. - Branch lengths. Branch lengths are interpreted in units of time; ultrametricity is not required.
-
Starting tree format. The input can be a rooted
phylotree; it does not need to already be asimmapobject. Internally,bifrostpaints a single baseline regime"0"and stores inferred regimes using SIMMAP conventions. -
Multi-dimensional traits.
bifrostworks directly in trait space. For high-dimensional settings, tunemvMORPH::mvglsarguments such asmethod,penalty,target, anderrorto match your data structure. -
Formulas. The default
formula = "trait_data ~ 1"fits an intercept-only multivariate model, but more general formulas are also supported when needed.
The arguments you will tune most often are:
-
min_descendant_tips: minimum clade size required for a candidate shift. -
shift_acceptance_threshold: minimum IC improvement required to accept a shift. Larger values yield more conservative models. -
IC: model comparison metric, either"GIC"or"BIC". -
num_cores: number of workers used for candidate scoring. -
formula: model formula passed tomvgls. -
method,penalty,target,error, and relatedmvglsarguments passed through.... -
uncertaintyweightsoruncertaintyweights_par: request serial or parallel per-shift IC support calculations. -
store_model_fit_history: retain the search path for later inspection and plotting. -
plot: draw or update SIMMAP plots during the search. -
verbose: emit progress messages during fitting.
Practical starting points:
- use larger
min_descendant_tipsvalues to avoid tiny-clade shifts, - start with a conservative
shift_acceptance_thresholdfor exploratory work, - compare
"GIC"and"BIC"on the same dataset when model-selection sensitivity matters, - perform sensitivity analyses rather than relying on a single run.
The ic_uncertainty_threshold argument is currently
reserved for future pruning and uncertainty workflows and can typically
be left at its default.
How the search works
searchOptimalConfiguration() performs a greedy,
step-wise search:
- Fit a baseline single-regime model.
- Generate one-shift candidate trees for internal nodes satisfying
min_descendant_tips. - Score candidate models, optionally in parallel.
- Iteratively accept shifts that improve the chosen information
criterion by at least
shift_acceptance_threshold. - Optionally compute per-shift IC support by removing each accepted shift in turn and refitting.
The accepted shift set defines the optimal no-uncertainty
configuration under the selected criterion ("GIC" or
"BIC").
High-level workflow
flowchart TD
subgraph S[Setup]
A([Start]) --> B[Init and validate]
B --> C[Paint baseline state 0]
C --> D[Generate one-shift candidates]
end
subgraph CS[Baseline and scoring]
D --> E[Fit baseline mvgls]
E --> F[Baseline IC GIC or BIC]
F --> G[Score candidates in parallel]
G --> H[Compute delta IC and sort]
end
subgraph GS[Greedy search]
H --> I[Init best tree and IC]
I --> J{More candidates?}
J -- Yes --> K[Add shift]
K --> L[Fit shifted model]
L --> M[Delta IC best minus new]
M --> N{Delta IC >= threshold?}
N -- Yes --> O[Accept update best]
N -- No --> P[Reject keep best]
O --> Q[Record status]
P --> Q
Q --> J
end
subgraph PP[Post processing]
J -- No --> U[Finalize best model]
U --> V{Compute IC weights?}
V -- No --> V0[Skip weights]
V -- Yes --> W{Any shifts?}
W -- No --> V0
W -- Yes --> X{Weights parallel?}
X -- Yes --> X1[Drop-one refits parallel]
X -- No --> X2[Drop-one refits serial]
X1 --> Y[Compute weights aicw and ER]
X2 --> Y
end
subgraph OUT[Output]
V0 --> Z[Assemble result]
Y --> Z
Z --> ZA[Extract regime VCVs]
ZA --> ZB([Return])
end
Because the search is greedy, it can converge to a local optimum. In
practice, it is useful to repeat analyses with different
min_descendant_tips, thresholds, and IC choices to assess
robustness.
Core functions
-
searchOptimalConfiguration()
The main end-to-end function for candidate generation, parallel scoring, greedy shift acceptance, optional IC-weight calculations, and result assembly. -
plot_ic_acceptance_matrix()
Visualize shift acceptance and information-criterion behavior across search iterations.
Internally, the search also relies on unexported helpers such as
generatePaintedTrees(),
fitMvglsAndExtractGIC(),
fitMvglsAndExtractBIC(),
calculateAllDeltaGIC(), paintSubTree_mod(),
addShiftToModel(), removeShiftFromTree(),
paintSubTree_removeShift(), whichShifts(), and
extractRegimeVCVs(). Most users will not need to call these
directly.
Reading the result
searchOptimalConfiguration() returns a named list that
typically includes:
-
Inputs and fitted objects:
user_input,tree_no_uncertainty_transformed,tree_no_uncertainty_untransformed, andmodel_no_uncertainty. -
Search summary:
shift_nodes_no_uncertainty,optimal_ic,baseline_ic,IC_used,num_candidates, andmodel_fit_history. -
Support and covariance summaries:
VCVs,ic_weights, andwarningswhen present.
The two returned trees distinguish transformed versus original branch
lengths, and ic_weights stores per-shift support summaries
such as IC weights and evidence ratios when those calculations are
requested.
Interpretation guidance:
- lower IC is better, so large positive values of
baseline_ic - optimal_icindicate stronger support for heterogeneity relative to the baseline model, - values of
optimal_ic - baseline_icclose to0suggest little support for adding shifts, -
shift_nodes_no_uncertaintyidentifies where supported changes in rate and covariance structure occur, -
VCVssummarize the estimated evolutionary covariance structure within each inferred regime, -
ic_weightshelp rank how strongly each accepted shift is supported in the final configuration, -
model_fit_historyis especially useful for diagnosing search behavior withplot_ic_acceptance_matrix().
Practical guidance
When moving from this smoke test to a real dataset:
- set
plot = FALSEunless you specifically want interactive plotting during the search, - use larger
min_descendant_tipsvalues to avoid tiny-clade shifts and shrink the candidate space, - start with a conservative
shift_acceptance_threshold, then compare nearby settings or both"GIC"and"BIC"if model-selection sensitivity matters, - plan for higher memory use on large or high-dimensional analyses,
- record
sessionInfo()and themvMORPHversion, and consider usingrenvfor fully reproducible runs, - archive the tree, trait matrix, formula, thresholds, and random seed used for final fits,
- repeat searches under nearby settings before treating individual inferred shifts as robust.
Once this smoke test runs cleanly, the next step is usually to move to a real dataset and inspect the returned shifts, IC support values, and regime-specific VCVs in more detail. For a full empirical workflow, continue to the jaw-shape vignette; for broader conceptual background, see Brownian Motion and Multivariate Shifts and Whole-Tree Models, PCA, and bifrost.
