ML trees with bootstraps

Overview

Objectives
  • Creating maximum likelihood trees with RAxML

  • Creating bootstrapped trees with RAxML

  • Creating bipartition tree (ML + bootstrap)

We showed how to generate a simple tree in the lesson “Phylogenetic tree”. However, for publication, you are expected to show more evidence, such as the stability of tree topology. One way to show this is using bootstrap values, which is the probability that a particular node (i.e. a dichotomous branching with a particular set of samples in each branch) appears among a large number of trees generated while resampling within the sequence alignment. Bootstrap values are often used in the same fashion as confidence intervals.

A common method for creating bootstrapped trees using RAxML consists of a 3 step approach:

Generating an ML tree

Maximum likelihood tree generation is computationally expensive, but the resulting tree is considered superior to other rapid methods. Thus, we’ usually generate a small number of ML trees (16 in the following example).

$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -# 16 -s input.fasta -n treeML -w outdir

$ ls -t outdir/*treeML*

Output directory specified by -w must be an absolute path in RAxML. $(pwd)/outdir may be used if outdir is a relative path. This directory needs to be created prior to running the command. Alternatively, you can use the current directory as output directory by not including this -w parameter.

RAxML_parsimonyTree.treeML.RUN.0  RAxML_parsimonyTree.treeML.RUN.1
RAxML_log.treeML.RUN.0            RAxML_parsimonyTree.treeML.RUN.2
RAxML_log.treeML.RUN.1            RAxML_parsimonyTree.treeML.RUN.3
...
...
RAxML_result.treeML.RUN.0         RAxML_result.treeML.RUN.1
RAxML_result.treeML.RUN.2         RAxML_result.treeML.RUN.3
...
...
RAxML_info.treeML                  RAxML_bestTree.treeML

RAxML will automatically select the best tree among the outputs and store it in the file RAxML_bestTree.xxx.

In the command above,

Generating bootstraps

Next, we can generate a large number of computationally permissive trees for calculating bootstrap values.

$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -b 144 -# 1000 -s input.fasta -n treeML -w outdir

$ ls outdir/*treeBS*
RAxML_info.treeBS    RAxML_bootstrap.treeBS

In the command above, -b specifies bootstrapping with supplied random seed, and -# specifies the number of bootstraps.

A newer rapid bootstrap method can be employed in place of standard bootstraping by using the argument -x instead of -b.

RAxML can also perform posterior bootstrap convergence analysis to determine if the number of bootstraps is adequate.

$ raxmlHPC -m GTRGAMMA -p144 -z outdir/RAxML_bootstrap.treeBS -I autoMRE -n BStest -w outdir

$ tail -n1 outdir/RAxML_info.BStest
Converged after 900 replicates

In the command above, -I initiates convergence testing and specifies which criterion to use for the test. -z specifies input bootstrap tree file to test.

Applying bootstrap values to the best ML tree

The final step is to apply the bootstrap values to the best ML tree.

$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -f b -t outdir/RAxML_bestTree.treeML -z outdir/RAxML_bootstrap.treeBS -n treeBP -w outdir

$ ls outdir/*treeBP*
RAxML_bipartitionsBranchLabels.treeBP    RAxML_bipartitions.treeBP

In the command above, -f b instructs creation of bipartition tree from best ML tree (supplied with -t) and the bootstrap trees (specified with -z).

The output files can be used to visualize the trees. The two output files have similar information except the branch support information is supplied in a slightly different format (node label vs branch label). Select the file that is correctly interpreted by your visualization program.

A single-step approach

RAxML can perform all three steps above with a single line of code. However, only the newer rapid approach can be used for bootstraping. By default, 20 ML trees are generated.

$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -f a -x 144 -# 1000 -s input.fasta -n treeALL -w outdir

$ ls outdir/*treeALL*
RAxML_bestTree.treeALL                    RAxML_bootstrap.treeALL
RAxML_bipartitionsBranchLabels.treeALL    RAxML_info.treeALL
RAxML_bipartitions.treeALL

RAxML resources