To go to top of the Software pages
To previous part of Software pages
Jun Adachi and Masami Hasegawa
have written a package MOLPHY, version 2.3b3,
carrying out maximum likelihood inference of phylogenies for either nucleotide
sequences or protein sequences. Their protein sequence maximum likelihood
program, ProtML, is a successor to the one they made available to me and
which I formerly distributed on a nonsupported basis in PHYLIP.
The package is
distributed free in C source code, with documentation.
MOLPHY is available from its
web site
from http://www.ism.ac.jp/ismlib/softother.e.html
A monograph describing MOLPHY (number 28 in the Computer Science
Monographs of the Institute of Statistical Mathematics) is available
from the same source (see folder csm96 on the distribution web page),
as TeX source and as a .dvi
file. The monograph can also be ordered from the Institute.
An executable version of MOLPHY 2.2 for Windows95 or Windows NT on Intel
processors, and also one that works on Windows NT on DEC Alpha processors, is
available from Russell Malmberg at the Botany Department of the
University of Georgia (russell (at) plantbio.uga.edu)
at his software
web site
at http://www.plantbio.uga.edu/~russell/index.php?s=1&n=5&r=0
Gary Olsen
, of the Department of Microbiology,
University of Illinois, Urbana, Illinois (gary (at) life.uiuc.edu)
has developed a speeded-up replacement for my program DNAML coded in C, called
fastDNAml version 1.2.2. It achieves a number of economies and also is organized so that
it can be run on parallel processors -- he and his co-workers have constructed
trees of very large size on a high-speed parallel processor. The program can
be compiled using the "p4" portable parallel processing toolkit. It can also
be run in ordinary serial mode on workstations where it is faster than DNAML.
fastDNAml is described in a paper: Olsen, G. J., H. Matsuda, R. Hagstrom, and
R. Overbeek. 1994. fastDNAml: A tool for construction of phylogenetic trees of
DNA sequences using maximum likelihood. Computer Applications in the
Biosciences (CABIOS) 10: 41-48.
It is available in the following places:
- The C program and MacOS 9 and MacOS X executables are available
by ftp
from the Indiana University Biology ftp server at
ftp.bio.indiana.edu
in directory molbio/evolve.
- A Debian Linux executable package for fastDNAml was made available
by Stephane Bortzmeyer and is maintained by Andreas Tille. It is available
through
its web page
at
http://packages.debian.org/unstable/science/fastdnaml.
Bette Korber
of the Theoretical Division,
Los Alamos National Laboratory , Los Alamos, New Mexico
(btk (at) t10.lanl.gov) and her colleagues have released
a version of fastDNAml which uses the REV (general
reversible) model of DNA evolution. They used it for the
results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta,
A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the
ancestor of the HIV-1 pandemic strains. Science 288:
1789-1796. The program is available both in a version using the MPI
Message-Passing Interface for parallel computers or a non-parallel
version. It is available as C source code for Unix from
the web site for the programs from that paper at
http://www.santafe.edu/~btk/science-paper/bette.html.
Alexandros Stamatakis
(alexandros.stamatakis
(at) h-its.org) of the
Heidelberger Institut für Theoretische Studien, Heidelberg, Germany
and his colleagues have released RAxML, version 7.2.8,
a program for faster
reconstruction of
phylogenies by maximum likelihood. It provides faster heuristic
search, use of parallel processing, and a simulated annealing algorithm,
RAxML can also carry out parsimony, bootstrapping, and consensus tree methods.
There are a number of papers describing RAxML:
- Stamatakis, A., T. Ludwig, H. Meier, and M. J. Wolf. 2002. AxML: A fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. pp. 21-28 in Proceedings of 1st IEEE Computer Society Bioinformatics Conference (CSB2002), Palo Alto, California, August 2002.
- Stamatakis, A., T. Ludwig, and H. Meier 2003. RAxML-II: A program for sequential, parallel and distributed inference of large phylogenetic trees.
Concurrency and Computation: Practice and Experience (CCPE)
16: 975-988.
- Stamatakis, A., T. Ludwig, and H. Meier. 2004. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics Advance Access published on December 17, 2004.
RAxML is available in two versions: RAxML-Light, now at verson 1.0.9, and
the regular version, RAxML. RAxML-Light uses an approximate model of rate
variation among sites,
and can only analyze DNA sequence data, but is able to run on larger cases
than the full version of RAxML.
The programs are available as C source code, Windows executables, and Mac OS X
executables at the Exelexis Lab
software web page
at http://sco.h-its.org/exelixis/software.html.
Daniele Silvestro and Ingo Michalak
of the Department of Botany and Molecular Evolution
of the Senckenberg Research Institute, Frankfurt am Main, Germany
(raxmlgui.help (at) googlemail.com)
have released raxmlGUI
(RAxML Graphical User Interface),
version 1.0, A graphical user interface for RAxML. RaxmlGUI is intended to
accelerate and simplify the usage of RAxML, enabling an interactive control of
all its major features (as of RAxML version 7.2.8). The graphical interface is
designed to be self-explanatory and to make its use very intuitive. In
addition, a detailed built-in help file is available. Through the
implementation of multi-thread versions of RAxML, the GUI enables the optimal
utilization of the available computational resources.
It is described in the paper:
Silvestro, D. and I. Michalak. 2011. raxmlGUI: a graphical front-end for RAxML. Organisms Diversity & Evolution DOI: 10.1007/s13127-011-0056-0.
It is available as Python script and Intel Mac OS X executables. It can be downloaded from
its web site
at https://sites.google.com/site/raxmlgui/
Thomas Keane
(thomas.m.keane (at) nuim.ie)
and Thomas Naughton (tom.naughton (at) nuim.ie), both of the Department of Computer Science of the
National University of Ireland, Maynooth have released DPRML,
a distributed cross-platform tree-building program that can use the idle
clock cycles of machines, allowing idle time on hundreds of machines to be
harnessed for tree-building. It uses the PAL Java framework. It is described in a paper: Keane, T.M., T. J. Naughton,
S. A. A. Travers, J. O. McInerney, and G. P. McCormack. 2005. DPRml:
Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21: 969-974. DPRML can be downloaded from its web page at
http://distributed.cs.nuim.ie/dprml.php Its authors note
that it is slower than their more recent distributed phylogeny platform
MULTIPHYL, and they urge use of that instead of DPRML.
T. M. Keane, T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack, of the Department of Computer Science at the the National University of Ireland, Maynooth, Ireland (tkeane (at) cs.nuim.ie ) have produced MultiPhyl,
version 1.06, a distributed phylogeny platform enabling maximum likelihood
runs across a large number of heterogeneous machines.
MultiPhyl is a high-throughput implementation of a distributed phylogenetics
platform that is capable of using the idle computational resources of many
heteogeneous non-dedicated machines to form a phylogenetics supercomputer. It
allows a user to upload hundreds or thousands of amino acid or nucleotide
alignments simultaneously and perform computationally intensive tasks such as
model selection, tree searching, and bootstrapping of each of the alignments.
The program implements a set of 80 amino acid models and 56 nucleotide ML
models and a variety of statistical methods for choosing between alternative models.
It is described in the paper:
Keane, T.M., T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack. 2005. DPRml: Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21(7): 969-974.
It is available as Java code.
It can be downloaded from the downloads web site
at http://distributed.cs.nuim.ie/downloads.php
for the distributed Java-based platform produced by this group. The platform
itself can also be downloaded from the same site.
Multiphyl can also be tested by using their web server version.
Ziheng Yang
of the Department of
Genetics and Biometry, University College London,
(z.yang (at) ucl.ac.uk) has released
PAML, version 4.4, a package of programs for the maximum
likelihood analysis of nucleotide or protein sequences, including codon-based
methods that take into account both amino acids and nucleotides.
The programs can estimate branch lengths in a phylogenetic tree and parameters
in the evolutionary model such as the transition/transversion rate ratio, the
gamma parameter for variable substitution rates among sites, rate
parameters for different genes, and synonymous and nonsynonymous substitution
rates. They can also test evolutionary models, calculate
substitution rates at particular sites, reconstruct ancestral nucleotide or
amino acid sequences, simulate DNA and protein sequence evolution,
compute distances based on the synonymous and nonsynonymous changes,
and of course do phylogenetic tree reconstruction by
maximum likelihood and Bayesian Markov Chain Monte Carlo methods. The
strength of the package lies in its rich implementation of
evolutionary models, though Yang coments that
tree-making is not a strong point of the current version.
Another notable point is the availability of codon models, which Yang
pioneered. The package is
available as Windows executables and as
C source code for Unix and MacOS X systems. An Old Versions folder
in the ftp site that distributes these also contains Mac OS executables for
the earlier versions 3.0a and 3.0c.
See the PAML
web page at http://abacus.gene.ucl.ac.uk/software/paml.html
where it is available.
Amy Egan and Joana Silva
of The Institute for Genomic Research (TIGR)
in Rockville, Maryland
(aegan (at) jcvi.org)
have produced IDEA
(Interactive Display for Evolutionary Analysis),
version 2.4, a graphical interface for
PAML. IDEA allows you to run either of
the PAML programs codeml or baseml on a single dataset or on
multiple datasets simultaneously. They allow you to obtain maximum likelihood
estimates of numbers of substitutions per branch and per site and to compare
multiple models of molecular evolution given the data and a phylogenetic tree
for the sequences. You can optionally generate phylogenies with
PHYLIP, using maximum parsimony (on
small datasets) or neighbor-joining. IDEA can perform multiple runs of
codeml with different starting (dN/dS) values and merge their results
for increased accuracy. It can also analyze multiple datasets in parallel and
save parameter values for future use, and monitor progress step by step. For
codeml analyses of sites-based evolutionary models features an interactive
tabular summary of results, visualizations of selective pressure along genes,
interactive histograms and depictions of phylogenetic trees with branch
lengths proportional to the estimated number of nucleotide substitutions.
It is available as a combination of Perl script, Java executables and Linux
or Solaris executables. It can be run on systems that have Perl, Java,
PAML 3.14 or 3.15, and PHYLIP. If parallel execution is desired you need to have SGE or Condor, otherwise it will just run on the machine on which it is launched. It can be downloaded from
its web site
at http://ideanalyses.sourceforge.net/main.html
Tim Massingham and Nick Goldman
of the Eurpean Bioinformatics Institute
in Hinxton, U. K.
(timm (at) ebi.ac.uk and goldman (at) ebi.ac.uk)
have produced SLR
(Sitewise Likelihood Ratio),
a program to compute and test the nonsynonymous/synonymous ratio of
substitutions at each site. For coding sequences it makes a maximum
likelihood estimate for each amino acid position of the ratio of nonsynonymous
substitutions to synonymous substitutions, and does a likelihood ratio test
for that site. The many sitewise tests are then corrected for multiple
comparisons to indicate which sites have strong evidence of purifying or
positive selection and so whether there is any reliable evidence for the
presence of selection in the alignment. Alternatively SLR can restricted to
only detect unusually variable sites, indicating such sites and providing
evidence for the presence of positive selection in the alignment.
It is described in the paper:
Massingham, T. and N. Goldman. 2005. Detecting amino acid sites under positive
selection and purifying selection. Genetics 169: 1853-1762.
It is available as C source code, Windows executables, Linux executables and
Powermac Mac OS X executables. It can be downloaded from
its web site
at http://www.ebi.ac.uk/goldman/SLR/
Gangolf Jobb
(gangolf (at)
treefinder.de), formerly of the
Institut für Statistik of the University of München, Germany,
has produced Treefinder, a maximum likelihood program for
nucleotide sequence data.
It makes available a variety of models of base change, including codon-position-specific models. It carries out search for best trees by its own method of
tree rearrangement, and can assess statistical support for groups by either
bootstrap or a local paired-sites method. All parameters of the models can be
optimized by searching for the values that maximize the likelihood. The
program is fast, and has both a graphical user interface and a general
language in which its operation can be programmed. Trees can be interactively
manipulated and constrained in various ways. Treefinder is described in
a paper: Jobb, G., A. von Haeseler, and K. Strimmer. 2004. TREEFINDER: A
powerful graphical analysis environment for molecular phylogenetics.
BMC Evolutionary Biology 4: 18. It has been available for
download from
its web site at
http://www.treefinder.de as executables for Windows, Mac OS X, and
Linux. It requires the Java runtime environment to be present.
However currently Jobb has declared himself "on strike" at this web site
and asks that people first email him to discuss whether he should be
compensated for his work. I do not know whether that means that the program
is available for free currently, or whether he will soon start charging for it.
He certainly deserves compensation for this good program.
Stéphane Guindon (currently at the University of Auckland, New Zealand,
s.guindon (at) auckland.ac.nz) and Olivier Gascuel (gascuel (at) lirmm.fr) at the
LIRMM, of the CNRS and the University of Montpellier II, France, have
released PHYML version 3.0, a fast maximum likelihood
program for
nucleotide or protein sequence data. It has 6 possible DNA substition
models, 5 amino acid substitution models, allowing estimation of many of
the model parameters, and can allow for a gamma
distribution of rates among sites and a proportion of invariable sites.
It can also do bootstrapping of the trees.
PHYML is described in a paper: Guindon, S., and O. Gascuel. 2003. A simple,
fast, and accurate algorithm to estimate large phylogenies by maximum
likelihood. Systematic Biology 52: 696-704. It is
available as Linux, SunOS, Windows, and Mac OS X executables from
its web site in Montpellier at
http://www.atgc-montpellier.fr/phyml/binaries.php, where it is also available as a
web server.
Johan Nylander (Johan.Nylander
(at) abc.se)
has written
BootPHYML version 3.4. This is a Perl script that performs
bootstrapping using programs from PHYLIP
, substituting PHYML for the PHYLIP program DNAML.
It works with Mac OS X and Linux or Unix. It is available on
a web page
at Nylander's web site in Sweden.
Bastien Boussau
of the Laboratoire de Biométrie et Biologie Evolutive
of the Université Lyon 1, CNRS, Lyon
(bastien.boussau (at) univ-lyon1.fr)
has released nhPhyML
(non-homogeneous PhyML),
version 1.00, a program based on PHYML to reconstruct
trees with a non-homogeneous model of DNA sequence evolution. nhPhyML can
reconstruct phylogenies by Maximum Likelihood framework using the Galtier and
Gouy (1998) nonhomogenous model of DNA sequence evolution. This model
allows different equilibrium G+C content to be associated to different
branches of the phylogeny. nhPhyML will locally rearrange a given rooted
phylogeny without changing the root position, and will optimize parameters of
the model of sequence evolution. It is described in the paper:
Boussau, B., and M. Gouy. 2006. Efficient likelihood computations with
nonreversible models of evolution. Systematic Biology 55 (5):
756-768.
It is available as C source code and Linux executables. It can be downloaded from
its web site
at http://pbil.univ-lyon1.fr/software/nhphyml/
Bastien Boussau
of the Laboratoire de Biométrie et Biologie Evolutive
of the Université Lyon 1, CNRS, Lyon
(bastien.boussau (at) univ-lyon1.fr)
has released PhyML-Multi
(a PhyML-derived program to detect recombination which uses multiple trees for one alignment),
version 1.00, a program that can infer recombination breakpoints and infer
multiple phylogenetic from one alignment. PhyML-Multi can find recombination
breakpoints in an alignment and infer phylogenies. It takes as input a
sequence alignment and a putative number k of trees to reconstruct.
Then, using a HMM or a mixture model, it will infer k trees from the
alignment and predict recombination breakpoints, under the Maximum Likelihood
criterion. PhyML-Multi can work on DNA sequences as well as protein sequences,
and can handle dozens of sequences.
It is described in the paper:
Boussau, B., L. Guéguen, and M. Gouy. 2009. A mixture model and a hidden
markov model to simultaneously detect recombination breakpoints and reconstruct
phylogenies. Evolutionary Bioinformatics Online 5 (June 25): 67-79.
It is available as C source code and Linux executables. It can be downloaded from
its web site
at http://pbil.univ-lyon1.fr/software/phyml_multi/
Pierre Rioux and Tim Littlejohn
, then of the Informatics
Division of the
Organelle Genome Megasequencing Program at the Université de
Montréal (LittleJohn is currently at BioLateral Group,
in Sydney, Australia, tim (at) biolateralgroup.com) made available PARBOOT (PARallel
BOOTstrapping), a program that
takes bootstrap sampled data sets and splits
them up, submitting each to a different computer, so as to run bootstrapping
quickly on networks of computers. It is intended for use with
PHYLIP programs.
It is available free as C source code
from the Indiana University IUBIO software archive
at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/phylo/ParBoot/. It is no longer available by ftp from Montréal.
It is described on a web page at the Université de Montréal at
http://megasun.bch.umontreal.ca/aboutpb.html. It requires a
networked system of computers with PHYLIP, a Perl interpreter, and
appropriate accounts and permissions.
Laura Salter Kubatko
of the Departments of Statistics and Evolution, Ecology, and Organismal Biology
at the Ohio State University, Columbus, Ohio
(lkubatko (at) stat.ohio-state.edu)
has written SSA
(inference of maximum likelihood phylogenetic trees using a Stochastic Search Algorithm),
version 1.0
, a program that uses a stochastic search to find maximum likelihood
phylogenies. SSA is a program for inferring maximum likelihood phylogenies
from DNA sequences. Two versions of the program are available: one which
assumes a molecular clock and one which does not make this assumption. The
method for searching the space of trees for the ML tree is based on a
simulated-annealing type algorithm. The program implements the F84 model of
nucleotide substitution and associated sub-models. It estimates the ML tree
and branch lengths, and can optionally estimate the transversion/transversion
ratio. Upon termination, the program returns the k trees of highest likelihood
found during the search, where k can be set by the user.
It is described in the paper:
Salter, L. A. and D. K. Pearl. 2001. Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. Systematic Biology 50(1): 7-17.
It is available as executables for Windows, Linux, AIX, and SPARC systems.
Laura is also willing to send the source code to users who own the book
Numerical Recipes in C by Press, Teukolsky, Vetterling and Flannery,
and who thus have permission to use routines from that book.
The documentation and executables can be downloaded from
its web site
at http://www.stat.ohio-state.edu/~lkubatko/software/ssa/ssa.html
Nir Friedman, Matan Ninio, Tal Pupko, Eval Privman, and Itshak Pe'er
of the Department of Computer Science and Engineering of Hebrew University, Jerusalem, Israel,
and the Department of Cell Research and Immunology of Tel Aviv University,
Israel
(semphy (at) cs.huji.ac.il)
have written SEMPHY
(Structural EM PHYlogenetic reconstruction),
version 2.0, Uses the structural EM algorithm to search for maximum likelihood
phylogenies. The Structural EM algorithm is one proven to go uphill on the
likelihood surface, and should gain in efficiency and adequacy of search of
the likelihood surface compared to other likelihood algorithms. The program
can use DNA or protein sequences, and can use a variety of DNA models and
amino acid replacement models including the general reversible model and the
HKY model (for DNA) and the JTT model (for protein sequences). It also allows
for Gamma-distributed among-sites rate variation. SEMPHY also makes available
an iterative distance matrix method which computes Bayesian posterior rates of
change at individual sites, uses these to compute distances and find a
neighbor-joining tree. The program and methods are described in the papers:
- Friedman, N., M. Ninio, I. Pe'er, and T. Pupko. 2002. A Structural EM Algorithm for phylogenetic inference. Journal of Computational Biology 9(2): 331-353.
- Ninio, M., E. Privman, T. Pupko, and N. Friedman. 2007. Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using bayesian inference of evolutionary rates. Bioinformatics 23: e136-e141.
It is available as C++ source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from
its web site
at http://compbio.cs.huji.ac.il/semphy/
Simon Whelan
of the Faculty of Life Sciences
at the University of Manchester, U.K.
(simon.whelan (at) manchester.ac.uk)
has released Leaphy
(Likelihood Estimation Algorithms for PHYlogenetics),
version 1.0beta, a fast and accurate program for maximum likelihood
phylogenetic inference. Leaphy uses maximum likelihood to estimate trees from
aligned amino acid and nucleotide sequences under a variety of commonly used
and popular models. The methods for searching for the best tree
topology are described in the paper:
Whelan, S. 2007. New approaches to phylogenetic tree search and their
application to large numbers of protein alignments. Systematic Biology
56: 727-740.
It is available as Windows executables and Linux executables. It can be downloaded from
its web site
at http://www.bioinf.manchester.ac.uk/leaphy/Leaphy.htm
Daniele Catanzaro
of the Computer Science Department
of the Université Libre de Bruxelles (U.L.B.)
(dacatanz (at) @ulb.ac.be)
has released PhyloCoco
version 2.3, a molecular phylogeny package for Intel-iMac with OS 10.4.9 or
higher and Java 1.4 or higher. PhyloCoco is a minimalist tool for rebuilding
molecular phylogenies by means of the likelihood criterion or the
minimum evolution criterion. Phylococo
selects the best substitution model of DNA evolution for the dataset of
sequences to be analyzed and displays the best phylogeny found so far.
It uses the GTR model of DNA evolution and uses different optimization
methods including the Very Large Scale Neighborhood (VLSN) search for the
topology and Iterated Local Search (ILS) to explore the solution space.
PhyloCoCo uses FigTree
to display the resulting phylogeny. It is described in the paper:
Catanzaro, D., R. Pesenti and M. C. Milinkovitch. 2007. Estimating phylogenies
under maximum likelihood: a very large-scale neighborhood approach.
Submitted to BMC Bioinformatics.
It is available as Java source code and Intel Mac OS X executables. It can be downloaded from
its web site
at http://homepages.ulb.ac.be/~dacatanz/Site/PhyloCoco.html
Vivek Gowri-Shankar
and Howsun Jow (vivek.gowri-shankar (at) s.man.ac.uk) of the Department of Computer Sciences of the University of Manchester,
Manchester, U.K. have written PHASE, version 1.1,
a software package for PHylogenetics And Sequence Evolution. It infers
phylogenies with models for RNA evolution that include models for both
paired sites and unpaired sites. The models for the unpaired sites have the
usual 4 states, while the models for the paired sites have 6, 7, or 16
states, depending on the model chosen. The programs carry out a Bayesian
Markov chain Monte Carlo (MCMC) analysis that samples trees from the
posterior distribution given the data. PHASE is described in two papers:
- Hudelot, C., V. Gowri-Shankar, H. Jow, M. Rattray and P. Higgs. 2003.
RNA-based phylogenetic methods: Application to mammalian mitochondrial RNA sequences. Molecular Phylogenetics and Evolution 28: 241-252.
- Jow, H., C. Hudelot, M. Rattray and P. Higgs. 2002. Bayesian phylogenetics
using an RNA substitution model applied to early mammalian evolution.
Molecular Biology and Evolution 19: 1591-1601.
It is available as C++ source code and Linux or Windows executables from
its web page at
http://www.bioinf.man.ac.uk/resources/phase/.
Le Sy Vinh
(vinh (at) cs.uni-duesseldorf.de)
and Heiko Schmidt (heiko (at) cs.uni-duesseldorf.de)
of the Institut für Bioinformatik of the University of Düsseldorf,
Germany and Arndt von Haeseler (arndt.von.haeseler (at) univie.ac.at)
of the Center for Integrative Bioinformatics Vienna (CIBIV), Austria,
have written Phylogenetic Navigator (PhyNav) version 1.0.
This program finds subsets of species in a dataset that are "minimal k-distance
subsets" and analyses these each by maximum likelihood. Then it stitches these
groups together using likelihood. This makes it possible to analyze larger
datasets. The program is described in a paper: Vinh, L. S., H. A. Schmidt,
and A. von Haeseler. 2005. PhyNav: A novel approach to reconstruct large
phylogenies. pp. 386-393 in Classification, the Ubiquitous Challenge (Proceedings of the 28th Annual Conference of the GfKl 2004), ed. C. Weihs and
W. Gaul. Series Studies in Classification, Data Analysis, and Knowledge
Organization. Springer-Verlag, Heidelberg/New York. It is available
as Linux executables from
its web site at
http://www.cibiv.at/software/phynav/
Shu-chuan (Grace) Chen
of the School of Mathematical and Statistical Sciences
of Arizona State University, Tempe, Arizona
(shu-chuan.chen (at) asu.edu)
has released MixtureTree
version 2.0, A phylogenetic tree package based on mixture models for reconstructing phylogeny from binary sequence data. MixtureTree is a Linux based program (written in C/C++) which implements an algorithm for binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to
the mixture algorithm with three different optimization options, the program
also implements a bootstrap procedure with a majority-rule consensus tree.
The program is described in the papers:
- Chen, S. C., M. Rosenberg, and B. Lindsay. 2011. MixtureTree: a program
for constructing phylogeny, BMC Bioinformatics 12: 111.
- Chen, S. C., M. Li, M. Rosenberg, and B. Lindsay. 2011. Mixture tree
construction and its applications. pp 135-147 in
pp. 135-147 in Handbook of Statistical Bioinformatics, ed. by H. S. Lu,
B. Scholkopf, and H. Zhao. Springer Handbooks of Computational Statistics. Springer-Verlag
The methods are described in the paper: Chen, S. C., and B. Lindsay, B. 2006.
Building mixture trees from binary sequence data, Biometrika 93
(4): 843-860.
MixtureTree is available as C source code and Linux executables. It can be downloaded from
its web site
at http://www.mixturetree.net/download.html
Morgan Price
of the Adam Arkin's group in the Physical Biosciences Division
of Lawrence Berkeley National Laboratory, Berkeley, California
(fasttree (at) microbesonline.org)
has produced FastTree
version 2, a fast maximum likelihood program that starts from a
neighbor-joining-like distance matrix estimate of the tree, which is
inferred using profiles of sequences instead of a distance matrix. It then
rearranges
that tree using a minimum evolution criterion, and finally optimizes the
tree by nearest-neighbor rearrangements using likelihood as the criterion.
The final tree is thus at least a local optimum using likelihood.
It is described as having run times proportional to N2L,
where N is the
number of species and L the sequence length. It can handle either nucleotide
or protein sequences. It also computes a measure of local support for nodes
using bootstrapping and a Shimodaira-Hasegawa (SH) method involving nearby
nodes.
It can use multithreading to take advantage of multiple processors.
It is described as much faster than other likelihood programs.
FastTree uses a GTR model of substitution, with gamma-distributed rate
variation.
It is described in two papers:
- Price, M. N., P. S. Dehal, and A. P. Arkin 2009. FastTree: Computing
large minimum-evolution trees with profiles instead of a distance matrix.
Molecular Biology and Evolution 26: 1641-1650
- Price, M. N., P. S. Dehal, and A. P. Arkin 2010. FastTree 2 --
approximately maximum-likelihood trees for large alignments. PLoS ONE
5(3): e9490
FastTree is available as C source code, Windows executables and Linux executables.
It can be downloaded from
its web site
at http://www.microbesonline.org/fasttree/
Paul Michael Agapow
, then
of the Department of Biology
of Imperial College, Silwood Park, U.K. and more recently of the
Health Protection Agency, U.K.,
(agapow (at) agapow.net)
has written Mac5, version 1.7.3,
a program for phylogenetic reconstruction using gapped data.
MAC5 implements MCMC sampling to estimate a phylogenetic tree from a DNA
multiple alignment. What differentiates MAC5 from similar programs is its use
of five-state sequence evolution models as a means to include the gap
information.
It is available as C source code, Windows executables and Powermac Mac OS X executables.
Its author says that owing to other projects, Mac5 is not being further
developed and is not being supported by him. It can be downloaded from
its web site
at http://www.agapow.net/software/mac5
David Posada
(dposada (at) uvigo.es)
of the Department of Biochemistry, Genetics and Immunology of the
University of Vigo, Spain and Keith Crandall
of the Department of Biology, Brigham Young University
released
Modeltest version 3.7, a program to test a hierarchy
of statistical models of DNA evolution using the Likelihood Ratio Test
criterion and the AIC (Akaike Information Criterion). The likelihood
values are obtained by running PAUP*.
MODELTEST accepts likelihood scores corresponding to 56 models of DNA
substitution including whether transition and transversion rates are
equal, whether rates at different sites are equal, and whether there are
invariant sites. Modeltest is described in the paper:
Posada, D. and K. A. Crandall. 1998. MODELTEST: testing the model of DNA
substitution. Bioinformatics 14: 817-818.
It is available as executables for Macintosh, for Windows,
and source code in C for that can be compiled on many other systems.
It is distributed from
its web site at
http://darwin.uvigo.es/software/modeltest.html.
Modeltest was the basis for two further developments: the MrModeltest
program which uses MrBayes
and the FindModel server at Los Alamos National laboratories which
is a revised version of Modeltest that uses the weighbor program to infer the trees.
David Posada
(dposada (at) uvigo.es)
of the Department of Biochemistry, Genetics and Immunology of the
University of Vigo, Spain has released jMODELTEST version
0.1.1, a
Java version of Modeltest. Like Modeltest, it
carries out statistical selection of best-fit models of nucleotide substitution.
It implements five different model selection strategies: hierarchical and
dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian
information criteria (AIC and BIC), and a decision theory method (DT). It also
provides estimates of model selection uncertainty, parameter importances and
model-averaged parameter estimates, including model-averaged phylogenies.
It is described in the paper: Posada D. 2008. jModelTest: Phylogenetic Model
Averaging. Molecular Biology and Evolution 25: 1253-1256.
It is distributed as Java executables that will run on Java-equipped Windows
systems, on Mac OS X, and on Linux systems that have Java installed. It
also uses PHYML to comput maximum likelihood trees under
the various models. I do not know whether it comes with PHYML installed
or requires the user to install it. jMODELTEST
will be found at
its web site at
http://darwin.uvigo.es/software/jmodeltest.html
Paulo Nuin (nuinp (at) mcmaster.ca) of the Department of Biology,
McMaster University, Hamilton, Ontario, Canada has released
MrMTgui version 1.01. This is a graphic user interface for
running Modeltest and MrModeltest. It is available for Windows
as executables from
the MrMTgui web site
at http://genedrift.org/mtgui.php. Source code of a
Linux version is also available which can be compiled using the WxWindows
windowing software. The Linux sources are available by accessing a
svn (subversion) version-control code base, using instructions available
at the above site. MrMTgui was formerly known as MTgui in the earlier
version which could not access MrModeltest.
Johan Nylander (Johan.Nylander
(at) abc.se)
has released MrModeltest version 2.2. This is a program which is a simplified version of
Modeltest 3.7. It is performs hierarchical
likelihood ratio tests and calculates approximate AIC, AICc, and Akaike weights
of the nucleotide substitution models currently implemented in both
PAUP* and MrBayes.
Version 2 has added use of four different hierarchies for the likelihood ratio
tests and the selected model being printed in a MrBayes block.
MrModeltest is available as an executable and source code for Windows,
for Mac OS, and for Mac OS X, and as source code for Linux and Unix.
It is available from
Nylander's software download site
at http://www.abc.se/~nylander/ in Sweden.
Johan Nylander (Johan.Nylander
(at) abc.se)
has written Modelfit version 1.2, and MrModelfit version 1.2. These are Perl
scripts that can run (respectively) Modeltest and
MrModeltest simply by typing a single command
line. They are available from
Nylander's software download site
at http://www.abc.se/~nylander/ in Sweden.
Charles Bell
of the Department of Biology of Xavier University of
Louisiana, New Orleans (cbell3 (at) xula.edu)
has written Porn*
(Phylogenetics On Rick's Network, as it was originally hosted on Rick
Ree's site)
verson 2.0, a Linux clone of Modeltest using the Python
language. It enables command-line computations equivalent to Modeltest
under the Linux operating system. It creates command blocks for PAUP* which can be used when running PAUP*.
Porn* is written as a shell script invoking Python modules.
It is available at its web site at http://www.phylodiversity.net/cbell/pornstar/
David Posada
(dposada (at) uvigo.es)
of the Department of Biochemistry, Genetics and Immunology of the
University of Vigo, Spain has released ProtTest,
version 2.4, a Java program allowing testing of 64 different models of protein
evolution, using the AIC, AICc, and BIC criteria for choosing among
models that include different substitution models, invariant sites,
rate heterogeneity, and empirical amino acid frequency variants of the
models. ProtTest uses the PAL library
of phylogenetic java routines and also uses the PHYML program to compute likelihoods.
It is described in the paper: Abascal, F., R. Zardoya and D. Posada. 2005.
ProtTest: Selection of best-fit models of protein evolution.
Bioinformatics 21: 2104-2105. It is available from
its web site
at http://darwin.uvigo.es/software/prottest.html
Thomas Keane, of the Bioinformatics and
Pharmacogenomics Lab of the Department of Biology,
National University of Ireland, Maynooth
(thomas.m.keane (at) nuim.ie)
has written ModelGenerator, version 0.85.
It is a Java program for model selection that selects
amino acid and nucleotide substitution models using
Fasta or PHYLIP alignments.
It supports 56 nucleotide and 80 amino acid substitution models.
It is described in the paper: Keane, T. M., C. J. Creevey, M. M.
Pentony, T. J. Naughton and J. O. McInerney. 2006, Assessment of methods
for amino acid matrix selection and their use on empirical data shows that ad
hoc assumptions for choice of matrix are not justified. BMC Evolutionary
Biology 6: 29.
It is available from its web site at http://bioinf.may.ie/modelgenerator/.
Johan Nylander (Johan.Nylander
(at) abc.se)
has written MrAIC
verion 1.4.4. This is a Perl script that carries out AIC, AICc, BIC, and
Akaike weights model comparison methods for nucleotide substitution models
by invoking the PHYML program. It is distributed from
Nylander's software download site
at http://www.abc.se/~nylander/ in Sweden.
Vladimir Minin, Zaid Abdo, Paul Joyce, and Jack Sullivan
of the Department of Biological Sciences
at the University of Idaho, Moscow, Idaho
(jacks (at) uidaho.edu)
or
(vminin (at) u.washington.edu)
(Minin is now at the University of Washington)
have released DT-ModSel
(Decision Theory MODel SELection),
a performance-based method for selecting a likelihood model for phylogenetic estimation .
It implements a model selection method which is based on the Bayesian Information Criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. It can compare 56 different models of molecular sequence evolution on a given tree.
Minin, V., Z. Abdo, P. Joyce, and J. Sullivan. 2003. Performance-based selection of likelihood models for phylogeny estimation. Systematic Biology 52: 674-683.
It is available as Perl script. It can be downloaded from
its web site
at http://www.webpages.uidaho.edu/~jacks/DTModSel.html
Sergei Kosakovsky Pond
and Simon Frost of the Anitviral Research Center,
University of California, San Diego
and Spencer Muse of the Department of Statistics, North Carolina State
University, Raleigh, North Carolina (muse (at) stat.ncsu.edu)
have released HY-PHY (HYpothesis testing using
PHYlogenies), version 0.99Beta. HY-PHY has general ways of
enabling the user to perform a wide variety of statistical tests of
different models of molecular sequence change. It is actually a
higher-level programming language which enables the user to set
up many different kinds of tests. The user can define their own
alphabet of symbols and test any reversible subtitution model.
Examples of tests that can be performed include molecular clock tests,
relative rate tests, relative ratio tests, and tests of positive
selection.
It is described in a paper: Kosakovsky Pond, S. L., S. D. Frost, and S. V. Muse.
2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics
21(5): 676-679.
Although not primarily intended as a
phylogeny estimation package, it also can infer trees by
Neighbor-Joining and UPGMA methods, and a number of search
strategies are also available for likelihood inference.
HY-PHY is freely available as executables for Mac OS, for Mac OS X, for
Windows, and as source code for for Unix and Linux. It is available at
the HY-PHY web page
at http://www.hyphy.org.
Akifumi S. Tanabe
of the Division of Ecology and Evolutionary Biology, Department of
Environmental Life Sciences, Graduate School of Life Sciences
of Tohoku University, Japan
(astanabe (at) mail.tains.tohoku.ac.jp)
has released Kakusan4,
a parallelized nucleotide substitution model selection
script written in the Perl language for data sets with multiple partitions.
Kakusan3 supports nucleotide substitution model selection on each partition and/or each
codon position by AIC, AICc or BIC. Because the optimization of likelihoods
is executed using BASEML,
PAUP* or
Treefinder and these can be run in
parallel, Kakusan can take advantage of multi-core systems or multiple
processor systems. The Kakusan Perl script can be run on Windows,
MacOS X, Linux, FreeBSD and on other UNIX operating systems. It accepts
several different input file formats.
It outputs configuration files for Treefinder, MrBayes and PAUP*.
It is described in the paper:
Tanabe, A. S., 2007, Kakusan: a computer program to automate the selection of
a nucleotide substitution model and the configuration of a mixed model on
multilocus data. Molecular Ecology Notes 7: 962-964.
It is available as Perl script, Windows executables and Mac OS X universal executables. It can be downloaded from
its web site
at http://www.fifthdimension.jp/products/kakusan/. Earlier
versions, Kakusan, Kakusan2, and Kakusan3 can also be downloaded there.
Jonathan Bollback
of the University of Edinburgh, Edinburgh, U.K., and of the Institute
of Science and Technology, Austria
(j.p.bollback (at) ed.ac.uk)
has written MAPPS
(Model Adequacy in Phylogenetics by Predictive Simulation)
version 1.1.6, a program to evaluate the fit of a group of phylogenetic models
to DNA sequence data. The rationale behind this approach is that an adequate
model should be able to predict future data (nucleotide site patterns). In the
absence of future data the model's predictive ability is compared to the
original data set. The model's predictive ability is evaluated through
simulation under the model. Comparison of simulated (or predictive) data sets
is evaluated using the multinomial test statistic. The program uses data and
trees in a format compatible with the output from
MrBayes.
It is described in the paper:
Bollback, J. P. 2002. Bayesian model adequacy and choice in phylogenetics.
Molecular Biology and Evolution 19(7): 1171-1180.
It is available as Mac OS X universal executables. It can be downloaded from
its web site
at http://www.simmap.com/bollback/software.html
Hidetoshi Shimodaira ("Shimo")
of the Department of
Mathematical and Computing Sciences, Tokyo Institute
of Technology, Japan (shimo (at) is.titech.ac.jp)
has released CONSEL version 0.1k, a package of small
programs to calculate P values for tests of phylogenies. It uses
output from other phylogeny programs (in particular it can use output
from PAUP,
PAML,
PHYML, and
MOLPHY) which makes available
to it the sitewise log-likelihoods for some trees and the trees themselves.
It uses these to carry out the Kishino-Hasegawa test, the Shimodaira-Hasegawa
test, a weighted version of the SH test, and a new "approximately
unbiased" test of Shimodaira's. CONSEL is available as C source code
that will compile on Linux and Unix systems that have the gcc
compiler, and it is also available as a DOS executable that will
run on DOS or Windows systems. It can be downloaded from
its web site
at http://www.ism.ac.jp/~shimo/prog/consel/index.html.
It is described in a paper: Shimodaira, H. and M. Hasegawa. 2001.
CONSEL: for assessing the confidence of phylogenetic tree selection.
Bioinformatics 17: 1246-1247 which cites the statistical
papers describing the methods.
Hidetoshi Shimodaira
("Shimo")
of the Department of Mathematical and Computing Sciences
of the Tokyo Institute of Technology, Ookayama, Meguroku, Tokyo, Japan
( shimo (at) is.titech.ac.jp)
has written scaleboot
(approximately unbiased P-values via multiscale bootstrap),
version 0.3-2, an R package for making approximately unbiased P values for
tree topologies. savelboot implements Shimodaira's Approximately Unbiased
method of putting P values on regions of parameter space, including tree
topologies. The P-values are computed from a set of multiscale bootstrap
probabilities , computed by sampling different fractions of the characters.
The multiscale bootstrap method has also been implemented in the program
CONSEL as well. scaleboot has an
interface for the pvclust clustering package in R. It also has a front end
for phylogenetic inference, and it can replace the CONSEL program for testing
phylogenies. Currently, scaleboot does not have a method for file conversion
from other phylogeny packages, so we must use CONSEL for this purpose before
applying scaleboot to calculate an improved version of AU p-values for trees
and branches. The methods are described in the papers:
- Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection, Systematic Biology 51 492-508.
- Shimodaira, H. 2004. Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling, Annals of Statistics 32 2616-2641.
It is available as an R package. It can be downloaded from
its web site
at http://www.is.titech.ac.jp/~shimo/prog/scaleboot/index.html
Maria Anisimova, Olivier Gascuel, and Jean-François Dufayard
of the Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM)
of the Université de Montpellier II, Montpellier, France
(manisimova (at) hotmail.com)
have produced PHYML-aLRT
(PHYML approximate Likelihood Ratio Test),
version 1.1, a program to carry out likelihood ratio tests of the presence of
branches in a phylogeny. PHYML-aLRT is a modification of the original
PHYML program, and is designed to compute test of the
reality of branches in a known phylogeny. Five branch support tests are
available: (1) the bootstrap, (2) aLRT statistics, (3) aLRT parametric
(Chi2-based) branch support, (4) aLRT non-parametric branch support
based on a Shimodaira-Hasegawa-like procedure, and (5) a combination of these
two latters supports, that is, the minimum value of both.
The methods are described in the paper:
Anisimova, M., and O. Gascuel. 2006. Approximate likelihood ratio test for branchs: A fast, accurate and powerful alternative. Systematic Biology 55(4): 539-552.
It is available as Windows executables, Linux executables, Solaris executables,
Powermac Mac OS X executables and Intel Mac OS X executables. It can be downloaded from
its web site
at http://atgc.lirmm.fr/phyml/alrt/ This program was of temporary
usefulness; the
method was made available in PHYML 3.0 and should probably be used from that
program, athough these executables are still available for download.
Nick Grassly
,
of the Department of Infectious Disease Epidemiology of
the School of Public Health,
Imperial College School of Medicine, St. Mary's Campus, London
(n.grassly (at) imperial.ac.uk)
has written PLATO,
version 2.11, (Partial Likelihoods Assessed Through Optimisation),
a program that takes sequential PHYLIP-style DNA sequences
followed by their maximum likelihood phylogeny, and using a likelihood approach
with sliding window analysis and Monte Carlo simulation
of the null distribution detects anomalously evolving regions in the DNA
sequences and assesses their significance.
This may lead to the detection of, for example, recombination, gene conversion
or convergence, or reveal variable selective pressures along the gene sequence.
A general substitution model is used that can allow the test to reveal
differences due to recombination while ignoring those due to varying rate
of evolution.
The method is described in the paper: Grassly, N. C., and E. C. Holmes. 1997.
A likelihood method for the detection of selection and recombination using
sequence data. Molecular Biology and Evolution 14: 239-247.
It is available as a Mac OS Macintosh binary executable, or in source code for
Unix systems. Although no longer distrubuted by Grassly, it is available
at the IUBIO web site
at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/dna/analysis/Plato/
Iain Milne, Dominik Lindner, and Frank Wright
of Biomathematics and Statistics Scotland
at the Scottish Crop Research Institute, Invergowrie, Dundee, Scotland
(help (at) topali.org)
have released TOPALi
version 2, a program for statistical and evolutionary analysis of multiple
sequence alignments. It is currently at version 2.5. It checks for evidence
of past recombination events by looking for changes in the inferred
phylogenetic tree TOPology between adjacent regions of a multiple
sequence ALignment. Their method detects recombinations by sliding a
window along a sequence alignment, and measuring the discrepancy between
the trees suggested by the first and second halves of the window, using
distance matrix methods. Version 2 includes further statistical tests
for recombination based on nonparametric bootstrapping and allowing for rate
heterogeneity between sites. It can also launch a range of statistical and
evolutionary analyses of multiple sequence alignments as web services running
(either locally on your PC) or on the HPC cluster in Dundee. These
include phylogenetic model selection (via
ModelGenerator), Bayesian
and maximum Likelihood phylogenetic tree estimation (via
PHYML and
MrBayes),
detection of sites under positive selection (using
PAML), and the recombination
breakpoint location analysis methods.
The versions of TOPALi are described in two papers:
- Milne, I., D. Lindner, M. Bayer, D. Husmeier, G. McGuire,
D. F. Marshall and F. Wright. 2009. TOPALi v2: a rich graphical interface
for evolutionary analyses of multiple alignments on HPC clusters and
multi-core desktops. Bioinformatics 25 (1): 126-127
- Milne, I., F. Wright, G. Rowe, D. F. Marshal, D. Husmeier, and G. McGuire.
2004. TOPALi: Software for automatic identification of recombinant sequences
within DNA multiple alignments. Bioinformatics 20 (11):
1806-1807.
It is available as Java source code, Java executables, Windows
executables and Linux executables. They can be downloaded from
its web site
at http://www.topali.org. Version 1 of TOPALi has been superseded
by version 2 but is also available, at
the version 1 web page
at http://www.topali.org/topali-v1/
Kim Fisker
, then
of the Computer Science Department at Aarhus University, Denmark
released RecPars, which does
a parsimony analysis of DNA sequences. It was more
recently maintained by Thomas Christensen of that department. It tries to find
the best phylogenies for different regions of the sequences and
thereby postulating a recombination event between these segments.
The method is described in a paper: Hein, J. 1993. A heuristic method to
reconstruct the history of sequences subject to recombination.
Journal of Molecular Evolution 36: 396-406.
RecPars is available as C source code for Unix. It is distributed from
its
web site
at http://www.daimi.au.dk/~compbio/recpars/recpars.html.
A web server is available there as well.
Dan Gusfield (gusfield (at) cs.ucdavis.edu) and Ren-Hua Chung
(rchung (at) ucdaavis.edu), both of the Department of Computer Science
at the University of California, Davis, have released PPH
(Perfect Phylogeny Haplotyper). PPH takes a set of diploid genotypes for SNP
(single nucleotide polymorphism) markers, and infers haplotypes for them. It
does this by seeing whether it can find a set of haplotypes that resolve all
diploid genotypes and that fit onto a tree without requiring any extra changes
of nucleotides (in other words, they are all compatible with the same tree).
The result is not only the haplotype resolution but the resulting tree, if any.
The method is described in a paper: Gusfield, D., 2002 Haplotyping as perfect
phylogeny: conceptual framework and efficient solutions, pp. 165-175 in
Proceedings of RECOMB 2002, edited by G. Myers, S. Hannenhalli,
D. Sankoff, S. Istrail, P. Pevzner et al. ACM Press, New York. The program is
available as C++ and Perl source code, and as executables for Windows, for
SUN SPARC Solaris, for Intel/AMD-compatible Linux, and for Mac OS X from
its web site at http://wwwcsif.cs.ucdavis.edu/~gusfield/pph.html.
Marc Suchard and Vladimir Minin
of the Department of Biomathematics
at the University of California, Los Angeles
(msuchard (at) ucla.edu)
have released DualBrothers
version 1.1
, recombination detection software based on the dual Multiple Change-Point (MCP) model.
. The model allows for changes in topology and evolutionary rates across sites in a multiple sequence alignment. It uses a Bayesian approach together with an MCMC (Markov chain Monte Carlo) sampling to simulate from the posterior distribution of the dual MCP model parameters.
It is described in the papers:
- Minin, V. N., K. S. Dorman, and M. A. Suchard. 2005. Dual multiple change-point model leads to more accurate recombination detection,
Bioinformatics 21: 3034-3042.
- Suchard M. A., R. E. Weiss, K. S. Dorman. and J. S. Sinsheimer. 2003. Inferring spatial phylogenetic variation along nucleotide sequences: a multiple change-point model. Journal of the American Statistical Association 98: 427-437.
- Suchard M. A., R. E. Weiss, K. S. Dorman, and J. S. Sinsheimer. 2002. Oh brother, where art thou? a Bayes factor test for recombination with uncertain heritage. Systematic Biology 51: 715-728.
It is available as Java code which needs the user to also download
the Colt scientific library for Java. It can be downloaded from
its web site
at http://www.biomath.ucla.edu/msuchard/DualBrothers/
Karin Dorman
of the Department of Genetics, Development and Cell Biology
of Iowa State University, Ames, Iowa
(kdorman (at) iastate.edu)
has written cBrother, a C version of the DualBrothers program,
with extensions. cBrother is a C version of the Java code of
DualBrothers, developed by
Suchard et al. as a Bayesian multiple change point model to test for the presence of rare recombination events in the history of a set of sampled sequences.
It is available as C source code. It can be downloaded from
its web site
at http://rumi.gdcb.iastate.edu/software/index.xml
Simone Linz, Achim Radtke, and Arndt von Haeseler
of the Center of Integrative BioInformatics Vienna
of the University of Vienna, Austria
(jarndt.von.haeseler (at) univie.ac.at
and linz (at) cs.uni-duesseldorf.de)
have written HGT
(Horizontal Gene Transfer),
a program to test for the presence of horizontal gene transfer. HGT considers
the distribution of trees obtained from a set of different genes, and then
simulates the trees obtained with a single species tree and different rates of
horizontal gene transfer. The estimation of the rate of horizontal gene
transfer is made based on the extent of differences among individual gene
trees in the simulation and in the observed set of loci.
The methods are described in the paper:
Linz, S., A. Radtke, and A. von Haeseler. 2007. A Likelihood framework to
measure horizontal gene transfer. Molecular Biology and Evolution
24: 1312-1319.
HGT is available as C source code. It can be downloaded from
its web site
at http://www.cibiv.at/software/hgt/
Darren P. Martin and Ed Rybicki
of the Microbiology Department
of the University of Cape Town, Cape Town, South Africa
Darrin.Martin (at) uct.ac.za)
have released RDP3
(Recombination Detection Program),
version 3.27, a program that applies a large number of recombination detection
and analysis algorithms. This includes many of the methods used in other
recombination-detection programs. In all it has about 12 different methods.
The software runs under Windows
and combines highly automated screening of large numbers of sequences with a
highly interactive interface for examining the results of the analyses.
It is described in the paper:
Martin, D. P., and E. P. Rybicki. 2000. RDP: detection of recombination
amongst aligned sequences. Bioinformatics 16: 562-563.
It is available as Windows executables. It can be downloaded from
its web site
at http://darwin.uvigo.es/rdp/rdp.html An older
version, RDP2, is also available there, as is an "unstable" early release of
RDP4.
Robert Beiko and Nicholas Hamilton
of the Institute for Molecular Bioscience
at the University of Queensland, Australia
(beiko (at) cs.dal.ca)
have released EEEP
(Efficient Evaluation of Edit Paths),
version 1.0, a program for inference of lateral genetic transfer by
comparison of phylogenetic trees. EEEP performs subtree prune-and-regraft
(SPR) operations on a rooted reference tree to reconcile it with a
user-supplied tree inferred from data. The rooting of the reference tree is
used to constrain the SPR operations that are allowed. The test tree need not
be rooted or binary, and may contain an incomplete subset of the taxa
represented in the reference tree.
EEEP has been successfully compiled under RedHat Linux and AIX, as well as
in Mac OS X and Windows XP. It is described in the paper:
Beiko, R.G., and N. Hamilton. 2006. Phylogenetic identification of lateral
genetic transfer events. BMC Evolutionary Biology 6: 15,
in which it was used to infer LGT events on 16,000 genes.
It is available as C++ source code, Windows executables and Linux executables. It can be downloaded from
its web site
at http://bioinformatics.org.au/eeep
Gary Olsen
of the Department of Microbiology, University
of Illinois, Urbana, Illinois (gary (at) phylo.life.uiuc.edu) has written
dnarates version 1.1.0. It reads a set of DNA sequences
and a tree, and for that tree makes a maximum likelihood estimate of the
rate of evolution at each site. This is done by taking the rate at each
site as a separate parameter and maximizing the likelihood with respect to
all those parameters. The program is available as generic C source code.
It is based in part (with my permission) on code from my PHYLIP program DNAML. dnarates is available from
the IUBIO phylogeny software page at
http://iubio.bio.indiana.edu/soft/molbio/evolve/
Bette Korber
of the Theoretical Division,
Los Alamos National Laboratory , Los Alamos, New Mexico
(btk (at) t10.lanl.gov) and her colleagues have released
RevDNArates which is a version of Gary Olsen's
program dnarates which uses the REV (general
reversible) model of DNA evolution and calculates the maximum likelihood
estimate of rate of change at each site (one parameter per site). They
used it for the
results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta,
A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the
ancestor of the HIV-1 pandemic strains. Science 288:
1789-1796. The program is available as C source code for Unix from
the web site for the programs from that paper at
http://www.santafe.edu/~btk/science-paper/bette.html.
Sonja Meyer and Arndt von Haeseler, then of the Insititut
für Bioinformatik, Heinrich Heine Universität, Düsseldorf,
Germany (von Haeseler is now at the Center for Integrative Bioinformatics
Vienna, and his email address is arndt.von.haeseler (at)
&nbps;univie.ac.at) have released PARAT,
version 0.9.1. This program infers a phylogeny and also site-specific
evolutionary rates (one for each site). It can do so for up to 100 sequences
directly. Above 100 sequences, it samples sets of sequences and estimates
the rates from each such set, and then averages the resulting rates.
It is distributed as open source C source code, which can readily be compiled
and installed. PARAT is decscribed in a paper: Meyer, S. and A. von Haeseler.
2003. Identifying site specific substitution rates. Molecular Biology
and Evolution 20: 182-189. It is available at
its web site
at http://www.cibiv.at/software/parat/
Itay Mayrose
of the Department of Cell Research and Immunology
of the George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
(itaymay (at) post.tau.ac.il )
has written Rate4Site
version 2.01, a program to estimate rates of evolution at different sites in
protein sequences. Rate4Site uses aligned protein sequences, constructs a
tree by a neighbor-joining or uses a user-defined input tree, and then infers
the branch lengths and the rates of evolution at the sites. These are assumed
to be drawn from a Gamma distribution and can be estimated either by
maximizing the likelihood of the tree with respect to each of the rates, or by
using a Bayesian inference with the Gamma distribution as the prior (the
parameters of the Gamma distribution are estimated empirically so that this is
an Emprical Bayes method).
The methods are described in the paper:
Mayrose, I., D. Graur, N. Ben-Tal and T. Pupko. 2004. Comparison of
site-specific rate-inference methods: Bayesian methods are superior.
Molecular Biology and Evoution 21: 1781-1791.
It is available as C++ source code and Windows executables. It can be downloaded from
its web site
at http://www.tau.ac.il/~itaymay/cp/rate4site.html
Itay Mayrose and Tal Pupko
of the Department of Cell Research and Immunology
of Tel Aviv University, Tel Aviv, Israel
(itaymay (at) post.tau.ac.il)
have produced McRate
(Markov Chain monte carlo RATE estimation),
version 1.0, a program to estimate rates of evolution at different sites.
McRate calculates the relative evolutionary rate at each site using a
probabilistic-based evolutionary model. This allows taking into account the
stochastic process underlying sequence evolution within protein families. Most
importantly, McRate uses Bayesian Markov chain Monte Carlo (MCMC) methodology
to integrate over the space of all possible trees. Hence, McRate does not
assume a pre-existing phylogenetic tree under which the sequences relate.
McRate is described as superior to methods that rely on a single tree only.
Its methods and the program are described in the papers:
- Mayrose I, D. Graur, N. Ben-Tal, and T. Pupko. 2004. Comparison of site-specific rate-inference methods for protein sequences: Bayesian methods are superior. Molecular Biology and Evolution 21: 1781-1791.
- Mayrose, I., A. Mitchell, and T. Pupko. 2004. Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account. Journal of Molecular Evolution 60(3): 315-326.
It is available as C++ source code and Windows executables. It can be downloaded from
its web site
at http://www.tau.ac.il/~talp/MCMC/McRate.html
Jianzhi George Zhang, now of the Laboratory of Genomic and Molecular Evolution
in the Department of Ecology and Evolutionary Biology of the University of
Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
and Xun Gu, now at the Department of Genetics, Development, and Cell Biology at
Iowa State University, Ames, Iowa
(xgu (at) iastate.edu)
wrote GZ-Gamma, a program
for estimation of the expected number of substitutions at each amino acid
or nucleotide site and the shape parameter of a Gamma distribution of rates of
evolution at different sites. The program
takes a phylogeny and infers the sequences at interior nodes of the tree using
a Bayesian method, and then uses these to infer changes and make a histogram
of changes among sites, then using that to infer the shape parameter of a
Gamma distribution that fits that histogram. The method and
program was described in a paper:
Gu, X. and J. Zhang. 1997. A simple method for estimating the parameter of
substitution rate variation among sites. Molecular Biology and Evolution
14: 1106-1113. It is available as C source code and as MSDOS
executables from the software web site of Masatoshi Nei's lab in
which the work was done. A zip archive of the files can be downloaded from
the link there for “Gamma”. A documentation file is also available
there from the “readme” link.
Jessica Leigh, Ed Susko, Manuela Bumgartner, and Andrew Roger
of the Department of Biochemistry and Molecular Biology and the Department
of Mathematics and Statistics of Dalhousie University, Halifax,
Nova Scotia, Canada
(jleigh (at) dal.ca)
have written Concaterpillar
version 1.2, a program that carries out a hierarchical likelihood ratio test
for phylogenetic congruence. It tests for two kinds of hypotheses in
supermatrix analysis. The first is the null hypothesis (H0) that the
phylogenies of markers in the supermatrix are congruent. If we cannot reject
congruence for a set of markers, the second hypothesis to test is whether or
not the markers to be combined have significantly different evolutionary
dynamics (branch lengths and rates-across-sites parameters); that is, whether
they should be concatenated or subjected to separate analysis.
The methods are described in the paper:
Leigh, J. W., E. Susko, M. Baumgartner, Roger AJ. 2008. Assessing congruence
in phylogenomic data. Systematic Biology 57: 104-115.
It is available as Python script. It uses the program
RAxML to infer trees, and the SciPy
Python library as well. It can be downloaded from
its web site
at http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#Concaterpillar
Haichun Wang, Matthew Spencer, Ed Susko, and Andrew Roger
of the Department of Mathematics and Statistics and of the Department of Biochemistry and Molecular Biology
of Dalhousie University, Halifax, Nova Scotia, Canada
(hcwang (at) mathstat.da.ca)
have produced PROCOV
(PROtein COVarion analysis),
version 1.3.2, a program for maximum likelihood estimation of phylogeny under
protein covarion models. PROCOV computes the likelihood of a given tree under
the rates-across-sites model or under the covarion-like model of Tuffley and
Steel, the model of Huelsenbeck, and the model of Galtier, as well as for a
general model that combines features of both the Huelsenbeck and Galtier
models. Procov can also optimize tree topologies with subtree
pruning-regrafting to search tree space. Procov is very computationally slow,
so this is most useful for small trees.
It is described in the paper:
Wang, H-C, M. Spencer, E. Susko, and A. J. Roger. 2007. Testing for covarion-like evolution in protein sequences. Molecular Biology and Evolution 24: 294-305.
It is available as C source code. The authors suggest using the BLAS matrix
library when compiling it. It can be downloaded from
its web site
at http://www.mathstat.dal.ca/~hcwang/procov.html
Nick Goldman
(goldman (at) ebi.ac.uk) of the European
Bioinformatics Institute, Hinxton, UK and his group have produced
EDIBLE, a program for Experimental Design and Information By
Likelihood Exploration, version 1.00. It allows the user to read in a phylogeny,
explore the effect on the likelihood and
on the information matrix (the second derivatives of the likelihood with
respect to the parameters) and measures of overall information of changing
branch lengths in the tree and
moving branch lengths around.
It also can carry out simulations, producing multiple data sets on the
tree in question.
The program is described in two papers:
- Goldman, N. 1998. Phylogenetic information and experimental design in molecular systematics. Proceedings of the Royal Society London B 265:
1779-1786
- Massingham, T. and N. Goldman. 2000. EDIBLE: experimental design and information calculations in phylogenetics. Bioinformatics
16: 294-295.
The program is available as C source
code and as Windows and Digital Unix executables. It can be downloaded from
its web site at
http://www.ebi.ac.uk/goldman/info/edible.html at the EBI site.
Bret Larget, of the Departments of Statistics and Botany at the
University of Wisconsin, Madison (larget (at) stat.wisc.edu)
and Donald Simon
of the Department of
Mathematics and Computer Science, Duquesne University, Pittsburgh,
Pennsylvania (simon (at) mathcs.duq.edu) have written
BAMBE (Bayesian Analysis in Molecular Biology and
Evolution) version 4.01a, a program for Bayesian analysis of phylogenies
with DNA sequence data. It uses a prior distribution of trees
and arearrangement mechanism introduced in the paper:
Mau, B., M. A. Newton, and B. Larget. 1997. Bayesian phylogenetic inference
via Markov chain Monte Carlo methods. Molecular Biology and
Evolution 14: 717-724.
The trees and parameter values are sampled by a Metropolis
algorithm Markov Chain Monte Carlo sampling.
The resulting posterior
distribution can be used to characterize the uncertainty about not
only the tree, but the parameters of the substitution model as
well.
The program is in C++ source code for Unix, and is distributed from
his web site
at http://www.stat.wisc.edu/~larget/. A Windows executable
of an earlier version is also available there. The 2.03 and earlier
versions are also available at a web page at Duquesne
University. BAMBE is also
available as a web server
at the Institut Pasteur in Paris.
Mark Pagel and Andrew Meade
of the School of Biological Sciences
of the University of Reading, Reading, U.K.
(m.pagel (at) reading.ac.uk)
have written BayesPhylogenies, version 1.1, a program for
estimating phylogenies by Bayesian inference. BayesPhylogenies uses Bayesian
Markov Chain Monte Carlo (MCMC) or Metropolis-coupled Markov chain Monte Carlo
(MCMCMC) methods. The program allows a range of models of gene sequence
evolution, models for morphological traits, models for rooted trees, gamma and
beta distributed rate-heterogeneity, and implements a mixture model that allows
the user to fit more than one model of sequence evolution without partitioning
the data. It is described in the paper:
Pagel, M. and Meade, A. 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology 53: 571-581.
It is available as Windows executables, Linux executables, and Powermac Mac OS
X executables. It can be downloaded from
its web site
at http://www.evolution.rdg.ac.uk/BayesPhy.html
Nicolas Lartillot
of the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier)
of the Université de Montpellier II, Montpellier, France
(nicolas.lartillot (at) lirmm.fr)
has written PhyloBayes
version 2.1c, a Bayesian phylogeny package for protein sequences using a mixture model. PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction using protein alignments. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT. This is a mixture model especially devised to account for site-specific features of protein evolution. It is particularly well suited for large multigene alignments. PhyloBayes can also
do divergence time estimation with a relaxed molecular clock, posterior predictive analyses, including a compositional homogeneity test,
and data recoding (analogous to R/Y coding, but for amino-acids).
The CAT model is described in the paper: Lartillot, N. and H. Phillipe. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution 21(6): 1095-1109.
PhyloBayes is a package of programs that operate together to do the steps of
the analysis. It is distributed as C++ source code and Linux executables. The
C++ source code can be compiled on Linux, Windows, or Mac OS X systems.
It can be downloaded from
its web site
at http://www.atgc-montpellier.fr/phylobayes/binaries.php
A server is here
John Huelsenbeck
(johnh
(at) berkeley.edu)
of the the Department of Integrative Biology of the University of
California, Berkeley, and Fredrik Ronquist
(Fredrik.Ronquist (at) nrm.se)
of the Naturhistoriska riksmuseet, Stockholm, Sweden
have written
MrBayes, version 3.1.2, a program for Bayesian inference of
phylogenies from nucleic acid sequences, protein sequences, and
morphological characters. It assumes a
prior distribution of tree topologies and uses Markov Chain Monte
Carlo (MCMC) methods to search tree space and infer the posterior
distribution of topologies. It reads sequence data in the NEXUS file
format, and outputs posterior distribution estimates of trees and
parameters. It can also use a hierarchical Bayesian framework to infer
sites that are under natural selection. It allows for rate variation
among sites and a variety of models of sequence evolution.
MrBayes is available as a Macintosh (PowerMac) executable, a Windows
executable, or as source code in C. It allows for multiple-chain
Metropolis-coupled Markov Chain Monte Carlo (MC3) runs for
more extensive search, and can be asked to spread jobs over a cluster of
computers using the MPI message-passing interface.
(Incidentally, since
Bayes was Reverend Bayes, shouldn't it be named RevBayes?).
MrBayes executables, source code, and documentation are available from
the MrBayes web page
at http://mrbayes.net.
Torsten Eriksson
of the Bergius Botanical
Garden, Stockholm, Sweden (torsten (at) bergianska.se)
makes available MrBayes tree scanners. These are
two Perl scripts that scan the output parameter files produced by MrBayes.
One saves the tree corresponding to the best sample. The other saves all
trees that contain a specific node (a specific grouping). They are
distributed together, and available from
his software distribution site
at http://www.bergianska.se/index_forskning_soft.html.
Marc Suchard
of the Department of Biomathematics
of the David Geffen School of Medicine at UCLA, Los Angeles
(msuchard (at) ucla.edu)
has written MrBayesPlugin, a Java plugin module enabling
Geneious to run MrBayes. With it, Geneious v2.5.4 (or above) is enabled perform and analyze simple Bayesian phylogenetic reconstruction using MrBayes.
It is available as Java executables. It can be downloaded from
its web site
at http://www.biomath.ucla.edu/msuchard/software/software.htm
Alexei Drummond, of the Department of
Computer Science of the University of Auckland, New Zealand (alexei (at) cs.auckland.ac.nz)
and Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of
the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and
formerly of the Department of Zoology, University of Oxford,
Oxford, U.K., have developed BEAST (Bayesian Evolutionary
Analysis Sampling Trees), version 1.4.1. This is a general Bayesian
inference program for parameters of evolutionary models when the trees
are coalescent trees. A variety of nucleotide substitution models
including relaxed molecular clocks are allowed, and population models that
include exponential population growth and divergence time between populations
are included. Most of the analyses use Bayesian sampling to infer
parameters by averaging over the posterior on the trees. For the purposes
of this listing, the two relevant features are the ability to output a
sample of the trees, so that the program can be used for Bayesian tree
inference in clocklike models, and the ability to infer the divergence time
between populations. The general approach used by BEAST is described
in the paper: Drummond, A. J., G. K. Nicholls, A. G. Rodrigo, and W. Solomon.
2002. Estimating mutation parameters, population history and genealogy
simultaneously from temporally spaced sequence data. Genetics
161: 1307-1320. BEAST is available as a Java executable which will
run on any system with Java 1.4 or later. There are specific packages
available for Mac OS X and for Windows as well as the general distribution.
These are all distributed from
its web site at http://beast.bio.ed.ac.uk/Main_Page
Alexei Drummond, of the Department of
Computer Science of the University of Auckland, New Zealand
(alexei (at) cs.auckland.ac.nz)
and Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of the
Institute of Evolutionary Biology at the University of Edinburgh, Scotland, U.K.
have released Tracer, version 1.2. This is
a program for analyzing the results of Bayesian sampling runs using either
BEAST or MrBayes. It allows
analysis of the progress of sampling the parameters. For the purposes of
this listing, the relevant feature is an ability to use the trees sampled
by these programs to do a Bayesian skyline plot analysis of birth and death
rates of lineages. Tracer is available as a Java executable from its web site
at http://tree.bio.ed.ac.uk/software/tracer/
with specific packages for Mac OS X and Windows as well.
Johan Nylander
of the Department of Botany of the University of Stockholm, Stockholm, Sweden
(johan.nylander (at) abc[dot]se)
has released burntrees
version 0.1.7, a script for manipulating the output from MCMC programs
(MrBayes, BEAST).
It is a script for manipulating tree (*.t, *.trprobs,
*.con) and parameter (*.p) files from MrBayes (v.3), and
other MCMC programs. The script can extract any contiguous interval of trees,
or make a random selection of a fraction of them. It can also thin the chain
by sampling every nth iteration. Branch lengths can also be removed from
trees when they are sampled. Trees can also be converted from Nexus to Phylip
(Newick) format or to altnexus format (sequence labels instead of numbers). In
a similar fashion, lines can also be extracted from a MrBayes *.p file. The
script comes with a helper script, catmb.pl, that concatenates files from
several runs
It is available as Perl script. It can be downloaded from
its web site
at http://www.abc.se/~nylander
Dan Rabosky
of the Department of Ecology and Evolutionary Biology
of Cornell University, Ithaca, New York
(DLR32 (at) cornell.edu)
has written laser
(Likelihood Analysis of Speciation and Extinction Rates),
version 2.1, an R package for a variety of analyses of changes in speciation and extinction rates. laser implements maximum likelihood methods based on the birth-death process to test whether diversification rates have changed over time and whether rates vary among lineages.
The methods are described in the papers:
- Rabosky, D. L. 2006. Likelihood methods for inferring temporal shifts in diversification
rates. Evolution 60: 1152-1164.
- Rabosky, D. L., S. C. Donnellan, A. L. Talaba, and I. J. Lovette. 2007. Exceptional among-lineage variation in diversification rates during the radiation of Australia's largest
vertebrate clade. Proceedings of the Royal Society of London, Series B 274: 2915-2923.
It is available as an R package.
It is described at its web site at
http://www.eeb.cornell.edu/Rabosky/dan/software.html and is
distributed at
its web page in the CRAN-R archive of R packages at http://cran.r-project.org/web/packages/laser/index.html
Pavel Morozov and Andrey Rzhetsky
of the Department of Biomedical Informatics and the Columbia Genome Center
of Columbia University, New York, New York
(pm259 (at) columbia.edu and andrey.rzhetsky (at) dbmi.columbia.edu)
have released PHYLLAB
version 1.1, A toolbox for sequence manipulation and phylogenetic analysis in
MatLab. PHYLLAB takes as input a set of aligned nucleotide or amino-acid
sequences, and performs phylogeny inference. Beside traditional phylogenetic
methods it uses a Markov chain Monte Carlo method, evaluating the posterior
distribution over tree topologies and a variety of model parameters, including
parameters of substitution-rate variation under a wavelet model. The graphical
interface helps users to manage input data and to visualize the most likely
trees; they can also view substitution-rate plots that show the maximum
posterior density (confidence) intervals. It is written in the MatLab
language, and interested users can extend it easily. The PHYLLAB toolbox is
continually expanding, and the authors expect to offer many more functions and
scripts for different purposes soon.
It is available as a MATLAB package. It can be downloaded from
its web site
at http://amdec-bioinfo.cu-genome.org/html/misc/Pavel/phyllab.html
Peter Foster (p.foster (at) nhm.ac.uk) of the Natural History Museum,
London, England has released p4 version 0.81, a Python
package for maximum likelihood and Bayesian phylogenetic analyses of molecular
sequences. This is not a program with menus and buttons; it is invoked
using the Python language, which the user should know before attempting to
use it. It can do Bayesian inference of phylogenies, as well as computation
of likelihoods of trees. It also has facilties for viewing large trees and for
manipulation of trees. It needs Python 2.3 or better and the Gnu Scientific
Library (GSL) installed on the machine. It is distributed as Python source code
at its web site at
http://www.bmnh.org/web_users/pf/p4.html
Mike Charleston
(mcharles (at) it.usyd.edu.au)
of the Sydney University Biological Informatics and Technology Centre,
Sydney, Australia
has developed Spectrum, a program for finding bipartition spectra
from phylogenetic
molecular and distance data, according to the method of Hendy et al.
(1994) (Hadamard transforms)
for moderately sized data sets (up to 18 taxa). The program also
implements a
branch-and-bound search for the "closest tree" - that is, the tree whose
expected spectrum is closest to the spectrum derived from the observed
data. Mac OS PowerMac, 68k Mac OS, and Windows executables are
available from his software
web site
at http://www.it.usyd.edu.au/~mcharles/.
Ingrid Jakobsen
, Susan Wilson, and Simon Easteal,
of Australian National University,
Canberra, released partimatrix. (Ingrid Jakobsen is
currently at the Department of Mathematics of the University of Queensland, Australia,
i.jakobsen (at) uq.edu.au).
This program
computes a "partition matrix" from aligned DNA sequence data. The method
finds partitions of the sequences into two groups and presents a matrix
which describes the conflict and agreement among these partitions. The
objective is to discover parts of the DNA sequence which imply different
trees. It is described in the paper
by I. B. Jakobsen, S. R. Wilson and S. Easteal. 1997.
The Partition Matrix: Exploring variable phylogenetic signals along nucleotide sequence alignments.
Molecular Biology and Evolution 14: 474-484.
The program is distributed as C source code for Unix systems with X Windows.
It seems not to be available from Dr. Jakobsen, but is `
available from
a site at the Centro Nacional de Cálculo Científico
de la Universidad de Los Andes, Venezuela
at http://www.cecalc.ula.ve/BIOINFO/servicios/herr1/PARTIMATRIX/manual.htm
Carla Cummins and James McInerney
of the Department of Biology
of the National University of Ireland Maynooth
(james.o.mcinerney (at) nuim.ie)
has released TIGER
version 1.02, A program for identifying rapidly-evolving characters in a matrix
of evolutionary characters. TIGER is open source software for identifying
rapidly evolving sites (columns in an alignment, or characters in a
morphological dataset). It can deal with many kinds of data (molecular,
morphological etc.). Sites like these are often removed or reweighted in order
to improve phylogenetic reconstruction, as they might not hold much phylogenetic information and therefore might simply be a source of noise.
It is described in the paper:
Cummins, C. A. and J. O. McInerney. 2011. A method for inferring the rate of
evolution of homologous characters that can potentially improve phylogenetic
inference, resolve deep divergence and correct systematic biases. Systematic
Biology 60 (6): 833-844. doi: 10.1093/sysbio/syr064.
It is available as Python script and Mac OS X universal executables. It can be downloaded from
its web site
at http://bioinf.nuim.ie/tiger
Yasuo Ina
of the National Institute of Agrobiological
Resources, Tsukuba, Japan
developed ODEN version, a package of programs for doing
distance matrix analyses on nucleotide or protein sequences. It is described
in a paper: Ina, Y. 1994. ODEN: a program package for molecular evolutionary
analysis and database search of DNA and amino acid sequences. Computer
Applications in the Biosciences (CABIOS) 10: 11-12.
It is available free
by anonymous ftp from
directory pub/unix/oden on ftp.dna.affrc.go.jp as C source code for Unix systems.
Angela Lüttke and Rainer Fuchs
(then of the European
Molecular Biology Laboratory; Fuchs is currently at
Biogen, Inc., Cambridge, Massachusetts)
wrote MacT, a package of programs for
Mac OS Macintoshes that compute distances and compute
Neighbor-Joining phylogenies for
them. The programs work on 4 through 26 sequences, and source code in
Microsoft QuickBasic is provided as well as compiled executables. The package
is free and is available on the molecular biology software servers.
For example, it is available at the Indiana University
IUBIO server
at http://iubio.bio.indiana.edu/soft/molbio/mac/.
It is described in a paper: Luttke, A. and R. Fuchs. 1992. MacT: Apple Macintosh
programs for constructing phylogenetic trees. Computer Applications in
the Biosciences 8: 591-594.
Nicholas Galtier
of the University of Lyon (galtier (at) biomserv.univ-lyon1.fr)
has written Phylo_win, a "graphic interface" for molecular
phylogenetic inference. It performs neighbor-joining, parsimony and
maximum likelihood methods and can bootstrap with any of them. Many distances
can be used including Jukes and Cantor, Kimura, Tajima and Nei, Galtier and Gouy
(1995), LogDet for nucleotidic sequences, Poisson correction for protein
sequences, Ka and Ks for codon sequences. Species and sites to include in the
analysis are selected by mouse. Reconstructed trees can be drawn, edited,
printed, stored, evaluated according to numerous criteria.
Taxonomic species groups and sets of conserved regions can be defined by
mouse in both tools and stored into sequence files, thus avoiding multiple
data files. It is entirely mouse-driven. Most usual sequence file formats are
read: CLUSTAL, FASTA, PHYLIP, MASE. It runs under X windows on many Unix
workstations.
It is described in the paper:
Galtier, N., M. Gouy, and C. Gautier. 1996. SeaView and Phylo_win, two graphic
tools for sequence alignment and molecular phylogeny. Computer Applications
in the Biosciences 12: 543-548.
Phylo_win is now considered by Galtier to have been superseded by his
program SeaView. Phylo_win is distributed as C source code (to compile it one
needs the NCBI Vibrant tool kit). It is also available
as executables for SunOS, Solaris, SGI Unix,
IBM RISC Unix, Linux, HP/UX, and DEC Alpha (Digital Unix). It can be
fetched from
its web page at http://pbil.univ-lyon1.fr/software/phylowin_legacy.html.
It can also be obtained by anonymous ftp from
biom3.univ-lyon1.fr in directory pub/mol_phylogeny.
A Digital OpenVMS executable is
also available
as http://www.tmk.com/ftp/vms-freeware/mathog/.
Heiko Schmidt, of the
Center for Integrative Bioinformatics of the University of Vienna
(heiko.schmidt (at) univie.ac.at),
Korbinian Strimmer
, now at
the Department of Statistics of the University of Münich, Germany
(korbinian.strimmer (at) lmu.de),
and Arndt von Haeseler, now at the Center for Integrative Bioinformatics
Vienna (arndt.von.haeseler (at)
&nbps;univie.ac.at)
have developed TREE-PUZZLE
version 5.2, (formerly called PUZZLE) a program for maximum likelihood
analysis for nucleotide and amino
acid alignments.
TREE-PUZZLE infers phylogenies by "quartet puzzling",
a method that applies maximum likelihood tree reconstruction to all possible
quartets of taxa and subsequently tries to combine most of the four-taxa
maximum likelihood trees to construct an overall maximum likelihood tree.
Usually there are several possible solutions. A consensus tree generated
from the quartet puzzling trees shows nodes that are well supported. More
details about the algorithm and on the phylogenetic accuracy can be found
in the papers:
- Strimmer, K. and A. von Haeseler. 1996. Quartet puzzling:
A quartet maximum likelihood method for reconstructing tree topologies.
Molecular Biology and Evolution
13: 964-969.
- Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: A simple method
to visualize phylogenetic content of a sequence alignment. Proceedings
of the National Academy of Sciences (USA) 94: 6815-6819.
- Schmidt, H.A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502-504.
TREE-PUZZLE supports all popular
models of sequence evolution of nucleotides and proteins, and can take
rate heterogeneity among sites into account. It computes pairwise maximum
likelihood distances for many different models of sequence evolution (TN,
HKY, F84, SH, Dayhoff, JTT, mtREV24, BLOSUM62, WAG, and VT),
and estimates parameters
of the models. It can estimate maximum-likelihood branch-lengths for user-specified
trees and perform likelihood ratio tests of clockness as well as Kishino-Hasegawa-Templeton
tests. The program is written in ANSI C and is compatible with PHYLIP files.
The current version also has features for parallel computation using the
MPI message-passing interface, if this is available.
Precompiled executables are distributed for Mac OS,
Windows, and Linux. For
Unix and VMS systems files for automated compilation are provided.
A version capable of parallel execution is also available.
TREE-PUZZLE is available from the TREE-PUZZLE
web page at http://www.tree-puzzle.de. A number of places that
mirrors of the distribution, or older versions, are available are listed there.
Its
online manual can be downloaded at http://www.tree-puzzle.de/manual.html.
Mike Holder
, formerly of
the High Performance Computing Center of the University of Houston
and Andrew Roger (aroger (at) is.dal.ca) of the Department of
Biochemistry and Molecular Biology of Dalhousie University, Halifax, Canada
have produced a shell script program for
Unix systems, puzzleboot, version 1.03, that allows the
analysis of multiple bootstrapped data sets with
TREE-PUZZLE. It is
designed for use with the distance matrix option of TREE-PUZZLE, to make use of
the distance calculation methods.
It is available from
the Roger lab software page at
http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#puzzleboot
Daniel Huson (huson (at) informatik.uni-tuebingen.de)
of the ZBIT Center for Bioinformatics at the University of Tübingen,
Germany and David Bryant (Bryant (at) math.auckland.ac.nz) of the University of Auckland, New Zealand,
distribute a program SplitsTree for
analysis of conflicts among splits implied by different quartets or
different characters.
It provides a number of methods for computing split networks from sequences
(e.g. median networks), distances (e.g. split decomposition or neighbor-net)
and trees (consensus networks and super-networks). Additionally, it contains
simple combinatorial methods for computing hybridization networks and
recombination networks. It can process sequence or
restriction site data, and can do bootstrapping.
It is discussed in the papers:
- Huson, D. H. 1998. SplitsTree: analyzing
and visualizing evolutionary data. Bioinformatics 14: 68-73.
- Huson, D. H. and Bryant, D. 2006. Application of phylogenetic networks in
evolutionary studies. Molecular Biology and Evolution 23(2): 254-267.
A number of versions of Splitstree are available at
the Splitstree web site
at http://www.splitstree.org.
These include
- SplitsTree4, a Java version which can run under Linux, Windows,
and Mac OS X.
- SplitsTree 3.2, which is available as a Windows executable, a Linux, and a
Solaris executable, and also a Mac OS X version by Rod Page
- SplitsTree 3.1, also as a Windows and a Linux version
- SplitsTree 2.4, for Mac OS.
Igor Kuznetsov and Pavel Morozov
, then of
the Institute of Cytology and Genetics, Novosibirsk, Russia
produced GEOMETRY,
a package for nucleotide sequence analysis using the method of
statistical geometry in sequence space.
Kuznetsov
(ikuznetsov (at) albany.edu)
is currently at the Department of Epidemiology and Biostatistics
at the State University of New York in Albany, Morozov
(pm259 (at) columbia.edu)
is currently at the
Irving Cancer Research Center at Columbia University.
The method is described in this paper:
Eigen, M., R. Winkler-Oswatitsch, and
A. Dress. 1988. Statistical geometry in sequence space: A method of quantitative
comparative sequence analysis, Proc. Natl. Acad. Sci. USA 85: 5913-5917. The program is described in the article:
Kuznetsov, I. and P. Morozov. 1996. GEOMETRY: a software package for
nucleotide sequence analysis using statistical geometry in sequence space.
Computer Applications in the Biosciences (CABIOS) 12: 297-301.
The package uses the same data formats for sequence and tree input as
the ones used in the VOSTORG package.
GEOMETRY is available as a DOS executable.
It is available for downloading by ftp
from the
EMBL file server ftp.ebi.ac.uk in directory
pub/software/dos as file geom.zip.
Vincent Berry
of the LIRMM, Université de
Montpellier, France (vberry (at) lirmm.fr) has released
PhyloQuart version 1.4, a package of programs inferring
phylogenies from quartets. It is able to use either nucleotide sequences or
distances. It implements the Q* method of tree reconstruction, which is
inspired by the work of Bandelt and Dress, and is described in the
paper: Berry, V. and O. Gascuel. 2000. Inferring evolutionary trees with
strong combinatorial evidence. Theoretical Computer Science
240: 271-298.
PhyloQuart is available as C source code which can be compiled on
Unix systems, from
its web site at
http://www.lirmm.fr/~vberry/PHYLOQUART/phyloquart.html.
Le Sy Vinh, now of the
Computer Science Department, University of Engineering and Technology, Vietnam
National University Hanoi
(vinhbio (at) gmail.com), and Arndt von Haeseler, now of the Center for Integrative Bioinformatics
Vienna (arndt.von.haeseler (at)
univie.ac.at)
have released IQPNNI version 3.3.2
(Important Quartet Puzzling and NNI Operation). This program uses selected
quartets called Important Quartets of species to build a preliminary
tree, rearrange it using the maximum likelihood criterion
by nearest-neighbor interchanges, and then use further
examination of quartets to remove and reposition some of the species.
It is described in some papers:
- Vinh, L. S. and A. von Haeseler. 2004.
IQPNNI: Moving fast through tree space and stopping in time. Molecular
Biology and Evolution 21: 1565-1571.
- Bui Quang Minh,, B. Q. L. S. Vinh, A von Haeseler and H. A. Schmidt. 2005.
pIQPNNI: Parallel reconstruction of large maximum likelihood phylogenies.
Bioinformatics 21(19): 3794-3796.
- Minh, B. Q., L. S. Vinh, H. A. Schmidt, and A. von Haeseler. 2006.
Large maximum likelihood trees. Proceedings of the NIC Symposium 2006
pp. 357-365, Forschungszentrum Jülich, Germany.
It is available as
C source code and Windows, Linux, and Mac OS X
binary executables (including a version that works with MPI parallel
execution) and source code from
its web site
at http://www.cibiv.at/software/iqpnni/
Stephen J. Willson
(swillson (at) iastate.edu)
of the Department of Mathematics, Iowa State University, has produced
a package of programs to infer phylogenies from quartets of species.
They infer phylogenies of individual quartets by parsimony, and in
combining them use information on how strongly the phylogeny for that
quartet is preferred over its alternatives, or by measures of
how well the group fits into a given placement on a tree, as judged by
quartets. The methods are described in two papers:
- Willson, S. J. 1998. Measuring inconsistency in phylogenetic trees,
Journal of Theoretical Biology 190: 15-36
- Willson, S. J. 1998. Building phylogenetic trees from quartets by
using local inconsistency measures . Molecular Biology and Evolution
16: 685-693.
The programs are in C and are described as having successfully been
compiled on Mac OS systems using the Codewarrior C compiler. Mac OS
executables are also provided. The programs are available at
Willson's software web site at
http://www.public.iastate.edu/~swillson/software.html.
James Lake
of the Department of Molecular, Cell and
Developmental Biology of the University of California, Los Angeles
(lake (at) mbi.ucla.edu) has released Gambit, which
implements a method called Boostrapper's Gambit. The method involves
bootstrap sampling sequences, computing trees for quartets of species, and
assembling larger trees out of quartets that have significant boostrap
support. One of the methods available to estimate trees from
the quartets is paralinear (LogDet) distances. Other distance methods and
parsimony are also available. The Bootstrapper's Gambit method is
described in a paper: Lake, J. A. 1995. Calculating the probability of
multitaxation evolutionary trees: Bootstrappers gambit. Proceedings of
the National Academy of Sciences, USA 92: 9662-9666.
The program is available as a DOS executable,
free as a beta release to noncommercial users on a trial basis until January 1,
2003. (It is unclear from the web site whether a free version is to be
available
to noncommercial users after that point -- a previous deadline was
extended). Commercial users are asked to pay $50 on a shareware basis.
The program is available at
its web site at
http://genomics.ucla.edu/gambit/.
Arne Röhl, Peter Forster, and Hans-Jürgen
Bandelt
(Forster has more recently been Senior Lecturer Forensic Science
in the Faculty of Science and Technology of Anglia Ruskin University,
Cambridge, U.K.),
e-mail address pf223 (at) cam.ac.uk, and Bandelt is at the
Fachbereich Mathematik, University of Hamburg, Bundesstrasse 55,
20146 Hamburg, Germany, e-mail address bandelt (at) math.uni-hamburg.de)
have written Network version 4.516, a program to infer
networks (which have more connections than trees) from non-recombining
DNA, STR, amino acid, and RFLP data.
The networks are either reduced median networks or
median-joining networks, method which are described
in the papers:
- Bandelt H.-J., P. Forster, B. C. Sykes, and M. B. Richards.
1995. Mitochondrial portraits of human populations using median networks.
Genetics 141: 743-53.
- Bandelt, H-J., P. Forster, and A. Röhl. 1999. Median-joining
networks for inferring intraspecific phylogenies. Molecular Biology
and Evolution 16: 37-48.
The program is available for free
as a Windows executable from Fluxus Engineering at
its web site at
http://www.fluxus-engineering.com/sharenet.htm.
Mike Hendy
, Katharina T. Huber, Michael Langton, Vincent Moulton, and
David Penny have written Spectronet version 1.27, a
program that computes a collection of weighted splits or partitions
and allows the user to interactively analyze the results with a series of
tools. Hendy and Penny are at Massey University, New Zealand (m.hendy (at)
massey.ac.nz and d.penny (at) massey.ac.nz), Huber and Moulton
are at the School of Computational Science of the University of East Anglia,
U.K. (Katharina.Huber (at) cmp.uea.ac.uk and Vincent.Moulton
(at) cmp.uea.ac.uk). Spectronet can read molecular sequence or discrete
character data, compute splits by Hadamard conjugation or directly, compute
and display compatibility matrices of characters, make reduced median networks,
and plot networks by making a Lentoplot.
Spectronet is described in a paper: Huber, K. T., M. Langton, D. Penny,
V. Moulton and M. Hendy. 2002. Spectronet: A package for computing spectra
and median networks. Applied Bioinformatics 1: 159-161.
It is available as a Windows executable from
its web site
at http://awcmee.massey.ac.nz/spectronet/index.html.
Steven Kelk, Leo van Iersel, Judith Keijsper, and Leen Stougie
of the Centrum voor Wiskunde en Informatica (CWI) and Technische Universiteit
Eindhoven (TU/e), Netherlands
(S.M.Kelk (at) cwi.nl)
have produced LEVEL2
version 0.91, which constructs level-2 phylogenetic networks from dense sets of
rooted triplets.
This program takes as input a dense set of rooted triplets and attempts
to construct a level-2 phylogenetic network from them (or level-1, or
level-0, if level-2 is not necessary). Triplets are the rooted analogue of
quartets, and a dense set of triplets is one where for every subset of three
taxa there is at least one triplet. A level-k phylogenetic network is a
rooted phylogenetic network where every biconnected component in the
underlying, undirected graph contains at most k recombination vertices.
The program produces an image of the resulting network, if it is found.
It is described in the paper:
van Iersel, L., J. Keijsper, S. Kelk, and L. Stougie. 2007. Constructing
level-2 phylogenetic networks from triplets. arXiv:0707.2890v1 [q-bio.PE].
It is available as Java source code, and also requires that the DOT graph
description package be installed. It can be downloaded from
its web site at Sourceforge
at http://sourceforge.net/projects/level2/ and a general
web page about it is at
its web site
at http://homepages.cwi.nl/~kelk/level2triplets.html
Luay Nakhleh, Derek Ruths, and Cuong Than
of the Department of Computer Science
of the Rice University, Houston, Texas
(nakhleh (at) cs.rice.edu)
have released PhyloNet
(Phylogenetic Network analysis ),
version 2.3, a phylogeny package with tools for reconstructing and analyzing
phylogenetic networks. It has programs for inferring horizontal gene transfer
events, by estimating the SPR distance between two trees (along with a
bootstrap-based measure of support), and interspecific recombination, by using
maximum parsimony. It also has tools for enumerating the trees and clusters of
taxa within a given network, comparing the topologies of networks, estimating
the strain tree of bacterial genomes from multi-locus data, and enumerating
valid coalescent histories of a gene tree within the branches of a species tree.
It is described in the paper:
Than, C., D. Ruths, and L. Nakhleh, 2008. PhyloNet: A Software Package for
Analyzing and Reconstructing Reticulate Evolutionary Relationships. Under Review.
It is available as Java executables. It can be downloaded from
its web site
at http://bioinfo.cs.rice.edu/phylonet/index.html
Guohua Jin and Luay Nakhleh
of the Department of Computer Science
of Rice University, Houston, Texas
(jin (at) cs.rice.edu and nakhleh (at) cs.rice.edu)
have produced NEPAL
(NEtwork Parsimony And Likelihood),
version 1.1, a suite of tools for reconstructing and analyzing reticulate
(non-treelike) evolutionary relationships using the maximum parsimony and
maximum likelihood criteria. It is used to identify horizontal gene (or
partial gene) transfers between species. NEPAL reads in a species tree in
Newick format or a network from NEPAL or RIATA-HGT output, and sequence data.
It returns the maximum parsimony or maximum likelihood score of the input or
generated trees or networks. The user can control the number of additional
edges added to the input tree. The methods are described in the papers:
- Jin, G., L. Nakhleh, S. Snir, and T. Tuller. 2006. Maximum likelihood of phylogenetic
networks. Bioinformatics 22(21): 2604-2611.
- Jin, G., L. Nakhleh, S. Snir, and T. Tuller. 2007. Efficient parsimony-based methods for phylogenetic network reconstruction. Bioinformatics 23: e123-e128.
It is available as Mac OS X and Linux executables. It can be downloaded from
its web site
at http://bioinfo.cs.rice.edu/nepal/index.html
Rasmus Nielsen, of the Centre for Bioinformatics of the University of
Copenhagen, Denmark and of the Department of Integrative Biology of the
University of California, Berkeley (rasmus
(at) binf.ku.dk),
Jody Hey, of the Department of Genetics, Rutgers University, Picataway, New
Jersey
(hey (at) biology.rutgers.edu)
and Sang Chul Joi
have released IMa2, a program that estimates divergence
times between several populations along with the population sizes before and
after divergence, as well as the migration rate between the populations
after divergence. The program uses Markov chain Monte Carlo (MCMC)
coalescent methods. It is described in two papers:
- Hey, J., and R. Nielsen. 2004.
Multilocus methods for estimating population sizes, migration rates and
divergence time, with applications to the divergence of Drosophila
pseudoobscura and D. persimilis. Genetics 167:
747-760.
- Hey J, and R. Nielsen, 2007,
Integration within the Felsenstein equation for improved Markov chain Monte
Carlo methods in population genetics.
Proceedings of the National Academy of Sciences USA 104(8): 2785-2790.
It allows Bayesian inference from a number of loci, each assumed to
be without intra-locus recombination. It can use a DNA mutation model,
a stepwise microsatellite mutation model, or an infinite-sites model.
The program estimates the population sizes, the times of divergence,
each relative to the mutation rate. It can
also estimate growth rates of population sizes after speciation.
IMa2 is distributed as a Windows executable with generic
C source code that can easily be compiled on Unix systems including Mac OS X.
It is available from
its web page
at the Hey lab web site,
http://genfaculty.rutgers.edu/hey/software#IMa2. Earlier
versions, IMa and IM, are also available there.
Liang Liu
of the Department of Agriculture and Natural Resources
of Delaware State University
(lliu (at) desu.edu),
together with Dennis Pearl of the Department of Statistics
at Ohio State University and Scott
Edwards of the Department of Organismic and Evolutionary Biology
at Harvard University,
has released BEST
(Bayesian Estimation of Species Trees),
version 2.2, a phylogeny package for estimating species trees from multilocus
DNA sequence data. BEST finds the joint posterior distribution of coalescent
gene trees and the species tree for multi-locus data under a hierarchical
Bayesian model. Proposal gene trees are made using a gene tree MCMC procedure
chosen by the user in MrBayes
This vector of gene trees is then paired with a species tree chosen under the
constraint that the gene trees be consistent with the species tree. An
MCMC importance samplier is then used to sample the species trees.
It is described in the papers:
- Liu, L. and D. K. Pearl. 2007. Species trees from gene trees:
reconstructing Bayesian posterior distributions of a species phylogeny using
estimated gene tree distributions. Systematic Biology 56:
504-514.
- Edwards, S. V., L. Liu, and D. K. Pearl. 2007. High resolution
trees without concatenation. Proceedings of the National Academy of
Sciences 104: 5936-5941.
- Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008.
Estimating species trees using multiple-allele DNA sequence data.
Evolution 62: 2080-2091.
It is available as C source code, Windows executables and Mac OS X universal executables. It can be downloaded from
its web site
at http://www.stat.osu.edu/~dkp/BEST/
Ruchi Chaudhary, Mukul S. Bansal, André Wehe, David
Fernández-Baca, and Oliver Eulenstein
of the Department of Computer
Science at Iowa State University, Ames, IA
(oeulenst (at) cs.iastate.edu)
have released iGTP version 1.0, a software package for
large-scale phylogenetic analyses using gene tree parsimony. iGTP implements
algorithms for inferring species supertrees that best reconcile the input gene
trees under the gene-duplication, gene-duplication and loss, and deep
coalescence cost models. iGTP extends the functionality and performance of
existing gene tree parsimony software and features building effective initial
trees using greedy stepwise leaf addition and the ability to have unrooted
gene trees in the input. Moreover, iGTP provides a user-friendly graphical
interface with integrated tree visualization software to facilitate analysis
of the results. It is described in the paper: Chaudhary, R., M. S. Bansal, A.
Wehe, D. Fernández-Baca and O. Eulenstein. 2010. iGTP: A software
package for large-scale gene tree parsimony analysis.BMC Bioinformatics
11: 574.
It is available as Windows executables, Linux executables and Mac OS X
universal executables. The authors can be contacted for the source code.
The executables can be downloaded from
its web site
at http://genome.cs.iastate.edu/CBL/iGTP/
Andrew Roger
, of the
Department of Biochemistry and Molecular Biology, Dalhousie University,
Halifax, Nova Scotia, Canada
(aroger (at) is.dal.ca)
has written ELW (Expected Likelihood Weights),
two PERL scripts -- elw.pl and calcwts.pl -- that,
together with PAUP* and the PHYLIP program Seqboot can be used to
implement the "expected likelihood weights" method of Strimmer and Rambaut,
described in the paper by Strimmer, K. and A. Rambaut. 2002. Inferring
confidence sets of possibly misspecified gene trees. Proceedings of the
Royal Society of London Series B 269: 137-142. It
calculates a confidence interval for the maximum likelihood tree using the
variation of the likelihoods among bootstrap estimates of the tree.
ELW can be downloaded from its entry on
Roger's software web page
at http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#elw
Naoko Takezaki
of the Life Science Research Center
of Kagawa University, Japan
(takezaki (at) med.kagawa-u.ac.jp)
wrote Lintre (Phylogenetic tests of
the molecular clock and linearized tree), a package of programs for
Sun workstations. The programs include:
- njboot -- construct a neighbor-joining (NJ) tree
- postree -- create a postscript file of trees
- tpcv -- conduct the two-cluster test
- branch -- conduct the branch length test
- branbst -- conduct the branch length test by bootstrap
The two-cluster test is essentially the relative rate test for
many sequences. The branch length test is the test of rate difference for each
sequence under the tree root from the average rate of all sequences.
The tests are described in: Takezaki, N., A. Rzhetsky, and M. Nei. 1995.
Phylogenetic test of the molecular clock and linearized trees. Molecular
Biology and Evolution 12: 823-33. The programs are
available as C source code and also as DOS executables. They are distributed
(as a compressed tar archive of the source code with examples and
documentation, and also as a self-extracting archive of sources and DOS
executable)
They are available at Naoko Takezaki's software web site at
http://www.kms.ac.jp/~genomelb/takezaki.eng.html#software
and also at
the Nei lab
software web site at
http://www.bio.psu.edu/people/faculty/nei/software.htm. They
are also available at
by ftp from the IUBio
archive at http://iubio.bio.indiana.edu/soft/molbio/evolve/lintr/.
Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of
the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and
formerly of the Department of Zoology, University of Oxford,
has written TipDate version 1.2.
TipDate is an application for estimating the rate molecular evolution
(and hence a time-scale) for a
phylogeny consisting of dated tips. These will most frequently be from viruses
or other
fast-evolving pathogens that have been isolated over a range of dates. The
program can also return
the likelihood for the simple molecular clock model (i.e., assuming that all
sequences are
contemporary), for a model in which rates of change at different times are
drawn from a distribution, or the non-clock model. These are useful for
likelihood ratio tests of the fit of the model to the data.
TipDate is described in a paper: Rambaut, A. 2000. Estimating the rate of
molecular evolution: incorporating non-contemporaneous sequences into maximum
likelihood phylogenies. Bioinformatics 16: 395-399.
TipDate is available as Mac OS executables and as source code for
Linux or Unix from
the IUBIO software site
at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/evolve/TipDate/.
It is also available in a web-based server version from the
Pasteur Institute server.
Thomas Wilcox,
formerly
of the Center for Computational Biology and Informatics
of the University of Texas, and more recently of Long Key Tropical Research Center, Florida
(tpwilcox (at) lktrc.org)
has produced Cadence
version 1.0.1, a program for Bayesian relative rate tests.
It is described in the paper:
Wilcox, T. P., F. J.García de Leon, D. A. Hendrickson, and D. M. Hillis. 2004. Convergence among cave catfishes: Long-branch attraction and a Bayesian relative rates test. Molecular Phylogenetics and Evolution 31: 1101-1113.
It is available as Powermac Mac OS X executables. It can be downloaded from
its web site at the University of Texas
at http://www.zo.utexas.edu/faculty/antisense/DownloadComputerPrograms.html
Jotun Hein,
of the Department of Statistics, University of Oxford
(hein (at) stats.ox.ac.uk) produced TreeAlign, a
multiple sequence alignment
program that builds trees as it aligns DNA or protein sequences. It uses a
combination of distance matrix and approximate parsimony methods, inspired
by the 1973 approach of David Sankoff. It is described in two papers:
- Hein, J. J. 1990. A unified approach to phylogenies and alignments.
Methods in Enzymology 183: 625-644.
- Hein, J. J. 1994. TreeAlign. pp. 349-364 in Computer Analysis of
Sequence Data. edited by A. M. Grffin and H. G. Griffin. Humana Press,
Tolowa, New Jersey.
TreeAlign is available as C source code. It uses enough memory to run
that it will not be practical on older desktop systems.
It is available
by anonymous ftp at
the European Bioinformatics Institute molecular biology software
distribution site ftp.ebi.ac.uk
in directories pub/software/unix and pub/software/vms.
A widely-used multisequence alignment program
that estimates trees as it
aligns multiple sequences is ClustalW. Currently it is
in version 2.0.12. It is the latest incarnation
of the Clustal family of tree-based alignment programs.
Clustal was originally written by Des Higgins (now at the Conway Institute,
University College Dublin, Ireland)
(des.higgins (at) ucd.ie), and later versions were developed by
Julie Thompson (now at the Institut de Génétique, et de Biologie
Moléculaire et Cellulaire at the Université de Strasbourg, France,
julie (at) igbmc.u-strasbg.fr), Toby Gibson,
(Toby.Gibson (at) embl.de), and
François Jeanmougin (jeanmougin
(at) igbmc.u-strasbg.fr) and many
others.
Recent features include the
ability to detect read different input formats (NBRF/PIR, Fasta,
EMBL/Swissprot), align old alignments, produce phylogenetic trees after
alignment (Neighbor Joining trees with a bootstrap option), write different
alignment formats (Clustal, NBRF/PIR, GCG, PHYLIP) and the presence of
a full command line interface. Clustal exists in two major variants:
- ClustalW which has a character-mode interface, in which the user
types responses to choose options from a menu.
- ClustalX which has a graphical user interface.
It is described in a number of papers:
- Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan,
H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T.
J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version 2.0.
Bioinformatics 23: 2947-2948.
- Jeanmougin, F., J. D. Thompson, T. J. Gibson, M. Gouy, and D. G. Higgins.
1998. Multiple sequence alignment with Clustal X. Trends in Biochememical
Sciences 23: 403-405.
- Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin,
and D. G. Higgins. 1997. The ClustalX windows interface:
flexible strategies for multiple sequence alignment aided by quality analysis
tools. Nucleic Acids Research 24: 4876-4882.
- Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1996. Using CLUSTAL
for multiple sequence alignments. Methods in Enzymology 266: 383-402.
- Thompson, J.D., D. G. Higgins and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic Acids Research 22: 4673-4680.
ClustalW is available from its main distribution page
at http://www.clustal.org. The downloads of the current version
there are also
available by ftp from the
European Bioinformatics Institute ftp server.
For the older ClustalV, there exists a Macintosh Hypercard stack, ClustToTree, that can
convert its tree files to Newick Standard format (used by many other programs).
ClustToTree is
made available by Kai-Uwe
Fröhlich at the University of Graz, Austria at
http://aaa-proteins.uni-graz.at/Archiv/ClustToTreecomp.html.
ClustalW is made available on web servers by the Genebee web server at the Belozersky Institute
in Moscow, and at the European Bioinformatics
Institute.
Cédric Notredame
of the Comparative Bioinformatics Group of the Center for Genomic Regulation
(CRG), Barcelona, Spain
(cedric.notredame (at) europe.com), Olivier Poirot,
Fabrice Armougom, and Sebastien Moretti of the Centre National de la Recherche
Scientifique Marseille-Nice Génopole, France have produced
T-Coffee (Tree-based Consistency Objective Function For
alignmEnt Evaluation), version 8.93. This is a multiple sequence alignment
program that aims to improve on ClustalW. It is of the
same general approach as ClustalW, a "progressive alignment" method, but it
avoids some of the problems with the "greedy" nature of the ClustalW
algorithm by taking into account more information about how the sequences
all align with each other. T-Coffee is described in the paper: Notredame, C.,
D. Higgins, and J. Heringa. 2000. T-Coffee: A novel method for multiple
sequence alignments. Journal of Molecular Biology 302: 205-217.
From the point of view of this listing, the relevant features of T-Coffee are
that it makes a "guide tree" and can write that tree out. It also can
read in a guide tree supplied by the user. Versions from 2.00 on can align
both sequences and structures. T-Coffee is available as Unix
source code which can easily be compiled, and as Linux, Mac OS X and Windows
binaries. It is available from
its web site
at http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html
Ward Wheeler
of the Division of Invertebrate Zoology, American Museum of Natural History, New York (wheeler (at) amnh.org) and
David Gladstein (gladstein (at) gladstein.org)
have written MALIGN, version 2.7, a parsimony-based alignment program for molecular sequences. It implements the original
suggestion by Sankoff, Morel, and Cedergren (1973) that alignment and
phylogenies could be done at the same time by finding that tree that minizes
the total alignment score along the tree. Jotun Hein's program TreeAlign
(mentioned above) is another, more approximate but possibly faster, attempt to
implement the Sankoff-Morel-Cedergren suggestion. MALIGN is one of the only
programs to calculate this optimality criterion exactly (Wheeler and
Gladstein's other program POY is the other).
MALIGN is described in a paper: Wheeler, W. C. and D. S. Gladstein. 1994.
MALIGN - A multiple sequence alignment program. Journal of Heredity
85: 417-418. MALIGN is available
from its download web site
at the Program in Scientific Computation of the American Museum of Natural
History at http://research.amnh.org/scicomp/projects/malign.php.
It is available as C source code and as binaries for
Linux, Windows, Sun Solaris, SGI, and HPUX. The
C source code is distributed in two forms, the ordinary one and a special
version for parallel computation.
MiraiBio
, a Hitachi Software company
DNASIS, a general-purpose DNA and protein sequence analysis
system, produced by
Molecular Biology Insights, Inc. of Cascade, Colorado (but sold through
Hitachi).
It has many functions including primer design, plasmid maps, contig assembly,
alignment, database searching, and many kinds of protein plots. For our
purposes what is relevant is the ability to do multiple sequence alignment
by the Higgins-Sharp method of progressive sequence alignment (the one used
in ClustalV), with one of the results being a UPGMA tree based on pairwise
sequence alignment scores. DNASIS is available from MiraiBio
as version 3.0 (called DNASIS MAX) Windows executables, including
a demo version at its web site at http://www.miraibio.com/dnasis-max/dnasis-max-overview.html.
Prices are not stated there -- there is Order form that can be sent to
them by email. It was formerly also available from MBI, and at that time
a Windows version cost $1,895 and a Mac OS X version cost $2,995 for a 1-10
user network license.
Karl Nicholas
(karlnicholas (at) hotmail.com)
with help from Hugh Nicholas (nicholas (at) psc.edu)
of the National Resource for Biomedical Supercomputing (NRBSC; www.nrbsc.org)
at the Pittsburgh Supercomputing Center has produced GeneDoc,
version 2.6.0.2, a program for the shading and
editing of multiple sequence alignments. Its reads .MSF files and Fasta Files.
The alignment can be edited by changing the position of residues in the
sequences. GeneDoc includes scoring functions to assist in determining
whether your aligment changes are improving the score. Support for obtaining
a score via sum-of-pairs or by a phylogenetic tree is included. Phylogenetic
trees can be built with either the GUI interface or imported NEXUS or PHYLIP
format tree descriptions. The program runs on Windows
and both 16-bit and 32-bit executables are distributed.
The source code is also available there.
It can be downloaded from its Web site at http://www.nrbsc.org/gfx/genedoc/gddl.htm
A Windows NT version for Digital Alpha processors was formerly available from
Russell Malmberg at the Botany Department of the
University of Georgia but is not currently in distribution.
Ward Wheeler
of the Division of Invertebrate Zoology, American Museum of Natural History, New York (wheeler (at) amnh.org),
David Gladstein (gladstein (at) gladstein.org)
and Jan De Laet of the Royal Belgian Institute of Natural Sciences,
Brussels (jdelaet (at) natuurwetenschappen.be) have written
POY version 4.1.2, a program that approximately implements
David Sankoff's method of
searching for the tree that minimizes a parsimony criterion that includes
penalties for gaps, accomplishing both searching for phylogenies and
alignments. POY has algorithmic improvements by Wheeler and Gladstein that
speed up the algorithm. (Their program MALIGN is the
only program carrying out the full Sankoff proposal).
POY implements two approximate methods, Fixed States Optimization and
Direct Optimization. The methods used are described in three papers:
- Varón, A., L. S. Vinh, and W. C. Wheeler. 2010. POY version 4: phylogenetic
analysis using dynamic homologies. Cladistics 26: 72-85.
- Wheeler, W. C. 1999. Fixed character States and the optimization of molecular sequence data. Cladistics 15: 379-385.
- Wheeler, W. 1996. Optimization alignment: the end of multiple sequence alignment in
phylogenetics? Cladistics 12:1-9.
POY is available in C source or in executables for
Linux, Mac OS X, and Windows. It is distributed
from its download web site
at the Program in Scientific Computation of the American Museum of Natural
History at http://research.amnh.org/scicomp/projects/poy.php.
Russell Doolittle (rdoolittle (at) ucsd.edu) and Dafei
Feng
, of the Section of Molecular Biology of the Division of Biological
Sciences of the University
of California at San Diego, released ALIGN in 1990. A
version for Macintoshes was coded by Peter Markeiwicz. ALIGN
implements the "progressive alignment" strategy described in their paper:
Feng, D.-F. and R. F. Doolittle. 1987. Progressive sequence aligment as a
prerequisite to correct phylogenetic trees. Journal of Molecular
Evolution 25: 351-360. This is also the basis for the
Clustal family of programs as well as the
(formerly distributed)
Pileup program in the GCG package.
The ALIGN program can align as well as print out a tree (which does not
have branch lengths). It uses Doolittle's own formats, and so three other
programs are included with ALIGN to convert formats. The programs are
distributed by ftp from the EBI ftp software server at ftp.ebi.ac.uk
in directory pub/software/mac as file align.hqx.
A set of C source programs presumably equivalent to these is also made
available by Milton Saier at UCSD on a
web page
at http://www-biology.ucsd.edu/~msaier/transport/software.html.
Roland Fleißner
of the Institut für Bioinformatik,
University of Duesseldorf, Germany
(fleissner (at) cs.uni-duesseldorf.de), Dirk Metzler of the Institut für Informatik, University of
Frankfurt, Germany (metzler (at) informatik.uni-frankfurt.de) and Arndt von Haeseler
of the Center for Integrative Bioinformatics Vienna (arndt.von.haeseler (at) univie.ac.at)
have written ALIFRITZ version 1.0. It simultaneously
infers phylogenies and alignments using a model of insertions,
deletions, and substitutions, using a Markov chain Monte Carlo method
to sample from alignments within given phylogenies.
It is described in the paper: Fleissner, R., D. Metzler, and A. von Haeseler.
2005. Simultaneous statistical multiple alignment and phylogeny reconstruction.
Systematic Biology 54: 548-561.
ALIFRITZ is available as C source code and as a Linux executable from
its web page at
http://www.cibiv.at/software/alifritz/
Ben
Redelings, currently of the National Evolutionary Synthesis Center
(benjamin.redelings (at) nescent.org) and Marc Suchard
of the Department of Biomathematics
of the University of California, Los Angeles
(msuchard (at) ucla.edu)
have produced BAli-Phy
(Bayesian ALIgnments and PHYlogenies),
version 2.1.0, a program for joint Bayesian estimation of alignment and phylogeny. Instead of inferring trees based on a single fixed alignment, BAli-Phy considers near-optimal alignments when estimating the phylogeny. BAli-Phy can also make use of information in shared insertions or deletions to infer the phylogeny. It uses a Markov chain Monte Carlo (MCMC) sampling to draw from the posterior distribution of alignments and trees. It can provide maximum posterior probability estimates the tree and the alignment and indicate the extent of support for groups and alignment positions. It can analyze either DNA or protein sequences,
allowing for rate variation among sites and allowing a variety of substitution
models. It is described in the papers:
- Redelings B. D, and M. A. Suchard 2005. Joint Bayesian estimation of alignment and phylogeny, Systematic Biology 54(3): 401-418.
- Suchard, M. A. and B. D. Redelings. 2006. BAli-Phy: simultaneous Bayesian inference of
alignment and phylogeny. Bioinformatics 22: 2047-2048.
It is available as C++ source code, Windows executables, Powermac Mac OS X executables, Intel Mac OS X executables, and Windows executables. It can be
downloaded from
its web site
at http://www.biomath.ucla.edu/~msuchard/bali-phy/
Kazutaka Katoh, Hiroyuki Toh, K. Kuma, T. Miyata, and K. Misawa
of the Division of Bioinformatics
of the University of Kyushu, Japan
(katoh (at) bioreg.kyushu-u.ac.jp.)
have written MAFFT
(Multiple sequence Alignment by Fast Fourier Transform),
version 6.821, a fast multiple sequence alignment program using Fast Fourier Transforms and progressive alignment. MAFFT has several methods for fast progressive alignment of very large numbers of sequences, allowing for large gaps, though tending to be limited to cases in which the blocks of aligned sites stay in the same order. It can build and output a tree as one option.
It is described in the paper:
- Katoh, K. and H. Toh. 2007. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23: 372-374.
- Katoh, K., K. Kuma, H. Toh and T. Miyata, 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511-518.
- Katoh, K., K. Misawa, K. Kuma and T. Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.Nucleic Acids Research 30: 3059-3066
It is available as C++ source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from
its web site
at http://align.bmr.kyushu-u.ac.jp/mafft/software/ A
web server is also available there.
Robert C. Edgar and Kimmen Sjolander
(bob (at) drive5.com)
(Sjolander is of the Department of Bioengineering
of the University of California, Berkeley)
have produced LOBSTER, a package carrying out
Simultaneous alignment and tree construction using hidden markov models
(SATCHMO), which
aligns protein sequences while constructing Hidden markov Models of the
alignments. It creates a hidden markov model of each protein structure,
aligning sequences to a profile and constructing a tree by clustering the most
similar sequences. The HMM profile constructed shows different numbers of
positions as alignable as one moves through the tree, so an interactive tree
viewer and sequence viewer is included to view the result.
It is described, and related methods described, in the papers:
- Edgar, R. C. and K. Sjolander. 2003. SATCHMO: Sequence alignment and tree construction using hidden Markov models, Bioinformatics 19(11): 1404-1411.
- Edgar, R. C. and K. Sjolander. 2004. COACH: profile-profile alignment of protein families using hidden Markov models, Bioinformatics 20(8):1309-1318
It is available as C++ source code, Windows executables and Linux executables. It can be downloaded from
the LOBSTER web site
at http://www.drive5.com/lobster/ There is also a
web server available to run
the program.
Robert Edgar
of Mill Valley, California
(muscle (at) drive5.com)
has written MUSCLE
(Multiple Sequence Comparison by Log-Expectation),
version 3.8.31, a program for creating multiple alignments of amino acid or nucleotide sequences. MUSCLE counts k-tuples shared among pairs of sequences and makes a preliminary phylogenetic tree for the sequences from this. It then makes a preliminary multiple-sequence alignment, and iteratively reconsiders the tree and the alignment. The result is many times faster than ClustalW and more accurate as well. The trees produced by the first or second tree-building procedures can be written out as well.
It is described in the papers:
- Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5): 1792-1797.
- Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with
reduced time and space complexity. BMC Bioinformatics 5 113.
It is available as C++ source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from
its web site
at http://www.drive5.com/muscle/
There is also web server of MUSCLE available at Kimmen Sjölander's group, but it does not return the trees inferred.
Manolo Gouy
of the Laboratoire de Biometrie et Biologie Evolutive
of the Centre National de la Recherche Scientifique, France
(manolo.gouy (at) univ-lyon1.fr)
has released SeaView
version 4.3.3, a multiplatform graphical user interface for sequence alignment
and phylogenetic tree building. SeaView allows multiple sequence alignment
with the MUSCLE and ClustalW
programs, and can also drive many other external multiple alignment algorithms.
It also drives GBlocks to help select blocks of evolutionarily conserved
sequence sites. Tree building can be done using parsimony, distance, or maximum
likelihood (using PHYML) approaches. SeaView also allows
network access to sequence databases, and display, printing, and
copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic
trees. Given this availability of many different methods for phylogenetic
analyses, SeaView will be especially useful for teaching and for occasional users of such software.
It is described in the paper:
Gouy M., S. Guindon, and O. Gascuel. 2010. SeaView version 4: a multiplatform
graphical user interface for sequence alignment and phylogenetic tree building.
Molecular Biology and Evolution 27 (2): 221-224.
It is available as C++ source code, Windows executables, Linux executables and
Mac OS X universal executables, and SeaView is also available as Linux packages
for Debian, Fedora, and Gentoo Linux. It can be downloaded from
its web site
at http://pbil.univ-lyon1.fr/software/seaview.html
Pietro Liò, of the Computer Laboratory at the
University of Cambridge (Pietro.Lio (at) cl.cam.ac.uk),
has written PASSML and PASSML_TM,
which use likelihood methods with Hidden Markov models to infer
phylogeny and also secondary structure from protein data. PASSML is for
general proteins and PASSML_TM is for membrane proteins.
The methods used are described in the papers: Goldman, N., J. L. Thorne,
and D. T. Jones. 1998. Assessing the impact of secondary structure and
solvent accessibility on protein evolution. Genetics 149:
445-458,
PASSML is described in the paper: Liò, P., N. Goldman, J. L. Thorne
and D. T. Jones. 1998. PASSML: combining evolutionary inference and protein
secondary structure prediction. Bioinformatics 14: 726-733,
and PASSML_TM is described in the paper:
Liò, P. and N. Goldman. 1999 Using protein structural information in
evolutionary inference: transmembrane proteins. Molecular Biology and
Evolution 16: 1696-1710.
The programs are available as ANSI C source code.
The source code is available via
its web page at
http://www.ebi.ac.uk/goldman/hmm/passml.html.
Rod Page (r.page (at) bio.gla.ac.uk), of
the Division of Environmental and Evolutionary Biology of the University
of Glasgow has written COMPONENT version 2.0, a program for
Windows systems for
comparing cladograms for use in phylogeny and biogeography studies. It has
many tree comparison and consensus methods, and far more features for
biogeographic studies (such as comparing species and area cladograms) than any
other package. It also can generate random trees. It runs under Windows 3.0
or higher. There is a review of the
program in: Slowinksi, J. 1993. Review of Component, Version 2.0, by Roderick D. M. Page. Cladistics 9: 351-353. COMPONENT is
available free from its
web site at
http://taxonomy.zoology.gla.ac.uk/rod/cpw.html.
Source code in Pascal and documentation (as PDFs) are also available there.
A very early development Macintosh version ("COMPONENT Lite") is available
from the
COMPONENT Lite web site
at http://taxonomy.zoology.gla.ac.uk/rod/cplite/guide.html.
Rod Page
(r.page (at) bio.gla.ac.uk), of the Division of Environmental and
Evolutionary Biology of the University of Glasgow
and Michael Charleston (mcharles (at) it.usyd.edu.au)
of the Biological Informatics and Technology Centre of the School of
Information Technologies of the University of Sydney,
Sydney, Australia have written TREEMAP, version 3,
a free program for comparing host and parasite
phylogenies. It allows you to interactively compare host and parasite
trees, construct reconstructions of the history of the association, and
perform some simple randomisation tests of hypotheses of cospeciation.
It also can use Charleston's "Jungles" method to fit parasite trees to host
trees by
parsimony. That method is described in his paper: Charleston, M. A. 1998
Jungles: A new solution to the host/parasite phylogeny reconciliation problem.
Mathematical Biosciences 149: 191-223.
For a description of the method used by TreeMap, see Page, R.D.M. 1994.
Parallel phylogenies: Reconstructing the history of host-parasite
assemblages. Cladistics 10: 155-173.
It can also estimate the number of randomized parasite trees that map as well
to the host tree as does the original parasite tree.
The program is available as a Java executable, which can be downloaded from
its web site
at http://www.it.usyd.edu.au/~mcharles/software/treemap/treemap3.html.
A beta release executable for Mac OS of version 2.0, called version 2.0β,
is available at
the
Treemap 2.0β web site
at http://www.it.usyd.edu.au/~mcharles/software/treemap/treemap.html.
An earlier version, 1.0,
is available as an executable for Mac OS or as an executable for Windows PCs.
They can be downloaded from
its WWW site: http://taxonomy.zoology.gla.ac.uk/rod/treemap.html.
Fredrik Ronquist (Fredrik.Ronquist
(at) nrm.se)
of the Naturhistoriska riksmuseet, Stockholm, Sweden
has released DIVA version 1.2, a program for
DIspersal Vicariance Analysis. It is for analyses in historical
biogeography, where one is reconstructing the distribution history of a group
of organisms from the distribution areas of extant species and their phylogeny.
It is a parsimony-style analysis based on optimization of the numbers of
dispersal and extinction events, where one assumes that speciations divide
species ranges allopatrically. It does not make any assumption about
the hierarchical nature of vicariance events.
It was formerly available as either a Windows executable or a Mac OS executable from
its web page
at http://www.ebc.uu.se/systzoo/research/diva/diva.html.
Currently there is some download, not well described, including perhaps source
code, available from the Sourceforge site at
http://diva.sourceforge.net/.
Yu Yan
,
of the College of Life Sciences
of Sichuan University, Chengdu, China
(yuyan (at) mnh.scu.edu.cn)
and A. J. Harris of the Department of Botany, Oklahoma State University,
Stillwater, Oklahoma, USA
have produced S-DIVA
(Statistical Dispersal-Vicariance Analysis),
version 1.9β and 1.5c, a tool for inferring biogeographic histories. It
uses statistical
dispersal-vicariance analysis to check the ancestral reconstructions
and evaluate the alternative ancestral areas at each node in the tree. S-DIVA
provides a graphical user interface
and can export high resolution graphical results for further analysis.
It expands on the methods provided by DIVA by using a Bayesian approach to
uncertainty in the phylogeny.
It is described in a paper: Yu, Y., A. J. Harris, and X. J. He. 2010. S-DIVA
(Statistical Dispersal-Vicariance Analysis): a tool for inferring
biogeographic histories. Molecular Phylogenetics and Evolution
56: 848-850.
It is available as a Windows executable. Microsoft .NET 2 Framework should
be installed on the system to enable S-DIVA to be run. S-DIVA can be
downloaded from
its web site
at http://mnh.scu.edu.cn/s-diva/
Fredrik Ronquist (Fredrik.Ronquist
(at) nrm.se)
of the Naturhistoriska riksmuseet, Stockholm, Sweden
has written TreeFitter version 1.0. It fits parasite
trees to a host tree, and can also use them to infer the best host tree.
The program, which has many options, uses an event-based parsimony
method, which penalizes events using penalties chosen to reflect their
improbability. The NEXUS file format is used for the tree files.
It is available from its web site at
http://www.ebc.uu.se/systzoo/research/treefitter/treefitter.html
as either a Windows executable or a Mac OS executable. An on-line manual
is available at the web site.
Steffen Junick, Daniel Merkle, and Martin Middendorf
,
of the research group on Parallel Computing and Complex Systems of the Faculty of Mathematics and Computer Science
at the the Universität Leipzig, Germany (Merkle is currently
in the Department of Mathematics and Computer Science at the University
of Southern Denmark)
(daniel (at) imada.sdu.dk)
have written Tarzan
(it is called this because pairs of trees that are cophylogenies have been
called "Jungles"),
Tarzan is in version 0.9. It is a program for the reconstruction of
cophylogenies (host/parasite trees or fits of trees to biogeographic vicariance
patterns).
Tarzan uses an event-based method to find cost minimal or reconstructions or
reconstructions that have a minimal (or maximal) number of certain
evolutionary events. Five different types of evolutionary events are
considered: cospeciation, duplication, sorting, switching, and extinction. For
host-parasite systems cospeciation events refer to simultaneous host and
parasite speciation, duplication events are independent parasite speciations,
sorting events correspond to lineage sorting, and switches correspond to host shifts.
It is described in the paper:
Merkle, D. and M. Middendorf. 2005. Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information Theory in Biosciences 123(4): 277-299.
Tarzan is available as Java code. It can be downloaded from
its web site
at http://pacosy.informatik.uni-leipzig.de/pv/Software/Tarzan/PV-Tarzan.engl.html
Pierre Legendre
of the Département de Sciences Biologiques
of the Université de Montréal, Montréal, Quebec
(Pierre.Legendre (at) umontreal.ca)
has written ParaFit, a program that tests host-parasite
evolution. It tests the hypothesis of coevolution between a clade of hosts and a clade of parasites. The null hypothesis of the global test is that the evolution of the two groups, as revealed by the two phylogenetic trees and the set of host-parasite association links, has been independent. The method requires some estimates of the phylogenetic trees or phylogenetic distances, and also a description of the host-parasite associations (H-P links) observed in nature. Two types of test are produced by the program: a global test of coevolution and a test on each H-P link.
It is described in the paper:
Legendre, P., Y. Desdevises and E. Bazin. 2002. A statistical test for host-parasite coevolution. Systematic Biology 51(2): 217-234.
It is available as FORTRAN source code, Windows executables, Powermac Mac
OS X executables, and Mac OS 9 executables. It can be downloaded from
its web site
at http://www.bio.umontreal.ca/casgrain/en/labo/parafit.html
Alexandros Stamatakis, A. Auch, J. Meier-Kolthoff, and M. Göker
,
of the Laboratory for Computational Biology and Bioinformatics (LCBB)
of the École Polytechnique Fédérale de Lausanne,
Switzerland and of the Center for Bioinformatics
(ZBIT) of the University of Tübingen, Germany
and the Lehrstuhl für Spezielle Botanik und Mykologie of the
Botanisches Institut, Universität Tübingen
(Stamatakis is currently at Lehrstuhl XII - Machine Learning and Data Mining in
Bioinformatics at the Technische Universität München, Germany)
(Alexandros.Stamatakis (at) h-its,org)
have written AxParafit
(AleXandros's version of Parafit),
a parallel version of the program ParaFit for fitting host and parasite
trees. AxParafit and AxPcoords are highly optimized versions of Pierre
Legendre's ParaFit and DistPCoA
programs for statistical analysis of host-parasite coevolution. AxParafit has
also been parallelized with MPI (Message Passing Interface) for compute
clusters. The AxParafit site also includes a parallel version of the program
AxPcoords which is used with AxParafit.
They are described in the paper:
Stamatakis, A., A. Auch, J. Meier-Kolthoff, and M. Göker. 2007. AxPcoords and Parallel AxParafit: Statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics 8: 405.
They are available as C source code, Windows executables, Linux executables and
Mac OS X universal executables. They can be downloaded from
its web site
at http://icwww.epfl.ch/~stamatak/AxParafit.html
Daniel Merkle, Martin Middendorf, and Nicolas Wieseke
of the Department of Computer Science
of the University of Leipzig, Germany
(middendorf (at) informatik.uni-leipzig.de)
has released CoRe-PA
version 1.0, tool for reconstructing the coevolutionary history of host parasite systems. CoRe-PA is a tool for reconstructing the coevolutionary history of host parasite systems. As Tarzan it uses an event-based method to find cost minimal reconstructions. These events are cospeciation, sorting, duplication and (host)switching.
With CoRe-PA you can design host parasite scenarios with a graphical editor,
generate random coevolutionary scenarios using the beta-split model with beta
0, -1 or -1.5, generate random coevolutionary scenarios by simulating
coevolution, generate random coevolutionary scenarios which retain the
characterof given host parasite systems, handle non-binary host and parasite
phylogenies,
choose between different ways of handling host switches,
use divergence timing information,
compute the best reconstructions for a given set of costs,
compute the best cost vector for a given host parasite system (where the cost
vector fits best to the reconstructed event frequencies),
do randomization tests for given host parasite systems to analyze the
evidence for coevolution,
and export host parasite scenarios and their reconstructions to SVG graphics
files. It is described in the paper:
Merkle, D., M. Middendorf, and N. Wieseke. 2010.
A parameter-adaptive dynamic programming approach for inferring cophylogenies.
BMC Bioinformatics 11 (Suppl 1): S60.
It is available as Java executables, Windows executables, Linux executables and Mac OS X universal executables. It can be downloaded from
its web site
at http://pacosy.informatik.uni-leipzig.de/58-1-Downloads.html
Ran Libeskind-Hadas
of the Computer Science of Harvey Mudd College in
Claremont, California
(hadas (at) cs.hmc.edu)
has released Jane
version 3, a cophylogeny reconstruction package. The input to Jane is a file
containing a host tree, a parasite tree, and a mapping of the tips of the
parasite tree to tips of the host tree. The user may specify the costs of each
of five types of events: cospeciations, duplications, host switches, losses,
and failure to diverge. Jane then endeavors to find least cost mappings of the
parasite tree onto the host tree subject to the given tip mapping. Jane also
has a features to perform randomization tests. It is described in the paper:
Conow, C., D. Fielder, Y. Ovadia and R. Libeskind-Hadas. 2010.
Jane: A new tool for the cophylogeny reconstruction problem. Algorithms for
Molecular Biology 5: 16.
It is available as Java executables. It can be downloaded from
its web site
at http://www.cs.hmc.edu/~hadas/jane/
Athanasia C.
Tzika, Raphaël Helaers, and Michel Milinkovitch
of the Laboratory of Artificial and Natural Evolution (LANE) of the Department
of Zoology and Animal Biology at the University of Geneva, Switzerland, and
Yves Van de Peer, of the Department of Plant Systems Biology of the University
of Gent, Belgium
(info (at) mantisdb.org) or (Michel.Milinkovitch (at) unige.ch)
have produced MANTiS
version 1.1, a program using molecular databases to reconstruct gene
duplications and losses . MANTiS builds a relational database integrating, in
a phylogenetic framework, all Ensembl genes, corresponding PANTHER molecular
functions and biological processes, as well as GNF, e-genetics, and HMDEG
expression data. It makes use of the Ensembl ortholog/paralog prediction
pipeline to reconstruct gene duplication events, and implements a dynamical
programming approach for the mapping of gene gains, duplications, and losses
on the phylogenetic tree.
It allows the user to identify gains and losses on specific branches of the
tree, see the genome content of ancestral species, statistically over- or
under-represented molecular functions, biological processes and anatomical
systems (expression data), and reconstruct tissue specificity of gained,
duplicated, and lost genes.
It is described in the paper:
Tzika, A. C., R. Helaers, Y. Van de Peer and M. C. Milinkovitch. 2008. MANTiS: a phylogenetic framework for multi-species genome comparisons. Bioinformatics 24(2): 151-157.
It is available as Java executables with a Windows executable installer, a
Linux executable installer, and a Mac OS X universal executable installer. It
can be downloaded from
its web site
at http://www.mantisdb.org
Jonathan Bollback, of the
Institute of Science and Technology Austria,
Klostemeuberg, Austria (bollback
(at) ist.ac.at)
has written SIMMAP (SIMulation MAPping) version 1.5.2.
It stochastically maps characters onto a tree, given the tree and a
probability model of character change among discrete states. It can handle
general models of nucleotiden substitution as well as the Mk model of change
of discrete morphological characters. It is also able
to estimate covariation between molecular or morphological characters,
estimate dN and dS, while accounting for model and tree uncertainty,
and estimate a wide variety of descriptive statistics for patterns in
molecular or morphological evolution.
The method it uses was introduced in the papers:
- Nielsen, R. 2002. Mapping
mutations on phylogenies. Systematic Biology 51: 729-739.
- Huelsenbeck, J. P., R. Nielsen, and J. P. Bollback. 2003. Stochastic
mapping of morphological characters. Systematic Biology 52:
131-158.
- Bollback, J. P. 2006. SIMMAP: Stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics 7:88.
SIMMAP is a Mac OS X executable, available
from its web page at
http://www.simmap.com
at the University of Copenhagen.
Liran Carmel, Yuri I. Wolf, Igor B. Rogozin, and Eugene V. Koonin
of the National Center for Biotechnology Information, National Library of Medicine
of the National Institutes of Health, Bethesda, Maryland (Carmel is now at
the
the Department of Computer Science at Hebrew University, Jerusalem, Israel
with email address
liran.carmel (at) carmellab.com) released EREM
(Evolutionary Reconstruction by Expectation-Maximization),
a program for parameter estimation and ancestral reconstruction for evolution
of binary characters. EREM assumes a probabilistic model for evolution of
binary characters on a given bifurcating tree. EREM estimates rates of change
between states 0 and 1 of the model, and reconstructs ancestral states
(presence and absence in internal nodes) and the location of events (gains
and loss along branches). It can also be used to simulate data on a tree.
It is available as C++ source code and Windows executables. It can be downloaded from
its web site
at http://carmelab.huji.ac.il/software/EREM/erem.html
Antonio Marco and Ignacio Marín
of the Departamento de Genética
of the Universitat de València and of the Instituto de Biomedicina de
València, of the Consejo Superior de Investigaciones Científicas,
València, Spain
(marcasan (at) uv.es)
have written Tree Tracker, a Perl script to detect
overrepresented clusters in a tree. It takes a user-supplied tree and a
list of genes. The program uses a permutation analysis of ranked clusters to
test whether groups within the tree are overrepresented for having one state of
genes that have two possible states. It is described in the paper:
Marco, A. and I. Marín. 2007. A general strategy to determine the
congruence between a hierarchical and a non-hierarchical classification.
BMC Bioinformatics 8: 442.
It is available as a Perl script. It can be downloaded from
its web site
at http://www.uv.es/~genomica/treetracker/
Jianzhi George Zhang of the Department of Ecology and Evolutionary
Biology of the University of Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
produced Ancestor, a program for inferring the ancestral
protein sequence of a set of species from their protein sequences.
The tree of the sequences is inferred by the minimum evolution
distance matrix method of Rzhetsky and Nei. I can estimate the ancestral
sequences at all nodes of the tree. The methods are described in a
paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid
sequences inferred by the parsimony, likelihood, and distance methods.
Journal of Molecular Evolution 44: S139-S146.
The program is distributed as a DOS executable with C source code. It will
run in a Windows Command Prompt window. It is
available from Masatoshi Nei's lab software site software site
at https://homes.bio.psu.edu/people/faculty/nei/software.htm
Jianzhi George Zhang of the Department of Ecology and Evolutionary
Biology of the University of Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
has produced ANC-GENE, a program to infer ancestral
protein and DNA sequences from DNA sequences of a coding gene when the
phylogeny of the species is known. It first infers the amino acids by a
distance-based Bayesian method, and then infers the underlying nucleotide
sequences by fixing the inferred amino acids. It estimates branch lengths
on the phylogeny by a distance method before inferring the ancestral sequences.
It uses one of two possible models of amino acid changes (the Poisson-f or
JTT-f models), as well as the Jukes-Cantor model of nucleotide substitution.
It outputs both inferred pathways of change at each amino acid position and
inferred sequences at each node of the tree. The methods are discussed in'
this paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid
sequences inferred by the parsimony, likelihood, and distance methods.
Journal of Molecular Evolution 44 (Suppl 1): S139-S146.
ANC-GENE is available as a DOS executable and C souce code. These
can be executed in
Windows in a Command Prompt windows. It can be downloaded from
the Nei laboratory software web site
at https://homes.bio.psu.edu/people/faculty/nei/software.htm
Xun Gu, of the Department of Genetics, Development and Cell Biology
and the Center for Bioinformatics and Biological Statistics at
Iowa State University, Ames, Iowa (xgu (at) iastate.edu) has
release Mgenome version 1.0. It finds trees for multiple
genome rearrangement by signed reversals. For a collection of genomes
represented by signed permutations of genes, it finds a tree that connects
all given genomes by reversal paths such that the number of all signed
reversals is as small as possible. The methods seem to be described in a paper:
Wu, S., and X. Gu. 2003. Algorithms for multiple genome rearrangement by
signed reversals. Pacfic Symposium on Biocomputing 8: 363-74,
although the paper does not refer to the program.
The paper is available as a PDF at the Gu lab web site.
The program is available as a Windows executable
at the Gu lab software web site at
http://xungulab.com/software.html.
Mathieu Blanchette, of the School of Computer Science, McGill
University, Montréal, Québec
(blanchem (at) mcb.mcgill.edu) has written BPAnalysis,
a program that infers phylogenies from a set of gene orders by minimizing
the number of breakpoints required in genome rearrangement (this is not the
same as minimizing the number of rearrangement events). It is a C++
program which is also distributed in source code and in an executable
for DOS and Windows. The method employed is described in the paper:
Sankoff, D. and M. Blanchette. 1998. Multiple genome rearrangement and
breakpoint phylogeny. Journal of Computational Biology 5:
555-570. It is available from
Blanchette's software page
at http://www.mcb.mcgill.ca/~blanchem/software.html
Benjamin Vernot, Aiton Goldman and Dannie Durand
of the Departments of Biological Sciences and Computer Science
of the Carnegie Mellon University, Pittsburgh, Pennsylvania
(notung (at) cs.cmu.edu)
have released Notung version 2.6, a unified framework for
incorporating gene duplication/loss parsimony in phylogenetic inference.
Given a gene and species tree as input, Notung can:
(1) Reconcile the trees, (2) Estimate upper and lower bounds on duplication
times in terms of speciation events, (3) Root an unrooted tree by minimizing
gene duplications and losses, and (4) Rearrange regions of a gene tree with
weak support in the sequence data to obtain alternate hypotheses.
Notung's graphical user interface supports exploratory data analysis
of very large trees and rapid review of many alternate
hypotheses. Notung also provides a command-line interface for
automated analysis of many trees in high-throughput genomic studies.
Notung can read and save trees in Newick, NHX, or Notung file format.
Images can be outputted in PNG format for use in
publications.
Notung is freely available in a Java executable which can run on Mac OS X,
Windows and Linux systems. The distribution includes: Notung java executable,
a manual in PDF format with worked examples, sample trees, sample scripts for
automated analysis. Java 1.4 or higher is required.
It is described in the paper: Durand, D., B. V. Halldorsson, and B. Vernot.
2005. A hybrid micro-macroevolutionary approach to gene tree reconstruction.
Journal of Computational Biology 13(2): 320-335.
It can be downloaded from
its web site
at http://www.cs.cmu.edu/~durand/Notung/
Olivier Elemento,
then of the IMGT, the International imMunoGeneTics database and the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier)
of the Université de Montpellier II, Montpellier, France
(He is now at the Institute for Computational Biomedicine
at Weill Cornell Medical College in New York City and his email
address is
ole2001 (at) med.cornell.edu)
has written DTscore
(Duplication, Tandem - score),
a distance-based tandem duplication tree reconstruction program. It takes as input a distance matrix between copies in a family of tandem repeats. The rows and columns need to be ordered in the same way as the copies are in the locus. DTscore can be applied to relatively large datasets (more than a hundred copies).
It is described in the paper:
Elemento O. and O. Gascuel. 2002. A fast and accurate distance algorithm to reconstruct tandem duplication trees. Bioinformatics 18: S92-S99.
It is available as C source code, Windows executables and Linux executables. It can be downloaded from
its web page at
http://www.lirmm.fr/%7Eelemento/DTscore/ and also at
its web site at
ATGC
at http://www.atgc-montpellier.fr/dtscore/binaries.php
Michael Sanderson
of the Department of Ecology and Evolutionary Biology
of the University of Arizona, Tucson, Arizona
(sanderm (at) email.arizona.edu)
has written gtp
(Gene Tree Parsimony),
version 0.15, a program to reconcile gene trees with species trees using a
gene tree parsimony criterion. The program reads a NEXUS-format file
containing the species tree and a series of gene trees, which have at their
tips the names of the species. The gene trees are reconciled with the species
trees using a gene duplication count. The gene trees can either be considered
to be rooted as given, or optionally they can be considered to be unrooted, in
which case the count of duplications is made by considering the minimum over
all possible rootings of each gene tree. The methods are described in the paper:
Zmasek, C. M. and S. R. Eddy. 2001. A simple algorithm to infer gene
duplication and speciation events on a gene tree. Bioinformatics 17: 821-828.
It is available as C source code. It can be downloaded from
its web site
at http://loco.biosci.arizona.edu/gtp/gtp.html
To top of this page
To next section of software pages
Notices added in compliance with University of Washington
requirements for web sites hosted at the University:
Privacy
Terms