Phylogeny Programs (continued)

Tatsuya Ota, most recently at the Hayama Information Network Center at the Graduate University of Advanced Studies, Hayama, Japan (ota (at) soken.ac.jp) written a package, DISPAN, (Genetic Distance and Phylogenetic Analysis), which computes for gene frequency data the heterozygosity, gene diversity, Nei's standard genetic distance or the DA distance, and their standard error. It also constructs phylogenies using the neighbor-joining (NJ) method or the UPGMA method. These trees can also be bootstrapped. A tree editor allows the user to rearrange the tree and print it out. The package consists of two programs, GNKDST and TREEVIEW. The first is a rewrite of a program by A. K. Roychoudhury, Y. Tateno, D. Graur, N. Saitou, and R. Schwartz, the second was written by Koichiro Tamura. DISPAN is distributed as DOS executables (which can run under Windows in a Command tool window. The package and its Readme file are available at the IUBIO software server at http://iubio.bio.indiana.edu/soft/molbio/ibmpc/ and at a web page describing it at http://www.bio.psu.edu/People/Faculty/Nei/Lab/dispan2.htm at the software pages of Masatoshi Nei's laboratory at Molecular Evolution and Phylogenetics at Pennsylvania State University.

Kevin Howe, Alex Bateman, and Richard Durbin of the Wellcome Trust Sanger Institute in Hinxton, U.K. (klh (at) sanger.ac.uk, agb (at) sanger.ac.uk, and rd (at) sanger.ac.uk) have released QuickTree, a program for rapid calculation of Neighbor-Joining trees. The algorithms used are O(n³) like most other implementations of that method, but have been optimized for speed. The program is described in the paper: Howe, K., A. Bateman, and R. Durbin. 2002. QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18: 1546-1547. QuickTree is distributed as C source code from its web site at http://www.sanger.ac.uk/resources/software/quicktree/

Travis Wheeler of the Janelia Farm Research Campus of the Howard Hughes Medical Institute (travis (at) nimbletwist.com) has released NINJA version 1.0.4, software for inferring large-scale neighbor-joining phylogenies from distance matrices. NINJA is argued by Wheeler to be the fastest available tool for computing correct neighbor-joining phylogenies for inputs of more than 10,000 sequences, and to be more than 10x faster than the fastest implementation of the canonical neighbor-joining algorithm (QuickTree). It is described in the paper: Wheeler, T. J. 2009. Large-scale neighbor-joining with NINJA. pp. 375-389 in Salzberg , S. L. and T. Warnow (eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics. WABI 2009 Springer, Berlin. It is available as Java source code and Java executables. It can be downloaded from its web site at http://nimbletwist.com/software/ninja/ It is also available as part of the Mesquite package of Java programs.

Sudhir Kumar, (S.Kumar (at) asu.edu), of the Center for Evolutionary Functional Genomics at Arizona State University, Tempe, Arizona has written PHYLTEST, version 2.0. It is a DOS executable program for testing phylogenetic hypotheses about four clusters of DNA sequences. It implements comparison of three alternative phylogenetic trees for four monophyletic clusters of sequences, the four-cluster analysis: Rzhetsky, A, S. Kumar, and M. Nei. 1995. Four-cluster analysis: a simple method to test phylogenetic hypotheses. Molecular Biology and Evolution 12: 163-167. It can also carry out the interior branch test of the null hypothesis that an interior branch length is significantly longer than zero (Rzhetsky, A. and M. Nei. 1992. A simple method for estimating and testing minimum-evolution trees. Molecular Biology and Evolution 9: 945-967), as well as the estimation of average pairwise distances (and standard errors) within and between clusters of sequences and relative rate tests and the computation of the time of divergence. PHYLTEST is distributed from the IUBIO software server at http://iubio.bio.indiana.edu/soft/molbio/ibmpc/ molbio/ibmpc. The "readme" file for it is distributed there and is also available at Masatoshi Nei's lab software page web page at Pennsylvania State University at http://www.bio.psu.edu/People/Faculty/Nei/Lab/phyltest2.htm. It is distributed as a self-extracting archive, containing the executables and examples, with a Readme file. The program can be run under DOS or in the Command tool of Window.

[TREECON icon here] TREECON version 1.3b is a software package developed by Yves Van de Peer of the Bioinformatics and Evolutionary Genomics group at the Department of Plant Systems Biology, University of Ghent, Belgium (yves.vandepeer (at) @psb.ugent.be) for the construction and drawing of phylogenetic trees based on distance data. Several equations are included to convert dissimilarity into evolutionary distance and several methods (such as neighbor-joining) are included for inferring the tree topology. It also includes bootstrap analysis. It also has good facilities for rerooting and drawing trees. The program is available for free for academic use, for other use you are asked to contact its author. It on PCs under Windows. It is described in several papers:

Van de Peer, Y. and R. De Wachter. 1993. TREECON: a software package for the construction and drawing of evolutionary trees. Computer Applications in the Biosciences (CABIOS) 9: 177-182.
Van de Peer, Y. and R. De Wachter. 1994. TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Computer Applications in the Biosciences (CABIOS) 10: 569-570.
Van de Peer, Y. and R. De Wachter. 1997. Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Computer Applications in the Biosciences (CABIOS) 13: 227-230.
Van de Peer, Y. 2003. Analysis of nucleotide sequences using TREECON. pp. 236-255 in The Phylogenetic Handbook, edited by M. Salemi and A.-M. Vandamme. Cambridge University Press, Cambridge, U.K.

It is described on its web site at http://bioinformatics.psb.ugent.be/software_details.php?id=3, and it can be downloaded from there, and an online manual is also viewable there.

Andrey Rzhetsky (andrey.rzhetsky (at) dbmi.columbia.edu) of the Department of Biomedical Informatics at Columbia University, New York and Masatoshi Nei of the Institute of Molecular and Evolutionary Genetics at Pennsylvania State University have produced METREE version 1.2, a program for carrying out the minimum-evolution distance matrix method. METREE runs on DOS systems and on Windows (under a Command Tool window). It computes minimum evolution distance matrix trees from DNA and amino acid sequence data and tests the statistical significance of topological differences and of the branch lengths. Different distance matrix measures may be used. The package is menu driven and the TREEVIEW program written by Koichiro Tamura for visualizing and printing out the final tree is also included. The method is described in the paper by A. Rzhetsky and M. Nei. 1992. A simple method for estimating and testing minimum-evolution trees. Molecular Biology and Evolution 9: 945-967, and the program is described in a paper by A. Rzhetsky and M. Nei. 1994. METREE: a program package for inferring and testing minimum-evolution trees. Computer Applications in the Biological Sciences (CABIOS) 10: 409-12. METREE is distributed from the IUBIO server from http://iubio.bio.indiana.edu/soft/molbio/ibmpc/. A Readme file is also available at its listing in its listing at the software page of Masatoshi Nei's laboratory at http://www.bio.psu.edu/People/Faculty/Nei/Lab/software.htm

Richard Desper, most recently of Ziheng Yang's lab at the Department of Biology, University College, London, U.K., and Olivier Gascuel of the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier), Montpellier, France (gascuel (at) lirmm.fr) have written FastME, a fast program for the minimum evolution distance matrix method. It is described as faster than neighbor-joining methods, more accurate than them, and as accurate as least squares methods. It can analyze multiple data sets as part of bootstrapping analyses. Its methods are described in two papers:

Desper, R., and O. Gascuel. 2002. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. pp. 357-374 in Proceedings of the Second International Workshop on Algorithms in Bioinformatics, WABI 2002, Rome, Italy, September 17-21, 2002. ed. R. Guigó and D. Gusfield. Lecture Notes in Computer Science, no. 2452. Springer-Verlag, Berlin.
Desper, R. and O. Gascuel. 2002. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. Journal of Computational Biology 19: 687-705.

FastME is distribued as Windows, Linux, and Mac OS X executables and C source code from its web site at http://www.atgc-montpellier.fr/fastme/binaries.php.

Olivier Gascuel (gascuel (at) lirmm.fr) of the Laboratoire d'Informatique, de Robotique et de Micro-Electronique de Montpellier (LIRMM) of the Universite de Montpellier II, France has written BIONJ, an improved version of Neighbor-Joining based on a simple model of sequence data. It follows the same agglomerative scheme as NJ but uses a simple, first-order model of the variances and covariances of evolutionary distance estimates. This model is appropriate when these estimates are obtained from aligned sequences. It retains the speed advantages of Neighbor-Joining while using a slightly different criterion to select pairs of taxa to join, one which will perform better when distances between taxa are large. It is described in the paper: Gascuel, O. 1997. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14: 685-695. C source code and Windows, Linux, and Mac OS X executables of BIONJ are available at its web page at http://www.atgc-montpellier.fr/bionj/binaries.php. It is also available as a web server here.

William J. Bruno of the Los Alamos National Laboratory (billb (at) lanl.gov) has released nneighbor, a modification of the PHYLIP Neighbor-Joining distance matrix program that avoids negative branch lengths (its name means Non-Negative Neighbor). The program is available as generic C code. It is available at one of Bruno's web pages at http://www.t10.lanl.gov/billb/related_links.html.

William J. Bruno, Nicholas D. Socci, and Aaron L. Halpern of the Los Alamos National Laboratory (billb (at) lanl.gov) have produced weighbor (Weighted nEIGHBOR-joining or perhaps WEIGHted neighBOR-joining), version 1.2.1, a distance matrix program for performing a weighted version of the Neighbor-Joining method. The weighting used is for nucleotide sequences and more correctly reflects the uncertainty of the longer distances in the tree than does ordinary Neighbor-Joining. It is thus closer to approximating maximum likelihood and will be more accurate than Neighbor-Joining on large trees. It is described in a paper: Bruno, W. J., N. D. Socci, and A. L. Halpern 2000. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Molecular Biology and Evolution 17: 189-197. Weighbor is available as C source code and as Windows, IRIX, Solaris, and Linux executables (plus some older executables for DOS and Mac OS) from its web site at http://www.t10.lanl.gov/billb/weighbor/index.html. It is also available as a web server at the Institut Pasteur in Paris.

Paul Lewis, (plewis (at) uconnvm.uconn.edu), of the Department of Ecology and Evolutionary Biology, University of Connecticut, and Dmitri Zaykin, then of North Carolina State University. have written GDA version 1.1, a set of programs to carry out many of the statistical methods for analyzing gene frequencies and sequence data that are described in Bruce Weir's book Genetic Data Analysis II (Sinauer Associates, Sunderland, Massachusetts, 1996). The programs run under Windows and include the calculation of UPGMA and Neighbor-Joining phylogenies. The program is described in a Web site maintained by Paul Lewis at http://hydrodictyon.eeb.uconn.edu/people/plewis/software.php There is also a link there to a command-line-only version of GDA by Chris Basten that runs under Mac OS X. The relevant feature for the purposes of this listing is the ability of the programs to compute a number of distances.

TFPGA icon here Mark Miller (MarkPerryMiller (at) gmail.com) of the Forest and Rangeland Ecosystem Science Center of the U.S. Geological Survey has written TFPGA (Tools For Population Genetics Analyses), a Windows program for the analysis of allozyme and molecular population genetic data. It can calculate genetic distances. In addition, this program calculates descriptive statistics, and F-statistics, and performs tests for Hardy-Weinberg equilibrium, exact tests for genetic differentiation, Mantel tests, and UPGMA cluster analyses. Additional features include the ability to analyze hierarchical data sets as well as data from either codominant markers such as allozymes or dominant markers such as AFLPs or RAPDs. It is available from his software web page at http://www.marksgeneticsoftware.net/ as a Windows executable.

[Genetix icon here] François Bonhomme of the Institut des Sciences de l'Evolution of the Université de Montpellier, France, along with K. Belkhir, P. Borsa, N. Raufaste and L. Chikhi (the program support email address is genetix (at) univ-montp2.fr) has released Genetix version 4.05. This is a Windows executable program that does a wide variety of population genetic procedures. The part relevant to the present list is that it computes the Nei and the Cavalli-Sforza genetic distances, both with and without bias correction. It also calculates F statistics and linkage disequilibrium, and performs permutation tests on the results. One advantage (or limitation, depending on your perspective) is that the interface is in French. Genetix is available from its web site (in French) at http://www.univ-montp2.fr/~genetix/genetix/genetix.htm.

Steven Kalinowski of the Department of Ecology of Montana State University, Bozeman, Montana (skalinowski (at) montana.edu) has written TreeFit version of 17 Dec 2007, a program to compare the fit of UPGMA and Neighbor-Joining trees to the same data. TreeFit creates neighbor-joining and UPGMA trees from a genetic distance matrix, and then compares the observed genetic distance between populations with the genetic distance in the tree. The similarity between these distances is express as R-squared, the correlation measure used in regression analyses. TreeFit can take as input either a distance matrix or genotypes in GenePop format. In that case it computes Fst genetic distances itself. It is available as Windows executables. It requires some pieces of the Microsoft .NET framework to run. It can be downloaded from its web site at http://www.montana.edu/kalinowski/Software/TreeFit.htm

Immanuel Yap, now of the Department of Plant Breeding and Genetics at Cornell University, Ithaca, New York (noelyap (at) ascus.plbr.cornell.edu) and Rebecca Nelson, now of the Department of Plant Breeding and Genetics and the Department of Plant Pathology and Plant-Microbe Biology, Cornell University rjn7 (at) cornell.edu) when they were at the International Rice Research Institute in Manila, Philippines released Winboot, a package of two programs for calculating UPGMA trees for binary (0/1) data and computing bootstrap support for groups on those trees. The programs can read 0/1 data in either PHYLIP format or a tabbed Excel-like format. They can compute a large variety of simple similarity coefficients and carry out bootstrapping on the input file before doing so. The Winboot program uses this to compute a bootstrap consensus tree. The Windist program is similar, but instead writes out the bootstrap sampled distance matrices to an output file in PHYLIP or NTSYS format. The package contains some code from PHYLIP, by agreement. Winboot is available as Windows executables from its web site at http://archive.irri.org/science/software/winboot.asp.

María Jesús Martín and Joaquín Dopazo, then of the R&D Department of TDI (TDI-EMBNet), Spain, (Dopazo is now at the Bioinformatics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain: jdopazo (at) cipf.es ) >tdi.es or dopazo (at) tdi.es) have developed OSA (Optimal Sequence Analysis), version 2.0. It finds, whithin large sequences, those regions with an information content similar to that of the whole sequence and it selects, among them, the shortest ones. This program was formerly called ORF. The algorithm used is based on comparing pairwise genetic distances, calculated for windows of variable size and position, to the distance matrix obtained for the whole sequence. Either uncorrected genetic distances or Jukes-Cantor distances can be used. Two methods are used to set cutoff levels: simulation-based significance values or bootstrapping. A variety of options for search among possible windows are available. The method has been described in a paper: M. J. Martín, F. Gonzalez-Candelas, F. Sobrino and J. Dopazo. 1995. A method for determining the position and size of optimal sequence regions for phylogenetic analysis. Journal of Molecular Evolution 41: 1128-1138. OSA uses aligned sequences in a number of common formats as input. It runs on UNIX-based machines. It is available in Gnu Pascal source code and also executable versions for Solaris and IRIX operating systems are available. The program can analyze up to 50 sequences of a maximum length of 10,000 bp. It can be obtained by ftp from ftp.ebi.ac.uk in directory pub/software/unix/osa, where the source code, a documentation file, and the Solaris and Irix executables are available.

Johannes Schaefer and Michael Schoeniger, then of the Lehrstuhl für Theoretische Chemie of the Technische Universität München have written DISTREE. It computes pairwise distances of aligned nucleotide sequences utilizing various models of base substitution. Moreover it provides the user with information on the goodness of fit of the models to the given set of sequence data. Each of the models is implemented in two variants, assuming identical and gamma distributed substitution rates across sequence sites. It is available as a DOS executable with C source code, or as source code for Unix systems. DISTREE is distributed through the EBI software site archive at http://mirror.pscigrid.gov.ph/ebi-software/software/dos/distree/,

Mikael Thollesson (lddist (at) artedi.ebc.uu.se), of the Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, Sweden has written LDDist version 1.3.2, which calculates LogDet distances from DNA and protein sequences. It accomodates rate variation from site to site as well, by excluding invariant sites or by allowing different rates for different sites to be preassigned. LDDist is described in a paper: Thollesson, M. 2004. LDDist: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences. Bioinformatics 20: 416-418. LDDist is, as this says, written in Perl and C++. With it is distributed PLD.pl, a companion script that serves as a front-end and example of how to use LDDist. They are distributed in source code from its web site at http://artedi.ebc.uu.se/molev/software/LDDist.html

William J. Bruno and Lars Arvestad (billb (at) t10.lanl.gov) of the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory, have released DISTANCE, version 1.0. It estimates the most general reversible substitution matrix corresponding to a given collection of aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences. The method is described in a paper: Arvestad, L. and W. J. Bruno. 1997. Estimation of reversible substitution matrices from multiple pairs of sequences. Journal of Molecular Evolution 45: 696-703. The program is written in C, and distributed from its web site at http://www.t10.lanl.gov/evolution/, along with Sun SPARC binaries.

Joyce Miller Hersh (msmead (at) doctorbeer.com), formerly of the Whitehead Institute at MIT (and more recently a high-tech patent attorney) wrote RESTSITE, version 1.2, a package of DOS programs for computing distances between species based on restriction sites or restriction fragments. The programs also include NJTREE and UPGMA which can infer phylogenies by the Neighbor-Joining and UPGMA distance matrix methods. The programs are written in Microsoft C: source code is available too. The programs, documentation, and source code are distributed by its Web site, http://www-genome.wi.mit.edu/~jmiller/restsite.htm. The programs and their methods were described in two papers:

Miller, J. C. 1991. RESTSITE: A phylogenetic program that sorts raw restriction data. Journal of Heredity 82: 262-263.
Nei, M., and J. C. Miller. 1990. A simple method for estimating average number of nucleotide substitutions within and between populations from restriction data. Genetics 125: 873-879.

Doug McElroy (Doug.McElroy (at) wku.edu) of Western Kentucky University distributes REAP, the Restriction Enzyme Analysis Package, written by him, Paul Moran, Eldredge Bermingham, and Irv Kornfeld. REAP can calculate distances from restriction sites, restriction fragments data, and from nucleotide sequences (the Kimura 2-parameter distance). REAP is a package of DOS executables available from McElroy's web site. at http://bioweb.wku.edu/faculty/mcelroy/. It is described in the paper: McElroy, D., P. Moran, E. Bermingham, and I. Kornfield. 1992. REAP: An integrated environment for the manipulation and phylogenetic analysis of restriction data. Journal of Heredity 83: 157-158.

Peter Rice, Alan Bleasby, and Jon Ison of the European Bioinformatics Institute in Hinxton, England (emboss (at) emboss.open-bio.org) have produced EMBOSS (European Molecular Biology Open Software Suite), version 6.0.1, a package of programs for general sequence analysis with phylogeny and alignment programs. EMBOSS, developed by many developers, is a general suite of programs for sequence analysis. It is a full-featured sequence analysis program developed intended to provide the same functionality as GCG. In addition to its own programs, it also has a suite of other programs, EMBASSY, that are configured to work with EMBOSS. These include ClustalW and most PHYLIP programs. EMBOSS and EMBASSY are a properly constructed toolkit for creating robust bioinformatics applications or workflows, with a comprehensive set of sequence analysis programs. All sequence and many alignment and structural formats are handled. There is an extensive programming library for common sequence analysis tasks. Several different GUI interfaces for it are available. It is described in the paper: Rice, P., I. Longden, and A. Bleasby. 2000. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16 (6): 276-277. It is available as C++ source code and C source code. It can be compiled on all known Unix and Linux systems. It can be downloaded from its web site at http://emboss.sourceforge.net/what/

MacVector, Inc., PMB 150, PO Box 582, 1939 High House Rd., Cary, NC 27519 and PO Box 582, Cambridge, U.K. CB1 0FH (info (at) macvector.com) sells MacVector version 10.0.2, a sequence analysis program for Mac OS and Mac OS X systems. The features that are relevant for this listing are its ability to do alignment and produce a guide tree using ClustalW, and either UPGMA or Neighbor-Joining distance matrix methods. It has many other features including sequence search, gene finding, motif searching, protein secondary structure and hydrophobicity prediction, and prediction of restriction digests and primer sites. Version 7.2 onwards can run natively on Mac OS X systems. It can be ordered through its web page at http://www.macvector.com. Its price for academic use was formerly $2,500, and for commercial use $5,000. Currently they do not give prices on their web page, but they have said to me that the above is slightly more expensive than what they charge now.

Soll Technologies, Inc., (sales (at) solltechnologies.com) 321 Lexington Ave., Iowa City, Iowa 52246, USA distributes DENDRON, a computer-assisted system for Windows for analyzing DNA fingerprinting gels. It reads and compares gel images. One feature is an average-linkage clustering algorithm that can produce trees from the gel images. For information and pricing, contact Soll Technologies. The DENDRON web page is at http://www.solltechnologies.com/products.html.

Philipp Schlüter of the Institut für Systematische Botanik of the University of Zürich, Switzerland (philipp.schlueter (at) systbot.uzh.ch) distributes FAMD (Fingerprint Analysis with Missing Data), version 1.108, a program for the analysis of dominant fingerprint data (AFLP, RAPD, etc.). It has a graphical user interface and functions for calculating distances among individuals (Jaccard, Dice, SMC, Nei and Li, Euclidean) or populations (CSE chord distance, pairwise PhiST), as well as tree building (UPGMA, NJ), consensus tree, and principal coordinate analysis routines including a flexible 3D viewer. FAMD handles missing data and allows the user to investigate the impact of missing data on the analyses. It contains routines for data matrix management (e.g. data filtering, easy management of groups/populations) and export filters to other programs' file formats (e.g., Nexus, Arlequin, Hindex, Hickory, GenePop/BAPS, NTSYSpc, Structure, Dfdist). It can calculate Shannon's index, estimate its variance by bootstrapping, perform AMOVA and Bayesian estimation of null allele frequencies. The methods are described in the paper: Schlüter, P. M. and S. A. Harris. 2006. Analysis of multilocus fingerprinting data sets containing missing data. Molecular Ecology Notes 6: 569-572. It is available as Windows executables. It can be downloaded from its web site at http://www.famd.me.uk/famd.html

James McInerney of the Department of Biology of the National University of Ireland, Maynooth, County Kildare, Ireland (james.o.mcinerney (at) may.ie) has written GCUA (General Codon Usage Analysis). It does codon usage and amino acid usage statistics, and also performs correspondence analysis/principle components analysis on both codon usage and amino acid usage statistics. Its relevance to the present list is that it also produces a distance matrix, based on Relative Synonymous Codon Usage (RSCU) statistics, whose format is PHYLIP/PAUP*4.0 -compatible. Although McInerney cautions that this matrix should not be used for phylogenetic inference, I wonder whether this distance does not have some phylogenetic information. The program is described in the paper: McInerney, J. O. 1998. GCUA (General Codon Usage Analysis). Bioinformatics 14 (4): 372-373. It is available as Mac OS X, Mac OS, Windows, IBM AIX, Digital Unix, and Linux binaries. The code isn't available, he says "because it is so embarassingly poor". It is available at his software downloads site at http://bioinf.nuim.ie/downloads.html. Earlier binaries, version 1.1 for Digital Unix, SunOS, Mac OS and Irix and version 1.2 for Linux, Digital Unix, Mac OS and SunOS can be retrieved via anonymous ftp from ftp.nhm.ac.uk in directory pub/gcua

[Swaap icon] David T. Pride (dpride (at) partners.org), formerly of Vanderbilt University (currently an internal medicine specialist in Berkeley, California), has written Swaap version 1.02. Swaap performs sliding window analyses on nucleotide sequences, computing a large variety of statistics on the sequences. The relevant feature for this listing is the ability to compute four different distance measures between sequences, either on full sequences or on sliding windows. Swaap is distributed as a Windows executable from the Swaap and Swaap PH web site at http://www.bacteriamuseum.org/SWAAP/SwaapPage.htm#Swaap

[Swaap PH icon] David T. Pride (dpride (at) partners.org), formerly of Vanderbilt University (currently an internal medicine specialist in Berkeley, California), has written Swaap PH version 1.02. Swaap PH computes many different kinds of statistics on nucleotide frequencies and oligonucleotide frequencies in sliding windows along nucleotide sequences. It can compute distances based on these frequencies. Swaap PH is a a Windows executable available from the Swaap and Swaap PH web site at http://www.bacteriamuseum.org/SWAAP/SwaapPage.htm#Swaap

Mathieu Blanchette of the McGill University Centre for Bioinformatics (blanchem (at) mcb.mcgill.ca) and David Sankoff of the Department of Mathematics and Statistics of the University of Ottawa, Canada have produced DERANGE2, a program to reconstruct the history of two gene maps using weighted inversions, transpositions and inverted transpositions. It can thus construct a set of distances based on the gene orders (not the sequences of the genes themselves). It is available as a standard C source code and can readily be compiled on Unix systems. It is available by anonymous ftp from ftp.ebi.ac.uk in directory pub/software/unix.

Laurent Excoffier of the Computational and Population Genetics Lab of the Institute of Zoology, University of Bern, Switzerland (laurent.excoffier (at) zoo.unibe.ch) has produced MINSPNET, a program that produces a minimum spanning tree and network from a distance matrix. It is available as a Windows executable. It can be obtained from a web page which lists software from that lab at http://cmpg.unibe.ch/software.htm.

[POPGENE icon here] Francis Yeh (francis.yeh (at) ualberta.ca) of the Department of Renewable Resources at the University of Alberta, Canada, has released POPGENE version 1.32, a free program for the analysis of genetic variation among and within populations using co-dominant and dominant markers. The feature that is relevant to the present list is that it can compute a number of genetic distances for gene frequencies. It is distributed as a Windows executable from its home page at http://www.ualberta.ca/~fyeh/index.htm.

F. James Rohlf has written NTSYSpc (Numerical Taxonomy System, Version 2.2), a clustering program that includes calculation of various kinds of distance measures, as well as Hierarchical clustering methods such as UPGMA as well as Neighbor-Joining and consensus trees. It can also do a variety of other things including ordination, scatter diagrams, and elliptic Fourier transforms (for shape analysis). NTSYSpc 2.1 is a Windows95 executable which will also run on Windows NT. It is available for $350 ($250 for educational and government institutions). 10-user site licensese are also available. It is distrubuted by Exeter Software (the biological software company, not the warehouse-inventory-software house of the same name). Their e-mail address is sales (at) exetersoftware.com. Their toll-free telephone number is 800-842-5892, their not-so-free phone number is +1-631-689-7838, and their fax number is +1-631-689-0103. Their mailing address is 47 Route 25A, Suite 2, Setauket, NY 11733-2870 USA . Further information is available on their Web page at http://www.exetersoftware.com/cat/ntsyspc/ntsyspc.html.

Warren Kovach of Kovach Computing Services, Anglesey, Wales (info (at) kovcomp.co.uk) has produced MVSP, a comprehensive multivariate statistical package for the PC platform. It can do many kinds of analyses (principal components, clustering, etc.) but the features relevant to this listing are clustering with a variety of methods and a variety of distance measures, including Li and Nei's restriction sites distance. MVSP may be ordered from Kovach Software through its web site at http://www.kovcomp.com/mvsp/. MVSP 3.1 for Windows costs UK £85 or US$ 150 for an academic license. A version on CD with a printed manual is £20 ($35) more. Commercial licenses are £115 ($185). Version 2.2 for DOS costs UK £65 or US$ 100. Free evaluation versions which works for a limited period can be downloaded from the Kovach Computing download web page at http://www.kovcomp.co.uk/downl2.html#mvsp. An evaluation version of version 2.2 for DOS is also available for downloading by ftp from garbo.uwasa.fi in directory pc/stat/. MVSP is also distributed by Exeter Software at its web site at http://www.ExeterSoftware.com/cat/kovach/mvsp.html Version 3.1 costs $185 for an academic license, $265 for a commercial license. There are discounts for multi-user licenses. Other vendors include Rockware and GeoMem.

János Podani of the Department of Plant Taxonomy and Ecology, Eötvös Loránd University, Budapest, Hungary (podani (at) ludens.elte.hu) has developed SYN-TAX 2000, a general package for clustering. It can calculate a wide variety of distance coefficients from numerical data, and can perform hierarchical clustering, nonhierarchical clustering, and ordination. This includes, in addition to many clustering methods, minimum spanning trees and additive trees by Neighbor-Joining. SYN-TAX 2000 is available as commercial software from Exeter Software at its web site there at http://www.ExeterSoftware.com/cat/syntax/syntax.html. It costs $350 for an educational license, $450 for a commercial license. Podani also maintains his own SYN-TAX web site at http://ramet.elte.hu/~podani/SYN2000.html where there are descriptions, screen shots, some free upgrades of certain program components, and also an older DOS executable version, 5.1, and a Macintosh version, SYN-TAX 5.02. There is a demo version available for the DOS version, and both the DOS and Mac versions are sold, each for $150 (for educational use $200), and both together for $300. Over the years various versions of SYN-TAX have been described by papers. The most recent description in a journal is: Podani, J. 1993. SYN-TAX 5.0: Computer programs for multivariate data analysis in ecology and systematics. Abstracta Botanica 17: 289-302.

John Archer and David Robertson of the Faculty of Life Science of the University of Manchester, Manchester, UK (john.archer (at) postgrad.manchester.ac.uk) has produced CTree version 1.02, which is a tree viewing software with an emphasis of quantifying clusters present on tree topologies. CTree has been designed for the quantification of clusters within viral phylogenetic tree topologies. (it is not to be confused with the tree alignment program Ctree). Clusters are stored as individual data structures from which statistical data, such as the Subtype Diversity Ratio (SDR), Subtype Diversity Variance (SDV) and pairwise distances can be extracted. Clusters can be selected manually or via a novel heuristic algorithm. Random trees can also be generated and used to generate control distributions of the various statistic data generated. Tree viewing features are also included along with output to PDF format. It is described in the paper: Archer, J., and D. L. Robertson. 2007. CTree: comparison of clusters between phylogenetic trees made easy. Bioinformatics 23(21): 2952-2953. It is available as Java executables. It can be downloaded from its web site at http://www.manchester.ac.uk/bioinformatics/ctree

[PC-ORD icon] B. McCune and M. J. Mefford of MjM Software, (mjm (at) centurytel.net) have released PC-ORD (PC Ordination), version 5.10, a package for clustering and ordination. It can do a variety of kinds of clustering and ordination methods, including nonmetric multidimensial scaling, principal coordinates analysis, and indicator species analysis. For the purposes of this listing what is relevant is that it can make dendrograms by clustering. This includes two-way clustering analysis that clusters individuals as well as variables. These work directly from a spreadsheet of data. I do not know what measures of distance it uses for this. The package is primarily intended for multivariate analysis of ecological data. It is available as Windows executables. It can be purchased and downloaded from its web site at http://home.centurytel.net/~mjm/pcordwin.htm. It is available at a price of $299 for a regular user, or $199 for a student license. A license for each additional simultaneous user is $100 (or $50).

Simon Goodman, then of the Institute of Cell, Animal, and Population Biology of the University of Edinburgh produced RSTCALC, version 2.2. It is primarily intended to perform analyses of population structure, genetic differentiation and gene flow using microsatellite data. IT calculates estimates the Rst measure of differentiation among a number of populations, but in addition you can also use RSTCALC to obtain estimates of the delta-mu^2 distance measure. Its calculations are described in a paper: Goodman, S. J. 1997. Rst Calc: a collection of computer programs for calculating estimates of genetic differentition from microsatellite data and a determining their significance. Molecular Ecology 6: 881-885. The program runs on Windows and is available from its web site http://www.biology.ed.ac.uk/research/institutes/evolution/software/rst/rst.html as a Windows executable.

Daniel Montagnon (Daniel.Montagnon (at) wanadoo.fr) of the Institut d'Embryologie, Faculté de Médecine, Strasbourg, France has written YCDMA (Y Chromosome Data MAnagement), version 1.2. This is a data management program for microsatellite data. It can do a wide variety of management tasks, maintaining and manipulating databases of genotypes, calculating gene frequencies, and converting file formats. For the purposes of this listing, its relevant feature is the calculation of a variety of gene frequency genetic distances between populations, and a squared copy number microsatellite genetic distance. YCDMA is written in Microsoft Visual Basic. It is available as a Windows executable from its web site at http://perso.wanadoo.fr/daniel.montagnon/YCDMAAng.htm.

Giorgio Bertorelle, of the Population Genetics and Genetic Epidemiology Group of the University of Ferrara, Italy (ggb (at) unife.it) has released DIVAGE and DIVAGE_C. These programs estimate the time of divergence of two populations based on the frequencies of rare alleles in the two populations, where it is assumed that these have each originated in a single mutational event before the divergence of the populations. The two programs make different assumptions about what is conditioned on what. The methods are described in a paper: Bertorelle, G. and B. Rannala. 1998. Using rare mutations to estimate population divergence times: a maximum likelihood approach. Proceedings of the National Academy of Sciences, USA 95: 15452-15457. The programs are DOS executables that will run in a Command Tool window on Windows systems. They can be downloaded from Bertorelle's software web site at http://web.unife.it/progetti/genetica/Giorgio/giorgio_soft.html

Stephane Guindon and Olivier Gascuel of the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier) of the Université de Montpellier II, Montpellier, France (guindon (at) lirmm.fr and gascuel @ lirmm.fr) have released GAME (GAMma Estimation), version 1.0, a program to estimate the gamma parameter and use it to calculate distances and trees. GAME computes the optimal value of the gamma shape parameter (α). It uses a fast distance method based on BIONJ or FastME, which allows very large data sets (up to 1000 taxa) to be dealt with using a standard PC. It can use DNA or protein models and can analyze multiple data sets, such as result from bootstrapping. The value of α can either be inferred for all data sets separately, or just from the first data set. It is described in the paper: Guindon, S. and Gascuel, O. 2002. Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Molecular Biology and Evolution 19: 534-543. It is available as C source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from its web site at http://www.lirmm.fr/~guindon/gamma.html It is also available as a web server here.

Gaston Gonnet and Chantal Korostensky of the Computational Biochemistry Research Group at ETH in Zürich, Switzerland, have made available Darwin, Data Analysis and Retrieval With Indexed Nucleotide/peptide sequences, version 2.1. It is an environment which enables the user to carry out a variety of kinds of analysis with sequences, including phylogeny methods These seem to include distance matrix, split decompositon, and a form of likelihood method. Darwin is available as executables for Solaris, Intel-compatible Linux, Irix, and HP/Compaq/Digital Alpha machines. These are available free if the user registers by filling out a form at the download page at the Darwin web page. The executables can then be transferred to the user by ftp or by e-mail of encoded files. It is described in the paper: Gonnet, G. H., M. T. Hallett, C. Korostensky, and L. Bernardin. 2000. Darwin v. 2.0: an interpreted computer language for the biosciences. Bioinformatics 16: 101-103. Details and distribution policies are explained further at Darwin's web page at http://cbrg.inf.ethz.ch/darwin. Darwin is also made available as a server.

[T-REX icon] Vladimir Makarenkov (makarenkov.vladimir (at) uqam.ca) of the Departement d'Informatique of the Université du Québec à Montréal and the Département de Sciences Biologiques of the Université de Montréal, and Philippe Casgrain casgrain (at) magellan.umontreal.ca) of the Département de Sciences Biologiques of the Université de Montréal have released T-REX (Tree and Reticulogram rEconstruXion), version 4.0a1. This program performs four methods of fitting an additive distance (distance in a nonclocklike tree) to a given dissimilarity. The methods available include Sattath and Tversky's ADDTREE method, Nei and Saitou's Neighbor-Joining method, Gascuel's UNJ Unweighted Neighbor-Joining method, his BIONJ method, the Circular order reconstruction method of Makarenkov and Leclerc (1997), and Yushmanov (1984), and the MW weighted least-squares method by Makarenkov (1997) and Makarenkov and Leclerc (1998). A number of methods for fitting trees to distance matrices that have missing values are also available. Nucleotide sequence distance can be computed from sequences using many of the widely-used distances. The program can also carry out bootstrap and jackknife resampling to assess strength of support for features of the trees. It also allows construction and plotting of "reticulograms" that show departures from treelike structure, and interactive manipulation of the tree and reticulogram diagrams. It is described in the paper: Makarenkov, V. 2001. T-Rex: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17: 664-668. Executables for Windows (the 4.0a1 version) and for Macintosh (the version 1.2a4 executable for PowerMacs) and an executable for a 32-bit DOS version are available at The T-REX web site at http://www.labunix.uqam.ca/~makarenv/trex.html. C++ source code is also available there. A web server for T-REX with more tree construction and manipulation methods is also available.

Philippe Casgrain (casgrain (at) magellan.umontreal.ca) and Pierre Legendre (Pierre.Legendre (at) umontreal.ca), of the Département des Sciences Biologiques at the Université de Montréal have produced Permute! version 3.4 alpha 9, a program to do permutation tests of regression of variables. It can permute distance matrices in a number of ways, one of which is to do so according to an ultrametric (clocklike) tree provided by the user. The ultrametric permutation method is described in these papers:

Lapointe, F.-J., and P. Legendre. 1991. The generation of random ultrametric matrices representing dendrograms. Journal of Classification 8: 177-200.
Lapointe, F.-J. and P. Legendre. 1992. A statistical framework to test the consensus among additive trees (cladograms). Systematic Biology 41: 158-171.

The program is an Apple Mac OS binary that can be run on System 7 and later. It is distributed from its web site at http://www.bio.umontreal.ca/casgrain/en/labo/permute/index.html

Jérôme Goudet, of the Department of Ecology and Evolution of the University of Lausanne, Switzerland (jerome.goudet (at) unil.ch) has written FSTAT, version 2.9.3.2, a program to estimate and test gene diversity statistics from codominant markers. For our purposes, the important feature is its ability to calculate the Nei and Cockerham/Weir families of distance measures. It can convert data in its own format to and from the format of Genepop. Version 2.9.2.3 is a Windows executable; an earlier version, 1.2, which is a DOS executable is also available. Both can be downloaded from its web site at http://www2.unil.ch/popgen/softwares/fstat.htm.

Michel Raymond and François Rousset of the Equipe Génétique et Environnement of the Institut des Sciences de l'Evolution at the University of Montellier II, France (Raymond (at) isem.univ-montp2.fr and Rousset (at) isem.univ-montp2.fr). have written distributed Genepop version 4.0, a program to carry out a variety of population genetics tests. It can test assumptions of Hardy-Weinberg and linkage equilibrium, run log-likelihood G-based test of differentiation between populations, use Slatkin's rare allele method to estimate number of migrants per generation, and calculate allele frequencies. For our purposes the relevant feature is its ability to calculates Fst and Rst measures of population differentiation, which are genetic distances. It is described in a paper: Raymond, M. and F. Rousset. 1995. GENEPOP (version 1.2) population genetic software for exact tests and ecumenicism. Journal of Heredity 86: 248-249. Genepop is a DOS executable that can run under Windows in a Command Tool window. It can be downloaded its web page at http://kimura.univ-montp2.fr/~rousset/Genepop.htm. An older version, 3.4, can be downloaded by ftp from the University of Montpellier at ftp://ftp.cefe.cnrs.fr/PC/MSDOS/GENEPOP/. A web server for Genepop 3.4 is also available in Australia at the the John Curtin University of Technology.

[Arlequin icon] Laurent Excoffier of the Computational and Population Genetics Lab of the Institute of Zoology, University of Bern, Switzerland (laurent.excoffier (at) zoo.unibe.ch), Stephan Schneider, and David Roessli have released Arlequin version 3.5.1, a program for population genetics analysis. It can perform many kinds of population genetic tasks including estimation of gene frequencies, testing of linkage disequilibrium, and analysis of diversity between populations. For the purposes of this list, the relevant feature is its ability to compute a variety of genetic distance measures including of Jukes and Cantor, the Kimura 2-parameter distance, and the Tamura-Nei distance, each of these with or without correction for gamma-distributed rates of evolution. It can also compute a Minimum Spanning Tree network. It is available as binaries for Windows, for either 32 or 64-bit processors. A special version to compute some summary statistics is also included. An archive including the binaries and a PDF documentation file are available at its web site at http://cmpg.unibe.ch/software/arlequin3/.

Naoko Takezaki of the Life Science Research Center of Kagawa University, Japan (takezaki (at) med.kagawa-u.ac.jp), Masastoshi Nei of the Institute of Molecular and Evolutionary Genetics of the Department of Biology. Pennsylvania State University, University Park, Pennsylvania, and Koichiro Tamura of Tokyo Metropolitan University, Tokyo, Japan have released POPTREE2, which computes various genetic distance measures and constructs trees of populations or closely related species from gene frequency data by using the Neighbor-Joining method and UPGMA. POPTREE2 can compute Nei's genetic distance and his Da genetic distance, as well as Latter's Fst* distance and the (Δμ)² and Dsw measures of microsatellite genetic distance. It can also perform bootstrapping, and compute heterozygosity and Gst measures of the extent of genetic variation in a population and genetic differentiation among subdivided population. The program uses a Windows graphical user interface, and trees can be displayed in a publishable for and changed by the user. POPTREE2 is described in a paper: Takezaki, N., M. Nei, and K. Tamura. 2009. POPTREE2: Software for constructing population trees from allele frequency data and computing other population statistics with Windows interface. Molecular Biology and Evolution 27: 747-752. It is available from its web site at http://www.med.kagawa-u.ac.jp/~genomelb/takezaki/poptree2/index.html. A source code Unix version (POPTREE version 1) and an executable DOS version (which is called poptrfdos) are also available there. It is also available from the IUBIO archive at http://iubio.bio.indiana.edu/soft/molbio/evolve. POPTREE was also formerly called njbafd, and under that name its earlier version is also available at the same IUBIO site.

Olivier Hardy and Xavier Vekemans of the Service d'Eco-Ethologie Evolutive at the the Université Libre de Bruxelles, Brussels, Belgium (ohardy (at) ulb.ac.be) distribute SPAGeDi (Spatial Pattern Analysis of Genetic Diversity), version 1.3, a program to compute a variety of different genetic diversity and genetic distance statistics. It can compute various statistics describing relatedness or differentiation between individuals or populations by pairwise comparisons, and analyze how these values are related to geographical distances in a way similar to a spatial autocorrelation analysis or by linear regression. The statistics computed include F_st, R_st, D_s (Nei's standard genetic distance), and the microsatellite distance (delta mu)² for analyses at the population level and, for analyses at the individual level, pairwise kinship, relatedness and fraternity coefficients (with different estimators for each) as well as Rousset's distance between individuals and a kinship analogue based on allele size. It can also do bootstraps and permutation analyses on the distances and geographic locations. It is available as Windows executables. It is described in the paper: Hardy, O. J. and X. Vekemans 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes 2: 618-620. It can be downloaded from its web site at http://ebe.ulb.ac.be/ebe/Software.html

[DnaSP icon] Julio Rozas , J. C. Sánchez-DelBarrio, X. Messeguer and Ricardo Rosas of the Departament de Genètica, Universitat de Barcelona, Spain (jrozas (at) ub.edu) have released DnaSP version 5.10.00, a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. It can also carry out several tests of neutrality: Additionally, it can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form. For the purposes of this web site, the relevant features are the calculation of measures of population divergence, which include the Jukes-Cantor method which can be used as a distance in phylogeny reconstruction. DnaSP is described in the papers:

Rozas, J. and R. Rozas. 1995. DnaSP, DNA sequence polymorphism: an interactive program for estimating Population Genetics parameters from DNA sequence data. Computer Applications in the Biosciences (CABIOS) 11: 621-625.

Rozas, J. and R. Rozas. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Computer Applications in the Biosciences (CABIOS) 13: 307-311.

Rozas, J. and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174-175.

It is distributed as a Windows executable from its web site at http://www.ub.es/dnasp/.

Jianzhi George Zhang, now of the Laboratory of Genomic and Molecular Evolution in the Department of Ecology and Evolutionary Biology of the University of Michigan, Ann Arbor, Michigan (jianzhi (at) umich.edu) wrote Bn-Bs, a program to estimate branch lengths in terms of synonymous and nonsynonymous substitutions per site, while the tree topology is given. The program uses the modified Nei-Gojobori method to estimate pairwise synonymous and nonsynonymous distances among present-sequences and then estimates branch lengths and their variances by using the ordinary least-squares method. The method is described in the paper: Zhang J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proceedings of the Natonal Academy of Sciences, USA 95: 3708-3713. It is available as C source code and as DOS executables from the software web site of Masatoshi Nei's lab in which the work was done. A zip archive of the files can be downloaded from the link there. A documentation file is can also be read there.

Jianzhi George Zhang, now of the Laboratory of Genomic and Molecular Evolution in the Department of Ecology and Evolutionary Biology of the University of Michigan, Ann Arbor, Michigan (jianzhi (at) umich.edu) released HON-new, a program to compute the amounts of conservative and radical amino acid substitution between pairs of DNA sequences of coding region exons. The program uses a classification of amino acids into categories. Three types of amino acid classifications (by charge, by polarity and one of Miyata and Yasunaga) are provided. One can also define conservative and radical amino acid oneself. The method is modified from the original method of Hughes, Ota, and Nei (1990) by taking into account transition bias. It is described in the paper: Zhang J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. Journal of Molecular Evolution 50: 56-68. It is available in C source code and as a Windows executable at the Nei laboratory software web site at http://homes.bio.psu.edu/people/faculty/nei/software.htm

Kevin Thornton of the Department of Ecology and Evolutionary Biology of the University of California, Irvine, California (krthornton (at) uci.edu) has written analysis version 0.7.3, a package to do population-genetic analyses of samples of DNA sequences. This is a package of programs to compute various tests and statistics on DNA sequence data within populations. For the purposes of this listing, the relevant feature is its ability to compute the Kimura K2P 2-parameter distance between sequences. It requires Thornton's libsequence C++ class library (available at the same site) and the GNU Scientific Library. The libsequence library is described in the paper: Thornton, K. 2003. libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19(17): 2325-2327. It is available as C++ source code. It can be downloaded from its web site at http://molpopgen.org/software/lseqsoftware.html

Daniel Montagnon (Daniel.Montagnon (at) wanadoo.fr) of the Institut d'Embryologie, Faculté de Médecine, Strasbourg, France has written NSA (Nucleotide Sequences Analyzer), version 3.3. It is a general program for reading in sequences and writing them out in a variety of data formats, with the ability to select particular sets of sites and sequences. For our purposes, the relevant feature is the ability to calculate a number of different nucleotide sequence distances, as well as some simple protein sequence distances. These include the Jukes-Cantor, Kimura, and Tamura-Nei distances, as well as a simple protein distance based on the fraction of similar amino acids. These can also have a correction for a gamma distribution of rates across sites. The program is written in Visual Basic, and is available as a Windows executable from its web site at http://perso.wanadoo.fr/daniel.montagnon/NSAAng.htm

John Brzustowski, of the Department of Biological Sciences of the University of Alberta, Canada (jbrzusto (at) ualberta.ca), wrote qclust, a program to carry out a number of clustering methods including Neighbor-Joining. The neighbor-joining method has been improved over our own Neighbor program, so as to be able to handle large numbers of taxa much more quickly. The program is available along with another program, calcdist which calculates distances from 0/1 data. The programs are available as C source and as DOS executables from its web page at http://www.biology.ualberta.ca/jbrzusto/dosclust.html. A more interactive version of the program is also available as Java from a web page at http://www2.biology.ualberta.ca/jbrzusto/cluster.php. (Brzustowski has declared that both of these programs are unsupported software, and he will not answer questions about them).

Keith Jolley, of the Epidemiology Group of the Department of Zoology, University of Oxford, Oxford, U.K. (keith.jolley (at) medawar.ox.ac.uk) has released S.T.A.R.T.2 (Sequence Type Analysis and Recombinational Tests), version 2. This is a set of tools for sumamrizing data, assigning lineages, and testing for recombination and selection. These are used with Multilocus Sequence Typing (MLST) data, which starts with DNA sequences at multiple loci, assigns alleles to each sample at a number of loci and clusters strains on the basis of the presence and absence of alleles. S.T.A.R.T. can calculate distances between strains based on these presences and absences, and cluster strains by the UPGMA method. It can also assign strains to lineages, carry out several different tests for recombination among strains, and do a pairwise dN/dS testing for selection. S.T.A.R.T. is described in a paper: Jolley K. A., E. J. Feil, M. S. Chan, and M. C. Maiden. 2001. Sequence type analysis and recombinational tests (START). Bioinformatics. 17: 1230-1231. It is available as a Windows executable from its web site at http://pubmlst.org/software/analysis/start2/. A previous version, S.T.A.R.T., is available at another page at the same web site. .

Joachim Friedrich, Thomas Dandekar, Matthias Wolf, and Tobias Müller, of the Department of Bioinformatics, Biocenter, University of Würzburg, Germany (ProfDist (at) biozentrum.uni-wuerzburg.de> have written ProfDist, version 0.9.8, a program for constructing large trees from profile distances of nucleotide sequences. Given a number of clades and DNA or RNA sequences, it constructs a profile for each clade, computes profile distances among these, and uses Profile Neighbor Joining (PNJ) to compute a tree. ProfDist is described in the paper: Müller, T., S. Rahmann, T. Dandekar, and M. Wolf. 2004. Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta). BMC Evolutionary Biology 4: 20. ProfDist is available as a Windows executable, or as source code for Windows, Linux, or Mac OS X. It is distributed from its web site at http://profdist.bioapps.biozentrum.uni-wuerzburg.de/.

Olivier Langella (Olivier.Langella (at) pge.cnrs-gif.fr) of the Laboratoire PGE, CNRS UPR9034, Gif sur Yvette, France, distributes Populations, version 1.2.30. It can calculate a wide variety of distances from multiple-allele diploid or haploid genotypes and from microsatellite data, and can also infer phylogenies by distance methods including Neighbor-Joining and UPGMA. It can bootstrap the data across loci and/or across individuals when constructing phylogenies. The trees can be trees of populations or trees of individuals. Populations is available as a free download from its web site at http://bioinformatics.org/~tryphon/populations/, as source code, as executables for Windows.

[GenoDive icon] Patrick Meirmans of the Department of Ecology and Evolution of the University of Lausanne, Switzerland (patrick.meirmans (at) unil.ch) has written GenoDive (Genotypic Diversity), version 2b9, a program for population genetic analyses. It can do Analysis Of Molecular VAriance, estimation of standardised coefficients of population differentiation, k-means clustering of populations using a simulated annealing approach, assigning genotypic identity (clones) to individuals, testing for clonal reproduction, testing Hardy-Weinberg equilibrium, calculation of the hybrid index for individuals, and different types of Mantel tests. The distances that it can compute are the features relevant to this listing. It can handle genetic data as well as distance matrices and ecological data, enabling you to combine data from several different sources into a single analysis. It is described in the paper: Meirmans, P. G. and P.H. Van Tienderen. 2004. GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Molecular Ecology Notes 4: 792-794. It is available as Mac OS X universal executables. It can be downloaded from its web site at http://www.bentleydrummer.nl/software

Allen Rodrigo, Alexei Drummond, and Matthew Goode of the Computational and Evolutionary Biology Laboratory, School of Biological Sciences, University of Auckland, New Zealand (a.rodrigo (at) auckland.ac.nz and m.goode (at) auckland.ac.nz) have released Pebble, version 1.0, (Phylogenetics, Evolutionary Biology, and Bioinformatics in a moduLar Environment) This is a graphical user interface around a functional programming language for evolutionary inferences. The system is written in Java using the PAL project classes as its components. This alpha release provides the basic user interface and some component packages. The following analyses and tools are available in vCEBL 0.3a:

Construction of serial sample phylogenies using sUPGMA or sWPGMA with sampling times known exactly or ordinally.
Construction of Neighbor-Joining, UPGMA and WPGMA phylogenies.
Estimation of pairwise distance matrices using user-specified rate matrices (but not yet allowing variation of rates between sites).
Estimation of population parameters including substitution/mutation rates using pairwise distances with or without parametric bootstrap confidence intervals.
Maximum-likelihood branch-length optimization of user-specified tree, including serial sampled clocklike trees.
ML estimation of divergence between serial samples assuming constant or varying mutation rates.
Simulation of genealogies and sequences under a constant-sized population model with or without serial sampling.

This package supersedes this laboratory's separate release of sUPGMA, which has therefore been withdrawn. Self-installing versions of PEBBLE for Macintosh or Windows can be obtained from: its web page at http://www.cebl.auckland.ac.nz/software2.php. It requires Java VM 1.1.1 or higher. It can also be obtained there as an applet for your browser, with some features lacking.

Le Sy Vinh (Vinh (at) cs.uni-duesseldorf.de) of the Bioinformatics Institute of the University of Düsseldorf, Germany and Arndt von Haeseler (arndt.von.haeseler (at) univie.ac.at) of the Centre for Integrative Bioinformatics Vienna (CBIV) have released STC (Shortest Triplet Clustering). This method constructs k-representative sets from triplet of species. The resuling clustering method is O(n²) in speed and can handle thousands of species with good accuracy. It is described in a paper: Vinh, L. S. and A. von Haeseler. 2005. Shortest triplet clustering: reconstructing large phylogenies using representative sets. BMC Bioinformatics 6: 92. The program is available as Linux and as Windows executables at its web site at http://www.cibiv.at/software/stc/

Naoko Takezaki of the Life Science Research Center of Kagawa University, Japan (takezaki (at) med.kagawa-u.ac.jp) has written sendbs. It computes average nucleotide substitutions within and between populations. The method is described in the paper by M. Nei and L. Jin (1989, Molecular Biology and Evolution 6: 290-300). However, sendbs differs from their method by using a bootstrap across sites obtain standard errors of the distances. It also constructs a tree of populations using a neighbor-joining method. It is distributed as source code for Unix, and also as a DOS executable, from by ftp from the Indiana ftp server and through the software page of Masatoshi Nei's lab at Pennsylvania State University at http://www.bio.psu.edu/People/Faculty/Nei/Lab/software.htm.

Applied Maths BVBA of Keistraat 120, 9830 Sint-Martens-Latem, Belgium has released GelCompar II, a comprehensive 1-d gel analysis program. It includes capabilities of clustering data taken from the gels. These are described as "including phylogenetic and dimensioning algorithms". The phylogeny algorithms include a number of distance-matrix clustering methods. It is also said to be able to carry out generalized parsimony. Gelcompar II is a Windows program. It is described at its web site at http://www.applied-maths.com/gc/gc.htm. A detailed brochure is available for downloading there. Gelcompar II is commercial software. For price and ordering information contact them by phone at +32 9 22222 100, fax them at +32 9 2222 102, e-mail them at info (at) applied-maths.com, or use the information request form at their web pages. Their U.S. Sales Office is at Applied Maths Inc., 512 East 11th Street, Suite 207, Austin, Texas 78701. phone +1 512-482-9700, fax +1 512-482-9708 (email is info-us (at) applied-maths.com). (One company vending Gelcompar II sells the whole package for $20,000, though if only the basic module and the cluster analysis module are ordered the price is $5,400).

Andrey Rzhetsky, now of the Department of Human Genetics at the University of Chicago (arzhetsk (at) medicine.bsd.uchicago.edu) Statio, a program for testing stationarity of nucleotide composition or amino acid composition in pairs of sequences. The program reads a pair of sequences and then tests stationarity under a number of possible models of DNA evolution or protein evolution. The method is described in a paper: Rzhetsky, A. and M. Nei. 1994. Tests of applicability of several substitution models for DNA sequence data. Molecular Biology and Evolution 12(1): 131-151. It can be downloaded as a set of MSDOS executables from the Nei lab software web site at https://homes.bio.psu.edu/people/faculty/nei/software.htm.

Probal Chaudhuri of the Theoretical Statistics and Mathematics Division of the Indian Statistical Institute, Calcutta, India (probal (at) isical.ac.in) has released SWORDS (Statistical analysis of WORDS in DNA sequences), a package for analyzeingthe frequencies and distribution of DNA words (subsequences) in multiple species. It has programs for drawing frequency curves and star plots of DNA wonds of given size. It can calculate distance matrices, infer phylogenies by UPGMA clustering. It also estimates the bootstrap values for the phylogenetic tress. It also can analyze position of the sequences and do other descriptive statistics and tests. As it does not need aligned sequences, SWORDS can handle large genome sizes. The methods and the program are described in the papers:

Chaudhuri, P. and S. Das. 2001. Statistical analysis of large DNA sequences using distribution of DNA words. Current Science 80 1161-1166.
Chaudhuri, P. and S. Das. 2002. SWORDS : a statistical tool for analyzing large DNA sequences. Journal of Biosciences 27: 1-6.

A trial Windows version is available from its web site at http://www.isical.ac.in/~probal/main.htm. It is described as available as C++ source code, Windows executables, Linux executables and Mac OS X universal executables.

Mike Sanderson (sanderm (at) email.arizona.edu) of Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona has written r8s, version 1.71, a program to adjust branch lengths and divergence times in a phylogeny to infer divergence times by smoothing rates of evolution to approximate a molecular clock (allow a "relaxed" clock). The program is given the tree with branch lengths as input and smooths this tree and infers divergence times. Sanderson's main approaches to smoothing divergence times are described in his papers:

Sanderson, M. J. 1997. A nonparametric approach to estimating divergence times in the absence of rate constancy. Molecular Biology and Evolution 14: 1218-1231.
Sanderson, M. J. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Molecular Biology and Evolution 19: 101-109.

The program can also take a file of trees estimated from different bootstrap replicates and show profile distributions of divergence times for given nodes in the trees. It is available as an x86 Linux executable or as a Mac OS X 10.2 executable from its web site at http://loco.biosci.arizona.edu/r8s/index.html.

Torsten Eriksson of the Bergius Botanical Garden, Stockholm, Sweden (torsten (at) bergianska.se) has released the r8s bootstrap kit. This is a number of Perl scripts and three general command blocks for PAUP* and r8s which enable bootstrapping analyses with r8s. It is available from his software web site at http://www.bergianska.se/index_forskning_soft.html.

Kai Chan (kaichan (at) stanford.edu) of the Department of Biological Sciences, Stanford University, Stanford, California, and Brian Moore (brian.moore (at) yale.edu) of the Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut have released SymmeTREE version 1.1. It is a program to test whether branches of a tree have diversified at different rates, and along which branches the significant shifts of diversity have occurred. This is evaluated using the species diversity of different parts of the tree. The program is described in a paper: Chan, K. M. A. and B. R. Moore. 2004. SYMMETREE: whole-tree analysis of differential diversification rates. Bioinformatics Advance Access publication November 30, 2004. The program is available as executables for Windows, Mac OS X, and Linux and as source code for other flavors of Unix. It is distributed from its web site at http://www.phylodiversity.net/bmoore/software.html.

Galina Glazko, now of the Department of Biomedical Informatics of the University of Arkansas Medical School, Little Rock, Arkansas (GVGlazko (at) uams.edu) and Masatoshi Nei of the Institute of Molecular Evolutionary Genetics at Pennsylvania State University, University Park, Pennsylvania have released TIMER, which estimates divergence times using a linearized tree approach. It can use DNA or protein sequences at multiple loci. It constructs a phylogeny using the Neighbor-Joining method, and then estimates branch lengths and divergenece times for the individual loci as well as for the full set of loci. It can carry out the Two-Cluster Test for constancy of rate of divergence at an individual node in the tree. The methods are explained in a paper: Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proceedings of the National Academy of Sciences 98: 2497-2502. TIMER is available as a Windows executable at the Nei lab software web site at https://homes.bio.psu.edu/people/faculty/nei/software.htm

To top of this page

To next section of software pages

Notices added in compliance with University of Washington requirements for web sites hosted at the University: Privacy Terms