Phylogeny – Background

Taxonomy and Phylogenetic Trees

Taxonomy is the science of classifying organisms. Biologists use similarities to group organisms into species, genera, families, orders, classes, phyla, kingdoms, and domains. Humans have practiced taxonomy for thousands of years by grouping organisms with similar traits together. Today, modern biologists use phylogeny, the evolutionary history of organisms, to inform taxonomy. In other words, we want to group organisms with shared common ancestors together.

A cartoon showing the taxonomic ranks used in biology. From largest to smallest they are: life, domain, kingdom, phylum, class, order, family, genus, species.
Taxonomic ranks used to classify organisms. Notice that smaller ranks nest within larger ranks. (Figure is in the public domain).

A phylogeny is a hypothesis of the evolutionary history of a group of organisms. We cannot directly observe events that happened in the past. However, we can observe traits that are shared between extant groups and use these to form and test hypotheses of relatedness. Biologists use morphology, fossils, and molecular similarity to classify organisms. Most phylogenetic reconstructions today rely on molecular data.

Phylogenetic trees are illustrations of phylogenies. There are several methods for drawing a phylogenetic tree. The most commonly used method today is cladistics, which tries to represent the best hypothesis of the evolutionary history of a clade, or group of organisms derived from a common ancestor.

Reading Phylogenetic Trees

Phylogenetic trees can take a number of forms that convey the same information. Consider, for instance, a tree depicting the relationship between the three domains of life – Bacteria, Archaea, and Eukarya. Molecular evidence suggests that Archaea and Eukarya are more closely related to each other than either group is to Bacteria. Six of the many ways to represent this information are shown in the figure below. Note that each of these trees is identical in terms of the information it conveys.

This figure shows six different ways to draw a phylogenetic tree with the groups bacteria, archaea, and eukarya. Two trees have bracket style branches and are oriented in different directions. Two trees have diagonal branches are oriented in different directions. Two trees have the three branches radiating out from a centeral point. In each tree, archaea and eukarya are shown diverging from a single point, indicating that they are each others closest relatives.
Six ways to represent the same phylogeny. Note that, while the shapes and orientations are different, each phylogenetic tree conveys the same information – that Archaea and Eukarya are more closely related to each other than to Bacteria, because they share a more recent common ancestor. (Figure by Melissa Hardy is in the public domain).

Trees can have proportional branch lengths, meaning that the length of the branch represents a particular length of time or a certain amount of genetic change. Many trees, however, do not have proportional branch length, and represent relationships without making any claim about time or genetic distance.

Trees can be rooted or unrooted. An unrooted tree shows the relationships between organisms without indicating the common ancestor. A rooted tree shows the common ancestor of all of the taxa shown in the tree. The common ancestor may be theoretical – that is, if the hypothesis shown in the phylogenetic tree is correct, it must have existed, even if it is not represented in the fossil record. In the rooted tree below, the root would be LUCA, the last universal common ancestor.

Tree 1: This tree has a horizontal orientation and is unrooted. The base has two branches, A and B. Branch A splits into two branches, A.1 and A.2. Node A.1 reads Archaea, node A.2 reads Eukarya, and node B reads Bacteria. Tree 2: This tree has a horizontal orientation and is identical to Tree 1 with the exception that it is rooted. The root has two branches, A and B. Branch A splits into two branches, A.1 and A.2. Node A.1 reads Archaea, node A.2 reads Eukarya, and node B reads Bacteria.
Phylogenetic trees may be either unrooted (left) or rooted (right). The root represents the common ancestor of all of the organisms on the tree. (Figure by Melissa Hardy is in the public domain).

Often, but not always, the most distant past is shown at the bottom or left-hand side of the phylogeny. Each node represents a divergence. Usually, if the tree depicts taxa that are different species or a higher order taxonomic group, each node represents a speciation event. In the tree below, node 1 is the earliest speciation event, and node 4 is the most recent.

This tree is in a V shape with a root at the point in the V. Node 1 begins at the point. The node branches up the left side of the V to node A. Continuing up the right side at intervals are nodes 2-4. Node 2 branches to node B. Node 3 branches to node C. Node 4 branches to node D. Finally node 1 branches up the right side of the V to node E.

Note that branches can be rotated around a node and show the same information. For instance, the two trees below show the same phylogeny, even though the branches of taxa C, D, and E have been rotated around nodes 3 and 4.

Tree 1: This tree is in a V shape with a root at the point in the V. Node 1 begins at the point. The node branches up the left side of the V to node A. Continuing up the right side at intervals are nodes 2-4. Node 2 branches to node B. Node 3 branches to node C. Node 4 branches to node D. Finally node 1 branches up the right side of the V to node E. Tree 2: This tree is also in a V shape with a root at the point in the V. Node 1 begins at the point. The node branches up the left side of the V to node A. Continuing up the right side at intervals are nodes 2-3. Node 2 branches to node B. Node 3 branches to node E and splits in the middle to node 4 which branches right to node D. Finally node 1 branches up the right side of the V to node C.
We can rotate branches around any branch point and the information in the tree remains the same. These two trees show the same hypothesis.

Sometimes, it is not possible to resolve evolutionary relationships into dichotomously branching trees. We represent this with a polytomy, or a node with three or more branches, as in this phylogenetic tree showing bacterial genera. Polytomies can also indicate that three or more groups radiated from the same ancestral population, but it is more commonly used to represent uncertainty regarding the best tree topology.

 

This tree starts with a root that splits into two branches. Branch A splits into three branches, A.1, A.2, and A.3. A.1 reads Chlamydia, A.2 reads Treponema, and A.3 reads Leptospira.
This tree of bacteria genera shows a polytomy, indicated by the arrow.

Using Phylogeny to inform Taxonomy

A guiding principle of cladistics is that a valid clade must be monophyletic, meaning that it includes a common ancestor and all of its descendants. The common ancestor might be known, or it may be theoretical. A clade can be at any level of taxonomy – species, genus, family, etc. – and it can consist of living or extinct organisms, or both.

The figure shows a phylogenetic tree with sixteen branch tips. Three groups are shaded. The groups shaded red and blue contain a common ancestor and all of its descendants. The group shaded green contains a common ancestor and some, but not all, of its descendants.
This figure shows a rooted phylogenetic tree with two valid clades shaded in red and blue. The group shaded in green is not a clade because it does not include all of the descendants of the common ancestor. (Figure is in the public domain).

The group shown in green is a paraphyletic group, which includes a common ancestor and some, but not all, of its descendants. Taxonomists today try to avoid naming groups that are paraphyletic. This has resulted in revisions of many taxa that were previously widely accepted. For example, reptiles must include the birds to be a valid clade. Excluding the birds by distinguishing between Class Reptilia and Class Aves results in paraphyly. Polyphyletic groups should also be avoided. A polyphyletic group includes taxa that do not share an immediate common ancestor. Grouping all plants with C4 photosynthesis, for instance, would result in a polyphyletic group. Likewise, grouping mammals and birds together due to the shared trait of homeothermy (“warm-bloodedness”) results in a polyphyly.

Monophyly(Yellow) - Testudines, Lepidosauria, Crocodylia, Aves, Reptilia, Diapsida, and Archosauria. Paraphyly(Teal) - Testudines, Lepidosauria, Crocodylia, Reptilia, Diapsida, and Archosauria. Polyphyly(Red) - Mammalia and Aves This tree is in a V shape with a root at the point in the V. Node Vertebrata begins at the point. The node branches up the left side of the V to node Pisces. Continuing up the right side at intervals are nodes Tertrapoda, Amniota, Reptilia, Diapsida, and finally Archosauria. Node Tetrapods branches to the left parallel to the left side of the V to node Amphibia. Node Amniota branches to the left to node Mammalia. Node Reptilia branches to the left to node Testudines. Node Diapsida branches to the left to node Lepidosauria. Node Archosauria branches to the left to node Crocodylia. Finally node Aves is at the point of the right side of the V.
This figure shows three groupings, only one of which is a valid clade. Grouping mammals and birds together (red) on the basis of “warm-bloodedness” results in a polyphyletic group. Grouping reptiles together but excluding the bird (blue) is a polyphyletic group. Class Reptilia must contain the birds to be considered a valid clade (yellow) or monophyletic group. (Figure is in the public domain).

Reconstructing Phylogenies

Ideally, phylogenies are reconstructed using homologous traits, which are similar between organisms due to shared ancestry. We refer to traits or characteristics as characters when discussing phylogeny. Morphological characters can be homologous. For example, seeds are a trait shared by some plants, as indicated on the phylogenetic tree. Seeds are a homology shared by magnolias and conifers, but not by ferns or mosses.

 

The tree starts with a root that branches into two. Branch A is vascular tissue and Branch B is moss. Branch A splits into two with Branch A1 as seeds and A2 ferns. Branch A1 splits into two with Branch A1.1 as magnolia and A1.2 as conifer.
Seeds are an example of a homologous trait in conifers and magnolias. Both of these groups share the trait because they inherited it from a recent common ancestor.

We can refer to characters as ancestral or derived. An ancestral character, or plesiomorphy, was present in a common ancestor, while a derived character, or apomorphy, differs from the ancestral form. In this case, seeds are a derived character, and lack of seeds is an ancestral character. How do we know if a character is ancestral or derived? Sometimes the fossil record can provide this information. Otherwise, we rely on outgroup analysis. An outgroup is a group of organisms that is relatively closely related to the taxa you are studying.

The tree starts with a root that branches into two. Branch A is the outgroup. Branch B splits into two branches, one labeled B. The other branch splits into two branches, labeled C and D. A is shaded yellow and marked as the outgroup. The group including B and C and D is shaded blue and marked as the ingroup.
The outgroup is closely related to the organisms being studied, but is not part of the group. Comparing traits between the outgroup and the ingroup can help us determine synapomorphies, or shared derived characters. (Figure by Ngilbert202 is used under a Creative Commons Attribution-ShareAlike license).

By comparing characters between the ingroup and outgroup, we can often determine whether a character is ancestral or derived.

Phylogenetic trees can be inferred by grouping organisms with shared derived characters together. Shared derived characters, or synapomorphies, distinguish a group from other taxa. Monophyletic groups are thus defined by their synapomorphies.

 

The tree starts with a root that branches into two. Branch A is vascular tissue and Branch B is moss. Branch A splits into two with Branch A1 as seeds and A2 ferns. Branch A1 splits into two with Branch A1.1 as magnolia and A1.2 as conifer.
Seeds are a synapomorphy of seed plants, which include the conifers and flowering plants such as magnolias. Vascular tissue is a synapomorphy of vascular plants, which include the fern, conifer, and magnolia, but not the moss.

However, not all shared characters are synapomorphies. A character which is similar in two groups but was not inherited from a common ancestor is called a homoplasy. This is often due to convergent evolution. An example of convergent evolution is the camera-like eyes of vertebrates and cephalopods. Despite the similarity in eye structure, camera-like eyes evolved independently in these two groups, which are not closely related. Another example is C4 photosynthesis, which has evolved independently at least 62 times in flowering plants.

Additionally, it is important to note that traits can be lost. In the phylogeny shown above, motile sperm is an ancestral character that is shared by liverworts, mosses, and ferns. It is a symplesiomorphy, a shared ancestral character. Using this trait to define a taxonomic group of ferns, mosses, and liverworts would result in a paraphyletic grouping, which is not acceptable in modern taxonomy.

The tree starts with a root with a label of motile sperm. The root splits into two. Branch A is not titled and Branch B is titled liverwort. Branch A splits into two with an untitled Branch A1 that splits into two and A2 titled moss. Branch A1 splits into two with Branch A1.1 as non-motile sperm and A1.2 titled as fern. A1.1 splits into two branches A1.1.1, titled magnolia, and A1.1.2 titled conifer.
Motile sperm is a character that is ancestral to land plants. Most seed plants, however, have evolved to have non-motile sperm.

How to draw a phylogenetic tree

Let’s examine land plants and a few of their close relatives to demonstrate reconstruction of phylogenies. For this tree, we will use parsimony to reconstruct the best phylogenetic tree. Parsimony simply means that we choose the simplest explanation that fits the evidence. In the context of phylogenetic trees, we draw the tree with the fewest number of evolutionary changes. The most parsimonious tree is not always the correct tree, particularly when analyzing molecular data, but it is a good place to start. Keep in mind that traits can be lost as well as gained. For instance, ratites, such as ostriches, emus, and rheas, are birds that have lost the ability to fly.

The first step in phylogenetic analysis is choosing the taxa to study. The second step is choosing characters that are, to the best of your knowledge, synapomorphies. For this exercise, the taxa and characters are presented in the table. In this table, 1=presence of character; 0=absence. This could also be indicated by +/-, or by a description of the character.

Table of characters of charophytes and relatives[1]

glycolate oxidase flagella on vegetative cells branching plasmodesmata Alternation of generations
Land plants 1 0 1 1 1
Coleochaete 1 0 1 1 0
Spirogyra 1 0 1 0 0
Closterium 1 0 1 0 0
Chlorokybus 1 0 0 0 0
Mesostigma 1 1 0 0 0
Volvox 0 1 0 0 0
  • Begin by drawing a tree that separates the outgroup from all the taxa in the ingroup. Add any shared characters of the ingroup to the tree in the correct position.

A rooted phylogenetic tree. From the root, it diverges into two branches. One branch is labeled volvox at the tip. The other is marked with a character, glycolate oxidase. This branch then splits into two branches. One is labeled mesotigma at the tip. The other is marked with a character, lack of flagella. After the character mark, there is a polytomy with five branches diverging from it. These branches are labeled land plants, coleochaete, spirogyra, closterium, and chlorokybus.

  • Now find the derived character shared by the most number of taxa in the ingroup. In this case, loss of flagella on vegetative cells is the next most ancestral character.

A rooted phylogenetic tree. From the root, there is a branch point with two branches. One branch has volvox at the tip. The other branch is labeled glycolate oxidase and has a polytomy that has six branches originating from it. These branches are labeled land plants, coleocheate, spirohyra, chlorokybus, and mesostigma.

  • Continue this process with each character. If two taxa have the same characters, they should be placed as sister taxa in your tree.

Two rooted phylogenetic trees. The first tree diverges into two branches from the root. One branch is labeled volvox at the tip. The other is marked with a character, glycolate oxidase. This branch then splits into two branches. One is labeled mesotigma at the tip. The other is marked with a character, lack of flagella. After this character mark, there is another branch point. One branch is labeled chlorokybus at the tip. The other branch has a chracter mark, labeled branching. After this character mark, there is a polytomy with four branches diverging from it. These branches are labeled land plants, coleochaete, spirogyra, and closterium.  The second tree diverges into two branches from the root. One branch is labeled volvox at the tip. The other is marked with a character, glycolate oxidase. This branch then splits into two branches. One is labeled mesotigma at the tip. The other is marked with a character, lack of flagella. After this character mark, there is another branch point. One branch is labeled chlorokybus at the tip. The other branch has a chracter mark, labeled branching. After this character mark, there is another branch point. One branch splits again into two branches, which are labeled spirogyra and closterium at the tips. The other branch is marked with a character, alternation of generations. After the character mark, there is another branch point that splits into two branches, labeled land plants and coleochaete at the tips.

Note that this character matrix has no conflicts, and there is only one best tree. In real life, there are often characters that are homoplasies, which can give conflicting information and make tree reconstruction significantly more difficult.

Moreover, for extant species, the preferred way to reconstruct phylogenies is to use molecular data. DNA and protein sequences can be compared between organisms, and the differences between sequences allow evolutionary biologists to determine relatedness with a much higher degree of precision than can be achieved using morphological or behavioral characters. This is because sequences have much more data than morphological characters (every nucleotide or amino acid is a separate character, and with modern genomics we can now compare hundreds of thousands of nucleotides or amino acids between species). Moreover, it is more reliable because morphology and behavior can exhibit convergent evolution, meaning that traits can arise independently in distantly related groups. This is much less of a problem with molecular data.

[1] Adapted from: Cédric Finet, Ruth E. Timme, Charles F. Delwiche, Ferdinand Marlétaz. (2010) Multigene Phylogeny of the Green Lineage Reveals the Origin and Diversification of Land Plants. Current Biology, Volume 20 (24):2217-2222.

definition

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

College Biology II Laboratory Copyright © by Melissa Hardy and William Tanner is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.