How to read a phylogenetic tree

Nowadays even the media seem quite happy to occasionally put up a phylogenetic tree as part of their scientific coverage, and they are proliferating on the internet on websites, research papers and blogs, in addition to books and magazines. However, while it is hardly difficult to get the gist of a tree, there is a certain skill and amount of knowledge that needs to go into pulling out all of the information correctly from a tree. It is easy to make mistakes about what a tree actually tells you so hopefully I can clear up a few misconceptions about tree creation and how trees should be read.

I want to point out before we get seriously started that I am here tying to deal only with how the tree looks, and now how we get there (i.e. cladistics). I do have posts in preparation covering that and of course the variation in methodology and analysis type, use of consensus trees, OTUs, outgroups and more (note: you can ignore all the jargon in that sentence really, it will make sense to at least a couple of readers) can all have a profound impact on the shape of the tree (before we even get started on supertrees). So don’t worry or ask about *how* we get there, this should just be a guide to looking at individual trees or comparing trees. It may sound backwards (and it kind of is) but we’ll do the ‘how’ later, let’s deal with the what’.

First off though, some terminology. I have knocked up this little tree (1) to cover the basics (and for which I apologise for the poor quality, as I have said before I don’t have any image editing software so these were done in PowerPoint – they’ll do even if they aren’t pretty). There are a few different names for such things, being called trees, cladograms, phylogenetic trees and others, and while they do have subtly different meanings (which I won’t go into here) they all look much the same and are treated the same way (in terms of basic interpretation) as the differences lie in their construction rather than how they look or are used afterwards.slide1

The tips of the tree are sometimes called leaves are basically where the actual taxa are. (Another bit of jargon, but very useful, a taxon is basically a biological unit, be it a species, genus, family, even a kingdom). The branches of the tree are the lines that connect the leaves and represent the pathway that traces the proposed evolutionary history of the lineages. The points at which branches separate are called nodes and represent hypothetical ancestors of the descendant clades (more technical language, clades being a group of taxa) of that node (which will be come relevant later, and I also have a whole post in preparation on this). Pairs of taxa or clades are said to be sisters (so A and B are sisters, as are E and F, C is that sister taxon to the clade A-B, and D to the clade A-C and so on). Sister taxa of course share a recent common ancestor at the node that joins them together. Often the base of the tree (typically a single taxon, but sometimes more) is called the outgroup. This is the taxon used to act as a basis for the analysis that produced the tree and is quite important, everything else is part of the ingroup. Typically branches bifurcate (i.e. two branches come off each node) but when relationships are unclear you are left with a polytomy with three or even more branches coming off a single node.

slide2So with those under our belts, how do you read a tree? The idea is that one can trace relationships using the tree to see how things are related to each other, but there are catches. Those lineages arising at the bottom were the first to evolve, but that does not mean those at the top necessarily came last ( as we will see below) merely that the ancestors of that taxon branched off first. The position in the tree is of course a function of what taxa are included and how the tree is arranged. The tree on the right (2) is identical to the one above, but a few branches have been rotated around the nodes. It *looks* different, but it is not, all the relationships are the same – the branch supporting E and F evolves before A-D, and C appears before A and B, but E and F appear to be the most advanced. Try tracing some branches to see how to move from one taxon to another in the two trees and you will see that they are the same. The tree is a little like the famous London Underground map – each line shows you in which order the stations appear, but those actual positions compared to the real world is only approximate small gaps between stations can represent several miles, but the order of the stations are still right and do not change. Later we will deal with trees that do take ‘real distances’ into account, but for now treat them (and indeed most trees) simply as a guide.

slide31Even though A and B still appear last in all of this and can be considered the most derived animals, they are not necessarily that special. Now let’s pretend we have added some extra taxa to our analysis and rerun it, let’s see what we get (3). Now E and F appear to be more derived than A and B since we now have two major clades splitting off early on (A-D and E-F plus J-L), but again their relationships have not actually changed, we have simply got a larger data set.  E and F are not now more derived that A and B, just in an apparently more derived position than A and B because of how the tree appears.

slide4

You can also change apparent positions based on what taxa are included in the analysis. If we are interested in the A-D clade we would not need all the other taxa for our analysis so would not include them all, just a few representatives (4). Now it appears that E and J are sister taxa when before they were not. This is because F, K and L are not in the analysis. Actually the relationships are effectively the same, but with just two taxa of the clade present, they end up a sister taxa since they basically have no choice.

slide51We can do an extreme version of this by removing almost all the taxa (5). Again, the apparent relationship of A to F is merely a function of the fact that of the available taxa in the analysis, they are the closest relatives and so come out together. Nothing has effectively changed in their relationships, merely the way in which they have been presented in the analysis.

This also works if you start replacing taxa. Now let’s take out a few and replace them with close relatives and see what happens (6). Again the tree looks different because of the new taxa and the lack of the familiar ones, but you will see that the fundamental relationships are the same between D, G, A, and E. In fact I have simply swapped the names around, but even if the shape of the tree changes a bit (i.e. there *are* some differences in the relationships between some of these new taxa, say between two different studies) like this (7) the relationships between these four are the same. There is actually little difference in the overall structure of the tree even if a few minor relationships are slightly different.slide61

slide7Another trap to avoid is confusing when things appeared in the fossil record with where they are in the tree, or even where they appear to be (8). So far all the trees have been artificial constructs that make the branches sit in a nice order so it’s easy to read the names and spot the patterns in the tree (and this is the most common type), but we can also make the branch *length* relate to time (or less often a measure of difference such as character support). Now the point at which the branch terminates relates to where the taxon first appears in the fossil record, with the left being the oldest and right the youngest. You can see that the pattern of branch length roughly matches that of branch splitting – in other words the most basal taxa (at the bottom of the tree) are also the oldest taxa (closest to the left). However, since the fossil record is not perfect, we also see exceptions – both G and F appear much later than we might expect and D is a little out. This is nothing to worry about (unless the branches are wildly different and even then there can be a good reason for it) and is quite normal. Note that again F has not moved – it is still the sister taxon to E and they in turn are the sister clade to the clade A-D, but we can now see that it split off from E some time ago and lasted a long time.

This leads to another point, namely that the actual taxa are of course real data points (i.e. fossils or of course living taxa) and the lines between them are simply inferred from the data (the actual reconstruction of hypothetical relationships based on how they appear). But, there are real animals supporting those branches (F did not magically appear in the fossil record but had ancestors) but we have simply not found them (or cannot recognise them). If we put V from tree number 6 into the equation, things don’t look so awkward any more (9) – again it’s at least in part a function of what taxa we include. Now F is simply a late surviving species of a lineage that split off from V a lineage which itself split off from E some time ago.

slide8Finally, the issue over branch lengths and temporal displacement explains why systematists use the words ‘basal’ and ‘derived’ as opposed to ‘primitive’ and ‘advanced’. The former pair refer only to the original branching position of the taxon on the tree, whereas the latter of course imply something about the evolutionary status of the taxon. Sharks are basal vertebrates since they branched off early in the history of the clade, but to call modern sharks primitive is not correct, true they have some features that could be considered primitive (such as the lack of a swim bladder) but can you really consider a lineage that has been around for hundreds of millions of years ‘primitive’ as a whole?

Well that is about if for now. There are probably a couple more things I have missed, but if nothing else that should go a long way to covering the basics and frankly I’m bored of drawing trees in PowerPoint. As I mentioned above, there are posts coming on cladistics and ancestors in palaeontology and a few more besides, but you might have to wait a while. In the meantime you can start looking for trees and comparing them and evaluating them with your new-found tree reading skills. And who says science is dull eh? Can’t be when ‘tree reading’ is available to all.

21 Responses to “How to read a phylogenetic tree”


  1. 1 Martin Brazeau 11/11/2009 at 7:39 pm

    “Sharks are basal vertebrates since they branched off early in the history of the clade”

    What does this mean? This use of the term “basal” has become one of my pet peeves, even though I fully admit to having abused the term this way myself. Sharks “branched off early”? Relative to what? The shark branch diverged from the osteicthyan branch at precisely the same time. Neither sharks nor osteichthyans are “basal”. The only thing

    Chondrichthyans are the sister group of osteichthyans. They are not “basal”.

    “basal taxa (at the bottom of the tree)”

    Consider the clade in the tree delimited by taxa A and F. It has seven terminal taxa. Suppose now, for instance, that taxon H is in fact a composite of terminals, also numbering seven. That is, suppose taxon H is in fact comprised of seven taxa and we re-draw the tree to reflect this. Which terminal is now “basal”?

    In reality, I think this use of the term “basal” actually confuses people about how to read trees, because it contradicts our earlier instruction not to read across the tips, but to read the grouping information demonstrated by the nodes, and that nodes are freely rotatable.

    Our choice of the term “basal” in this instance is really just a re-branding of our mistaken use of “primitive” (as “basal” and “primitive” actually mean the same thing: towards the bottom of the tree”). It tends to be employed for less species-rich branches in a cladogram that form the sister group of a more species-rich one.

    For more on this, please see:
    Krell, F.-T. & Cranston, P.S. 2004. Which side of the tree is more basal? Systematic Entomology 29: 279–281

    • 2 David Hone 11/11/2009 at 8:49 pm

      Hi Martin,

      I’m not sure what to say here. I know where you are coming from but do remember that this was designed as an entree so to speak for non systmaticists and as a result the terms can get a little fudged (despite my best efforts to avoid it).

      I know what you mean about the term ‘basal’, but then I am not sure what we should use instead. Sure, what we often say are ‘basal’ taxa are a result of how the tree is drawn and how we look at it (if we add more things or redraw the tree in an alternate arrangement the position can apparently move) but I’d argue that the term is still meaningful and more importnatly, useful. It means (to me anyway) that in the given context of a given tree this is something that branched off the stem lineage before another taxon.

      I disagree that basal and primitive are the same thing. Sharks are ‘basal’ as the branch off early but they retain some ‘primitive’ characteristics (like the lack of a swin bladder) but also have their own indepdendently ‘derived’ characteristics. The characters are primitive and derived, but the taxa themselves are not.

      • 3 robertsloan2 18/10/2011 at 8:17 pm

        Does this generally mean that “basal” means it’s an old lineage and “more derived” a newer one? That definition would make sense to me anyway.

        I’m not a scientist but an interested layman with a fairly large vocabulary. “Basal” and “Primitive” do seem to mean something different to me in tone. Basal meaning “it’s been around for a long time” and derived meaning “it’s relatively new and out toward the branchy side of the tree.”

      • 4 David Hone 19/10/2011 at 8:10 am

        That’s more or less it, yeah, it’s about relative positions.

  2. 5 Martin Brazeau 11/11/2009 at 9:16 pm

    Hi Dave,

    Thanks for your reply.

    I’m not sure what to say here. I know where you are coming from but do remember that this was designed as an entree so to speak for non systmaticists and as a result the terms can get a little fudged (despite my best efforts to avoid it).

    As a fellow blogger, I appreciate the difficulties in communicating this. However, I think our use of the term ‘basal’ actually undermines our effort to present a correct view of ‘tree-thinking’, because the “base” of a tree is its root. Thus, when we read across the tips and refer to terminal taxa as “basal” we actually contradict ourselves.

    I know what you mean about the term ‘basal’, but then I am not sure what we should use instead.

    We should use the term “sister group of _____” (as explained in the editorial by Krell & Cranston).

    It means (to me anyway) that in the given context of a given tree this is something that branched off the stem lineage before another taxon.

    “Branched off the stem lineage”? So it only refers to fossil taxa? Then how can sharks be “basal”? And what is this business of “branching off before” something else?

    disagree that basal and primitive are the same thing. Sharks are ‘basal’ as the branch off early but they retain some ‘primitive’ characteristics (like the lack of a swin bladder) but also have their own indepdendently ‘derived’ characteristics. The characters are primitive and derived, but the taxa themselves are not.

    But this is predicated on a misuse of the term “basal”. You keep saying that “sharks branch off early”, but you have not explained what you mean by this. Early relative to what? Are humans basal? After all, humans and sharks ‘branched off’ at exactly the same point in time: the crown gnathostome split. What’s the difference? Why aren’t humans basal while sharks are?

    I hope my queries make sense and we can come to an understanding on these points.

    • 6 David Hone 12/11/2009 at 9:03 am

      Hi Martin,

      I think we are getting there though of course I’m sure you (and other readers) will be aware that this is *just* the kind of discussion that needs a pub, a drink and bits of papers on which we can scribble trees as examples to make specific points, especially as we seem to have several interlinked, but also somewhat separate points on the go here! Right, I’ll do my best but this is tangled – ah the joy of phylogenetic discussions.

      “I think our use of the term ‘basal’ actually undermines our effort to present a correct view of ‘tree-thinking’, because the “base” of a tree is its root. Thus, when we read across the tips and refer to terminal taxa as “basal” we actually contradict ourselves.”
      -I’m still not sure this is true. While I agree that the term can vary with the tree, again when referring to a specific tree I don’t see the problem. As long as it is clear then there is no problem (though I concede here that you *do* read it differently and thus it is not necessarily clear, but this is not a problem I have ever encountered before). I’d also add that while we do read the tips to some degree, the tip is not the only thing – the branch is important too and I’m trying to refer to the position of the branch with respect to the rest of the tree.

      “We should use the term “sister group of _____” ”
      -I agree that’s clear an unambiguous though I’ve not read that paper yet, sorry (probably bad while conducting this discussion).

      “And what is this business of “branching off before” something else? ”
      -Well again I think *in context* this is not a problem. If you are talking about mammals on a tree, then I don’t see the confusion of sharks ‘branching off’ before mammals, either phylogentically or temporally on a tree of vertebrates.

      “Are humans basal? After all, humans and sharks ‘branched off’ at exactly the same point in time: the crown gnathostome split. What’s the difference? Why aren’t humans basal while sharks are?”
      – I hope this is clearer now as in the context of humans sharks are indeed basal, but of course they are not compared to all other non-shark vertebrates, then they are indeed the sister-taxa. I think it’s a context thing and I’d argue that that the term is pretty clear and useful. However I would agree that “sister taxon to” is probably less ambiguous.

      I hope that is clearer. In short I’m not sure what I am advocating is problematic, but i do agree that your alternatives are probably *better*. But I’ll need to think about this and a few drinks and some paper would help this!

  3. 7 Tj 01/03/2010 at 7:57 am

    Hey this is an awesome tutorial, thanks so much-it made everything much clearer and easier to understand! I am currently looking at trees for a plant based fructosyltransferase and this will help me understand and describe what I see there better! Thank you.

  4. 8 Tracy 29/07/2010 at 12:28 pm

    So on picture 1, which one was the first to diverge from other lineages?

  5. 9 David Hone 01/08/2010 at 9:45 pm

    Tracy, it’s either G or H. But as they are in a polytomy, it’s not clear which might have come first (or it could, in theory, have happened simulataneously).

  6. 10 prashanti 11/08/2011 at 2:24 am

    you are really good at explaining this. Thank you very much.

  7. 11 robertsloan2 18/10/2011 at 8:20 pm

    Thank you for this excellent article. You’ve helped make these diagrams more understandable to me, especially when I see different ones that share more or less the same data. It was easy to grasp why the diagrams can look radically different and yet still share the same set of data.

    It’s that no one is going to draw the insane diagram that would chart all of the relationships of all the organisms known in a big category like, say, dinosaurs. That wouldn’t fit on the screen and even then different scientists’ ideas on the relationships would come into it with different evidence supporting one pattern over another.

    With the way you did this, it will be a lot easier for me to read these charts in future. I’m looking forward to your article on cladistics.

    • 12 David Hone 19/10/2011 at 8:05 am

      “It’s that no one is going to draw the insane diagram that would chart all of the relationships of all the organisms known in a big category like, say, dinosaurs.”

      You mean like this: https://archosaurmusings.wordpress.com/2008/07/23/dinosaur-diversity-and-a-super-supertree/

      OK it’s not all of them, but it is a very good chunk and there are bigger trees out there. There are issues with doing something this size, but in theory at least you can plus multiple trees together to produce very large ones and there’s no reason why this can’t ultimately include all of life.

  8. 13 Liza Adamat 24/11/2011 at 6:38 am

    Thank you very much for a very informative discussions.

  9. 14 Chandra 12/06/2012 at 7:20 pm

    nice work done.. thanks a million…

  10. 15 Janette 20/05/2013 at 12:16 pm

    Truly useful….look forth to visiting again.

  11. 16 Abdul hussain 01/06/2013 at 5:13 pm

    thanks very much for very useful information many student could be benefited


  1. 1 Monophyletic, Paraphyletic and Polyphyletic « Dave Hone’s Archosaur Musings Trackback on 20/12/2008 at 9:58 am
  2. 2 Ghost lineages « Dave Hone’s Archosaur Musings Trackback on 17/08/2009 at 11:57 am
  3. 3 Communicating Systematics « DINOSOURS! Trackback on 08/06/2012 at 3:56 am
  4. 4 Framing Fossil Exhibits: Phylogeny | EXTINCT MONSTERS Trackback on 26/02/2015 at 5:51 pm
Comments are currently closed.



@Dave_Hone on Twitter

Archives

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 594 other subscribers