Nowadays even the media seem quite happy to occasionally put up a phylogenetic tree as part of their scientific coverage, and they are proliferating on the internet on websites, research papers and blogs, in addition to books and magazines. However, while it is hardly difficult to get the gist of a tree, there is a certain skill and amount of knowledge that needs to go into pulling out all of the information correctly from a tree. It is easy to make mistakes about what a tree actually tells you so hopefully I can clear up a few misconceptions about tree creation and how trees should be read.
I want to point out before we get seriously started that I am here tying to deal only with how the tree looks, and now how we get there (i.e. cladistics). I do have posts in preparation covering that and of course the variation in methodology and analysis type, use of consensus trees, OTUs, outgroups and more (note: you can ignore all the jargon in that sentence really, it will make sense to at least a couple of readers) can all have a profound impact on the shape of the tree (before we even get started on supertrees). So don’t worry or ask about *how* we get there, this should just be a guide to looking at individual trees or comparing trees. It may sound backwards (and it kind of is) but we’ll do the ‘how’ later, let’s deal with the what’.
First off though, some terminology. I have knocked up this little tree (1) to cover the basics (and for which I apologise for the poor quality, as I have said before I don’t have any image editing software so these were done in PowerPoint – they’ll do even if they aren’t pretty). There are a few different names for such things, being called trees, cladograms, phylogenetic trees and others, and while they do have subtly different meanings (which I won’t go into here) they all look much the same and are treated the same way (in terms of basic interpretation) as the differences lie in their construction rather than how they look or are used afterwards.
The tips of the tree are sometimes called leaves are basically where the actual taxa are. (Another bit of jargon, but very useful, a taxon is basically a biological unit, be it a species, genus, family, even a kingdom). The branches of the tree are the lines that connect the leaves and represent the pathway that traces the proposed evolutionary history of the lineages. The points at which branches separate are called nodes and represent hypothetical ancestors of the descendant clades (more technical language, clades being a group of taxa) of that node (which will be come relevant later, and I also have a whole post in preparation on this). Pairs of taxa or clades are said to be sisters (so A and B are sisters, as are E and F, C is that sister taxon to the clade A-B, and D to the clade A-C and so on). Sister taxa of course share a recent common ancestor at the node that joins them together. Often the base of the tree (typically a single taxon, but sometimes more) is called the outgroup. This is the taxon used to act as a basis for the analysis that produced the tree and is quite important, everything else is part of the ingroup. Typically branches bifurcate (i.e. two branches come off each node) but when relationships are unclear you are left with a polytomy with three or even more branches coming off a single node.
So with those under our belts, how do you read a tree? The idea is that one can trace relationships using the tree to see how things are related to each other, but there are catches. Those lineages arising at the bottom were the first to evolve, but that does not mean those at the top necessarily came last ( as we will see below) merely that the ancestors of that taxon branched off first. The position in the tree is of course a function of what taxa are included and how the tree is arranged. The tree on the right (2) is identical to the one above, but a few branches have been rotated around the nodes. It *looks* different, but it is not, all the relationships are the same – the branch supporting E and F evolves before A-D, and C appears before A and B, but E and F appear to be the most advanced. Try tracing some branches to see how to move from one taxon to another in the two trees and you will see that they are the same. The tree is a little like the famous London Underground map – each line shows you in which order the stations appear, but those actual positions compared to the real world is only approximate small gaps between stations can represent several miles, but the order of the stations are still right and do not change. Later we will deal with trees that do take ‘real distances’ into account, but for now treat them (and indeed most trees) simply as a guide.
Even though A and B still appear last in all of this and can be considered the most derived animals, they are not necessarily that special. Now let’s pretend we have added some extra taxa to our analysis and rerun it, let’s see what we get (3). Now E and F appear to be more derived than A and B since we now have two major clades splitting off early on (A-D and E-F plus J-L), but again their relationships have not actually changed, we have simply got a larger data set. E and F are not now more derived that A and B, just in an apparently more derived position than A and B because of how the tree appears.
You can also change apparent positions based on what taxa are included in the analysis. If we are interested in the A-D clade we would not need all the other taxa for our analysis so would not include them all, just a few representatives (4). Now it appears that E and J are sister taxa when before they were not. This is because F, K and L are not in the analysis. Actually the relationships are effectively the same, but with just two taxa of the clade present, they end up a sister taxa since they basically have no choice.
We can do an extreme version of this by removing almost all the taxa (5). Again, the apparent relationship of A to F is merely a function of the fact that of the available taxa in the analysis, they are the closest relatives and so come out together. Nothing has effectively changed in their relationships, merely the way in which they have been presented in the analysis.
This also works if you start replacing taxa. Now let’s take out a few and replace them with close relatives and see what happens (6). Again the tree looks different because of the new taxa and the lack of the familiar ones, but you will see that the fundamental relationships are the same between D, G, A, and E. In fact I have simply swapped the names around, but even if the shape of the tree changes a bit (i.e. there *are* some differences in the relationships between some of these new taxa, say between two different studies) like this (7) the relationships between these four are the same. There is actually little difference in the overall structure of the tree even if a few minor relationships are slightly different.
Another trap to avoid is confusing when things appeared in the fossil record with where they are in the tree, or even where they appear to be (8). So far all the trees have been artificial constructs that make the branches sit in a nice order so it’s easy to read the names and spot the patterns in the tree (and this is the most common type), but we can also make the branch *length* relate to time (or less often a measure of difference such as character support). Now the point at which the branch terminates relates to where the taxon first appears in the fossil record, with the left being the oldest and right the youngest. You can see that the pattern of branch length roughly matches that of branch splitting – in other words the most basal taxa (at the bottom of the tree) are also the oldest taxa (closest to the left). However, since the fossil record is not perfect, we also see exceptions – both G and F appear much later than we might expect and D is a little out. This is nothing to worry about (unless the branches are wildly different and even then there can be a good reason for it) and is quite normal. Note that again F has not moved – it is still the sister taxon to E and they in turn are the sister clade to the clade A-D, but we can now see that it split off from E some time ago and lasted a long time.
This leads to another point, namely that the actual taxa are of course real data points (i.e. fossils or of course living taxa) and the lines between them are simply inferred from the data (the actual reconstruction of hypothetical relationships based on how they appear). But, there are real animals supporting those branches (F did not magically appear in the fossil record but had ancestors) but we have simply not found them (or cannot recognise them). If we put V from tree number 6 into the equation, things don’t look so awkward any more (9) – again it’s at least in part a function of what taxa we include. Now F is simply a late surviving species of a lineage that split off from V a lineage which itself split off from E some time ago.
Finally, the issue over branch lengths and temporal displacement explains why systematists use the words ‘basal’ and ‘derived’ as opposed to ‘primitive’ and ‘advanced’. The former pair refer only to the original branching position of the taxon on the tree, whereas the latter of course imply something about the evolutionary status of the taxon. Sharks are basal vertebrates since they branched off early in the history of the clade, but to call modern sharks primitive is not correct, true they have some features that could be considered primitive (such as the lack of a swim bladder) but can you really consider a lineage that has been around for hundreds of millions of years ‘primitive’ as a whole?
Well that is about if for now. There are probably a couple more things I have missed, but if nothing else that should go a long way to covering the basics and frankly I’m bored of drawing trees in PowerPoint. As I mentioned above, there are posts coming on cladistics and ancestors in palaeontology and a few more besides, but you might have to wait a while. In the meantime you can start looking for trees and comparing them and evaluating them with your new-found tree reading skills. And who says science is dull eh? Can’t be when ‘tree reading’ is available to all.