Today it is the turn of Graeme Lloyd to entertain us with tails of dinosaurs. You might know him from his blog This Life’s A Fiction, or as the lead author on an old paper of mine on dinosaur supertrees. Of course he should be best known for our most important contribution as PhD students where we instigated the arrival of Dr Leonard P. Annectens to the University of Bristol, Geology Department. Here Graeme, talks about his most recent work revisiting issues of dinosaur diversity and their potential decline before the famous KT extinction.
Despite my publication record being largely on fossil vertebrates I am essentially a group-less palaeobiologist. (In my current post I work on planktic forams and coccolithophores!) Instead, my primary interests are more methodological and usually involve large databases compiled from the primary literature. A good example being the dinosaur supertree project that Dave and I were a part of. However, my current post is non-phylogenetic so I have been getting into some more standard richness metrics that led me to write the paper for which this blog post was commissioned.
But first, some history. Palaeontologists have been seriously interested in changes in richness (or taxonomic diversity) through geological time since the 1960’s. However, the earliest analyses were pretty rudimentary, using geological periods as time bins and orders or families instead of genera or species as the taxonomic unit of interest. Debate largely raged over whether there was an exponential rise in richness towards the present or whether this was some artefact, either a “Pull of the Recent” effect, whereby taxonomic ranges of living taxa are extended far back into the fossil record, or a dramatic increase in the amount of rock available for study.
Unfortunately much of this debate was more arm-waving than quantitative – what was really required were new data and new methods. This fact was best identified in a hugely influential paper by Dave Raup. He brought new data (a rock record curve) to bear on the issue – finding that the amount of rock closely matched the number of taxa in each time bin (his Figure 1). He then went on to suggest two ways in which we can help correct for this (seemingly obvious) sampling bias. The first of these are subsampling methods. However, these required new data on individual occurrences of fossils, and not the simple firsts and lasts that had been recorded up to that point. This dream wouldn’t be realised until sometime later with the creation of the Paleobiology Database. However, the second, which was using a model of the available rock to “correct” richness curves, has taken even longer.
The first serious attempt to use modelling as a means of countering potential sampling biases was by Andrew Smith and Al McGowan in their 2007 paper on the rock record of western Europe. They came up with an idea that goes something like this: if observed richness is purely a product of sampling then, i) true richness can be considered to be level (essentially the same null hypothesis of subsampling) and, ii) the bin with the highest richness corresponds to the bin with the largest sample size (or amount of rock), the second highest with the second largest and so on. This latter observation is the key to their method – if we then line up our rock record and diversity measures against each other from smallest to largest we can fit a simple linear model to them. This model can be expressed in the form of an equation where we input our actual rock record measure and it outputs a predicted richness value.
In reality, of course, sampling and richness are not perfectly in rank order and thus sometimes our prediction is going to be lower or higher than what we observe. Thus if we subtract our predicted values from our observed values we have a new set of richness values that are either negative (lower than predicted) or positive (higher than predicted). The theory goes that these values are excursions in richness that cannot be explained by our sampling proxy (rock record measure) and thus potentially represent the true underlying biological signal.
There are, however, a couple of problems here. First of all there is no actual significance test. Models have error and values within that error are still potentially explained by the model – not all of our excursions are necessarily significant. The second is that the data isn’t ideal for this analysis. Specifically Smith and McGowan used Sepkoski’s diversity data which is a range-through curve (based on firsts and lasts only). The problem here is that it doesn’t actually tell us how many taxa were actually sampled in that bin, thus potentially violating a key assumption of the model (that the bin with the most taxa is also the bin with the most rock). In practice we should only ever use a sampled-in-bin curve with this approach.
Smith and McGowan’s method has gained popularity in it’s short life so far, particularly amongst vertebrate workers. This is largely down to the fact that subsampling has never really been a viable option for the smaller sample sizes vertebrate workers have to deal with. Indeed, when we rarefied our dinosaur supertree data we found a flat line. (See the red line in the middle figure here.) Consequently, it is unsurprising that the first attempt at a significance test for the Smith and McGowan method should come from a vertebrate paper. Paul Barrett and colleagues (including Al McGowan) decided to use the standard deviation of the residuals once the predicted values had been subtracted from the observed values in their paper on dinosaur diversity. Unfortunately this approach has the consequence that roughly the same limited number of points will appear significant, regardless of how well or how poorly the model fits.
It’s at this point that I enter the fray. Early on in my current project I had been using the Smith and McGowan method to try and subtract rock record signal from my diversity curves of coccolithophores and planktic forams. However, I wasn’t satisfied with how it worked as the relationship between richness and sampling wasn’t a linear one for my data – my residuals kept describing a curve. My first change was thus to fit a bunch of different models that fitted straight, curved or even wavy lines. But which is best? I solved this by using the Akaike Information Criterion. This is a neat – and easy to calculate – measure that weighs the likelihood of each model based on two optimality criteria. First is its fit to the data (a good fit being better than a bad fit). Second is its complexity (a simple model is better than a complex model). The model with the lowest AIC was selected and its equation used to give the predicted richness values.
The second problem, that of the significance test, was harder to solve. Getting error bars from complex models and translating them back into the predicted values was tricky. In the end I opted for the simple solution of again using standard deviations (as Barrett et al. had), but this time these are calculated at the model-fitting stage and not after the predicted values were subtracted. This worked much better and now it was clear with the Barrett et al. data that sauropodomorphs really were a poor fit to their model, with multiple time bins showing significant excursions instead of none (the dash-dot line is the confidence interval of the model):
There was one final change to make. Andrew Smith asked me if we could do something like the hinge regression line he and Al had done in their paper. (In their residuals they could see three major rises and falls.) However, these had been identified by eye alone rather than being an objective quantitative result – anathema to me! My challenge was to find a method that could do this (I was sure one would exist as it seems like the kind of thing many people would use). After much searching I came across just the thing: Multivariate Adaptive Regression Splines, or MARS for short. Having found an R library with this function I wrote a bit of extra code to apply it to my specific data. The result for sauropodomorphs is below:
It’s pretty clear from this that the overarching trend in the latter part of sauropodomorph history is a declining one and I found the same pattern for ornithischians. Theropods were different however, as a linear model was deemed more likely than a MARS model. This is because their richness shows more short-term fluctuations rather than a longer-term trend, but it seems interesting to me that they are different being the only surviving clade of the three. However, I stop short of suggesting what may cause this.
I should finish by pointing out that I have written a little tutorial for anybody wanting to use this method with their own data here.
Lloyd, G. T., in press. A refined modelling approach to assess the influence of sampling on palaeobiodiversity curves: new support for declining Cretaceous dinosaur richness. Biology Letters.
“..I found the same pattern for ornithiscians.”
But not until the late Maastrichtian. Also significant decline may be limited to America, if Godefroit is right about Amur localities being coeval with the Hell Creek. I have doubts about that, though.
Actually if you look at the last spline of the MARS model the ornithischian decline starts in the Late Jurassic. Of course it would be nicer to do this with more up-to-date data and use several different rock record proxies to see how these patterns hold up.
“…the ornithiscian decline starts in the Late Jurassic.”
Because of the apparent absence of stegosaurs after early Tithonian in America and Europe? And just one genus in the EK of Asia, compared with about three around Callovian?
It’s not because of any single time bin. It’s a trendline.
Smith and McGowan’s work certainly was path-breaking, but we shouldn’t discount the work Shanan Peters has done. It is a very different approach, but his 2005 paper predates that of Smith & McGowan by two years, and is trying to address some of the same questions.
Agreed. That’s why I put in the “seemingly obvious” caveat. Shanan’s stuff is in my opinion a level ahead of this. Unfortunately though, for the moment we only have macrostratigraphic data for North America.