My colleagues Mike Taylor and Andy Farke among others have done an admirable job of promoting the concept of open access in palaeontology, both for data and for the actual research papers that academics produce. However, while this is on the whole a very good thing, it has I believe (in conjunction with other phenomena) produced problems from the frontline scientists whom it is supposed to help.
While what I am about to write may be seen as a complaint it should not be – it is an observation. It is for me currently problematic, but that does not mean that I do not support open access (I do) or that this is a huge issue (it isn’t) or that on balance open access is a bad thing (not true either). With change comes problems, some foreseen and others not, and most if not all ultimately overcome or sidestepped to the general satisfaction of most so this is not something I expect to be a long-term issue. Here I simply want to illustrate a couple of problems that I have not seen commented on or discussed before. So with this in mind, what’s the problem?
The issue is one of the critical mass of literature that is now landing on the desks (or these days, hard-drives) of palaeontologists. Now I admit that since I dabble in quite a range of areas (theropods, sauropods, pterosaurs, body size, flight, evolution, ecology and so on) I’m probably at the wrong end of this and if I just stuck to reading so, only the pterosaur and cladistics literature to keep up with systematics and new pterosaurs it’d be fine. However, I don’t, and in any case there are still subjects just as ‘narrow’ as pterosaur phylogenies that produce far more papers (we’ve quite possibly had more tyrannosaurs alone this year than pterosaurs, certainly more phylogenies) so it’s not necessarily a fair comparison.
When I was finishing up my BSc which was only just 10 years ago I would go along to the library every week or two and flick through the issues of a couple of dozen journals that the library had the were relevant to my interests and the courses I had taken. It took a few minutes (and in the Biology department at Bristol where I was based we had a bigger selection than many others I knew of in other universities). If I was starting to read research on a new field then I could get to a pretty advanced (for its time) online catalogue system and search the catalogues of both my own university and that of a number of others in the area and try to identify specific papers (from the title and keywords alone, and of papers only going back to around 1980) that might be worth a look and then try to get them via an expensive and not especially fast loan.
In other words, if I wanted say, to see what literature was out there on lion behaviour I might be lucky and find a few in the library and then be able to loan out a couple more that may or may not have been useful. It could take days to get them all and I might be left with half a dozen papers. It would not be great, but it might well be sufficient for the purpose intended. If I wanted more than this, then I could read what I have and start trying to track down the records of those papers cited in the ones I had that looked important. I could also trace some keywords and titles in vast volumes that listed them and find new papers from there. Even an expert with a big collection of papers would probably struggle to have an especially complete collection of papers on so specific a subject, especially rare and historical ones, or those published outside of western Europe and North America.
Now I can go straight to Google Scholar and access thousands, if not tens of thousands, of articles of interest almost instantly. Sure you have to hunt though those you don’t want or need and download a bunch of rubbish or find critical papers are not accessible but the time saved (all that photocopying!) and range of material is enormous. Even things you can’t access at once are easy enough to ask for. You can track down almost any researcher in minutes and e-mail them asking for their papers when you might have had to send a transatlantic letter to a researcher (you hoped was not in the field and had not moved) and wait for a response. I’d guess that from scratch I could get 100 papers on tyrannosaurs today if I tried, and a fair few more if I e-mailed around and asked, when it might have taken me weeks or months before and cost a fair bit of money in loans and paper to get 20. I can also get historical manuscripts, whole books, extra notes and commentary, translated articles of non-English journals, whole masters and doctorate theses and more. Tons of foreign language papers (and even English language ones) I never knew even existed 10 years ago are now available online and with them, thousands of new articles. All of this is, of course, good. But it’s hard not to look at this wealth of information and not see a few problems with it.
First of all, for the uninitiated or inexperienced, it’s far harder to find the really good stuff. It takes time to work out what are good papers and which are not, good and bad journals and good and bad authors. If you only had access to 20 papers, it was easy enough to skim them and pick the half dozen that should form the basis of your next generation of reading and research – when you start with 200 that instantly becomes much, much harder. Similarly it’s easy to get blinded into thinking that what you have is enough – you might be able to download and read 100 papers, but without the half dozen key ones that really stand as vital in the field, all that superficial stuff many just not be good enough. Now of course this need not be too bad – the experienced researcher knows a good paper when he sees one, even for a field he has never looked at before, and journals like Nature and TREE have always been great reads, and you should soon spot an obvious gap in your collection if people keep referring to a paper you don’t have. However, I do think this is still an issue for some whereby quantity is confused with quality or quality can be masked by quantity.
Secondly and more importantly this availability seems increasingly to be viewed as a necessity and not a bonus. I see authors trying to cram a reference to every damned paper on a subject into a manuscript in either some unnecessary one-upmanship contest with the rest of the world, or desperately trying to give themselves credibility for simply having read 25 papers on a subject. Perhaps there are other reasons, but I can’t figure them out. Worse, this is sometimes used as stick by some referees and editors (and others) to beat authors with. You can cite four or five papers to support and argument and then get criticised for leaving out one 1964 Brazilian article on the subject, or accused of plagiarising an idea because you had not read / cited that 1964 Brazilian paper.
This is silly.
Yes, I well appreciate that as a researcher you have a duty to know the field in which you are writing. You must read and understand and take in a significant fraction of the most critical papers on a subject and to do otherwise is a disservice to your own work and that of others. But that does not mean you should try to track down every reference that has ever related to a field and read it and cite it. There has to be a balance. I simply cannot stop to read *every* paper on theropod morphology on the off chance that I have been pre-empted on a point I wish to make, or to note the two exceptions that occur, or document every time someone else has said it before. It could take literally months in places to read the literature on a subject that might refer to one otherwise minor point in a paper. No one would ever get any work done, or would be forced to specialise enormously so that you just had one incredibly minor area to work on and could keep on top of the new papers.
In short, we need to strike some kind of balance, or perhaps rather come to accept a new status quo. There is an absolute shedload of data out there that is now accessible and did not used to be and this is a very good thing indeed. However, it is beyond the practical means of researchers to be expected to read every single paper that has ever been published in their field, or even to read everything published each year (depending of course, on quite what your field is). Things will be missed and mistakes will be made, just as they always have, but I think the sheer volume of material now available and increased communication between researchers makes it appear much worse than it is. It used to be seen as a faux pas if you missed a 70’s Nature paper on something, now it’s seen as problematic if you miss a foreign language paper published 70 years ago and the two are not the same. Good researchers and good referees will I’m sure help make this situation easier, but in the meantime I suspect things will be testing for some.