Using Archive.org to Research Your Novel

The Internet Archive,, has to be one of the most valuable resources for any historical novelist. The Internet Archive contains a lot of things from archived versions of internet pages, to audio and visual material including films. But for me the most valuable resource is the number of scans of old, out of copyright books. In particular the number of printed editions published in the 19th century of historical documents such as chronicles, registers, parliamentary documents etc, is simply staggering. A number of these books have been scanned from the collections of various libraries and in particular large US Universities, so if you want material from non-English speaking countries then other resources might be better. And sometimes they haven’t scanned every book you might come across. For my research of the Pontvallain campaign I did find that other repositories of material were useful as well, but by far the largest source has been the Internet Archive.

You might say – “hasn’t Google books” scanned a lot of out of copyright books? Yes you’d be right – as have Microsoft. But often the best place to find these scans is at the Internet Archive – for whatever reason Google Books often doesn’t display the full version of these scans and the Internet Archive is easier to use.

So how do you get started?

I am assuming that you already have your bibliography together. If you need to research titles then somewhere else might be a better place – probably a general history of the period with good footnotes and bibliography of primary sources.

For this example I am going to be searching and downloading the Issue Roll of Thomas de Brantingham, bishop of Exeter, Lord High Treasurer of England…, A.D. 1370, ed. F. Devon (1835)

1. Searching

I would suggest you search by the title of the document rather than the author name. The search box is pretty straightforward, but if you search for the Editor here, Devon, you get the following:

search - devon

But using the title you get:

search results

2. Which Title

There are likely to be different copies of each text – presumably because scan have been provided by different libraries. I would generally choose the one with the most downloads as its likely that other users have found this to be in the best conditions – some scans can be messed up – blurred images, bent pages!

3. Book summary page

This is where you see the metadata for each book. Key things you might want to check are the publication date, copyright information and language. On the left you will see a list of file types. Ignore this list! Go straight to the link for All Files. If you go straight to PDF for example here, you might be redirected to Google Books and then find you can’t get the PDF for some reason – but you can.

book page

4. All Files list

I would always select the file type ending .pdf as this will be the best version. Sometimes you will have the option to choose colour or black and white – the colour version looks pretty but takes longer to download.

All files

5. Download!

Be warned this can take sometime – each PDF might well be 50 MB or more in size. So be patient.


6. What about Kindle, ePub, text versions!

Well unfortunately as these are scans of books producing images the text is not particularly well rendered, so you may well get nonsense. Some of the text comes out fine, but some will be rubbish. See the example below:

text nonsense

This is from the text file, but the text is used to make the ePub, Kindle formats as well, so you will have the same problem.

PDF is the best option.

Check out the Archive

I hope this guide has been useful. The Internet Archive really is a great resource for any historical novelist or anyone with an interest in history and in particular primary sources.

