The best search engine for finding full text of public domain books online

Following  yesterday’s post about Google Books’ failure to present links to the full online text of Le Vingtième Siècle, a futuristic novel by Albert Robida published in 1883, I was thinking about just how difficult it had been to find those links. Lots of clicking and human natural language processing was needed, which seems to suggest that search engines still have a way to go before they can provide good results for such queries.

Eric Rumsey made an interesting comment that got me thinking it might be interesting to do a comparison of different search engines to see which if any are best for finding online books. Since Le Vingtième Siècle was rather difficult to find, it seemed like a good test case.

To my knowledge there are two places where you can find the full text of this book online: the primary source at Gallica, which has image and plain text versions and this page on Gloubik, which also links back to Gallica. The question is will any of the Internet search engines find them?

Here are the results:

The search terms I used for all searches were: Le Vingtième Siècle Albert Robida

Test with Google Book Search:

Test with Google Web Search:

Test with Google.fr Web Search (since the work in question is in French):

Test with Bing:

Test with Yahoo.fr:

Test with ask.com:

Just for fun,

Test with DuckDuckGo:

Test with Hakia:

  • No links to full text in first 10 results.

Test with Yebol:

  • #8 Albert Robida (link to Gloubik page with pdf links and link to Gallica)
  • And this tweet from Mike Cane which links to Le Vingtième Siècle-La Vie Électrique (and which will probably disappear from the search results shortly as it is already 4 days old):

@doctorlaura BING!! –> Albert Robida : Le vingtième siècle – La vie électrique http://t.co/uVzmsBwFri Nov 05 13:41:18 via Tweet Button

Finally, here’s a special case, Evri. It’s special because while it did not return any results with a link to the book, it seems to be the only search engine in addition to Google Books that knew Le Vingtième Siècle is a book. In fact it showed me before I even finished typing in the query.

While Evri didn’t show me any full text links, and 9 out of the top 10 links were to Wikipedia pages that mention Le Vingtième Siècle, I think Evri has a lot of potential. I’ve been planning to do a post about it, but perhaps now someone else will do it so I won’t have to.

So what’s the bottom line? Of the ten web searches, two listed the one page that has links to all the available online pdfs: yahoo.fr and Yebol. None of the searches presented any direct links within the first dozen results to what should be the most authoritative and reliable source, the Bibliothèque Nationale de France’s digital library site Gallica.

Of course, it’s not possible to draw any generic conclusions from a single test, but if you’re looking for full text online, it might be interesting to try different search engines and compare the results. If you have done similar experiments, I’d love to hear about them.

This entry was posted in Digital Books, Google, Search Engines. Bookmark the permalink.

6 Responses to The best search engine for finding full text of public domain books online

  1. Mike Cane says:

    What a frikkin mess. What all of the search engines should have done was ask first, Do you mean the book? This cries out for semantic web and solid metadata and all of that.

    Also, did you try the Wikipedia entry on Robida? Sometimes those have the best direct links!

    • laura says:

      Exactly, and that’s what Evri does, except that it doesn’t have all the indexing and solida metadata. Yet.

      I did check the Wikipedia entries, but unless I overlooked something, I did not see any links to the online text.

  2. Eric Rumsey says:

    Very interesting, thank you!

    A few specific observations first:

    *** I searched in Google Books for a specific phrase in the book (p 6: “du bureau et les communications furent”) and GBS does find it in snippet view — Apparently 1981 a republication.

    *** In the Yebol search result that you cite — the #8 link does not make it easy to see that it includes a link to “Le Vingtième Siècle” — It seems likely that most people would miss it.

    *** For the Yahoo.fr search — I repeatedly get an error message for this:
    #6 [PDF] Albert Robida – Le Vingtième Siècle (pdf of part 3)

    General comments:

    Within the (admittedly provincial) American Google world, this book seems pretty obscure (is it not so in France?) — I hope you’ll repeat your tests with something more in Google’s usual scope.

    As I mentioned before I think, Gallica and Gloubik are relatively unknown to the world of Google and Wikipedia. So I hope you’ll write more about them.

    • laura says:

      Thanks for pointing out the error in the yahoo.fr search result link. I must’ve mixed up the the links when I tested them, and I thought it was referring to a valid Gloubik page. I’ve corrected the post.

      Interesting too your search for a specific phrase in Google Books. I had tried the same thing, with something very specific to the book: I searched for “Colobry,” the last name of one of the characters. I’ve just repeated this search, and oddly it does not find the 1981 edition. It does list the English translation, but I can’t actually preview the text because the page isn’t available.

      Here’s a Google Translation of the #8 search result from Yebol. Some of it is fairly mangled, but the first sentence starts “The copy of the twentieth century put online by the National Library…” that seems fairly clear, as do the links to “Full Text,” “Part I,” “Part II” and “third party,” each of which lists the file size. Even in French, it’s not easy to miss that.

      Finally, I do admit that this book is rather obscure (unfairly so!), but I think that makes it all the better as a test. In any case, it does make things more tractable as there are not too many versions online to be found. At the moment, unless something new comes up, I’m not planning to do any more experiments like this.

  3. Rachel says:

    I have found Buzzdock realtime search to be really helpful when I’m looking for text from books. Buzzdock has Amazon as one of its sites that it searches when it gathers results so it automatically picks up results from Amazon that might contain the keywords. It hasn’t failed me yet, so I really like it.

    • laura says:

      As far as I can tell, Buzzdock seems to be yet another browser plug-in that does something similar to Google Custom Search, letting you search across multiple sites, including Twitter, within a simple results window located under the search box on your results page. So, it’s not a search engine, but rather a search aggregator and results presentation engine.

      That being said, the subjet of this article is finding online full digital versions of public domain books. For this purpose, the only two Buzzdock search applications that are likely of interest are Amazon and Evri, which is arguably doing something similar to Buzzdock, but adding value through the semantic results processing. You are correct to point out Amazon as a potential source for public domain works, and it does provide search within a number of books, however Amazon’s public domain collection is quite limited; Amazon is more interested in selling books than giving you a free one. In the majority of cases, and especially if the book is popular, Amazon does not provide a free version of public domain books that are also for sale in Kindle editions. So unless you are planning on buying a Kindle version of a public domain book, Amazon doesn’t seem to be a very likely place to look for it.

      In the present case, Amazon does not have a free version of La Vingtième Siècle, although you can happily search within the for sale English translation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s