A comment on my post about France’s new legislation regarding unavailable books from the 20th century left me considering one of the core issues surrounding the digitization of these works, and I began wondering if projected revenues from their licensing could go far towards reimbursing the digitization costs. Of course that depends on how high the cost is, and if the plan is to create an artificial demand by holding libraries as captive customers, then the question is moot.
I was struck by the impression that the cost of digitizing all these books could be an obstacle to federally funded institutions in an age where a large percentage of private citizens have their own personal scanners and are known to make scanned books available online in versions of higher quality than mega-corporations with comparatively colossal resources like Google. So I decided to see if I could find out how much the French National Library estimated it would cost to digitized these 500,000 books. I found the answer quickly via Google.
According to an article from in Le Figaro, which may also refresh a few memories concerning the context in which the new unavailable books legislation was passed, the cost is estimated at €50 million. €50 million?!! That’s right; according to official figures digitization will cost €100 per book.
I understand that digitizing books without destroying them is extremely time-consuming or requires special equipment (or both), but I was so surprised by the size of this request that I decided to check the going prices for equipment and services designed to digitize printed content.
I made a quick and dirty unscientific survey and found a wide range of prices from $1 to $100 per book (or more). There are literally hundreds of companies offering these services. The price depends mostly on the length of the book, the desired format and whether or not the scanning process is destructive. I used a 300 page book as a reference, and checked prices corresponding to “quality” formats at eight different sites. For a detailed summary of costs, considerations and trade-offs of digitizing for library projects, Digital History by Daniel J. Cohen and Roy Rosenzweig has a good description.
Now I can understand the National Library in France wants a high-quality copy, so $60 (about €45 at current exchange rates) seems reasonable to me. That’s also the cost quoted by Cohen and Rosenzweig, and it’s less than half the cost estimated by the French Minister of Culture.
So what’s happening here? Why does it cost over twice as much as much for the library to digitize a book as it does for an individual to pay someone to do it? Remember, those digitization services are making a profit off that €45 too, so the real cost depends on the margin. If they make 15% profit, the cost is only €38 . One answer might be the quality of the digitization. According to Cohen and Rosenzweig if OCR is desired for features like full-text search and 99.9% accuracy is needed, the digitization cost could go from 20 US cents a page to eight or ten times as much. Note that the library already has the metadata, so they shouldn’t have to create that, just link the digitized file to it when they’re done.
Now I don’t know the formats being used by the French National Library, but if they’re not getting near 95% accurate OCR from those scans, either they’re being charged too much for digitization or someone is making off with over half of the funds. Either way taxpayers are getting royally ripped off because under the new law they’re not even guaranteed library access to 20th century scanned works in electronic form, even after 10 years.
Under those conditions, the idea of crowdfunding the digitization isn’t likely to go anywhere. It’s no wonder that only 105 books seem to have been adopted by sponsors so far. If not crowdfunding, then why not crowdsourcing? After all lots of people have books and scanners, and I haven’t even mentioned high quality, low-cost DIY Bookscanners. What’s keeping them from using the Internet to get together and share their books? Well, nothing. Take a look Inside Europe’s Largest Text Pirate Site. Publishers may call it piracy, but it sounds a lot like a citizen created library to me.
According to Le Nouvel Observateur Rue89, the French National Library digitizes 50,000-60,000 books per year. That’s barely twice the number of French comic books that have been digitized by teams of organized pirates.
Yesterday, I saw a story in The Times that residents in London’s Friern Barnet started a People’s Library of the print variety when their library was shut down by the council on short notice.
On the one hand, it’s noble and wonderful to see individuals self-organizing toward the objective of sharing the sum of human knowledge, but on the other hand, it’s sad to see that governments and private interests have failed libraries, writers and citizens. People have more faith in each other than in the public institutions whose mission it is to fulfill this objective. They understand that they have wider, more reliable and more convenient access when they organize their resources themselves, instead of when publicly funded institutions do. After all, isn’t that what those institutions are for?
There’s another argument for the citizen’s library too: censorship (link in French), but that’s a subject for another post.