Google just announced that they’re making a million+ public-domain books downloadable in EPUB format. This is an improvement over the old situation, where you could download PDFs (sans OCRed text) of those books or read them in plain text online (one physical page at a time), but not download a small, well-OCRed text copy.
I’d be delighted if they went all the way to true plain text downloads. (And then let me download all the public-domain stuff in bulk. And gave me a pony.) But this is a nice improvement. In other news, I’d also be delighted if my Kindle supported EPUB natively.
3 thoughts on “Google and EPUBs”
This is an improvement, for sure, but like you I’m impatient for being able to download full text. Not sure why this isn’t a possibility already, on Google’s end. Also, are there any web scraping programs that you could feed the URL of the first page of Plain Text for the book, and it would then auto-scroll through the pages, downloading each one? Is this a pipe dream or a real possibility? Hopefully Google will make it irrelevant by opening up full text downloads for the entire book.
I don’t understand why a plain text download is needed?…epub is an open standard. It allows for formatting, hot-linked table of contents and reflow, depending on the screen size. I think your wish should be to have Kindle firmware updated to accept epubs. — My guess is, they can’t do that because that would mean Kindlers would be able to borrow from libraries, as opposed to buying everything from one etailer…not in keeping with their business model.
I’d like plain text for easy data mining, not for reading individual works. It’s not that it’s terribly hard to extract the text from an EPUB, just that it’s another step in my text-analysis workflow.
As for the Kindle, yes, I imagine Amazon’s reluctant to use an open standard for purely business reasons.