Why do ebooks have so many errors?
If you buy and read ebooks, you will have noticed that over all, ebooks tend to have noticeably more errors than print books. In some cases, the level of errors goes beyond annoying and into aggravating or even (if the book was pricey) infuriating.
Errors in books come in two kinds: typos and formatting errors. Typos can include using the wrong word (e.g., their for there, your for you’re, etc.) or just misspelling a word, but misspelling is much less common because spell checkers generally catch it. You occasionally find typos in print books, but they are usually proofed carefully, so that mistakes are rare enough not to bother the reader. Formatting errors in traditionally published print books are almost unknown because composition staff are paid to layout pages, so they check their work carefully, and then the proofer checks it, too.
That careful proofing step doesn’t always happen in ebooks, and occasionally you will see things like paragraphs that don't indent properly, or that run together, words that run together, and occasionally garbage text (letters with diacritical marks appearing as gibberish, words in the wrong place, etc.). A particularly annoying problem is unnecessary hyphenation—hyphens and spaces in the middle of single words, so that the word looks like this:
The source of typos is obvious. An editor and/or a proofreader missed the mistake (or in some self-published books, the author didn’t have an editor or proofreader!). The source of formatting errors in traditionally published ebooks can mostly be traced mostly to one thing: workflow.
Many book publishers are still focused more on print than on digital. Their workflow is driven by the need to get a manuscript into printed pages quickly. Generally, the composition staff of the publishing house pulls the word processing file the author provided into page layout software such as Adobe Indesign, which is then used to lay out the pages, apply headers and footers—basically, to make to make the book look as it should in print. Sometimes if a word is not hyphenating at the right place, they will manually insert a hyphen and a line break to force it to hyphenate where it should for the printed page (where the word appears at the end of the line). But of course, if the same file that was used for a print book is converted to produce the ebook, the excess hyphen and line break is still there, and the broken word may well be in the middle of the line, depending on the font size the reader selects. Extra spaces and non-breaking spaces or other things that were added to make the pages look right can also cause problems. If no one proofs the output of the conversion, these things are in the ebook when it is downloaded.
Alternatively, some publishing houses will take the word processing document the author submitted and convert that to ebook format. That file doesn’t have the page-based tweaks, but it also doesn’t have the last-minute corrections of typos and other edits that were made in the final product. The editors have to use their marked-up paper copy to correct it all over again, or let it go out as is. Interestingly, the same problem is faced by writers who traditionally publish a book, get the rights back after a few years, and then want to self publish. The only digital format they have of their book is often the word processing file they sent their editor; the finished book is available to them only in hard copy. If they want to publish an ebook that matches what was in the print version, they have to go through the file and make all the same edits their publisher did, or scan the print pages (more on that in a bit).
One solution for publishers is to refine the typesetting-to-ebook conversion process so that it accounts for every single thing that the typesetter could do to the file, and also to proof and correct the ebook file. But of course, a complication is that there are two main ebook formats: epub and Kindle (which is based on Mobipocket). Publishers need to produce a file in both formats, although many simply provide epub to Amazon and rely on Amazon’s conversion to create the Kindle format.
Another solution is one that information providers implemented a decade ago: convert the manuscript from word processing into a structured format like SGML (standard generalized markup language) or XML (extensible markup language). Instead of marking up the text for a typesetting system according to how it should look (e.g., 14 point Bodoni bold, 11 point Helvetica italic), SGML and XML require that text be marked up to identify what it is (e.g., chapter, chapter title, paragraph, etc.). Once you have data in this kind of neutral format, you can more reliably convert it to whatever output format is needed: print, web, or ebook. With special software, you can impose rules to make the data valid (e.g., you can require that every chapter has to have a chapter title). If you make the SGML or XML text the source for all outputs, you can proof and correct it once, without having downstream effects on other formats.
Of course, that option isn’t practical for self-published authors. They usually either rely on tools like Calibre or Scrivener to convert their word processing document, or they pay a conversion house such as 52 Novels to do a custom conversion. Custom conversions cost more, but usually produce a better-looking ebook. Even with that, authors still need to proof every word, as word processing files can have hidden characters that don’t show up until you convert the file.
In the case of backlist books, many of which were previously out of print, publishers have opted for scanning of print pages, which has resulted in some truly terrible errors in ebooks (wrong characters, extraneous text, etc.). Publishers (and authors) who rely on OCR (optical character recognition) scanning need to realize how far it is from perfect and include a rigorous proof-and-correct stage in their workflow. Meanwhile, the free sample feature on most ebook platforms is very useful for spotting books with these kinds of errors. If you're buying an old book, even if it's one you've read before and know you want to read again, I recommend you download the free sample just to check out the formatting.
Right now publishers are scrambling to adjust to the digital world. They haven't yet figured out that they can take advantage of the fact that ebooks can be corrected much more easily than print books. It's getting better, but ebook workflow still has a way to go. Books are now data as well as art, and publishers and authors need to know that and act accordingly.
~ Carmen Webster Buxton
About today's Guest Post Author:
Carmen Webster Buxton was born in Honolulu, and experienced a childhood on the move, as her father was in the US Navy. She has been a librarian, a teacher, a project manager, a wife, and a mother, although not in that order. She now lives in Maryland with her husband, her daughter, and a cat with the unlikely name of Carbomb. She writes science fiction, mostly set in the far future, and the occasional fantasy, and has five ebooks for sale on all major platforms, with two more to follow soon. She often blogs about ebooks and ereaders as well as her own books.
Get our Badge!