Electronic Books Lecture 6 Ing. Miloslav Nič Ph.D. letní semestr 2010-2011 BI-XML Miloslav Nič, 2011 Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti
E-book Wikipedia: An electronic book (also e-book, ebook, digital book) is a text and image-based publication in digital form produced on, published by, and readable on computers or other digital devices.
E-book formats TXT HTML collection PDF Kindle (based on Mobipocket) EPUB... and many more basd on similar principles
EPUB x PDF http://www.adobe.com/content/dam/adobe/en/ devnet/digitalpublishing/pdfs/epub_datasheet.pdf PDF: a fixed page - the publisher in complete control over page layout and presentation EPUB: text reflow according to screen size
International Digital Publishing Forum (IDPF) http://idpf.org/ a global trade and standards organization develops and maintains the EPUB content publication standard
EPUB a distribution and interchange format standard for digital publications and documents latest stable version EPUB 2.0.1 EPUB 2 initially standardized in 2007 EPUB 3 in the process of being standardized (2011?)
Google and EPUB
Project Gutenberg
epub Readers see e.g. http://www.jedisaber.com/ebooks/readers.asp Some examples: Bookworm Calibre FB Reader Mobipocket Stanza...
epub and Kindle not direct support at this moment several converters available
Bookworm.oreilly.com
FBReader my favourite reader (both Linux and Android in my case; installers for other versions - e.g. Windows, Mac also exists) http://www.fbreader.org/
EPUB Standards Open Publication Structure (OPS) book content in XHTML or DTBook Open Packaging Format (OPF) book structure and metadata Open Container Format (OCF) book file structure and compression to a single file
Open Publication Structure (OPS) XML files Namespaces: XHTML: DAISY: OPS: http://www.w3.org/1999/xhtml http://www.daisy.org/z3986/2005/dtbook/ http://www.idpf.org/2007/ops
XHTML XHTML 1.1; only some modules are included a selection of supported elements: html, head, title, body abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var, dl, dt, dd, ol, ul, li, sub, sup a, img, caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr
CSS a subset of CSS 2 supported must be supplied with the book (not via web) E-Book readers are very variable (screen size, graphic capabilities) CSS styleshets very usefull
Images @alt of <img> required core media types support of which is required: image/gif image/jpeg image/png image/svg+xml
DTBook (Digital Talking Book) an XML vocabulary defined in ANSI/NISO Z39.86-2005 Standard (http://www.niso.org/workrooms/daisy/z39-86-2005.html) recommended for more advanced applications (e.g. educatonal books) supports footnotes, sidebars, annotations, page numbers, etc.
DTBook features hierarchical navigation sequential reading with choices (e.g. skip footnotes) specific reading methods for different components (e.g. tables) time synchronization via SMIL
Navigation Control File (NCX) http://www.niso.org/workrooms/daisy/z39-86-2005.html#ncx exposes the hierarchical structure of a book
Open Packaging Format (OPF) describes and references all components of the electronic publication (e.g. markup files, images, navigation structures) provides publication-level metadata specifies the linear reading-order of the publication provides fallback information to use when unsupported extensions to OPS are employed provides a mechanism to specify a declarative global navigation structure (the NCX)
OPF File Structure Package: Metadata Manifest Spine Guide
<package> root element of OPF package Attributes: xmlns= http://www.idpf.org/2007/opf version = 2.0 unique-identifier = an-unique-id primary book identifier selected from a collection of Dublin core identifier elements in <metadata> if not world-wide unique it may cause problems in libraries and catalogues
<metadata> a required child of <package> its children either elements from Dublin core namespace and/or <meta> elements with same syntax as XHTML
<dc:elements> Dublin core: http://dublincore.org/documents/dces/ Elements: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, type e.g.: <dc:title>a book</dc:title> <dc:identifier>uhf-232-dsds</dc:identifier>
<dc:identifier> at least one <identifier> with attribute @id must be present inside <metadata> the value of an @id attribute must be equal to the @unique-identifier of <package> element content of the <identifier> element with such @id is used to uniquely identify the book in libraries and catalogues
<manifest> the next required child of <package> provides a list of all the files that are part of the publication (xhtml, css, images, ) each file listed in a child <item> each file must be given precisely once but the order of files is not significant
<item> child of <manifest> Attributes, all required: @id @href relative paths interpreted relative to the location of OPF file containing the <manifest> @media-type Optional attribute: @fallback provides an @id of another item to be used if this item @media-type is not supported
<spine> the next required element collects main ebook pages contains one or more <itemref> elements <itemref idref='anid'> @toc of <spine> anid is @id of a <manifest>/<item> contains a value of @id of an <item> which provides a content for ebook, usually in NCX format
Open Container Format (OCF) a general-purpose container technology collects a related set of files into a singlefile container the required format for a file containing an EPUB book a ZIP archive
OCF file structure File mimetype Directory META-INF with files: container.xml (required) manifest.xml metadata.xml signatures.xml encryption.xml rights.xml Directory OEBPS with EPUB files (which may be in subdirectories) Other directories, e.g. PDF for alternative book versions
file: mimetype in the root of ZIP archive it must be the first file in the archive must contain text: application/epub+zip make sure there are no whitespaces around this text simplifies automatic recognition of the archive
container.xml in directory META-INF format: <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="oebps/an_opf_file.opf" </rootfiles> </container> media-type="application/oebps-package+xml" />
EPUB 3.0 http://idpf.org/epub/30/spec/epub30- overview.html 4 specifications: EPUB Publications 3.0 EPUB Content Documents 3.0 EPUB Open Container Format (OCF) 3.0 EPUB Media Overlays 3.0 in draft stage
Some changes from v.2 http://idpf.org/epub/30/spec/epub30-changes.html HTML5 syntax (DTBook no longer an alternative syntax to XHTML) NCX superseded by EPUB Navigation Document (uses <nav> from HTML5) text-to-speech facilities multimedia support (via HTML5 <audio> and <video>)
EPUB Media Overlays 3.0 defines a usage of SMIL a simplified subset of SMIL 3.0 that allow sequencing of clips <par> + <seq> @clipbegin, @clipend