Assigning metadata to EPUB3 digital ebooks. Part 1: EPUB3 specifications for metadata

Time to read
25 minutes
Read so far

Assigning metadata to EPUB3 digital ebooks. Part 1: EPUB3 specifications for metadata

 

[Versió catalana]


Miquel Centelles Velilla

Lecturer at the Department of Library and Information Science
Universitat de Barcelona

Mireia Ribera Turró

Lecturer at the Department of Applied Mathematics and Analysis
Universitat de Barcelona

 

Abstract

Objectives: This paper describes how to use internal and descriptive metadata in EPUB 3 according to current standards and the good practices disseminated by the publishing industry. It also offers a series of recommendations which will be especially useful for small publishers and book authors wishing to publish their own works.

Methodology: The paper is descriptive and instructive, using an explanation of the internal structure of a book in EPUB3 format to consider different metadata elements and corresponding standards, their structure and possible values. The authors use a case study to exemplify their recommendations.

Results: The EPUB mechanisms that describe the context, content and structure of books essentially emulate the tools that are specifically designed for cataloguing, an indication of the consolidation of the metadata model in its origin. The authors especially recommend the inclusion of five mandatory metadata elements: identifier, title and language, as drawn from the Dublin Core standard, and two further elements, selected according to the user’s own scheme, to describe the date of the last modification and the duration of the audiovisual media included.

Resum

Objectius: descriure els mecanismes i la funció de les metadades internes i descriptives en el format EPUB3 segons els estàndards vigents i les bones pràctiques difoses per la indústria editorial, i identificar un conjunt de recomanacions especialment orientades a petits editors o autors que s'autoediten les obres.

Metodologia: l'article és descriptiu i instructiu, ja que a partir de l'explicació de l'estructura interna d'un llibre en format EPUB3 s'expliquen els diferents elements de metadades, els estàndards corresponents, la seva estructura i els possibles valors. El conjunt de propostes s'exemplifica a partir d'un cas d'estudi.

Resultats: el format EPUB disposa de mecanismes de descripció del context, el contingut i l’estructura dels llibres que emulen, en origen, els instruments específicament orientats a la catalogació, una mostra de la consolidació del model de metadades en origen. Els autors recomanen especialment la inclusió dels cinc elements de metadades obligatoris, tres dels quals (identificador, títol i idioma) dins l'esquema Dublin Core i dos segons un esquema propi, per descriure la data de la darrera modificació i la durada dels mitjans audiovisuals inclosos.

Resumen

Objetivos: describir los mecanismos y la función de los metadatos internos y descriptivos en el formato EPUB3 según los estándares vigentes y las buenas prácticas difundidas por la industria editorial, e identificar un conjunto de recomendaciones especialmente orientadas a pequeños editores o autores que se autoeditan las obras.

Metodología: el artículo es descriptivo e instructivo, ya que a partir de la explicación de la estructura interna de un libro en formato EPUB3 se explican los diferentes elementos de metadatos, los estándares correspondientes, su estructura y los posibles valores. El conjunto de propuestas se ejemplifica a partir de un caso de estudio.

Resultados: el formato EPUB dispone de mecanismos de descripción del contexto, el contenido y la estructura de los libros que emulan, en origen, los instrumentos específicamente orientados a la catalogación, una muestra de la consolidación del modelo de metadatos en origen. Los autores recomiendan especialmente la inclusión de los cinco elementos de metadatos obligatorios, tres de los cuales (identificador, título e idioma) dentro del esquema Dublin Core y dos según un esquema propio, para describir el dato de la última modificación y la duración de los medios audiovisuales incluidos.

 

1 Introduction

The aims of this article are twofold: on one hand it wants to show how metadata is organized in EPUB digital format, on the other hand it analyzes and comments critically the current trends on their use in the organization and retrieval of digital books in the most important distribution platforms and in popular reading devices.

EPUB format is gaining momentum in the publishing industry. This article, will focus particularly in EPUB version 3 (from now on EPUB3), which was published in November 2014 with the official title ISO/IEC TS 30135: Information Technology: Digital publishing: EPUB3.

The sum of both objectives will allow us to identify a set of recommendations about metadata assignment in the authorship and publishing of EPUB3 books, particularly focused on small publishers (including self-publishers) wanting to distribute their production through different channels, or simply wishing to incorporate their books within a collection on their personal reading devices.

Two important aspects will determine our sample and our approximation to EPUB3 metadata. The first of them is the definition of "metadata", taken as a departing point in our study. In the context of this article, we define metadata as structured data, concerning published works, which describe aspects of its creation context (for example, who created it), its content (for example, what they talk about), or its structure (for example, which collection do they belong to). The metadata inhabiting the elements of a published work in its own publishing format –internal metadata--, promotes the adoption of specific tools and procedures. In this article, we will focus particularly on the mechanisms related to the organization and retrieval of digital books within mainstream distribution platforms and within widespread digital books reading appliances and applications. The second is the selection of the normatives analysed which will be limited to the only two existing specifications versing about metadata assignment in EPUB books1. On one side, EPUB Publicacions 3.0.1 (2014), defining semantics and global conformation requirements for EPUB books, and on the other side, EPUB Media Overlays 3.0.1 (2014), which defines a format and a model to synchronize text and audio content. In the context of EPUB books, a media overlay is an XML document that associates an XHTML content document with a prerecorded audio, aiming at offering the reader a multimedia synchronized reading experience. On further sections, we will talk about how to describe this type of resource with metadata inside an EPUB book.

Our approach to assign metadata in EPUB3 books, is linked to the project "Creació de continguts multimodals II",2 made by Adaptabit team,3 in collaboration with a team of professors from the Law Faculty of Universitat de Barcelona. Within this project we redesign some of the teaching resources from the Master in Family Legislation and Childhood in order to offer them in several formats, more accessible to students with low vision or other print disabilities. In the project, we choose EPUB version 3 as the prioritary format, due to its importance in the publishing industry, and also to its capabilities – among other advanced features – to incorporate a voice track syncronized to the textual content. One of the tasks in the redesign was to assign descriptive metadada to the EPUB format version, following the specifications of the format. The examples illustrating this article come from one of the books in the project, in particular from the document "Módulo: Infancia, protección de la persona y adopción" which belongs to the XVI Master in Family Legislation and Childhood.

Keywords:

  • Metadata
  • Electronic books
  • EPUB
 

2 EPUB3 Metadata

EPUB 3 introduces a set of predefined metadata elements and several mechanisms to define additional elements depending on the specific needs of EPUB books creators. The reference metadata schema for the predefined elements is Dublin Core Metadata Element Set (2014),4 from now on DCMET.

As previously stated, we will synthesize and apply the guidelines offered by EPUB Publications 3.0.1 (2014) and EPUB Media Overlays 3.0.1 (2014) specifications. It is worth mentioning that EPUB format uses XML syntax for every aspect, and that it uses a variant of HTML5 specifically for the content and structure.

Although several articles emphasize the power of EPUB3 as a rich format to include metadata (Barker; Campbell, 2015) and the possibility to associate different types of content (Sigarchian et al., 2014). Upon authors knowledge, there does not exist any other deep description in the literature about metadata within EPUB and about its use, in the date of writing.

It is important, before going further in the article, to differentiate the three entities object of description in EPUB metadata: the EPUB publication entity, the rendition entity and the resource entity.

An EPUB publication is a set of one or more renditions, packed in an EPUB Container on a zip file, which, in general, corresponds to a unique creative work. One publication can include one or more renditions, each one containing the needed and interrelated resources on which an EPUB publication is build up. As an example, there could be a visual presentation and an auditory presentation, and even other modalities, of the same EPUB publication.

A resource can include creative content or programming code contributing to the logic and specific presentation of an EPUB rendition. As examples of resources, we can mention the Package document belonging to a rendition, the EPUB Content Documents, the CSS documents, the audio files, any video file, the images, the embedded fonts or a script.

An EPUB publication includes a specific number of resources, organized in one or several renditions. Every resource must belong at least to one of the book renditions, if the book offers more than one.

Each rendition of an EPUB document includes only one "Package", an XML document which specifies all the necessary resources to make a specific rendition, defines the resources reading order and associates information related to navigation and to metadata. In this XML document, the root element is <package>, whith three compulsory children, ordered as follows:

  • <metadata> element, which encapsulates all the descriptive information about the corresponding rendition.

  • <manifest> element, which provides an exhaustive list of the resources included in the corresponding rendition.

  • <spine> element, which defines the reading order of the resources associated to the corresponding rendition.

Additionally, the element <package> must be enriched with two attributes: the version attribute, specifying the version of the EPUB Standard used in the described rendition5, and the unique-identifier  attribute, which – as we will see in section 5 – allows the expression of the main identifier of the publication described in the metadata.

In this article, we will focus our attention on the <metadata> element. You can find a detailed description of other compulsory and optional elements in <package> at EPUB 3 Overview.6 Moreover, it is beyond the scope of this writing to describe the mechanisms to include metadata within the content of EPUB documents encoded as XHTML, in order to semantically enrich them and to provide increased processing and accessibility possibilities. Examples of these mechanisms could be RDFa (XHTML+RDFa 1.1., 2013) and microdata (HTML Microdata, 2013).

The minimal composition of <metadata> element is illustrated in the next code sample:

Code snippet 1. Minimal composition of <metadata> element within an EPUB document

Code snippet 1. Minimal composition of <metadata> element within an EPUB document

xmlns (1) attribute is used to declare DCMET namespace,i.e. http://purl.org/dc/elements/1.1/ (2). This namespace is associated to dc prefix (3), which will identify all elements taken from this schema.

A child element for each metadata describing the publication (4, 5, 6, 7). The aplicable elements are summarized in Table 1, together with their condition of mandatory (1 or more instances are required) or optional (0 or more instances are required). There is no predefined order for these elements, and all of them can be repeated.

Metadata elements
Mandatory/Optional
Cardinality
<dc:identifier>
Mandatory
1 or more
<dc:title>
Mandatory
1 or more
<dc:language>
Mandatory
1 or more
Elements de metadades de DCMES opcionals:
<dc:contributor>
<dc:coverage>
<dc:creator>
<dc:date>
<dc:description>
<dc:format>
<dc:publisher>
<dc:relation>
<dc:rights>
<dc:source>
<dc:subject>
<dc:type>
Optional
0 o more
<meta>
  • One element <meta> with the property dcterms:modified, declaring the last modification date of a rendition is required. 
  • An element <meta> with the dcterms:modified property set to the latest modification data is required.
  • An element <meta> with the media:duration property for the whole presentation, and for each media overlay incorporated, is required.
  • All other <meta> elements are optionals
2 or more
<link>
Optional
0 o more

Table 1. Metadata elements in EPUB 3

Each of these elements admits several types of attributes, which are stated in the EPUB Publications 3.0.1 (2014) standard. Some of the accepted attributes come from the main XML Schema, in particular, idxml:lang and dir.

The specific meaning and application conditions of every metadata element within EPUB3 format will be described on sections 3 to 7.

As you can see in table 1, EPUB3 incorporates a set of metadata elements coming from DCMET schema. Among them, three are compulsory for every "Package" document: <dc:identifier><dc:title>  and <dc:language>. All elements are repeatable, which means that a specific "Package" document can include more than an instance of anyone.

 

3 DCMET compulsory elements

3.1 <dc:identifier> element

Every instance of this element contains a unique identification code, associated with the corresponding presentation in the EPUB publication. These codes can follow adhoc logic or can be based on a standardized identification system such as UUID, DOI, ISBN or ISSN. The identifier is given in the optional attribute id, its value identifies the particular instance of the element.

Since <dc:identifier> can be repeated, a particular presentation can include several identifiers – corresponding to diferent identification Systems – related to the described publication. However it is required that every <metadata> section contains, at minimum, one instance of the <dc:identifier> element with the main, non ambiguos, identifier of the publication as its content, the Unique Identifier. This identifier will remain unchanged even though minor revisions on the packaging or content of the publication are made, and it will preserve the publication identity in any possible distribution or identification process.

In order to specify which identifier is unique, the value of the id attribute of the <dc:identifier> instance containing it will be the same as the value of the unique-identifier  attribute in the  <package> element. We can see this procedure in the next code snippet:

Code snippet 2. Identifier value repeated in <package> and <metadata>

Code snippet 2. Identifier value repeated in <package> and <metadata>

The EPUB Publications 3.0.1 standard (2014) proposes some best practices related to how to manage publications identifiers.

  • Although the unique identifier in an EPUB publication is not a static value, it is not recommended to create new identifiers when there are updates in the metadata, minor error corrections or other minor changes in the EPUB publication.

  • To be able to identify minor modifications and new releases/editions, EPUB3 specification defines another identifier, the Release Identifier. Upon recommendations, although its value is not a property in the metadata element it should merge the contents from the Unique Identifier and from the <meta property=&"dcterms:modified&"> , that consists of the date and time of the last modification of a rendition (see section 5.1). In this way, ebook holders can distinguish and sequentially order all the EPUB publications with a global unique identifier. In the next code snippet a unique identifier and a modification date are combined to form a release identifier.
Unique identifier and modification data elements as release identifier

Code snippet 3. Unique identifier and modification data elements as release identifier

On the "Package" document, the identifier will be: ca-20140317123428@2014-03-17T12:10:04Z

3.2 <dc:title> element

The instances of this element will contain a name given to an EPUB publication. There are different kinds of titles to be assigned to a publication, as we will see in section 5. This element only accepts the XML schema optional attributes: idxml:lang and dir.

Element <dc:title> with all its three atributes: id, xml:lang and dir

Code snippet 4. Element <dc:title> with all its three atributes: id, xml:lang and dir

3.3 <dc:language> element

The instances of this element specify the language used in the content of a rendition of the EPUB publication. This element only accepts the optional attribute id.

<dc:Language> element

Code snippet 5. <dc:Language> element

 

4 Optional elements in DCMET

Additionally, it is possible to incorporate the following optional elements coming from the DCMET schema: <dc:contributor>, <dc:coverage>, <dc:creator>, <dc:date>, <dc:description>, <dc:format>, <dc:publisher>, <dc:relation>, <dc:rights>, <dc:source>, <dc:subject> and <dc:type>, all of them repeatable.

The definition and basic description of each one can be found in DCMET schema especification. However, EPUB Publications 3.0.1 (2014) specification adds some particularities in the use of five of these elements which take priority over the base schema. The affected elements are

 <dc:contributor>, <dc:creator>, <dc:date>, <dc:source> and <dc:type>.

We briefly define and describe all 12 optional elements in the following section. We start by the five elements with added particularities in the EPUB Publications 3.0.1 (2014) specification.

  • <dc:creator> used to represent a person or organization name, responsible of the intellectual creation of the EPUB publication content. <meta> elements provides a mechanism to refine this information providing details such as the role of the creator or the alternative forms of his/her name.

  • <dc:contributor> used to represent a person or Organization name, with a secondary role in the creation of the EPUB publication content.

  • <dc:date> used to define the publication date of the EPUB publication. It is recommended that this element value is expressed following the W3CDTF profile of the ISO 8601:2004 (2004) international standard. An example could be 2016-03-17T12:10:04Z.

  • <dc:source> used to identify related publications which served as a base of the derived EPUB publication. This element could be applied, for instance, to a printed publication previous to the EPUB digital publication.

  • <dc:type> used to indicate that an EPUB publication is of a specific type, from the point of view of nature and gener. The EPUB Publications 3.0.1 (2014) specification references authors to a registry of publication types to fill in the <dc:type> element (EPUB Registry of Publication Types, 2014). It should be noted that in the time of writing, many values in this registry are still in draft.

  • <dc:coverage> identifies the geographical or time scope of the resource, the geographical or jurisdictional space where it applies. It is recommended to assign values coming from controlled vocabularies.

  • <dc:description> includes a synthesis of the resource content. The description can consist, but is not restricted to, on a summary, a table of contents, a graphical representation or an abstract of the resource.

  • <dc:format> identifies the file format, the physical media and the resource dimensions (physical measure and duration). It is recommended to assign values coming from controlled vocabularies.

  • <dc:publisher> identifies the person, organization or service responsible of the resource availability.

  • <dc:relation> identifies a related resource, as for example a version in another format, a translation, etc.

  • <dc:rights> includes information to the appliable rights to a resource, as for exemple the copy rights.

  • <dc:subject> identifies the resource topic. It is recommended to assign values coming from controlled vocabularies.

All these optional metadata elements admit the id attribute, which esclusively identifies the element, and serve as a link to associated <meta> elements, as we will see in section 5. Additionally, the elements <dc:contributor>, <dc:coverage>, <dc:creator>, <dc:description>, <dc:publisher>, <dc:relation>, <dc:rights> and <dc:subject> also admit xml:lang and dir attributes. 

 

5 Elements <meta>

The <meta> element is a mechanism to include additional metadata elements, unforeseen by the DCMET schema,7 or to refine any other element.8 We use refine to indicate the inclusion of additional details to a particular instance of a metadata element and to its value. Refining is a fundamental mechanism to enrich metadata elements provided by EPUB3 specifications.

The elements requiring refining can be compulsory or optional elements from DCTERM schema – reviewed in sections 3 and 4 – as well as addtional elements included in the "Package" documents as <meta> elements. For instance, we can refine a <dc:creator> element instance to give information about his/her role as author,publisher, illustrator, etc.

When we include <meta> elements, we must take special care with some conditions as illustrated in the following code snippet.

Element <meta> indicating where to take special care

Code snippet 6. Element <meta> indicating where to take special care

  • When we want to include a new metadata element (1), the <meta> element must include, at minimum one property (2) attribute, its value indicating which element it is (3).

  • When we want to refine an element already included (4), we must identify which element will be refined in the value of refines (5) attribute and we must identify the refining meaning in the value of the property (6) attribute.

  • The EPUB Publications 3.0.1 (2014) specification accepts four additional attributes to the required ones: idschemexml:lang and dir.

Following, we detail some aspects of using <meta> elements for the two indicated objectives.

 

5.1 <meta> elements used to include additional metadata elements

It is possible to include metadata elements coming from any schema which could be of interest to describe our publications. In general, we will need to declare the corresponding namespace in the <package> section. Let’s imagine we want to include the element primaryTopic, from Friend of a Friend (aka, FOAF) schema (FOAF Vocabulary Specification, 2014), to describe the main topic of a publication. The following code snippet illustrates how it could be expressed in a "Package" document.

Element <package> with other metadata schemas elements

Code snippet 7. Element <package> with other metadata schemas elements

With the prefix (1) attribute the FOAF schema spacename is declared. Within the <meta> element, primaryTopic element is declared as value in (2) property attribute, preceded by the (3) foaf prefix, which links it to the namespace.

Despite the precedent exemple is the most common way to include a new element, there are some metadata schema that can be included using the <meta> mechanism and that do not require a previous namespace declaration, as they belong to the reserved prefixes of the EPUB specification. Some relevant examples are indicated in Table 2.

Scheme
Internationalized Resource Identifier (IRI)
Prefix
DCMI Metadata Terms http://purl.org/dc/terms/
dcterms
Media Overlays Metadata Vocabulary http://www.idpf.org/epub/vocab/overlays/#
media
ONIX for Books: Codelists Issue 26 http://www.editeur.org/ONIX/book/codelists/current.html#
nix

Table 2. Examples of reserved namespace prefixes within EPUB 3

A complete and updated relation of these schemas can be consulted at EPUB Publications Reserved Prefixes (2013) normative document.

 

5.2 <meta> elements used to refine metadata elements

Section  &"4.3.2 Metadata meta Properties&" of EPUB Publications 3.0.1 (2014) specification includes a vocabulary of accepted values for the attribute property to detail the meaning of a particular refine, i.e., the kind of refine. Among other functions, the vocabulary states if a refine kind can be applied to particular elements of the base schema DCMET, or if it can be applied to any element.

For the purpose of this article, it is more relevant to categorize them depending on the purpose they are created for. In general, taking into account this criterion, we can identify two categories of refine types: those that detail the nature of the refined element consequently giving more information about the meaning of an assigned value to a particular instance, and those that are used to declare variant forms of a property.

Within the first category, we can include the refine types such as identifier-typetitle-type and role. All of them are appliable to a limited number of elements within DCMET schema.

  • identifier-type can only be applied to the <dc:identifier>and <dc:source> elements to indicate the form or nature of an identifier.  In this case we can use the scheme attribute to specify the controlled vocabulary of any publication identification system used to fill-in its value. In the following code snippet:
identifier-type property applied to <dc:identifier>

Code snippet 8. identifier-type property applied to <dc:identifier>

The value onix:codelist5  indicates that the refine value —01— is the code corresponding to a legacy system, within the controlled vocabulary List 5: Product identifier type code (2014) in ONIX for Books (2015).

  •  title-type  can only be applied to the  <dc:title> element to indicate its form or nature. As in previous cases, it is possible to specify a controlled vocabulary with the scheme attribute. In the following code snippet:
title-type property is used to declare edition number, main title and collection

Code snippet 9.  title-type property is used to declare edition number, main title and collection

Refines in the type of title property allow us to declare the actual title (main), the edition number (edition) and the collection title linked to the publication (collection).

  • role can only be applied to the <dc:contributor> and <dc:creator> elements to describe the nature of contribution (for instance, that a person is author or publisher of a work).In this case we can use again the scheme attribute to specify the controlled vocabulary of any publication identification system used to fill-in its value. In the following code snippet:
role property to indicate the author type of  responsibility

Code snippet 10. role property to indicate the author type of  responsibility

the marc:relators value indicates that the refine value —aut— is the code corresponding to author role wihin the MARC code list for relators (Library of Congress. Network Development and MARC Standards Office, 2014) controlled vocabulary.

In the second category, refines that identify variations over the designation of an element, we can include alternative-scriptfile-as and display-seq, which can be applied to any element.

  •  alternative-script  allows us to include an alternative expression in another language or, if appliable, in another writing system, to the refined element. It is usually applied to <dc:creator> and <dc:title>elements.

  • file-as allows us to include a normalized form of a value in a specific metadata element, aiming at obtaining a good alphabetical order. For example, let us have an instance of <dc:creator> element such as <dc:creator id=&"creator&">Isaac Ravetllat </dc:creator>; in this case, we can include in the  <metadata> element the inverted form of the autor name —i.e, &"Ravetllat, Isaac&"— within a file-as refine: <meta refines=&"#creator&" property=&"file-as&">Ravetllat, Isaac</meta>. Similarly, we can give alternative versions to <dc:title> element if it starts with articles, demonstratives, etc, to generate an optimal form for alphabetical ordering.

  •  display-seq indicates the position of the current instance relative to other instances of the same element. For example, in a publication the content of which is expressed in more than a language we can indicate the priority of language display with the application of this refine. In the following code snippet an exemple of application of display-seq property in a &"Package&" document.
display-seq property applied to a multilingual document

Code snippet 11. display-seq property applied to a multilingual document

Using display-seq refine allows us to indicate that English (indicated by code en) is the main language, while Spanish (indicated by code es) is a second language in this publication.

The vocabulary to fill-in the values of the property  attribute includes other kind of refines beyond these two categories, such as belongs-to-collectioncollection-typegroup-positionmeta-authsource-of.

Both refine categories can benefit the publication when ebook search or ebook organization tools are used. However, we will now focus on two <meta> elements that are compulsory in every &"Package&" document, as shown in Table 1: the element with dcterms:modified property, and the element with media:duration property. 

 

5.3. <meta>element with dcterms:modified property

Every rendition must incorporate an instance of the <meta> element, declaring the date of the last modification. This element must contain a property attribute, with a value of dcterms:modified, linked to the DCMI Metadata Terms (2014) scheme (from now on, DCTERMS). The meaning of this element shall not be confused with the optional element <dc:date> used to define EPUB publication date, as we saw in section 4.

The value of <meta> element with dcterms:modified property must be expressed in conformtiy with dateTime types in XML Schema (XML Schema Part 2: Datatypes Second Edition), following the format  CCYY-MM-DDThh:mm:ssZ, i.e. complete date plus hours and minutes9

An example of an instance of the <meta> element with dcterms:modified property is <meta

property=&"dcterms:modified&">2014-03-17T12:10:04Z</meta>.

Additional <meta> elements could be included with modification properties, but their goal cannot be to declare the last modification date of the rendition; instead, they shall incorporate a refine attribute referring to another element or resource.

 

5.4 <meta> elements to describe the duration of the whole set of presentations and the duration of media overlays

EPUB Media Overlays 3.0.1 (2014) specification specifies that the duration of the whole presentation must be described at least, as well as the duration of each media overlay incorporated in the EPUB publication. In every case, the element must contain a property attribute, with a media:duration value, linked with the &"Media Overlays Metadata Vocabulary&" (2014) schema. As indicated above, this is a reserved vocabulary, and in consequence, its namespace does not need to be declared with the prefix attribute in the <package> element.

The value of the <meta> element with a media:duration  property must be expressed as a clock value wich is a subset of values within the Synchronized Multimedia Integration Language (SMIL) (SMIL 3.0 Timing and Synchronization, 2008). The specific durations represent the audio clips in Edition time in a way that they do not include live transmission, external resources, nor playing times.

An example of a description of the duration time of a complete rendition is shown in Code snippet 12.

Indication of duration of a multimedia content with the property media:duration

Code snippet 12. Indication of duration of a multimedia content with the property media:duration

This resource spans for an hour, fifty-three minutes, twenty-five seconds and sixty-five hundredths of second.

Incorporating <meta> elements to describe the duration of media overlays requires to declare and identify these media overlays abovehand, in <manifest> section. In this section, each component must be declared with an <item> element. Following, find an example of media overlays declaration.

media-overlay declared within <manifest>

Code snippet 13. media-overlay declared within <manifest>

The content of an <item> element must include (compulsory) an id  attribute to identify it (1). In this example, the <meta> identified with item_1 is referring to a document with EPUB content. In the same element, an optional media-overlay (2) attribute is included, referring to the media overlay identifier to be described in another <item> element, identified by item_44 id.

<meta> element with the identifier item_44 (3) describes in detail the media overlay. In the content, there is a compulsory attribute media-type (4), which value is — as it is expected in media overlays— application/smil+xml ("Core Media Types", 2014).

Once a media overlay is declared and identified in the <manifest> section, it is possible to describe its duration with a refine <meta> element with the property media:duration. See an example in the following code, which refines the media overlay declared from the previous example.

Indication of duration of a previously declared media-overlay

Code snippet 14. Indication of duration of a previously declared media-overlay

The reference vocabulary &"Media Overlays Metadata Vocabulary&" (2014) offers other properties to describe many aspects of the whole rendition and of a particular media overlay. Narrator is a property that applies to all cases which identifies the name of the speaker.

For the whole rendition, there are two addtional properties:

  • active-class: the CSS class name defined by the author to apply to the content element within the EPUB document which is currently playing.

  • playback-active-class: the CSS class name defined by the author to apply to the content element within the EPUB document, while it is being played.
 

6 <link> element

<link> element provides a mechanism to associate a full metadata record to an EPUB publication, among other possibilities. As we have already mentioned in section 1, a metadata record is a set of elements which fully describe a publication, following a predefined metadata schema and expressed in a specific data interchange format.

Example of a publication’s metadata record from Dipòsit Digital de la UB  (2008- )

Figure 1. Example of a publication’s metadata record from Dipòsit Digital de la UB
(2008- )

When applying the <link> element, two fundamental considerations must be taken into account:

  • This mechanism allows to associate metadata records coming from any schema.
  • The associated metadata records can be located in the EPUB container or outside the publication (in a publisher’s catalog, in a library’s catalog, in a repository, etc... available on the web).

The <link> element accepts 5 attributes, two compulsory: rel and href. The following example illustrates the application of these attributes.10

Examples of use of attributes rel i href within a link element

Code snippet 15. Examples of use of attributes rel i href within a link element

  • The rel (1) attribute is used to indicate that the associated object is a metadata record and the schema under which it is created. Every available value for this attribute is defined by the Standard vocabulary EPUB Link Relationships Vocabulary (2013). Some schema have specific values assigned to them; for example, onix-record value indicates that the metadata record follows the schema ONIX for books (2015). Beyond these well-known schemas, the generic value to associate a metadata record is record.

  • href (2) attribute indicates the location of the metadata record, be it inside the EPUB publication container or outside it.

The procedure to declare space names related to the applied schemas in <link> element is exactly the same as the procedure for <meta> elements. 

 

7 <cover-image> element

A special case of the appliable metadata to EPUB publications is the element relative to the publication cover —cover-image. EPUB Publications 3.0.1 (2014) specification stablishes that this cover-image is one of the few publication components that could be declared in <manifest> section. As an example of an <item> declaration describing the cover image, see the following code.

Declaration of cover imatge within a <manifest> element

Code snippet 16. Declaration of cover imatge within a <manifest> element

The id attribute value, in <item> element links this component of the publication to the metadata element related to the cover image.

Cover images are essential for the commercial identification of electronic books along the distribution chain. Reading appliances supporing EPUB3 are able to show and perfectly identify this image through the corresponding <item> element within <manifest> section. EPUB Publications 3.0.1 (2014), on the sack of backward compatibility with EPUB2 version (Open Packaging Format (OPF) 2.0.1 v1.0., 2010), on its section &"3.4.8 The meta Element (OPF2)&" accepts incorporating a a <meta> element to describe the cover image, as it is required by many digital ebook distribution platforms. However, the specification also gives a warning to ignore it for reading appliances supporting EPUB3.

Example of cover-image expressed as <item> and described with a <meta> element.

Figure 2. Example of cover-image expressed as <item> and described with a <meta> element.

 

This element improves the appearance of the book in digital collections appearing on the main distribution platforms, and also helps users identify the book at first sight in their personal libràries within reading appliances.

 

8 Conclusions and recommendations

EPUB3 specifications offer the capacity to include metadata elements related to context, content and publication structure within a fundamental component, the &"Package&" document. This document specifies and describes the resources necessary for a specific rendition of an EPUB publication. The reference metadata schema is Dublin Core Metadata Element Set (DCMET), although mechanisms to incorporate elements coming from other metadata schema (the <meta> element) exist as well, they could also link publications with metadata records located inside the same EPUB container or outside it (through <link> elements).

Among all metadata elements to be included in a &"Package&" document, five of them are compulsory and therefore affect the validity of EPUB publications. Three of them are related to the reference Dublin Core schema (<dc:identifier>, <dc:title> i <dc:language>). The remaining two (last modification date of a rendition and duration time for a complete rendition or for each of its media overlays declared in the <manifest> element) must be declared as <meta> elements.

The declaration of metadata elements offers enriching mechanisms aiming at clarifying the meaning of values assigned to elements (for example, to clarify which role has any person with an intellectual authorship in <dc:creator> element), or at declaring variants of a property (for example, an actual title and a variant title in distinct <dc:title> elements).

To sum up, we can conclude that mechanisms available in EPUB publications to describe context, content and structure could fully replace, in origin, the tools specifically created for cataloging, and that they offer full capacity to the actors over tasks traditionally located further down in the books value chain.

The way EPUB3 format deals with internal metadata shows a priori the consolidation of publishers and productors assuming a cataloging role for their digital books, and the merging of this role with others such as creation or design of content. All indicators seem to emphsasize the increasing importance of EPUB3 format in the publishing world. As a consequence, there are mechanisms and processes to include metadata authorship in the creation step (cataloging in origin), as a way to facilitate metadata reuse and exploitation in further steps, without losing necessary values such as the identity of the publication on posterior changes nor sacrificing the possibility of enriching these data in further steps.

The benefits derived from advancing the creation of publications metadata in the first steps collide with two use contexts fundamental to the user experience: the digital books distribution platforms and the ebooks reading appliances or applications.

 

Bibliography

Barker, P.; Campbell, L. M. (2015). &"LRMI, Learning Resource Metadata on the Web&". In: International Conference on World Wide.

&"WWW'15 Companion: proceedings of the 24th International Conference on World Wide Web&" (2005). International World Wide Web Conferences Steering Committee. Geneva, Switzerland.

&"Core Media Types&" (2014). In: EPUB Publications 3.0.1: proposed specification 28 February 2014. <http://www.idpf.org/epub/301/spec/epub-publications.html#sec-core-media-types>. [Retrieved: 20/03/2014].

DCMI Metadata Terms (2014). <http://dublincore.org/documents/dcmi-terms/>. [Retrieved: 05/08/2014].

Dipòsit Digital de la UB (2008-). Barcelona: Universitat de Barcelona. Centre de Recursos per a l'Aprenentatge i la Investigació. <http://diposit.ub.edu/dspace/>. [Retrieved: 25/01/2015].

Dublin Core Metadata Element Set, Version 1.1. (2014). <http://dublincore.org/documents/dces/>. [Retrieved: 25/07/2014].

EPUB Content Documents 3.0.1 (2014). <http://www.idpf.org/epub/301/spec/epub-contentdocs.html>. [Retrieved: 04/08/2014].

EPUB Link Relationships Vocabulary (2013). <http://www.idpf.org/epub/vocab/package/link/>. [Retrieved: 30/07/2014].

EPUB Media Overlays 3.0.1 (2014). <http://www.idpf.org/epub/301/spec/epub-mediaoverlays.html>. [Retrieved: 04/08/2014].

EPUB Open Container Format (OCF) 3.0.1 (2014). <http://www.idpf.org/epub/301/spec/epub-ocf.html>. [Retrieved: 04/08/2014].

EPUB Publications 3.0.1: proposed specification 28 February 2014 (2014). <http://www.idpf.org/epub/301/spec/epub-publications.html>. [Retrieved: 20/03/2014].

EPUB Publications Reserved Prefixes (2013). <http://www.idpf.org/epub/vocab/package/pfx/reserved-20131108.html>. [Retrieved: 30/07/2014].

EPUB Registry of Publication Types (2014). <http://www.idpf.org/epub/vocab/package/types/>. [Retrieved: 01/08/2014].

FOAF Vocabulary Specification (2014). <http://xmlns.com/foaf/spec/>. [Retrieved: 25/01/2015].

HTML Microdata: W3C Working Group Note 29 October 2013. (2013). <http://www.w3.org/TR/microdata/>. [Retrieved: 25/01/2015].

ISO 8601:2004: data elements and interchange formats: information interchange: representation of dates and times (2004). Geneva: International Organization for Standardization.

ISO/IEC TS 30135: Information Technology: Digital Publishing: EPUB3 (2014). Geneva: International Organization for Standardization.

Library of Congress. Network Development and MARC Standards Office (2014). MARC code list for relators. <http://www.loc.gov/marc/relators/relaterm.html>. [Retrieved: 01/08/2014].

&"Media Overlays Metadata Vocabulary&" (2014). En: EPUB Media Overlays 3.0.1. <http://www.idpf.org/epub/301/spec/epub-mediaoverlays.html>. [Retrieved: 04/08/2014].

ONIX for Books: Codelists Issue 26 (2014). <http://www.editeur.org/files/ONIX%20for%20books%20-%20code%20lists/ONIX_BookProduct_Codelists_Issue_26.html>. [Retrieved: 04/08/2014].

Open Packaging Format (OPF) 2.0.1 v1.0. (2010). <http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm>. [Retrieved: 30/07/2014].

Register, Renée (2012). Development, use, and modification of book product metadata. New York: Book Industry Study Group.

Sigarchian, Hajar Ghaem et al. (2014). &"EPUB 3 for integrated and customizable representation of a scientific publication and its associated resources&". CEUR workshop proceedings, vol. 1.282. <http://ceur-ws.org/Vol-1282/lisc2014_submission_3.pdf>. [Retrieved: 18/09/2015].

SMIL 3.0 Timing and Synchronization (2008). <http://www.w3.org/TR/SMIL/smil-timing.html#q22>. [Retrieved: 16/01/2015].

Wolf, Misha; Wicksteed, Charles (1997). Date and Time Formats. W3C. <http://www.w3.org/TR/NOTE-datetime>. [Retrieved: 31/07/2014].

XHTML+RDFa 1.1. (2013). Second Edition. W3C. <http://www.w3.org/TR/xhtml-rdfa/>. [Retrieved: 25/01/2015].

 

Notes

1 The remaining EPUB standards are EPUB Content Documents 3.0.1. (2014), which defines XHTML, SVG and CSS profiles in the context of EPUB publications , and EPUB Open Container Format (OCF) 3.0.1 (2014), which defines a file format and a processing model to encapsulate a set of resources in a EPUB container, consisting in a unique file (ZIP).

2 This article has been funded by 2014PID_UB/015 project, by Universitat de Barcelona.

3 Adaptabit is a Consolidated Innovation group dedicated to digital accessibility in teaching, research and teaching innovation, within theMathematics and Computer Science Department of Universitat de Barcelona, with members of the Library and Information Science Department.

4 Dublin Core Metadata Element Set (2014) includes fifteen metadata elements (named properties) to describe information resources. It is part of a more general schema created and matntained by the Dublin Core Metadata Initiative, named DCMI Metadata Terms (2014).

5 The "3.0" value on this attribute indicates the conformity of a specific rendition to EPUB Publications 3.0.1 (2014).

6 XML schema for &"Package&" documents is available at <http://www.idpf.org/epub/301/schema/package-30.nvdl>.

7 EPUB Publications 3.0.1 (2014) specification designates the use of this meta element as &"primary expression&".

8 EPUB Publications 3.0.1 (2014) specification designates the use of this meta element as &"subexpression&".

9 This expression format belongs to W3CDTF profile of ISO 8601 international standard, mentioned in Wolf; Wicksteed, 1997.

10 This example is taken from EPUB Publicacions 3.0.1 (2014).

Recommended citation

Centelles Velilla, Miquel (2015). "Assigning metadata to digital books in EPUB3. Part one: metadata in EPUB3 specifications". BiD: textos universitaris de biblioteconomia i documentació, núm. 35 (desembre) . <http://bid.ub.edu/en/35/centelles.htm>. [Consulta: 19-02-2020].

 

similar articles in BiD

similar articles in Temària

Temària's articles of the same author(s)

Centelles Velilla, Miquel   Ribera, Mireia   


[ more information ]