You are on page 1of 41

Information and Communication Technology Department

UNIT 1

INTRODUCTION TO
MARKUP LANGUAGES

Markup Languages and Information Management Systems


Information and Communication Technology Department

What is a markup language?

It is a system for annotating a document in a way that


it is syntactically distinguishable from the text.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Examples of graphical annotations in documents

… but these are not semantic annotations

Markup Languages and Information Management Systems


Information and Communication Technology Department

Examples of annotations with a markup language?

Document

tags
(Syntactically distinguishable)

Document with annotations

Markup Languages and Information Management Systems


Information and Communication Technology Department

Writing documents
We write documents using different types of characters.

Numerical digits

Punctua
tion mark
s

ase letters
l and lowerc
Capita

Control Characters

Markup Languages and Information Management Systems


Information and Communication Technology Department

… but computers just work with electricity.

It’s necessary to translate human


characters into electrical pulses.

Markup Languages and Information Management Systems


Information and Communication Technology Department

ASCII
(American Standard Code for Information Interchange)

Each character is represented by 7 bits.


How many characters can be represented in this code?

Code Types of characters Examples


0-31 Control characters Bell, carriage return, horizontal tab
32-126 Printable characters Añ?-:9

ASCII table

Markup Languages and Information Management Systems


Information and Communication Technology Department

Extended ASCII

Each character is represented by 8 bits.


How many characters can be represented in this code?

ASCII and Extended ASCII represent the same


characters with codes 32-126.

There are many Extended ASCII encoding. Some of them are:


ISO 8859-1, ISO 8859-2, ISO 8859-5, ISO 8859-15, etc.

Extended ASCII table

Markup Languages and Information Management Systems


Information and Communication Technology Department

Unicode
It contains more than 128.000 characters, covering 135 modern and
historic scripts, as well as, multiple symbol sets.

Unicode defines three types of encoding:

Encoding Each character is represented by…

UTF-8 1 – 4 bytes

UTF-16 2 or 4 bytes

UTF-32 4 bytes

Markup Languages and Information Management Systems


Information and Communication Technology Department

Binary files vs text files

Binary file Text file

Files opened with a text editor

Bits embedded within characters Just characters


(metadata)

Markup Languages and Information Management Systems


Information and Communication Technology Department

Word processors vs text editors


File written with a processor File written with an editor

Plain text

Markup Languages and Information Management Systems


Information and Communication Technology Department

Kinds of markup

Procedural Markup Structural Markup


Tags provide instructions for the Tags are used to label different parts of
programs that are going to process the the document. They are related to the
document. They are related to style and structure of the text.
appearance of the text.

Could you match each tag with its type?

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

GML (Generalized Markup Language)

In the 1960s, IBM created this language in order to store great amounts
of information in a tidy way.
Example of GML Document

:h1.Chapter 1: Introduction
Tags are
:p.GML supported hierarchical containers, such as
preceded by
colon character :ol.
:li.Ordered lists (like this one),
:li.Unordered lists, and
:li.Definition lists
:eol. as well as simple structures.
:p.Markup minimization (later generalized and formalized in SGML), allowed the end-tags to be omitted for the "h1"
and "p" elements.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

SGML (Standard Generalized Markup Language)

It descended from GML and was originally designed to share large-project


documents in government, law and industry.
It was also applied by military and aerospace technical reference.

Example of SGML Document

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

HTML (HyperText Markup Language)

It descended from SGML to allow the creation of hypertext in the World


Wide Web. Several versions have been released.

What is the meaning of “hypertext”?

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

HTML (HyperText Markup Language)

A few questions about the World Wide Web:

• What is it?
• When was it created?
• Who created it?
• What was it created for?
Tim Berners-Lee

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

HTML (HyperText Markup Language)

Answer this question:


What is the physical location of the web pages on the Internet?

Hypertext is the foundation of data communication in WWW as it allows


us to jump from one server to another by clicking on links.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

HTML (HyperText Markup Language)

At the beginning, HTML included procedural and structural markup, but


last versions focus just in the logical structure of the documents.

The W3C has published several HTML recommendations, such as:

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

What is the meanning of “W3C”?

Search on the Internet to answer these questions:

1.When was W3C created? Who founded it and leads it?


2.What is W3C’s mission? What design principles guide its work?
3.What is the Web Accessibility Initiative (WAI)?
4.Explain different web accesibility perspectives, the people who depend on
them, and their benefits.
5.What is the mission of the W3C i18n activity?
6.What is W3C’s vision of the web?

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

The word “eXtensible” refers to the fact that there is no limit in


the number of tags included in a document.

We can use as many tags as we need in the documents we write.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

It was created from SGML in order to work in an easier way.

In 1998, W3C released its first recommendation (version 1.0).

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

This language doesn’t define tags but some syntactic rules to write
documents.

It is a metalanguage, that is, a language used to create other languages


by defining the name, location and meaning of their tags.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

It was created as a system to define, validate and share document


formats on the web.

Nowadays, it is used to share information between different


platforms and applications, for example, in databases,
spreadsheets, editors, etc.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

Let’s see an example of XML document:

Imagine that we would like to write this information in a well-


structured document:

• “Visual C#”,”Fco. Javier Ceballos”,”Ra-Ma”,”936”,”52.75”


• “Programación en C”,”Luis Joyanes Aguilar”,”McGraw-
Hill”,”735”,”45,25”

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

<library>
<book>
<title>Visual C#</title>
<author>Fco. Javier Ceballos</author>
<publisher>Ra-Ma</publisher>
<pages>936</pages>
<price>52.75</price>
</book >
<book>
<title>Programación en C</title>
<author>Luis Joyanes Aguilar</author>
<publisher>McGraw-Hill</publisher>
<pages>735</pages>
<price>45,25</price>
</book >
</library >

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

Data describes itself by using tags that make up a vocabulary.

We can observe data structure, as some tags are defined inside


other ones.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XML (eXtensible Markup Language)

Common applications:

•XML for web sites.


•XML for communication between
applications.
•XML for program configurations.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XHTML (eXtensible HyperText Markup Language)

It is an adaptation of HTML 4.01 in order to create correct documents


for the web.

It consists of almost all of HTML 4.01 tags while it uses XML rules.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XHTML (eXtensible HyperText Markup Language)

Although XHTML is very similar to


HTML 4.01, it is a direct
descendant of XML.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Evolution of markup languages

XHTML (eXtensible HyperText Markup Language)

XHTML allows to separate contents and presentation.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Tecnologies related to XML

eXtensible Linking Laguage (XLL)

• XML Linking Language (XLink), describes how to define links in an


XML document in order to connect it with other resources.
• XML Base (XBase), describes how to write URIs easily in an XML
document.
• XML Pointer (XPointer), describes how to access to a specific part of
an XML document.

More information at https://www.w3schools.com/xml/xml_xlink.asp

Markup Languages and Information Management Systems


Information and Communication Technology Department

Tecnologies related to XML

eXtensible Stylesheet Language (XSL)

• XSL Transformations (XSLT). It describes how to transform an XML


document into another document.
• XML Path Language (XPath). It is used by XSLT to access to
different parts of an XML document.
• XML Formatting Objects (XSL-FO). It is used to choose a visual
format for an XML document, commonly, PDF format.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Tecnologies related to XML

XML document criptography

Document Type Definition XML Schema XML Query Language

Digital Signature for XML

Simple API* for XML


API* for editing XML documents

Embed XML documents in HTML Automatic creation of HTML from XML


XML format for collecting inputs
from web forms

*API, Application Programming Interface

Markup Languages and Information Management Systems


Information and Communication Technology Department

Other markup languages

TeX y LaTeX

TeX is a typesetting system designed by


Donald Knuth in 1978 to create quality
scientific documents.

LaTeX is a descendant of TeX but easier to


use as it has less commands.
Donald Knuth

Markup Languages and Information Management Systems


Information and Communication Technology Department

Other markup languages

TeX y LaTeX

Markup Languages and Information Management Systems


Information and Communication Technology Department

Other markup languages

Rich Text Format (RTF)

RTF is a proprietary document file format with published


specification developed by Microsoft Corporation for
document interchange.

Most word processors are able to read and write some


versions of RTF.

Markup Languages and Information Management Systems


Information and Communication Technology Department

Other markup languages

PostScript

It was created at Adobe Systems by


John Warnock and others, as a page
description language in the electronic
publishing and desktop publishing.

John Warnock

Markup Languages and Information Management Systems


Information and Communication Technology Department

JSON (JavaScript Object Notation)

It is a language-independent data format. It derives from JavaScript but


currently is used by many programming languages.

It is not consider a markup language because data and metadata are


written similarly.

Markup Languages and Information Management Systems


Information and Communication Technology Department

JSON (JavaScript Object Notation)

Possible description of a person in a


JSON document

Markup Languages and Information Management Systems


Information and Communication Technology Department

Specialized markup languages based on XML

What are these languages used for?

• RSS • OOXML
• Atom • RDF
• ePUB • SMIL
• DITA • SOAP
• MathML • SVG
• ODF • VoiceXML
• OSDF • WSDL

Markup Languages and Information Management Systems


Information and Communication Technology Department

Specialized markup languages based on XML

And much more markup languages…

Accounting Education Computers Energy


• XFRML • TML • SML • PetroXML
• SMBXML • SCORM • TDML • ProductionML
• LMML • GeophysicsML
Entertaiment Multimedia
• SMDL Software • MML Relationship with clients
• ChesGML • OSD • X3D • CIML
• BGML • PML • NAML
• BRML Mathematics

Manufacturing
• OpenMath

• SML

Markup Languages and Information Management Systems

You might also like