You are on page 1of 41

Introduction to XML

Chapter 1

1
Chapter Objectives -1
 Discuss markup language
 List and explain drawbacks of HTML
 Discuss the architecture of XML documents
 List the benefits of XML
 Discuss Parser

Core XML / Chapter 1 / Slide22 of 35


Chapter Objectives -2
 Build a complete XML Document:
 Character Data
 Comments

 Processing Instructions

 Entities

 General Entities

 Parameter Entities

 The DOCTYPE Declarations

Core XML / Chapter 1 / Slide33 of 35


History of Markup
Documents recorded Typesetters formatting
using paper and pen documents

Tools used by typesetters


to format a document

Core XML / Chapter 1 / Slide44 of 35


Markup Language
 A Markup language defines the rules that help to add
meaning to the content and structure of documents.
 They are classified as:
 Stylistic Markup – It determines the presentation of the
document
 Structure Markup – It defines the structure of the
document
 Semantic Markup – It determines the content of the
document

Core XML / Chapter 1 / Slide55 of 35


SGML
 Generalized Markup Language (GML) is the
system of formatting documents.
 GML was fine-tuned and came to be known
as Standard Generalized Markup Language
(SGML).
 SGML is the source of origin of all markup
languages
Core XML / Chapter 1 / Slide66 of 35
Features of SGML
 It describes markup language, which allows
authors to create their own tags that relate to
their content.
 It needs a separate file that will contain all
the rules for the language, for its
interpretation
 A SGML application is markup language
derived from SGML.
Core XML / Chapter 1 / Slide77 of 35
HTML
 HTML is the most famous markup language derived
from SGML.
 It was created to mark up technical papers so that
they could be transferred across different platforms
for the scientific community.
 It is now also used by those non-scientific users who
are concerned about their document’s presentation.

Core XML / Chapter 1 / Slide88 of 35


Drawbacks of HTML
 Fixed tag set
 Presentation technology does not relate to the contents
 It is flat
 Clogging
 HTML is not international
 Data interchange is impossible
 Does not have a robust linking mechanism
 HTML is not reusable

Core XML / Chapter 1 / Slide99 of 35


HTML and XML code Examples
<UL> HTML Code <Details> XML Code
<LI> TOM CRUISE <CONTACT>
<UL> <PERSON_NAME>TOM CRUISE
<LI> CLIENT ID : 100 </PERSON_NAME>
<LI> COMPANY : XYZ Corp. <ID> 100 </ID>
<LI> Email : tom@usa.net
<Company>XYZ Corp. </Company>
<LI> Phone : 3336767
<LI> Street Adress: 25th <Email> tom@usa.net</Email>
St. <Phone> 3336767 </Phone>
<LI> City : Toronto <Street> 25th St. </Street>
<LI> State : Toronto <City> Toronto </City>
<LI> Zip : 20056 <State> Toronto </State>
</UL> <ZIP> 20056 </ZIP>
</UL>
</CONTACT>
</Details>
1010 of 35
Core XML / Chapter 1 / Slide
XML -1
 XML stands for Extensible Markup Language.
 It overcomes all the drawbacks of HTML.
 It allows the user to define their own set of tags, and also makes
it possible for others (people or programs) to understand it.
 It is more flexible than HTML.
 It inherits the features of SGML and combines it with the
features of HTML.
 It is a smaller version of SGML.

1111 of 35
Core XML / Chapter 1 / Slide
XML -2
 XML is a metalanguage and it describes other
languages.
 The data contained in an XML file can be displayed
in different ways.
 It can also be offered to other applications for further
processing.
 Style sheets help transform structured data into
different HTML views. This enables data to be
displayed on different browsers.

1212 of 35
Core XML / Chapter 1 / Slide
XML Architecture - 1
 XML supports three-tier architecture for handling
and manipulating data.
 It can be generated from existing databases using a
scalable three-tier model.
 XML tags represent the logical structure of data that
can be interpreted and used in various ways by
different applications.
 The middle-tier is used to access multiple databases
and translate data into XML.

1313 of 35
Core XML / Chapter 1 / Slide
XML Architecture -2

1414 of 35
Core XML / Chapter 1 / Slide
XML – A Universal data format
 HTML is a single markup language, but XML is a family
of markup languages.
 Any type of data can be easily defined in XML.
 XML is popular because it supports a wide range of
applications and is easy to use.
 XML has a structured data format, which allows it to store
complex data

1515 of 35
Core XML / Chapter 1 / Slide
Benefits of XML
 The three-tier architecture has easier
scalability and better security.
 The benefits of XML are classified into the
following:
 Business benefits
 Technological benefits

1616 of 35
Core XML / Chapter 1 / Slide
Business Benefits
 Information sharing:
 Allows businesses to define data formats in XML
 Provides tools to read, write and transform data between
XML and other formats
 XML inside a single application:
 Powerful, flexible and extensible language
 Content Delivery:
 Supports different users and channels, like digital TV,
phone, web and multimedia kiosks

1717 of 35
Core XML / Chapter 1 / Slide
Technological Benefits
Separation of data and
presentation

Semantic Technological Extensibility


information Benefits

Re-use of data

1818 of 35
Core XML / Chapter 1 / Slide
XML Document Structure
 An XML document is composed of sets of
“entities” identified by unique names.
 All documents begin with a root or document
entity.
 Entities are aliases for more complex functions.
 Documents are logically composed of declarations,
elements, comments, character references, and
processing instructions.

1919 of 35
Core XML / Chapter 1 / Slide
Well formed and Valid Documents
 An XML document is considered as well formed, if a
minimum set of requirements defined in the XML 1.0
specification are satisfied.
 The requirements ensure that correct language terms are
used in the right manner .
 A valid XML document is a well-formed XML
document, which conforms to the rules of a Document
Type Definition (DTD).
 DTD defines the rules that an XML markup in the XML
document must follow.

2020 of 35
Core XML / Chapter 1 / Slide
Parsers - 1
 Parsers help the computer interpret an XML
file.
<?xml
version=“1.0”?
>
<nxn>
</nxn>

Editor with the XML document parsed by the Parsed document


XML document parser viewed in the browser

Their are two types of parsers:



Non Validating parser
Validating parser
2121 of 35
Core XML / Chapter 1 / Slide
Parsers - 2
XML
file
Parsers load the XML
and other related files
to check whether the
XML document is well
formed and valid
Other related Data tree
files (like DTD
file)
2222 of 35
Core XML / Chapter 1 / Slide
Data versus Markup
Markup

<NAME> Tom Cruise </NAME>


Data

2323 of 35
Core XML / Chapter 1 / Slide
Creating an XML Document
 To create an XML document:
 State an XML declaration
 Create a root element

 Create the XML code

 Verify the document

2424 of 35
Core XML / Chapter 1 / Slide
Stating an XML Declaration
 Syntax

<?xml version=“1.0” standalone=“no” encoding=“UTP-8”?>


 ‘Standalone’ and ‘encoding’ attributes are
optional, only the version number is mandatory
 ‘Standalone’ – is the external declaration
 ‘Encoding’ - specifies the character encoding
used by the author
 XML 1.0 version is default
2525 of 35
Core XML / Chapter 1 / Slide
Creating a Root Element
 There can only be one root element
 It describes the function of the document
 Every XML document must have a root
element
Example
<?xml version=“1.0” standalone=“no” encoding=“UTP-8”?>
<BOOK>
</BOOK>
2626 of 35
Core XML / Chapter 1 / Slide
Creating the XML Code -1
 It is the process of creating our own elements
and attributes as required by our application.
 Elements are the basic units of XML content.
 Tags tell the user agent to do something to the
content encased between
Opening Tagthe start
ContentandClosing
end tag.
Tag
Parts of an
element <TITLE> Aptech Ltd </TITLE>

Element
2727 of 35
Core XML / Chapter 1 / Slide
Creating the XML Code -2
 Rules govern the elements:
 At least one element required
 XML tags are case sensitive

 End the tags correctly

 Nest tags Properly

 Use legal tags

 Length of markup names

 Define Valid Attributes

2828 of 35
Core XML / Chapter 1 / Slide
Verify the document
 The document should follow the
XML rules; otherwise it will not be
read by the browser or by any other
XML reader

2929 of 35
Core XML / Chapter 1 / Slide
Comments
 This is information for the understanding of
the user, and is to be ignored by the processor.
 Syntax
<!- - Write the comment here -- >

Example The example given will


<!-- don't show these
<NAME>KATE WINSLET</NAME>
display only the name TOM
<NAME>NICOLE KIDMAN</NAME> CRUSIE, and others are
-->
<NAME>ARNOLD</NAME> treated as comments.
<NAME>TOM CRUISE</NAME> 3030 of 35
Core XML / Chapter 1 / Slide
Processing Instruction
 A processing information is a bit of information meant
for the application using the XML document.
 These instructions are directly passed to the application
using the parser.
 The XML declaration is also a processing agent.
<?xml:stylesheet type=“text/xsl”?>

Name of application Instruction information

3131 of 35
Core XML / Chapter 1 / Slide
Character Data
 The text between the start and end tags is
defined as ‘character data’.
 Character data may be any legal (Unicode).
 Character data is classified into:
 PCDATA
 CDATA

3232 of 35
Core XML / Chapter 1 / Slide
PCDATA
 It stands for parsed character data.
 PCDATA is text that will be parsed by a Parser.
 Tags inside the text will be treated as markup and
entities will be expanded.

Entity Name Character


&lt; <
&gt; > Predefined entities
 
&amp; &
&quot; "
&apos; '
3333 of 35
Core XML / Chapter 1 / Slide
CDATA
 It means character data.
 It will not be parsed by the Parser.
 CDATA are used to make it convenient to include
large blocks of special characters.
 The character string ]]> is not allowed within a
CDATA block as it will signal the end of the
CDATA block.
<SAMPLE>
<![CDATA[<DOCUMENT>
<NAME>TOM CRUISE</NAME>
Example <EMAIL>tom@usa.com</EMAIL>
</DOCUMENT>]]>
</SAMPLE>
3434 of 35
Core XML / Chapter 1 / Slide
Entities
 Entities are used to avoid typing long pieces of text
repeatedly within a document.
 There are two categories of entities:
 General entities
Syntax
<!ENTITY ADDRESS "text that is to be represented
by an entity">
 Parameter entities
Syntax
<!ENTITY % ADDRESS "text that is to be represented by an entity">

3535 of 35
Core XML / Chapter 1 / Slide
Examples of Entities
An example of Parameter entities An example of a General entity
< CLIENT = "&APTECH;" PRODUCT
= "&PRODUCT_ID;" QUANTITY <!ENTITY full_address " My
= "15"> Address 12 Tenth Ave. Suite 12
 Entity declaration Paris, France">
 Syntax  Entity declaration
%PARAMETER_ENTITY_NAM  Syntax

E; &ENTITY_NAME;
 Example  Example

%address; &address;

3636 of 35
Core XML / Chapter 1 / Slide
The DOCTYPE declarations
 The <!DOCTYPE [..]> declaration follows the XML
declaration in an XML document.
 Syntax
<?xml version="1.0"?>
<!DOCTYPE myDoc [
...declare the entities here....
<myDoc>
...body of the document....
</myDoc>

Example
<!DOCTYPE CUSTOMERS [
<!ENTITY firstFloor "15 Downing St Floor 1">
<!ENTITY secondFloor "15 Downing St Floor 2">
<!ENTITY thirdFloor "15 Downing St Floor 3">
]>
3737 of 35
Core XML / Chapter 1 / Slide
Attributes
 An attribute gives information about an
element.
 Attributes are embedded in the element start
tag.
 An attribute consists of an attribute name and
attribute value.
Example
<TV count="8">SONY</TV>
<LAPTOP count="10">IBM</LAPTOP>

3838 of 35
Core XML / Chapter 1 / Slide
Summary-1
 A markup language defines a set of rules that adds meaning to the content
and structure of documents
 XML is extensible, which means that we can define our own set of tags,
and make it possible for other parties (people or programs) to know and
understand these tags. This makes XML much more flexible than HTML
 XML inherits features from SGML and includes the features of HTML.
XML can be generated from existing databases using a scalable three-tier
model. XML-based data does not contain information about how data
should be displayed
 An XML document is composed of a set of “entities” identified by unique
names

3939 of 35
Core XML / Chapter 1 / Slide
Summary-2
 A well-formed document is one that conforms to the basic rules of XML;
a valid document is a well-formed document that conforms to the rules of
a DTD (Document Type Definition)
 The parser helps the computer to interpret an XML file
 Steps involved in the building of an XML document are:
 Stating an XML declaration

 Creating a root element

 Creating the XML code

 Verifying the document

 Character data is classified into PCDATA and CDATA

4040 of 35
Core XML / Chapter 1 / Slide
Summary-3
 Entities are used to avoid typing long pieces of text repeatedly
in a document. The two types of entities are:
 General entities

 Parameter entities

 The <!DOCTYPE […]> declaration follows the XML


declaration in an XML document.
 An attribute gives information about an element

4141 of 35
Core XML / Chapter 1 / Slide

You might also like