You are on page 1of 208

‫جامعة عمران‬

‫كلية الهندسة وتقنية‬


‫المعلومات‬

‫أساسيات تكنولوجيا الويب‬


‫لطلبة المستوى الثاني‬
‫علوم حاسوب ‪ -‬نظم معلومات ‪ -‬أمن‬
‫سيبراني‬

‫الجـزء الأول‬

‫أستاذ المقرر‬
‫أ ‪ /‬ماجد صالح أحمد الحاج‬

‫‪EMAIL : magedalhaj4@gmail.com‬‬
‫‪PHONE : 770091917‬‬
‫‪ 1444‬هـ ‪ 2022 -‬م‬
Motivation
 Web-based Knowledge & Data Management
 A huge amount of Web data
 how to organize, retrieve them, how to discover interesting
patterns and how to make a recommend from them
 Web Search Engine
 Uber Taxi and Didi Chuxing
 Amazon, Alibaba, Tencent, JD.com
 Web Blog Analysis
 Spam Email Detection
 Online Electronic Medical Data Analysis
 Electronic Health Care and eHealth
 Social Network Analysis
Amazon Business Model

3
Examples of Web Search
Engines

4
Examples of Web

5
Introduction to Client-Server Systems,
WWW and Web Technology

Week 1
Outline
 The Internet
 The Web
 What makes the Web work?
 HTTP
 URL
 HTML
 CGI
 Example of a Web page
 Summary
The Internet
To

IP Address: 123.21.12.131

From
The Internet

Worldwide collection of interconnected networks.

 Began in late ‘60s in ARPANET, a US project,


investigating how to build networks that could
withstand partial outages.
 Starting with a few nodes, Internet estimated to
have over 100 million users in 1997, and over 270
million users in over 100 countries in 1998, with
one million new users joining each month.

6
Historical View: Internet
 1969 - Telnet
 1970 - 4 computers
 Stanford, UCLA, UC Santa Barbara, U Utah
 1971 - FTP
 1983 - 562 computers on the internet
 1993 - 1.2 million computers on the internet
 1999 - ssh, sftp, ……
 2010 - Amazon, Alibaba, ……
 2020 - Smart-based devices, …….
Outline
 The Internet
 The Web
 What makes the Web work?
 HTTP
 URL
 HTML
 CGI
 Example of a Web page
 Summary
The Web
 World-Wide Web (Web, WWW)
 networked information system that provides a simple
way of browsing different types (text, pictures, video,
audio, etc.) of information on the Internet using
hyperlinks.
 Web pages
 electronic documents that typically contains several
types of information accessible via the World Wide Web
 Web sites
 a collection of related Web pages of a certain individual,
group, or organization.
 The Web uses a client/server model
Client-Server Model
Browser - software to interact
machine that services internet request
with internet data at the client

machine that initiates internet request


Client/Server Interaction

Request File
Browser
Display File

Send File

Server
What is a Web Server?

Web server
 computer running application software that listens and
responds to a client computer’s request made through a
web browser
 machine that hosts web pages and other web
documents
 provides web documents and other online services
using HTTP
What is a Web Browser?

Web browser
 application software that is used to locate and issue a
request for the page on the web server that hosts the
document
 It also interpret the page sent back by the web server
and display it on the monitor of the client computer
 computer program that lets you view and explore
information on the World Wide Web
Web Browsers

 Microsoft Internet Explorer – browser integrated with


the Windows operating system. Mac versions are
available.
 Netscape Navigator - available for Windows, Mac, and
Unix platforms.
 Opera – one of the alternatives to the two most popular
browser mentioned above
 Mozilla – open source web browser software
 Lynx - popular Unix text-based browser
 Google Chrome is a Google browser that combines a
minimal design with sophisticated technology.
Outline
 The Internet
 The Web
 What makes the Web work?
 HTTP
 URL
 HTML
 CGI ……
 Example of a Web page
 Summary
What Makes the Web Work?
The Web relies on these mechanisms:
 Protocols - set of standards used to access resources
via the Web
 Universal Resource Locator (URL) - uniform
naming scheme for Internet resources
 HTML - Document formatting language used to
design most Web pages
 CGI - Common Gateway Interface
 Servlet - Application run by a server connected to the
WWW. It is one of the most popular avenues for Java
development today.
Protocols
 Standard set of rules that governs how computers
communicate with each other, i.e. HTTP, FTP and
SMTP.
 HTTP (HyperText Transfer Protocol) is the underlying
protocol used to transmit information over the Web.
 HTTP is based on request-response paradigm:
 Connection: Client establishes connection with Web server.
 Request: Client sends request to Web server.
 Response: Web server sends response (HTML document)
to client.
 Close: Connection closed by Web server.
HTTP
Connection
 1. Client
 makes an HTTP request for a web page
 makes a TCP/IP connection
 2. Server accepts request
 sends page as HTTP
 3. Client downloads page
 4. Server breaks the connection
Uniform Resource Locators (URLs)
 Identifies the file to request
 Specifies server and file
 Defaults used for missing values

protocol host computer directory path file name

Note: Not all URLs will have the directory and filename
HyperText Markup Language (HTML)
 Hypertext
 presents and relates information as hyperlinked
documents that point to other documents or resources.
 HTML
 A standard markup language that defines a hypertext
document.
 A simple, powerful, platform-independent document
language.
 Specifies what displays should look like
 Browser interprets HTML
 Same HTML file often looks different across browsers
 HTML files are the source files of Web pages
HTML File Structure
<HTML>
<HEAD>
<TITLE>Page Title</TITLE>
</HEAD>
<BODY>
Stuff
</BODY>
</HTML>
What About Graphics?
 An HTML file can refer to an image file

Here is a nice picture:


<IMG SRC=“stars.gif”>
What About Hyperlinks?
 An HTML file can refer to another HTML file

<h2>Teaching</h2>
<p><a href=”http://ai.uwaterloo.ca/3421.html">
COSC 3421 Fall 2002</a></p>
<p><a href=”http://ai.uwaterloo.ca/3221.html">
COSC 3221 Winter 2003</a></p>
Simple Formatting
<H1><FONT COLOR="#b80000">
Heading level 1</FONT></H1>
<H2><FONT COLOR="#ff0000">
Heading level 2 </FONT> </H2>
<P>Paragraph with <B>bold</B> and
<I>italic</I> text.</P>
<HR>
Creating HTML Files

 Text editor (Notepad, Pico)

 HTML Editor (FrontPage, Netscape


Gold and HoTMetaL)
Moving Files to Servers

Browser Server

User views Author


files in sends
browser files to
server
Author
creates Author
files
Client-Server Systems Architecture

Web Server
Retrieving Hosting web pages
web pages
using HTTP
protocol
Internet Web Authoring System
Web Client create web pages
Browser Publish Scanner
web pages
Video capture
Sound card
Web page: document written in HTML,JSP and ASP.
Internet Client-Server Systems
Internet Client-Server Systems
Internet Client-Server Systems

40 Internet Banking
Internet Client-Server Systems

Uber is an app and taxi service that


connects riders and drivers with the tap
of a button by using their phone’s GPS
capabilities
Internet Client-Server Systems

Uber is an app and taxi service that


connects riders and drivers with the tap
of a button by using their phone’s GPS
capabilities
Internet Client-Server Systems

Uber is an app and taxi service that


connects riders and drivers with the tap
of a button by using their phone’s GPS
capabilities
Internet Client-Server Systems

Uber is an app and taxi service that


connects riders and drivers with the tap
of a button by using their phone’s GPS
capabilities

China’s E-commerce Empire: Alibaba Group


Internet Client-Server Systems

Uber is an app and taxi service that


connects riders and drivers with the tap
of a button by using their phone’s GPS
capabilities
Wechat Functions

46
Wechat Business Model

47
Amazon Business Model

48
Static and Dynamic Web Pages

 A static Web page is ready before it is


accessed.

 The content of a dynamic Web page is


generated each time it is accessed.

18
Common Gateway Interface (CGI)

 CGI programming techniques were introduced


to provide dynamic Web pages via server-side
interaction.

 A standard method to extend the functionality


of the web server.

 Any programming language can be used.


Common ones include: Perl, C++, Visual Basic.
30
CGI-based Web Application

 HTTP Request

 HTTP Document
Web Browser Web Server

 Output  HTML forms to


(HTML) invoke CGI scripts

 Get Data
CGI Scripts/
Applications Database
 Return data
How Web Page Works

Sample web page and its source.


 The source contains the
instructions that define the
contents, layout, and structure of a
web page.
 The instructions are written in
HTML or another web authoring
tool used in creating the page.
 The browser uses these
instructions to interpret and
display the web page on the screen.
How Web Page Works

URL

Navigational tools

Navigational
Graphics /
tools
Hyperlinks

Hyperlinks
Cookies
 A piece of information generated by the web-server
and stored in the client side ready for future access.
 Cookies can make CGI scripts more interactive.
 Cookies are text files stored on Web client.
 CGI script creates cookie and has a Web server sent
it to client’s browser to store on hard disk.
 Later, when client revisits Web site and uses a CGI
script that requests this cookie, client’s browser
sends information stored in the cookie.

39
Cookies
 How do cookies work?
Request Origin
Client
Server A

Response Origin
Client
Set-Cookie: XYZ Server A

Request Origin
Client Cookie: XYZ Server A

 Where are cookies used?


 Shopping applications
 Storing login information
 Tracking pages visited by a user
Summary
 The Web is a networked information system that
contains a huge collection of files
 The Web relies on clients and servers
 HTML and other files are sent from servers to
clients
 Files are identified by URLs
 Servers send files to browsers
 Browsers interpret HTML
 Cookies is a piece of information generated by
the web-server and stored in the client side.
Internet Client-Server Systems
Web Server

Client

1 Internet Banking
Internet Client-Server Systems

Client

Web Server

Uber is an app and taxi service that


connects riders and drivers with the tap
Client Client
of a button by using their phone’s GPS
capabilities
Outline of Today’s Class
 Web Servers
 Static and Dynamic Web Pages
 CGI Programming
 What makes the CGI work?
 FORM
 GET and POST Methods
 QUERY_STRING and CONTENT_LENGTH

 SGML, HTML and XHTML


 XML and DTD
 XML Examples
Web Servers
 How does a web server work?
You contact the web server and request a file. The
server returns the file.

GET foo.html
PC-1
Web Server
Foo.html

GET index.html Files


/myDir/index.html
PC-2 Index.html /myDir/foo.html
/myDir/bar.html
Web Servers
Request 1

Request 2
Web Servers
 Most web servers are very simple. They
just return files to the PC that requests it

 The web browser does the hard work of


translating a file into pretty pictures

 See “View->Source” for the file that is


returned by the server
Web Servers
 It would be a Bad Thing if anyone on
the internet could retrieve any file on
the web server.

 The files are kept in a special directory


— requests for files are relative to that
directory.
Static Web Pages

Request file

Retrieve file

Send file
Dynamic Web Pages

Request service

Do Computation

Generate HTML
page with results
of computation

Return dynamically
generated HTML file
CGI and Web Forms
 How to write the HTML that sends data to
the server?

 What does the server have to do to process


this information?

 The most common method to handle this


is CGI -- Common Gateway Interface
CGI Programming

HTTP
SERVER

CLIENT CGI Program


CGI Programming

Environment
Variables

stdin

HTTP CGI Program


SERVER
stdout
Important CGI
Environment Variables
REQUEST_METHOD

QUERY_STRING

CONTENT_LENGTH
Request Method: Get

 GET requests can include a query string


as part of the URL:
Delimiter

GET /cgi-bin/finger?hollingd HTTP/1.0

Request Resource
Method Query
Name
String
CGI URLs
 There is a mapping between URLs and CGI
programs provided by a web server. The
exact mapping is not standardized (web
server admin can set it up)

 Typically:
 requests that start with /CGI-BIN/ , /cgi-bin/
or /cgi/, etc. refer to CGI programs (not to
static documents).
CGI Programs
 When the user hits the “submit” button
the data is sent to the web server
 The CGI program that handles it on the
web server is specified in the HTML
Form tag

<FORM method=post action="http://unix.aml.yorku.ca/cgi-bin/formProcessor.pl">


CGI Programs

 Anything special about the program?


 The web server has to have permissions set to
allow the program to be executed. Typically
this is only turned on in a few directories, eg
/cgi-bin
 Has to comply with the usual security things
for that system.
CGI Programs
 What kind of program does it need to be?
 Can be written in any language—C++, C,
perl, etc. Just has to be able to process the
attribute-value pairs.

 Perl is excellent for its pattern matching


and text processing capabilities.
CGI Programs
The data is sent to the CGI program in a specific
format of attribute-value pairs. The attribute is the
name of the field in the HTML tag, the values are
what the user inputs

First name: <input type="text" name="firstName">


Middle name: <input type="text" name="middleName"><br>
Last name: <input type="text" name="lastName"><br>

firstName=lee
middleName=harvey
lastName=oswald
CGI Programs
 Strengths:
 A simple method to send data to the server.
 Dynamically generates HTML pages.

 Weaknesses
 All the processing happens on the server.
 Takes time to launch the CGI process on the
server.
 Use the process, instead of thread.
Web Forms
 Overview of Web forms
 HTML form components
 GET & POST methods
 Server-side processing with forms
CGI-based Web Application

 HTT Request

 HTTP Document
Web Browser Web Server

 Output  HTML forms to


(HTML) invoke CGI scripts

 Get Data
CGI Scripts/
Applications Database
 Return data
Form Interaction with CGI
Web Browser Web CGI
Server Program
User requests form

Returns form to client

User submits form


Forwards to CGI program

Returns results to server


Returns results to client

Network Server
Forms
 Forms work in a different and slightly more
complex way than standard HTML pages.
 Forms consist of a number of separate data entry
components such as menus and text areas.
 The user can select different options from the menus
and enter text in the text entry fields.
 A single form can contain many text entry fields
and/or many menus.
 To differentiate the menus and text areas from each
other each one is given a unique name, selected by
the Web form designer.
HTML Forms

 Each form includes a METHOD that


determines what http method is used
to submit the request.

 Eachform includes an ACTION that


determines where the request is
made.
HTML Forms
HTML includes elements or tags for creating forms on Web pages.
There are three stages to creating a form:
 define the form data [a set of variables]
 design the form itself
 define the method for processing the form’s data on the
server-side
When the Web page containing the form is loaded, the user can:
 enter data into the form
 then submit that data to the Web server

[usually by clicking a submit button on the form]


HTML Form Variables
A variable has:
 a name
 a value
A form contains one or more variables. When the user
fills in the form, values are assigned to these variables.

When the user clicks the submit button, the set of variable
names & corresponding values are sent to the Web server
in a HTTP request.

The Web server can extract the set of variables & values
from the HTTP request, and can do something with them...
Example for HTML Form
<html>
<head> Note that this form
<title>Query Form</title>
contains two variables
name & id
</head>

<body>
<h2>Query Form</h2>
<form method="GET” action="doquery.php”>
<p>Your name: <input name="name" type="text" size=30></p>
<p>Your ID: <input name="id" type="text" size=15></p>
<p><input type="submit" value="Submit your query"></p>
<p><input type="reset" value="Clear your query"></p>
</form>
</body>
</html>
Example for HTML Form

<input name="name" type="text" size=30>


<input name="id" type="text" size=15>

<input type="submit" value="Submit your query">


<input type="reset" value="Clear your query">
Forms
The method attribute specifies
Each form must begin and
how the form’s data is sent
end with form tags.
to the Web server. The post
<?xml version = "1.0"?> method appends form data
to the browser request.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1 .0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns = "http://www.w3.org/1999/xhtml"> The value of the action attribute
<head> specifies the URL of a script on
the Web server.
<title>Web Engineering - Feedback Form</title>
</head>
Input elements are used to send
<body><h1>Feedback Form</h1> data to the script that processes the form.
<p>Any comments please.</p>
<form method = "post" action = "/cgi-bin/feedbackform"><p>
<input type = "hidden" name = "recipient" value = "webeng@xhtmllecture.com" />
<input type = "hidden" name = "subject" value = "Feedback Form" />
<input type = "hidden" name = "redirect" value = "main.html" /> </p>
</form> A hidden value for the type
<p> attribute sends data that is
not entered by the user.
Forms
<label>Name:
<input name = "name" type = "text" size = "25" maxlength = "30" />
</label></p>
<p><form> The maxlength attribute gives
the maximum number of
<input type = "submit" value = "Submit comments" / > Characters the user can input.
<input type = "reset" value = "Clear comments" />
</p> The size attribute gives
</form></body></html> the number of characters
visible in the text box.

The label element describes The value attribute displays


the data the user needs a name on the buttons created.
to enter in the text box.
Forms

Text box created using


input element.

Submit button created


using input element.

Reset button created


using input element.
Table & Form

<TABLE FRAME = none>


<TR><TD ALIGN = right>
Name:<BR>
Card number:<BR>
Expires:<BR>
Telephone:<BR>
<TD ALIGN=left><BR>
<FORM method="POST" action=”/cgi-bin/myscript.cgi”>
<INPUT NAME=“name” SIZE=18><BR>
<INPUT NAME=“cardnum” SIZE=18><BR>
<INPUT NAME=“expires-month” SIZE=2>/
<INPUT NAME=“expires-year ” SIZE=2><BR>
<INPUT NAME=“phone” SIZE=18>
</FORM>
</TABLE>
Form Methods
 The method attribute on the form tag specifies how the Web
Browser should send the data to the Web server.

 Two options:
 GET: pass the data in a HTTP GET request
 POST: pass the data in a HTTP POST request

 In a HTTP GET request, the browser appends the form data to a


URL. For example:

 http://www.yorku.ca/jhuang/doquery.cgi?name=joe+bloggs&id=1234

 Note how the variable names & values are appended to the URL.
Any spaces in a value are converted to +.
Form Actions
 The action attribute on the form tag specifies what the
Web server should do with the form data.

 Common options:
 email the data to someone [the mailto action]
 pass the data to a script or program

 The script will be parsed the variables & values, and


can then process them.

 For example, the CGI script could use the name & id to
look up student info in a database.
Form Actions
 <form method="GET" action="mailto:jhuang@yorku.ca">

 Until you can actually use scripts on the server, use the
mailto action. It operates in the same way as the mailto that
you have used in the HTML document.

 When used in a form, the mailto action will send an email


to the email address of the person specified. The mailto
action is of limited use for complicated forms but works
adequately for simple forms.

 The email received contains all of the names and values in


one long list.
What a CGI will get

 Thequery (from the environment


variable QUERY_STRING) will be a
URL-encoded string containing the
name, value pairs of all form fields.

 TheCGI must decode the query and


separate the individual fields.
GET vs. POST
 TheGET method delivers data (query) as
part of the URL

 When using forms, it’s generally better to


use POST:
 there are limits on the maximum size of a GET
query string (environment variable)
 a post query string doesn’t show up in the
browser as part of the current URL
CGI reading POST

 IfREQUEST_METHOD is a POST, the


query is coming in STDIN.

 The environment variable


CONTENT_LENGTH tells us how much
data to read.
CGI Method Summary
 GET:
 REQUEST_METHOD is “GET”
 QUERY_STRING is the query

 POST:
 REQUEST_METHOD is “POST”
 CONTENT_LENGTH is the size of the query
(in bytes)
 query can be read from STDIN
HTTP Form Processing
1. user fills in form &
clicks submit 2. Browser sends GET
http://www.yorku.ca/jhuang/doquery.cgi?name=joe+bloggs&id=1234

4. server sends
5. Browser script results to 3. server runs
displays the Browser the script
script results* doquery.cgi
passing form
internet data to it

*The script results will usually be HTML text


A More Complex Form Example

Text field

Password field

Radio buttons

Drop-down list

Check boxes

Text area

Buttons
Form Processing & Results
 The easiest way to deal with form data is to simply email it to an
email address using a mailto form action:

 <form method="POST" action="mailto:name@where.com">

 More often, we want to process the data on the server-side, using


a program or script.

 The old way is to use a so-called CGI Script, usually with a URL
something like:

 <form method="POST" action=”/cgi-bin/myscript.cgi">

 The newer way is to use an HTML-embedded script language


such as Servlet, JSP, or ASP. We’ll look at how to use Servlet
later in the course...
Alternatives for Generating
Dynamic Pages
Can dynamically generate page in other ways?
 Java Servlets

 Java Server Pages

 Active Server Pages (ASP)


Dynamic Web Pages

server side client side

SSI
WWW
CGI WWW server
CGI HTTP client
program

API
script
(embedded
in HTML)
Java
other Java applet
program servlet
(application )


SML

Standardized General Markup Language

)
Some History

 SGML

 HTML

 XML and XHTML


SGML

 Standardized General Markup Language


 Developed by a committee!
 Led by Charles Goldfarb, 1978-1986
 A grammar to define the structure of documents

 Rules define the construct or structure


 Terminals are <tags> and strings
HTML & XML
 HTML is a subset of SGML with a shared
DTD

 HTMLDOC::=(<html> HEAD BODY </html>)

 XML is a subset of SGML with many DTDs


allowed
XML
Uses tags to identify semantics of data
 looks like HTML, but isn’t
<slide><title>Introduction</title>
<author><first>Jimmy</first>
<last>Huang</last>
</author>
<content>XML this and that</content>
</slide>
 is license free, platform-independent and
well-supported
HTML

Hypertext Markup Language


 Hypertext Markup Language
 Presents documents via WWW browsers
 Specifies document layout and hyperlink
 Predefines set of tags (ie. Common DTD)
HTML: An Example
<HTML>
<TITLE>Statistics Canada</TITLE>
<BODY>
<H3>Welcome to Stats Canada</H3>
Statistics Canada ……. . <p> We like numbers…..
<img src=“mapleleaf.gif>
<ul>What we do
<li><a href=“census.html”>Census</a>
<li><a href=“special.html”>Special surveys</a>
<li><a href=“online.html”>Online data</a>
</ul>
</BODY>
</HTML>
HTML
HTML - Advantages
 Simple - fixed set of tags
 Portable - used with all browsers
 Linking - within and to external documents

HTML - Disadvantages
 Limited tag set
 Can’t separate the presentation from content
 Can’t define structure of contents
XHTML

EXtensible Hyper-Text MakeupLanguage

)
XHTML Basics
 Very few real changes from HTML
 But more strict

 All tags are in lowercase


 All tags must be closed
 Empty tags
 Paired tags
XHTML Document Structure
Overlap versus Nesting
XHTML tags
 Start tags and end tags
 Start tags - delimited by < and >
 End tags - delimited by </ and >
 <h1>This is a Large Heading</h1>
 <br>This text starts on a new line.

 Some start tags also include attributes which


further define information about the element.
!DOCTYPE
 HTML 3.2
 <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 3.2
Draft//EN”>
 Netscapes HTML standard
 <!DOCTYPE HTML PUBLIC “-//WebTechs//DTD Mozilla
HTML 2.0//EN”>
 Not strictly necessary for HTML, highly recommended
 Future browsers can still attempt to display your older documents
(written to previous HTML standards) in the way that was
originally intended, even though the HTML language may have
evolved
 XHTML
 <?xml version = "1.0"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN“ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
strict.dtd">
!DOCTYPE
!DOCTYPE Title tags

<?xml version = "1.0"?>


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Body tags
<!– Comments: name_of_webpage.html -->

<html xmlns = "http://www.w3.org/1999/xhtml">


<head>
<title> Web Engineering: XHTML I </title>
</head>

<body>
<p>Welcome to XHTML!</p>
</body>
</html>
Images
The value of the src attribute

<?xml version = "1.0"?> of the image element is the


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Stric t//EN" location of the image file.
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- Pictures with XHTML -->
<html xmlns = "http://www.w3.org/1999/xhtml">The height and width attributes of the
<head> image element give the height
<title>Web Engineering - pictures</title> and width of the image.
</head>
<body>
<p><img src = "angelheart.jpg" height = "251" width = "367"
alt = "An angel" />
<img src = "grail.jpg" height = "180" width = "130"
alt = "A chalice" /></p>
</body>
</html>
The value of the alt attribute gives a
description of the image. This description
is displayed if the image cannot be displayed.
Colours
 <BODY TEXT=“aqua”>

aqua black blue fuchsia


gray green lime maroon
navy olive purple red
silver teal white yellow
 <BODY TEXT=“#00FF00”>
 <FONT COLOR = “#rrggbb” | “colour name”>
text</FONT>

000000 00FF00 FFFFFF


BLACK BRIGHT-GREEN WHITE
Inline Styles
<h1 style="color:blue; font-style: italic">First
Stylesheet Example</h1>

<p>The first example of stylesheets uses an inline


style.</p>

<h1>Second Stylesheet Example</h1>

<p>The second example of stylesheets uses a document-


level style.</p>

<h1>Third Stylesheet Example</h1>

<p> The third example of stylesheets uses an external


stylesheet.</p>
Demonstration:
inline_css.html
XML

EXtensible Markup Language

)
XML Introduction
 The Extensible Markup Language (XML) is a document
processing standard proposed by the World Wide Web
Consortium (W3C), which is related to Standard
Generalised Markup Language (SGML).
 Possible to search, sort, manipulate and render XML
using Extensible Markup Language (XSL).

 Highly portable
 Files end in the .xml extension.
XML& W3C
•XMLhas been in development since the 1960s through its parent called
SGML(Standard GeneralizedMarkup Language) which is also the parent for
HTML

•XMLis astreamlined version of SGMLdesigned for transmission of structured


data over the Web by a working group in the World Wide Web Consortium
(W3C) in 1996

• Passed as W3Cstandard in Feb 1998

- www.w3.org/xml
- www.xml.com/axml/axml.html (annotated version)
XML-related Technologies
 DTD (Document Type Definition) and XML Schemas are
used to define legal XML tags and their attributes for
particular purposes

 CSS (Cascading Style Sheets) describe how to display


HTML or XML in a browser

 XSLT (eXtensible Stylesheet Language Transformations)


and XPath are used to translate from one form of XML to
another

 DOM (Document Object Model), SAX (Simple API for


XML, and JAXP (Java API for XML Processing) are all
APIs for XML parsing
From HTML to XML..
•HTML major drawback – information loses its
structure when translated into HTML
•HTML is a presentation-oriented markup language,
so information embodied in it is difficult to process
•Information and knowledge servers are overloaded
since we have to search information and perform
format processing
•Servers often answer the same request many times
if users request several views on the same data
From HTML to
• HTML: XML..
-Lacks extensibility – can’t create tags or attributes
to parameterise or semantically qualify data
-Lacks structure – does not support the
specification of deep structures needed to represent
database schemas or object-oriented hierarchies
-Lacks validation – does not support language
specification that lets applications check imported
data’s structural validity
XMLGoals
As a portable, platformindependent data storage

• support a wide variety of applications,


• easy to use across the Internet,
• compatible with SGML,
• easy to create programs that process XML,
• clear and legible (self-describing),
• XMLdocuments should be easy to create
• XMLdesigns should be quickly prepared, formal & concise etc.
XML
• XML is not for displaying
.. information but for managing
information.
•Working group of World Wide Web Consortium (W3C) created
XML as a standard for creating markup languages.
• Designed it for distributing structured documents over the web
•A kind of “light” SGML (Standard General Markup Language)
simplified to meet Web requirements
• Unlike HTML, XML lets users:
 Extract data from a document
 Define their own tags and attributes
 Define data structures and nest document structures to any
complexity level
 Make applications that validate a documents structure. Any XML
document can contain an optional description of its grammar for use by
applications that perform structural validation
XML.
 .
The problem that XML helps us to solve is how to transfer data
between servers, or between the client and the server.
 It is a Markup language for describing structured data – content is
separated from presentation.
 XML documents contain only data
 Applications decide how to display the data
 Language for creating markup languages
 Can create new tags
 XML documents contain only data, not formatting instructions, so
applications that process XML documents must decide how to display
the documents data.
 For example a PDA (personal digital assistant) may render an XML
document differently than a wireless phone or desktop computer would
render that document.
HTML and XML
XML stands for eXtensible Markup Language
HTML is used to mark up XML is used to mark up
text so it can be displayed to data so it can be processed
users by computers
HTML describes both XML describes only
structure (e.g. <p>, <h2>, content, or “meaning”
<em>) and appearance (e.g.
<br>, <font>, <i>)

HTML uses a fixed, In XML, you make up


unchangeable set of tags your own tags
XML.
 .
XML is a meta-language
 With HTML, existing markup is static: <HEAD> and <BODY>
for example, are tightly integrated into the HTML standard and
cannot be changed or extremely difficult extended.
XML.
 .
XML is a meta-language
 With HTML, existing markup is static: <HEAD> and <BODY>
for example, are tightly integrated into the HTML standard and
cannot be changed or extremely difficult extended.
 XML, on the other hand, allows ou to create your own markup
tags and configure each to your liking: for example
 <WebEngHeading>
 <WebEngSummary>
 <WebEngReallyWildFont>
 Each of these elements can be defined through user defined
document type definitions (DTD) and stylesheets are applied to
one or more XML documents.
 There are no ‘correct’ tags for an XML document, except those
defined by the author
Some
Code
 Schema  Entity
 Address
 Entity
 SubEntities
 Passport Details
 Street
 SubEntities  City
 Last Name  Town
 First Name  State
 Address  Province
 ……..
DTD
<!ELEMENT passport_details (last_name,first_name+,address)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT address
(street,(city|town),(state|province),(ZIP|postal_code),country,contact_no?,email*)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT town (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT province (#PCDATA)>
<!ELEMENT ZIP (#PCDATA)>
<!ELEMENT postal_code (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT phone_home (#PCDATA)>
<!ELEMENT email (#PCDATA)>
Internal DTD and
Instance
<?xml version='1.0'?>
<!DOCTYPE passport_details [
<!ELEMENT passport_details <passport_details>
(last_name,first_name+,address)> <last_name>Smith</last_name>
<!ELEMENT last_name (#PCDATA)> <first_name>Jo</first_name>
<!ELEMENT first_name (#PCDATA)>
<first_name>Stephen</first_name>
<!ELEMENT address
(street,(city|town),(state|province) <address>
,(ZIP|postal_code),country,contact_no?,email*)> <street>1 Great Street</street>
<!ELEMENT street (#PCDATA)> <city>GreatCity</city>
<!ELEMENT city (#PCDATA)> <state>GreatState</state>
<!ELEMENT town (#PCDATA)>
<postal_code>1234</postal_code>
<!ELEMENT state (#PCDATA)>
<country>GreatLand</country>
<!ELEMENT province (#PCDATA)>
<!ELEMENT ZIP (#PCDATA)> <email>jhuang@yorku.ca</email>
<!ELEMENT postal_code (#PCDATA)> </address>
<!ELEMENT country (#PCDATA)> </passport_details>
<!ELEMENT phone_home (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
Shared
DTD specifies the DTD
XML Document
<?xml version='1.0'?>

<!DOCTYPE passport_details SYSTEM "PassportExt.dtd">

<passport_details>
<last_name>Smith</last_name>
<first_name>Jo</first_name>
<first_name>Stephen</first_name>
<address>
<street>1 Great Street</street>
<city>GreatCity</city>
<state>GreatState</state>
<postal_code>1234</postal_code>
<country>GreatLand</country>
<email>jo@theworldaccordingtojo.com</email>
</address>
</passport_details>
XML Examples
 XML Source File
 http://www.yorku.ca/jhuang/xml/04.adhoc.topics.xml

 XML Style language


 http://www.yorku.ca/jhuang/xml/04.adhoc.topics.xsl

 Parsing and rendering XMLwith IE5+


 http://www.yorku.ca/jhuang/xml/04.adhoc.topics_xsl.xml
XML Applications
 XML permits document authors to create markup for
virtually any type of information.
 Authors can create
describing markup of
entirely newtypes
specific languages
data, for
including mathematical
formulas, chemical molecular structures,
music, recipes etc.
- XHTML
- VoiceXML (for speech)
- MathML (for mathematics)
- SMIL (the Synchronous Multimedia Integration Language, for
multimedia presentations)
- CML (Chemical Markup Language, for chemistry)
- XBRL (Extensible Business Reporting Language, for financial
XML Parsers
 Processing an XML document requires a software program
called an XML parser (or processer). These are available at
no charge in many languages (Java, Python, C++ etc.).

http://www.xml.com/programming/

 Parsers check an XML documents syntax and enable software


programs to process marked-up data. XML parsers can
support the Document Object Model (DOM) or the Simple
API for XML (SAX).
 DOM: Build a tree structure containing the XML
document’s data
 SAX: Process the document and generate events
XML-related Vocabulary
 SGML: Standard Generalized Markup Language
 XML : Extensible Markup Language
 DTD: Document Type Definition
 element: a start and end tag, along with their contents
 attribute: a value given in the start tag of an element
 entity: a representation of a particular character or string
 PI: a Processing Instruction, to possibly be used by a program
that processes this XML
 namespace: a unique string that references a DTD
 well-formed XML: XML that follows the basic syntax rules
 valid XML: well-formed XML that conforms to a DTD
Outline of Today’s Class
 SGML, HTML and XHTML

 XML and DTD

 XML Examples

 The Framework of WWW


SGML

Standardized General Markup Language

)
SGML

 Standardized General Markup Language


 Developed by a committee!
 Led by Charles Goldfarb, 1978-1986
 A grammar to define the structure of documents

 Rules define the construct or structure


 Terminals are <tags> and strings
HTML & XML
 HTML is a subset of SGML with a shared
DTD

 HTMLDOC::=(<html> HEAD BODY </html>)

 XML is a subset of SGML with many DTDs


allowed
XML
Uses tags to identify semantics of data
 looks like HTML, but isn’t
<slide><title>Introduction</title>
<author><first>Jimmy</first>
<last>Huang</last>
</author>
<content>XML this and that</content>
</slide>
 is license free, platform-independent and
well-supported
HTML

Hypertext Markup Language


 Hypertext Markup Language
 Presents documents via WWW browsers
 Specifies document layout and hyperlink
 Predefines set of tags (ie. Common DTD)
HTML
HTML - Advantages
 Simple - fixed set of tags
 Portable - used with all browsers
 Linking - within and to external documents

HTML - Disadvantages
 Limited tag set
 Can’t separate the presentation from content
 Can’t define structure of contents
XHTML

EXtensible Hyper-Text MakeupLanguage

)
XHTML Basics
 Very few real changes from HTML
 But more strict

 All tags are in lowercase


 All tags must be closed
 Empty tags
 Paired tags
XHTML Document Structure
Overlap versus Nesting
XHTML tags
 Start tags and end tags
 Start tags - delimited by < and >
 End tags - delimited by </ and >
 <h1>This is a Large Heading</h1>
 <br>This text starts on a new line.

 Some start tags also include attributes which


further define information about the element.
!DOCTYPE
 HTML 3.2
 <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 3.2
Draft//EN”>
 Netscapes HTML standard
 <!DOCTYPE HTML PUBLIC “-//WebTechs//DTD Mozilla
HTML 2.0//EN”>
 Not strictly necessary for HTML, highly recommended
 Future browsers can still attempt to display your older documents
(written to previous HTML standards) in the way that was
originally intended, even though the HTML language may have
evolved
 XHTML
 <?xml version = "1.0"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN“ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
strict.dtd">
!DOCTYPE
!DOCTYPE Title tags

<?xml version = "1.0"?>


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Body tags
<!– Comments: name_of_webpage.html -->

<html xmlns = "http://www.w3.org/1999/xhtml">


<head>
<title> Web Engineering: XHTML I </title>
</head>

<body>
<p>Welcome to XHTML!</p>
</body>
</html>
Images
The value of the src attribute

<?xml version = "1.0"?> of the image element is the


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Stric t//EN" location of the image file.
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- Pictures with XHTML -->
<html xmlns = "http://www.w3.org/1999/xhtml">The height and width attributes of the
<head> image element give the height
<title>Web Engineering - pictures</title> and width of the image.
</head>
<body>
<p><img src = "angelheart.jpg" height = "251" width = "367"
alt = "An angel" />
<img src = "grail.jpg" height = "180" width = "130"
alt = "A chalice" /></p>
</body>
</html>
The value of the alt attribute gives a
description of the image. This description
is displayed if the image cannot be displayed.
Colours
 <BODY TEXT=“aqua”>

aqua black blue fuchsia


gray green lime maroon
navy olive purple red
silver teal white yellow
 <BODY TEXT=“#00FF00”>
 <FONT COLOR = “#rrggbb” | “colour name”>
text</FONT>

000000 00FF00 FFFFFF


BLACK BRIGHT-GREEN WHITE
Inline Styles
<h1 style="color:blue; font-style: italic">First
Stylesheet Example</h1>

<p>The first example of stylesheets uses an inline


style.</p>

<h1>Second Stylesheet Example</h1>

<p>The second example of stylesheets uses a document-


level style.</p>

<h1>Third Stylesheet Example</h1>

<p> The third example of stylesheets uses an external


stylesheet.</p>
Demonstration:
inline_css.html
XML

EXtensible Markup Language

)
XML Introduction
 The Extensible Markup Language (XML) is a document
processing standard proposed by the World Wide Web
Consortium (W3C), which is related to Standard
Generalised Markup Language (SGML).
 Possible to search, sort, manipulate and render XML
using Extensible Markup Language (XSL).

 Highly portable
 Files end in the .xml extension.
XML& W3C
•XMLhas been in development since the 1960s through its parent called
SGML(Standard GeneralizedMarkup Language) which is also the parent for
HTML

•XMLis astreamlined version of SGMLdesigned for transmission of structured


data over the Web by a working group in the World Wide Web Consortium
(W3C) in 1996

• Passed as W3Cstandard in Feb 1998

- www.w3.org/xml
- www.xml.com/axml/axml.html (annotated version)
XML-related Technologies
 DTD (Document Type Definition) and XML Schemas are
used to define legal XML tags and their attributes for
particular purposes

 CSS (Cascading Style Sheets) describe how to display


HTML or XML in a browser

 XSLT (eXtensible Stylesheet Language Transformations)


and XPath are used to translate from one form of XML to
another

 DOM (Document Object Model), SAX (Simple API for


XML, and JAXP (Java API for XML Processing) are all
APIs for XML parsing
From HTML to XML..
•HTML major drawback – information loses its
structure when translated into HTML
•HTML is a presentation-oriented markup language,
so information embodied in it is difficult to process
•Information and knowledge servers are overloaded
since we have to search information and perform
format processing
•Servers often answer the same request many times
if users request several views on the same data
From HTML to XML..
• HTML:
-Lacks extensibility – can’t create tags or attributes
to parameterise or semantically qualify data
-Lacks structure – does not support the
specification of deep structures needed to represent
database schemas or object-oriented hierarchies
-Lacks validation – does not support language
specification that lets applications check imported
data’s structural validity
XMLGoals
As a portable, platformindependent data storage

• support a wide variety of applications,


• easy to use across the Internet,
• compatible with SGML,
• easy to create programs that process XML,
• clear and legible (self-describing),
• XMLdocuments should be easy to create
• XMLdesigns should be quickly prepared, formal & concise etc.
XML
• XML is not for displaying information but for managing
information.
•Working group of World Wide Web Consortium (W3C) created
XML as a standard for creating markup languages.
• Designed it for distributing structured documents over the web
•A kind of “light” SGML (Standard General Markup Language)
simplified to meet Web requirements
• Unlike HTML, XML lets users:
 Extract data from a document
 Define their own tags and attributes
 Define data structures and nest document structures to any
complexity level
 Make applications that validate a documents structure. Any XML
document can contain an optional description of its grammar for use by
applications that perform structural validation
XML
 The problem that XML helps us to solve is how to transfer data
between servers, or between the client and the server.
 It is a Markup language for describing structured data – content is
separated from presentation.
 XML documents contain only data
 Applications decide how to display the data
 Language for creating markup languages
 Can create new tags
 XML documents contain only data, not formatting instructions, so
applications that process XML documents must decide how to display
the documents data.
 For example a PDA (personal digital assistant) may render an XML
document differently than a wireless phone or desktop computer would
render that document.
HTML and XML
XML stands for eXtensible Markup Language
HTML is used to mark up XML is used to mark up
text so it can be displayed to data so it can be processed
users by computers
HTML describes both XML describes only
structure (e.g. <p>, <h2>, content, or “meaning”
<em>) and appearance (e.g.
<br>, <font>, <i>)

HTML uses a fixed, In XML, you make up


unchangeable set of tags your own tags
XML
 XML is a meta-language
 With HTML, existing markup is static: <HEAD> and <BODY>
for example, are tightly integrated into the HTML standard and
cannot be changed or extremely difficult extended.
XML
 XML is a meta-language
 With HTML, existing markup is static: <HEAD> and <BODY>
for example, are tightly integrated into the HTML standard and
cannot be changed or extremely difficult extended.
 XML, on the other hand, allows ou to create your own markup
tags and configure each to your liking: for example
 <WebEngHeading>
 <WebEngSummary>
 <WebEngReallyWildFont>
 Each of these elements can be defined through user defined
document type definitions (DTD) and stylesheets are applied to
one or more XML documents.
 There are no ‘correct’ tags for an XML document, except those
defined by the author
Some Code
 Schema
 Entity  Entity
 Passport Details  Address
 SubEntities  SubEntities
 Last Name  Street
 First Name  City
 Address  Town
 State
 Province
 ……..
DTD
<!ELEMENT passport_details (last_name,first_name+,address)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT address
(street,(city|town),(state|province),(ZIP|postal_code),country,contact_no?,email*)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT town (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT province (#PCDATA)>
<!ELEMENT ZIP (#PCDATA)>
<!ELEMENT postal_code (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT phone_home (#PCDATA)>
<!ELEMENT email (#PCDATA)>
Internal DTD and Instance

<?xml version='1.0'?>
<!DOCTYPE passport_details [
<!ELEMENT passport_details <passport_details>
(last_name,first_name+,address)> <last_name>Smith</last_name>
<!ELEMENT last_name (#PCDATA)> <first_name>Jo</first_name>
<!ELEMENT first_name (#PCDATA)>
<first_name>Stephen</first_name>
<!ELEMENT address
(street,(city|town),(state|province) <address>
,(ZIP|postal_code),country,contact_no?,email*)> <street>1 Great Street</street>
<!ELEMENT street (#PCDATA)> <city>GreatCity</city>
<!ELEMENT city (#PCDATA)> <state>GreatState</state>
<!ELEMENT town (#PCDATA)>
<postal_code>1234</postal_code>
<!ELEMENT state (#PCDATA)>
<country>GreatLand</country>
<!ELEMENT province (#PCDATA)>
<!ELEMENT ZIP (#PCDATA)> <email>jhuang@yorku.ca</email>
<!ELEMENT postal_code (#PCDATA)> </address>
<!ELEMENT country (#PCDATA)> </passport_details>
<!ELEMENT phone_home (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
Shared DTD
XML Document specifies the DTD
<?xml version='1.0'?>

<!DOCTYPE passport_details SYSTEM "PassportExt.dtd">

<passport_details>
<last_name>Smith</last_name>
<first_name>Jo</first_name>
<first_name>Stephen</first_name>
<address>
<street>1 Great Street</street>
<city>GreatCity</city>
<state>GreatState</state>
<postal_code>1234</postal_code>
<country>GreatLand</country>
<email>jo@theworldaccordingtojo.com</email>
</address>
</passport_details>
XML Examples
 XML Source File
 http://www.yorku.ca/jhuang/xml/04.adhoc.topics.xml

 XML Style language


 http://www.yorku.ca/jhuang/xml/04.adhoc.topics.xsl

 Parsing and rendering XMLwith IE5+


 http://www.yorku.ca/jhuang/xml/04.adhoc.topics_xsl.xml
XML Applications
 XML permits document authors to create markup for
virtually any type of information.
 Authors can create entirely new markup languages for
describing specific types of data, including mathematical
formulas, chemical molecular structures, music, recipes etc.
- XHTML
- VoiceXML (for speech)
- MathML (for mathematics)
- SMIL (the Synchronous Multimedia Integration Language, for
multimedia presentations)
- CML (Chemical Markup Language, for chemistry)
- XBRL (Extensible Business Reporting Language, for financial
data exchange)
XML Parsers
 Processing an XML document requires a software program
called an XML parser (or processer). These are available at
no charge in many languages (Java, Python, C++ etc.).

https://www.w3schools.com/xml/xml_parser.asp

 Parsers check an XML documents syntax and enable software


programs to process marked-up data. XML parsers can
support the Document Object Model (DOM) or the Simple
API for XML (SAX).
 DOM: Build a tree structure containing the XML
document’s data
 SAX: Process the document and generate events
In Brief .. XMLis for Data Exchange
•Very frequently companiesneed to exchangedata amongdissimilar
systems, locations, software, hardware, data formats etc.

•Data stored in different formats - Data that is not stored in databases


(unstructureddata) is difficult to exchange and often require custom software

• Data can be interchanged in various ways


- agree on a totally customformat
- agree on a proprietary system
- using standard data format

•XMLprovides a standardised format for data and techniques for


generating, validating, formatting, transforming and extracting it
When Do You Use It?

• XMLis good for exchanging data between dissimilar systems

• If data exchange only occurs between similar systems,

XML may not be the right choice!


B2B
• XMLis frequently used in B2B applications
- B2B means that two companies are exchanging data
- also one company exchanging data between different locations
- agreement on the format (through DTD, XMLSchema) of messages

B2C
• Business-to-Consumer involves sending XMLdirectly to the client
• Data sent directly to the client needs a style (XSL) applied
• Applying style is best accomplished on the server side
Document Structure
• Three distinct parts
- Prolog <?xml version=“1.0” encoding=“UTF-8”?>
- Root Element
- Miscellaneous Section

•Prolog contains instructions that apply to the entire document


(such as XMLdeclaration, DTD)
• Root element is a single element that encloses all of the data
•Miscellaneous is not recommended but still included in the
standard
XMLElement Structure

Child
Xml document element
Child
element Child
element
Root element
Child
element
Child
element Child
element
XMLElements
- have the same overall structure
- can contain sub-elements
PCDATA
(Parsed Character Data)

<Student Sex = “Male” > SomeData </Student>


ATTRIBUTE

STARTTAG CONTENTS ENDTAG

ELEMENT

NAME
Element vs. Attribute based XML
<student> <student id = “9906789”> 2
1
<id> 9906789 </id> <name>Adam</name>
<name>Adam</name> <email>adam@unl.ac.uk</email>
<email>adam@unl.ac.uk</email> </student>
</student>

3
<student id = “9906789” name=“Adam email=“adam@yorku.ca”> </student>

Which is better? NORIGHTANSWER!


Someissues to consider
- elements can have substructures;but not attributes
- IDattributescan be easily locatedand processed
XMLDocument
(another sample .xml file)

prolog
<?xml version = "1.0"?> •The document structures
data with ‘books’ element
<!-- article.xml -->
as the root node.
root
<books>
element
<author> •Root node contains
<title> Introduction to Computer Graphics </title>
elements (e.g. author)
<date>1995</date>
<fname>James</fname>
<lname>Foley</lname> •Each element further
</author> contains child nodes that
<author>
describe data
<title> Principles of Database Systems </title>
<date month="February” >2000</date>
<fname>Greg</fname> attribute
<lname>Riccardi</lname> •<books>,<author>,<title>
</author>
etc. are customised tags
</books> Miscellaneous
<!- - This is a list of students - -> describing data.
XMLSyntax
• XMLelements must be enclosed within start and end tags
<title> Introduction to Computer Graphics </title>
If there is no data inside the element, tag can end with ‘/>’
<title/> which is same as <title> </title>

• Element attributes must be enclosed within double quotes:


<date month="February” >2000</date>

• Element tags are case sensitive <author> Adam</Author> is incorrect

• XMLtags must be nested in correct order:


<books> <author> …</books> </author> Bad
<books> <author> …</author> </books> Good

XMLis therefore very rigid in enforcing syntax compared to HTML (which


is very forgiving)

• A“well formed” documents follows all these rules


DTD: Document Type Definition
•The XMLsample document shown earlier follows syntax rules only. It is
therefore called a well-formed document
•It can also be made to follow strict grammar rules for enforcing the
structure
• DTD specifies grammar rules for an XMLdocument
- several XMLdocuments prepared from various sources can be
validated using a single set of grammar rules
• An XMLdocument that adheres to a DTD is called valid. Avalid
document has stronger structure than a well-formed document
•DTD specifies rules for elements (child nodes) and howit can be
expanded into sub elements (child nodes)
• DTD consists - Element declarations,Attribute list, Data types etc.
• DTDs are based on SGML; difficult to create!
DTD Nested Elements
Define the list
<!ELEMENTauthor (date, title, fname, lname)>

• The author grammar indicates that it is madeup of four elements defined as below:
<!ELEMENT date (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT lname (#PCDATA)>

• Each element may have attributes that contains information about its content
e.g. <date month="February” >2000</date>
CDATAin non-
• An element’s attribute list can be defined using ATTLIKSTtag: parsed
syntax: <!ATTLIST element_name attribute_name type default_value>

<!ATTLIST date month CDATA#IMPLIED>


Specifies the month attribute of the element date. CDATAmeans that it
is acharacterstring. #IMPLIEDmeans- the attribute is optional. If it is
not specified the systemprovides a value. Other options:
#REQUIRED: the XMLauthor must provide the attribute value
#FIXED: the attribute value is fixed and can not be modified by the user
External DTDin XML document
• Any external DTDspecification can be used by several XMLdocuments
Example: <!DOCTYPEbooks SYSTEM "author.dtd">
books is the root element of the document. SYSTEM specifies the
DTDfile.

<!-- DTDfor books: author.dtd -->


<!ELEMENT books (author+)> External subset
<!ELEMENT author (date, title, fname, lname)> (specified in XMLusing SYSTEMor PUBLIC keywords)
<!ELEMENT date (#PCDATA)>
<!ATTLIST date month CDATA#IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT lname (#PCDATA)>
<!DOCTYPE books SYSTEM"author.dtd">
author.dtd <books>
<author>
<date>1995</date>
<title> Introduction to Computer Graphics </title>
<fname>James</fname>
<lname>Foley</lname>
</author>
authorDtd.xml using a DTD …..
</books>
Including DTDin XML document -
Internal/Inline
•. Introduced into XMLusing the documenttype declaration (DOCTYPE)
<!DOCTYPE books [
<!ELEMENT books (author+)>
<!ELEMENT author (date, title, fname, lname)>
<!ELEMENT date (#PCDATA)>
<!ATTLIST date month CDATA#IMPLIED>
Internal subset
<!ELEMENT title (#PCDATA)>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT lname (#PCDATA)>
]>

<books>
<author>
<date>1995</date>
<title> Introduction to Computer Graphics </title>
<fname>James</fname>
<lname>Foley</lname>
</author> inLineDtdExample.xml
…..
</books>
DTDs - Disadvantages
• Notoriouslyhard to read
• Difficult to create (written in non-XMLsyntax; uses EBNF - Extended Backus-Naur
Form - grammar)
• No support for namespacesetc. Also studyANY, EMPTY,
• Limiteddata types (PCDATA, CDATA) MixedContent

Alternative to DTDs - XMLSchemas


• also referredas XSchema
If time permits – covered towardsthe end

• Easy to create and read (Well-formedXMLsyntax)


• can be editedusing XMLtools
• Support for namespaces
• Moredata types (byte, float, long; time, date; binary ..)
•User-defineddata types (Facets are properties used to
specify a data type, settinglimitsand boundarieson data
values)
Developing XMLdata

Programthat processes
XMLdocuments

•First, create XML document that the contains content character data and
marked up with XML tags.
•Second, build Document Type Definition (DTD). The DTD specifies rules
such as ordering of elements, default values, and so on.
•Third, use XML Parser that checks the XML document against the DTD and
then splits the document up into markup regions and character-data regions.
•After processing with the XML parser, the data now is in a structured format
and can be processed by any XML application.
XMLParsers (or Processors)
• one of the most important layers to an XML-aware application (e.g.Firefox, IE 5+)
• input - raw XMLdocument
• parses to ensure that the document is wellformed and/or valid (if a DTDexists),
report errors and allows programmatic access to the documentcontents
• output - a data structure (XMLdocumentis transformed)

XML DTD Tree


Document
+ (optional)
XMLparser
Structure

<books> books
<author>
<date>1995</date>
<title> Web IR </title> author
<fname>Jimmy</fname>
<lname>Huang</lname>
</author>
</books> 1995 Web IR Jimmy Huang
Parsing XML Documents
•Parsers can support the Document Object Model (DOM) and Simple API
for XML(SAX) for accessing document’s content programmatically using
languages such as Java, C, C++, Python etc.

• ADOMbased parser builds a tree structure containing the XML


document’s data in memory.
(used to create and modify XMLdocuments)

•ASAX based parser processes the document and generates events (I.e.
notifications to the application) whentags, comments etc. are
encountered. These events return data from the XMLdocument.
(used to read XMLdocuments only;
SAX is attractive for handling large documents because it is not required
to load the entire document)
DOM(Document Object Model)
•A DOM-based parser exposes a programmatic library called the DOM
API that allows data in an XML document to be accessed and modified by
manipulating the nodes in a DOM tree. DOM API is available in many
languages e.g. JavaScript.
• Data can be accessed quickly as all the document’s data is in memory.
•The DOM interfaces for creating and manipulating XML documents are
platform and language dependant. DOM parsers exist for Java, C, C++,
Python and Perl.
•JDOM provides a higher-level API than the W3C DOM for working with
XML documents in Java. See www.jdom.org
- provides full tree representation of the XML document
- allows random access to any node
- provides a variety of output formats
- less memory intensive than DOM API
• In order to use DOM API, programming experience is required.
SAX (Simple API for XML)
• Developed by the members of the XML-DEV mailing list
• Released in May 1998
•SAX and DOM are totally different APIs for accessing information in
XML documents.
•SAX based parsers invoke methods when markup (e.g. a start tag,
end tag etc.) is encountered. With this event based model, no tree
structure is created to store data. Instead, data is passed to the
application from the XML document as it is found.
=> greater performance and less memory overhead than with DOM
•Many DOM parsers use a SAX parser to retrieve data for building the
DOM tree.
•SAX parsers are typically used for reading documents that will not be
modified.
Parsing (msxml) and rendering
XML with IE
• XML document contains data, NOT formatting information.
•When XML document is loaded into IE5+, the document is
parsed by msxml.
•If the document is well-formed, the parser makes the
document’s data available to the application (I.e. IE5).
•The application can format and render the data and also
perform other processing.
•IE5 renders data by applying a stylesheet that formats and
colours the markup identically to the original document.
•Notice the - sign. It indicates that child elements are visible.
When clicked, it becomes + hiding the children.
•This behaviour is similar to viewing disk directory structure
using a program such as Windows Explorer.
Using XML:
How does browser read XML ?
 XML parser: A tool for reading XML documents.
 To manipulate an XML document, you need an XML
parser. The parser loads the document into your
computer's memory. Once the document is loaded,
its data can be manipulated using the DOM. The
DOM treats the XML document as a tree.
 Once you have installed Internet Explorer 5.0, the
Microsoft XML parser is available.
 http://www.w3schools.com/xml/xml_parser.asp
 https://developer.mozilla.org/en-
US/docs/Archive/Mozilla/XML_in_Mozilla (XML in
Mozilla)
Using XML: Presenting Data

 Need to convert XML tags into appropriate


HTML tags for use in a browser!!

 <lastname>Smith</lastname>

 <b>Smith</b> Smith
Extensible Stylesheet Language (XSL)
• XMLis just data - no presentationinformation
• Topresent the data on the screen or paper or any media - apply appropriate style
• Style sheets containrules that instruct the processorhowto present elements
• Twostyle languages:CSS (CascadingStyle Sheets) and XSL
• XSLis powerfulthan CSSand an excellent solution to control the presentation of
data
- resource intensive: memory and processingpower
- complex to write
• transforms and translates XMLdata from one format intoanother
samedocumentneeded to be displayed in HTML,PDF and postscript form
CSS and XSL
 CSS - Cascading Style Sheets
 can predefined HTML display (font etc)
 these are shared and reused

 XSL - XML Style language


 predefine display characteristics for XML
entities
 transform into CSS for browsers to use
Cascading Style Sheets
CSS street, city, town, state, province,
ZIP, postal_code {
last_name font-family: verdana, arial;
{ font-size: 12pt;
font-family: verdana, arial; font-weight:bold;
font-size: 15pt; color:green;
font-weight:bold; display:block;
display: block; margin-bottom: 20pt;
margin-bottom: 5pt; margin-top: 40pt;
} }
first_name email {
{ font-family: verdana, arial;
font-family: verdana, arial; font-size: 12pt;
font-size: 15pt; font-weight:bold;
font-weight:bold; color:blue;
display: block; display:block;
margin-bottom: 5pt; margin-top: 5pt;
} }
Extensible Stylesheet
Language (XSL)
•XSLprovides a complete separation of data or content and
presentation, andprovides a methodto translate data into a PDFor
HTMLdocument.

• XSLis a combinationof twolanguages:

* XSLT(Extensible Stylesheet LanguageTransformation):defines rules for


transformingan XMLdocument into another format

* XSLFO(XSL Formatting Objects): specific XSLinstructions that describe


howcontent should be rendered; sophisticatedversion of CSS; formatting of
<h1>,<table>tags can be set

• Details can be found at http://www.w3.org/Style/XSL


XSLT
• XSLTis a declarative language for transformingXMLdocumentsinto other

• once an XMLdocumentis parsed, it is transformed throughXSLT


that is, XSLtextual stylesheet and a textual XMLdocumentare
mergedtogether to produce data formatted according to the
stylesheet.
XMLdocument
XMLdocument
XSLT processor (another text-
based format)
XMLstylesheet
(XSLT)

•XSLT Processors - selection criteria: speed of transformation and conformity to


the XSL and XSLT specifications

• some widelyused parsers:


* Apache Xalan, Oracle XSL Processor, Lotus XSL Processor, James
Clark’s XT, Keith Visco’s XSL:P, Michael Kay’s SAXON, Microsoft XSL
processor (built into IE 5)
XSL (Style Language)
<?xml version='1.0'?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/TR/WD-xsl"
xmlns="http://www.w3.org/TR/REC-html40"
result-ns="">
<xsl:template><xsl:apply-templates/></xsl:template>
<xsl:template match="/">
<html>
<head>
<title><xsl:value-of select="/passport/last_name"/></title>
</head>
<body>
<H1><xsl:value-of select="/pastport/last_name, first_name"/></H1>
<H2>Address</H2>
<BLOCKQUOTE>
<xsl:apply-templates select="/passport/address"/>
</BLOCKQUOTE>
</body>
</html>
XSL: Examples
<xsl:template match=”EmployeeRecord/Name">
<Bold>
<xsl:apply-templates/>
</Bold> All the children of the “Name” element
contained in “EmployeeRecord” are
</xsl:template> processed with template.

<xsl:template match=”EmployeeRecord/Name">
<Bold>
<xsl:apply-templates select=“FirstName”/>
</Bold>
</xsl:template> The templates is applied only to the
`FirstName’element of the `Name’
element contained in `EmployeeRecord’.
Options for Displaying XML
XSL XSL HTML
Transformation Transformation Document Web Browser
spec

CSS XML XSL


Stylesheet Document Stylesheet

XML enabled XML Display example1


Web Broswer Engine
example2
An Example

Boeing

 Boeing places a DTD on its site


 part purchasers use this DTD
 Boeing can use multiple XSL stylesheets
Example
Boeing (cont’d)
 customer creates an order document,
they can verify the validity of that
document against the DTD.
 this ensures they are transmitting only
type-valid orders.
 in turn, Boeing can ensure they are
receiving only type-valid documents.
Summary
XML - Advantages
 Platform and system independent
 User-defined tags
 Doesn’t require explicit DTD
Display format and content are separate
XML - Disadvantages
 Requires a processing application
 “More difficult” than HTML
 Must be converted to HTML to view in
browser
Importance of XML
 Coordinating Heterogenous Databases

 Separation of Structure / Content / Display

 Document Validity Checking

 Potential Use in Standards


HTMLDocument
(good for formatting)
<html><body>
<h2>Student List</h2>

<ul> Whatis “yes”?


<li> 9906789 </li>
<li>Adam</li> Data and
<li>adam@unl.ac.uk</li> presentatio
<li>yes - final </li>
</ul>
n logic
<ul> mixed
<li> 9806791 </li>
<li>Adrian</li>
<li>adrian@unl.ac.uk</li>
<li>no</li>
</ul> Whatis “no”?
</body></html>
XMLDocument
(good for describing data)
<?xml version = "1.0"?>

<student_list>
<student> Only data
<id> 9906789 </id>
<name>Adam</name>
<email>adam@unl.ac.uk</email>
• Data is self-describing
<bsc level=“final”>yes</bsc>
</student>
• customtags describe content
(define your owntags)
<student>
<id> 9806791 </id>
<name>Adrian</name> • easy to locate data
<email>adrian@unl.ac.uk</email> (e.g. all BSCstudents)
<bsc>no</bsc>
</student>

</student_list>
The Framework of WWW
HTML
Web Designer External Applications
Authoring Non-HTTP objects
& Publisher
Tools/Editors
• JAVA Servlet
• CGI (Perl)
• ASP & ASP.NET
• Java Server Pages
• Java Applet
• JavaScript

Web Programmer
Web
Browser
Internet
Global Reach
Broad Range Web
Server
Client
End User Web Master
Why Build Pages Dynamically?
 The Web page is based on data submitted by the user
 E.g., results page from search engines and order-
confirmation pages at on-line stores
 The Web page is derived from data that changes
frequently
 E.g., a weather report or news headlines page
 The Web page uses information from databases or
other server-side sources
 E.g., an e-commerce site could use a servlet to build a
Web page that lists the current price and availability of
each item that is for sale

You might also like