Fundamental XML for Developers

Dr. Timothy M. Chester Texas A&M University

Timothy M. Chester is. . .
• Senior IT Manager, Texas A&M University
– Application Development, Systems Integration, Developer Tools & Training

• Lecturer, Texas A&M College of Business
– Courses on Business Programming Fundamentals (VB.NET, C#), XML & Advanced Web Development.

• Author
– Visual Studio Magazine, Dr. Dobbs Journal, IT Professional

• Consultant
– President & Principal, eInternet Studios

• Contact Information
– E-mail: – Web:

Texas A&M University

You Are. . .
• Software Developers
– New to XML, Object Oriented Development – Require „basics‟ of XML course

• IT Managers
– Need familiarity with XML basics and terminology – Interested in how XML can affect both software development and legacy system integration

XSLT. • Assumes you know nothing about XML or XML based technologies • Provides a basic introduction to XML based technologies • Demonstrates some of the basics of working with the DOM. WSDL. . Schema.This session . and SOAP. . .

Agenda XML • Document Object Model (DOM) • XPATH • XSLT • Schema • WSDL • SOAP • Questions .

Underlying Technologies XML Is the Glue Connectivity Presentation Connecting Applications Connect the Web Browse the Web Program the Web .

Evolution of Web



Generation 1

Generation 2

Generation 3

Static HTML

Web Applications

Web Services

Web Services Overview
Application Model
Partner Web Service Other Web Services Partner Web Service

Internet + XML

End Users
Application Business Logic Tier Data Access and Storage Tier Other Applications

Introducing XML
• XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. • Because it is extensible, XML can be used to create a wide variety of document types.

Introducing XML
• XML is a subset of a the Standard Generalized Markup Language (SGML) which was introduced in the 1980s. SGML is very complex and can be costly. • These reasons led to the creation of Hypertext Markup Language (HTML), a more easily used markup language. XML can be seen as sitting between SGML and HTML – easier to learn than SGML, but more robust than HTML.

it cannot be modified to meet specific needs. • Because HTML is not extensible. Browser developers have added features making HTML more robust. but they do not solve data description or cataloging issues in an HTML document. Additional features have been added to HTML. but this has resulted in a confusing mix of different HTML standards. .The Limits of HTML • HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page.

Introducing XML • HTML cannot be applied consistently. Different browsers require different standards making the final document appear differently on one browser compared with another. .

Introduction to XML Markup • XML document (intro.xml .xml) – Marks up message as XML – Commonly stored in text files • Extension .

5. We include them for clarity.Fig.1 2 3 4 5 6 7 8 <?xml version = "1.1 : intro.0 <!-.Simple introduction to XML markup --> <myMessage> <message>Welcome to XML!</message> </myMessage> Element message is child element of root element myMessage Line numbers are not part of XML document. .xml --> <!-.0"?> Document begins with declaration that specifies XML version 1.

Introduction to XML Markup (cont.) • XML documents – Must contain exactly one root element • Attempting to create more than one root element is erroneous – Elements must be nested properly • Incorrect: <x><y>hello</x></y> • Correct: <x><y>hello</y></x> – Must be well-formed .

It is this rigidity built into XML that ensures XML code accepted by the parser will work the same everywhere. . • XML parsers are strict.XML Parsers • An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.

called Mozilla.0 and above. . • Netscape developed its own parser. which is built into version 6.0 and above.XML Parsers • Microsoft‟s parser is called MSXML and is built directly in IE versions 5.

comments. etc. are encountered – (Events are notifications to the application) .) • XML parsers support – Document Object Model (DOM) • Builds tree structure containing document data in memory – Simple API for XML (SAX) • Generates events when tags.Parsers and Well-formed XML Documents (cont.

• Places plus (+) or minus (-) signs next to container elements – Plus sign indicates that all child elements are hidden – Clicking plus sign expands container element » Displays children – Minus sign indicates that all child elements are visible – Clicking minus sign collapses container element » Hides children • Error generated.Parsing an XML Document with MSXML • XML document – Contains data – Does not contain formatting information – Load XML document into Internet Explorer 5. if document is not well formed .0 • Document is parsed by msxml.

.XML document shown in IE6.

Character Set • XML documents may contain – Carriage returns – Line feeds – Unicode characters • Enables computers to process characters for several languages .

.g. Markup • XML must differentiate between – Markup text • Enclosed in angle brackets (< and >) – e.Characters vs. Child elements – Character data • Text between start tag and end tag – Welcome to XML! – Elements versus Attributes .

Entity References and Built-in Entities • Whitespace characters – Spaces. becomes <markup>This is character data</markup> . line feeds and carriage returns • Significant (preserved by application) • Insignificant (not preserved by application) – Normalization » Whitespace collapsed into single whitespace character » Sometimes whitespace removed entirely <markup>This is character data</markup> after normalization.White Space. tabs.

Entity References and Built-in Entities (cont.) • XML-reserved characters – – – – – Ampersand (&) Left-angle bracket (<) Right-angle bracket (>) Apostrophe (’) Double quote (”) • Entity references – Allow to use XML-reserved characters • Begin with ampersand (&) and end with semicolon (.White Space.) – Prevents from misinterpreting character data as markup .

) – Left-angle bracket (&lt.) – Quotation mark (&quot.&gt.) – Mark up characters “<>&” in element message <message>&lt.) – Apostrophe (&apos.) – Right-angle bracket (&gt.White Space. Entity References and Built-in Entities (cont.) • Build-in entities – Ampersand (&amp.&amp.</message> .

Agenda XML Document Object Model (DOM) • XPATH • XSLT • Schema • WSDL • SOAP • Questions .

NET. C. Python. VB.Introduction • XML Document Object Model (DOM) – Build tree structure in memory for XML documents – DOM-based parsers parse these structures • Exist in several languages (Java. etc) . VB. C#. Perl. C++.

Tim!" – Attributes from and to also have nodes in tree . attribute.0"?> <message from = "Paul" to = "Tem"> <body>Hi. Tim!</body> </message> • Node created for element message – Element message has child node for body element – Element body has child node for text "Hi. etc. <?xml version = "1.Introduction • DOM tree – Each node represents an element.

Xml Namspace – Sun Microsystem‟s JAXP .DOM Implementations • DOM-based parsers – Microsoft‟s msxml – Microsoft.NET System.

Creating Nodes • Create XML document at run time .

Traversing the DOM • Use DOM to traverse XML document – Output element nodes – Output attribute nodes – Output text nodes .

DOM Components • Manipulate XML document .

Agenda XML Document Object Model (DOM) XPATH • XSLT • Schema • WSDL • SOAP • Questions .

attribute values – String-based language of expressions • Not structural language like XML – Used by other XML technologies • XSLT ..g.Introduction • XML Path Language (XPath) – Syntax for locating information in XML document • e.

Nodes • XML document – Tree structure with nodes – Each node represents part of XML document • Seven types – – – – – – – Root Element Attribute Text Comment Processing instruction Namespace • Attributes and namespaces are not children of their parent node – They describe their parent node .

Determined by The element tag. node descendents in document order.XPath node types Node Type root string-value expanded-name Description Represents the root of an XML document. comment or processorinstruction children. comment or processorinstruction children. The normalized value of the attribute. concatenating the including the namespace string-values of all text. This node exists only at the top of the tree and may contain element. Represents an XML element and may contain element. concatenating the string-values of all textnode descendents in document order. text.prefix (if applicable). including the namespace prefix (if applicable). element attribute . Represents an attribute of an element. The name of the attribute. Determined by None.

XPath node types. processing instruction namespace The part of the processing instruction that follows the target and any whitespace. Represents an XML namespace.and -->). Represents an XML processing instruction. (not including <!-. Represents an XML comment. (Part 2) Node Type text string-value The character data contained in the text node. The URI of the namespace. . comment The content of the comment None. expanded-name Description None. The target of the processing instruction. Represents the character data content of an element. The namespace prefix.

Location Paths • Location path – Expression specifying how to navigate XPath tree – Composed of location steps • Each location step composed of – Axis – Node test – Predicate .

Axes • XPath searches are made relative to context node • Axis – Indicates which nodes are included in search • Relative to context node – Dictates node ordering in set • Forward axes select nodes that follow context node • Reverse axes select nodes that precede context node .

Node Tests • Node tests – Refine set of nodes selected by axis • Rely upon axis‟ principle node type – Corresponds to type of node axis can select .

Node-set Operators and Functions (cont.) • Location-path expressions – Combine node-set operators and functions • Select all head and body children element nodes head | body • Select last bold element node in head element node head/title[ last() ] • Select third book element book[ position() = 3 ] – Or alternatively book[ 3 ] • Return total number of element-node children count( * ) • Select all book element nodes in document //book .

Agenda XML Document Object Model (DOM) XPATH XSLT • Schema • WSDL • SOAP • Questions .

Introduction • Extensible Stylesheet Language (XSL) – Used to format XML documents – Consist of two parts • XSL Transformation Language (XSLT) – Transform XML document from one form to another – Use XPath to match nodes • XSL formatting objects – Alternative to CSS .

Xml Namespace .Setup • XSLT processor – Microsoft Internet Explorer 6 – Java 2 Standard Edition – Microsoft.NET System.

Templates • XSLT document – XML document with root element stylesheet – template element • Matches specific XML document nodes • Uses XPath expression in attribute match .

Templates (cont.xml into HTML document .) • XSLT – Two trees of nodes • Source tree corresponds to original XML document • Result tree contains nodes produced by transformation – Transforms intro.

A-Z) – Attribute descending (i. Z-A) ...e.e.Iteration and Sorting • XSLT allows – Iteration through node set • Element for-each – Sorting node set • Element sort – Attribute ascending (i.

Conditional Processing • Perform conditional processing – Such as if statement – Use element choose • Allows alternate conditional statements • Similar to switch statement • Has child elements when and otherwise – when element content used if condition is met – otherwise element content used if no conditions in when condition are met .

XSLT and XPath • XPath Expression – locates elements. attributes and text in XML document .

Agenda XML Document Object Model (DOM) XPATH XSLT Schema • WSDL • SOAP • Questions .

. • Name collision isn‟t a problem if you are not concerned with validation. name collision will keep a document from being validated. The document content only needs to be well-formed. • However.Working with Namespaces • Name collision occurs when elements from two or more documents share the same name.

Name Collision This figure shows two documents each with a Name element .

Using Namespaces to Avoid Name Collision This figure shows how to use a namespace to avoid collision .

• Namespaces must be declared before they can be used.Declaring a Namespace • A namespace is a defined collection of element and attribute names. Elements can share the same name if they reside in different namespaces. • Names that belong to the same namespace must be unique. .

and prefix is a string of letters that associates each element or attribute in the document with the declared namespace. The syntax to declare a namespace in the prolog is: <?xml:namespace ns=“URI” prefix=“prefix”?> • Where URI is a Uniform Resource Identifier that assigns a unique name to the namespace.Declaring a Namespace • A namespace can be declared in the prolog or as an element attribute. .

. A URI identifies a physical or an abstract resource. <?xml:namespace ns=http://uhosp/patients/ns prefix=“pat”> • Declares a namespace with the prefix “pat” and the URI http://uhosp/patients/ns. • The URI is not a Web address.Declaring a Namespace • For example.

0"?> <!-.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny. 5.Namespaces --> <directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.8 : namespace.1 2 3 4 5 6 7 8 9 10 11 <?xml version = "1.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> 12 13 14 15 16 17 18 </directory> .Fig.xml --> <!-.

jpg"> <image:description>A funny picture</image:description> 15 16 17 <image:size width = "200" height = "100"/> </image:file> 18 </directory> .9 : defaultnamespace.0"?> <!-. 5.1 2 3 4 5 6 7 8 9 10 11 12 13 14 <?xml version = "1.Fig.xml"> <description>A book list</description> </file> <image:file filename = "funny.Using Default Namespaces --> <directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <file filename = "book.xml --> <!-.

. • To avoid confusion. • It represents a specific instance of the structure defined in the schema.Schemas • A schema is an XML document that defines the content and structure of one or more XML documents. the XML document containing the content is called the instance document.

Comparing Schemas and DTDs This figure compares schemas and DTDs .

• Several schema “dialects” have been developed in the XML language.Schema Dialects • There is no single schema form. • Support for a particular schema depends on the XML parser being used for validation. .

Starting a Schema File • A schema is always placed in a separate XML document that is referenced by the instance document. .

• A simple type element contains only character data and has no attributes. • A complex type element has one or more attributes. . or is the parent to one or more child elements.Schema Types • XML Schema recognize two categories of element types: complex and simple.

Schema Types This figure shows types of elements .

.Understanding Data Types • XML Schema supports two data types: built-in and user-derived. • A built-in data type is part of the XML Schema specifications and is available to all XML Schema authors. • A user-derived data type is created by the XML Schema author for specific data values in the instance document.

Understanding Data Types • A primitive data type. • A derived data type is a collection of 25 data types that the XML Schema developers created based on the 19 primitive types. also called a base type. is one of 19 fundamental data types not defined in terms of other types. .

Agenda XML Document Object Model (DOM) XPATH XSLT Schema WSDL • SOAP • Questions .

WSDL • Think "TypeLib for SOAP" • WSDL = Web Service Description Language • Uniform representation for services – Transport Protocol neutral – Access Protocol neutral (not only SOAP) • Describes: – – – – – Schema for Data Types Call Signatures (Message) Interfaces (Port Types) Endpoint Mappings (Bindings) Endpoints (Services) .

IBM.UDDI • Think "Yahoo!" for WebServices • Universal Description and Discovery Interface • WebService-Programmable "Yellow Pages" • Advertise Sites and Services • May point to DISCO resources • Initiative driven by Microsoft. Ariba .

Agenda XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP • Questions .

programming language neutral – Hardware independent – Protocol independent • Works over existing Internet infrastructure . heterogeneous environment – It enables cross-platform interoperability • Interoperable – OS.SOAP Overview • A lightweight protocol for exchanging information in a distributed. object model.

SOAP Overview • Guiding principle: “Invent no new technology” • Builds on key Internet standards – SOAP ≈ HTTP + XML – Submitted to W3C • The SOAP specification defines: – – – – The SOAP message format How to send messages How to receive responses Data encoding .

SOAP SOAP Is Not… • Objects-by-reference – Distributed garbage collection – Bi-directional HTTP • Activation • Complicated – Doesn‟t try to solve every problem in distributed computing – Can be easily implemented .

..SOAP The HTTP Aspect • SOAP requests are HTTP POST requests POST /WebCalculator/Calculator..asmx HTTP/1.0”?> <soap:Envelope .org/Add” Content-Length: 386 <?xml version=“1.> .1 Content-Type: text/xml SOAPAction: “http://tempuri.. </soap:Envelope> .

SOAP Message Structure SOAP Message Headers SOAP Envelope The complete SOAP message Protocol binding headers <Envelope> encloses payload <Header> encloses headers Individual headers SOAP Header Headers SOAP Body Message Name & Data <Body> contains SOAP message name XML-encoded SOAP message name & data .”> <n1>12</n1> <n2>10</n2> </Add> </soap:Body> </soap:Envelope> .> ..> <soap:Header . </soap:Header> <soap:Body> <Add xmlns=“http://tempuri.SOAP SOAP Message Format • An XML document using the SOAP schema: <?xml version=“1.0”?> <soap:Envelope .

1 200 OK .> <soap:Body> <AddResult xmlns=“http://tempuri.SOAP Server Responses • Server replies with a “result” message: HTTP/1.0”?> <soap:Envelope .org/”> <result>28.6</result> </AddResult> </soap:Body> </soap:Envelope> ..... Content-Type:text/xml Content-Length: 391 <?xml version=“1.

IBM Hewlett Packard Intel . Secret Labs AB UserLand Software Inc. Ltd. Scriptics Corp.SOAP Industry Support • • • • • • • • DevelopMentor Inc. Zveno Pty. SAP Compaq • • • • • • • • • Microsoft Rogue Wave Software Inc. Rockwell Software Inc. Digital Creations IONA Technologies PLC Jetform ObjectSpace Inc.

Agenda XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions .

Questions .

Bibliography • Harvey Deitel‟s “XML:How To Program” • Prentice Hall XML Reference • Microsoft Academic Resource Kit .

Sign up to vote on this title
UsefulNot useful