You are on page 1of 94

UNIVERSITY OF TECHNOLOGY

XML, JSON, Redis


DATA ENGINEERING
Dr. Tran Minh Quang

GROUP 7
Doan Thanh Khang
Nguyen Huu Nhan

0
Outline
1. XML
2. JSON
3. REDIS

1
1. XML

2
XML: Extensible Markup Language

Figure 1.1 A graphical depiction of a very simple xml document. [1]

• Extensible Markup Language (XML) is a markup language that


defines a set of rules for encoding documents in a format that is
both human-readable and machine-readable. [1]
3
What is XML ?
• XML has emerged as the standard for structuring and exchanging data
over the Web in text files.
• XML can be used to provide information about the structure and
meaning of the data in the Web pages rather than just specifying how
the Web pages are formatted for display on the screen.
• Both XML and JSON documents provide descriptive information, such
as attribute names, as well as the values of these attributes, in a text
file; hence, they are known as self-describing documents.

4
Structured Data
• The information stored in relational databases is known as structured
data because it is represented in a strict format.
• For example, each record in a relational database table—such as each
of the tables in the COMPANY database in Figure 5.6—follows the
same format as the other records.

Figure 1.2 The DEPARTMENT TABLE in Figure 5.6 [2]


5
Semi-structured Data
• Data may have a certain structure, but not all the information
collected will have the identical structure.
• Some attributes may be shared among the various entities, but other
attributes may exist only in a few entities.
• Additional attributes can be introduced in some of the newer data
items at any time, and there is no predefined schema.
• EX: XML, JSON [3]

6
Unstructured Data
• Unstructured data (or unstructured information) is information that
either does not have a pre-defined data model or is not organized in a
pre-defined manner.
• EX: Examples of "unstructured data" may include books, journals,
documents. [4]

7
XML Hierarchical (Tree) Data Model
• Two main structuring concepts: elements and attributes.
• Attribute in XML is not used in the same manner as is customary in database.
• Attributes in XML provide additional information that describes elements.
• Complex elements are constructed from other elements hierarchically
• Simple elements contain data values.
• EX:
<TR>
<TD width=“50%”>
<FONT size=“2” face=“Arial”>John Smith:</FONT>
</TD>
<TD>7.5 hours per week</TD>
</TR>
8
Three main types of XML documents
• Data-centric XML documents. These documents have many small
data items that follow a specific structure and hence may be
extracted from a structured database.
• Document-centric XML documents. These are documents with large
amounts of text, such as news articles or books. There are few or no
structured data elements in these documents.
• Hybrid XML documents. These documents may have parts that
contain structured data and other parts that are predominantly
textual or unstructured. They may or may not have a predefined
schema.

9
Three main types of XML documents
• EX:

Figure 1.3 Example for Data-centric XML [5]

Figure 1.4 Example for Document-centric XML documents [5]


10
XML Documents, DTD, and XML Schema
• Database schemas constrain what information can be stored, and the
data types of stored values.
• XML documents are not required to have an associated schema.
• However, schemas are very important for XML data exchange
• Otherwise, a site cannot automatically interpret data received
from another site
• Two mechanisms for specifying XML schema
• Document Type Definition (DTD)
• Widely used
• XML Schema
• Newer, increasing use

11
Why DTDs?
• XML documents are designed to be processed by computer programs
• If you can put just any tags in an XML document, it’s very hard to write a
program that knows how to process the tags.
• A DTD specifies what tags may occur, when they may occur, and what
attributes they may (or must) have.
• A DTD allows the XML document to be verified (shown to be legal).
• A DTD that is shared across groups allows the groups to produce
consistent XML documents.

12
Document Type Definition (DTD)
• The type of an XML document can be specified using a DTD
• DTD constraints structure of XML data
• What elements can occur
• What attributes can/must an element have
• What subelements can/must occur inside each element, and how many
times.
• DTD does not constrain data types
• All values represented as strings in XML
• DTD syntax
• <!ELEMENT element (subelements-specification) >
• <!ATTLIST element (attributes) >

13
Document Type Definition (DTD)

14
ELEMENT descriptions
• When specifying elements, the following notation is used:
• A * repeated zero or more times.
• A + repeated one or more times.
• A ? repeated zero or one times.
• An element appearing without any of the preceding three symbols must
appear exactly once in the document.
• #PCDATA stands for parsed character data.
• The list of attributes that can appear within an element can also be specified
via the keyword !ATTLIST .
• Parentheses can be nested when specifying elements.
• A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the
document.

15
ELEMENT descriptions
• Suffixes:
? optional foreword?
+ one or more chapter+
* zero or more appendix*
• Separators
, both, in order foreword?, chapter+
| or section|chapter
• Grouping
() grouping (section|chapter)+

16
DTD Example [6]

Figure 1.3 Example DTD [6]


<?xml version="1.0" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
<person name="Joe Miller" gender="male"
type="father" id="123.456.789"/>
<person name="Josette Miller" gender="female"
type="girl" id="123.456.987"/>
</family>
17
Limitations of DTDs
• DTDs are a very weak specification language
• You can’t put any restrictions on element contents.
• It’s difficult to specify:
• All the children must occur, but may be in any order
• This element must occur a certain number of times
• There are only ten data types for attribute values, What about integer, float,
date, etc.?
• ID not typed , No two elements can have the same id, even if they have
different types (e.g., book vs. section )

18
XML Schema
• A more powerful way of defining the structure and constraining the
contents of XML documents
• An XML Schema definition is itself an XML document
• Typically stored as a standalone .xsd file
• XML (data) documents refer to external .xsd files
• W3C recommendation
• Unlike DTD, XML Schema is separate from the XML specification

19
XML Schema example

Figure 1.6 XML Schema example. [9]


20
.
Schema descriptions and XML namespaces:
• http://www.w3.org/2001/XMLSchema is a commonly used standard
for XML schema commands.
• Each such definition is called an XML namespace because it defines
the set of commands (names) that can be used.
EX: <?xml version=“1.0” encoding=“UTF-8” ?>
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

21
Annotations, documentation, and language
used
• xsd:annotation and xsd:documentation, which are used for providing
comments and other descriptions in the XML document.
• xml:lang of the xsd:documentation element specifies the language
being used.
<xsd:annotation>
<xsd:documentation xml:lang=“en”>Company Schema (Element
Approach) - Prepared by Babak Hojabri </xsd:documentation>
</xsd:annotation>

22
Elements and types
• The name attribute of the xsd:element tag specifies the element name,
which is called company for the root element in our example.
• The structure of the company root element can then be speci- fied, which
in our example is xsd:complexType. This is further specified to be a
sequence of departments, employees, and projects using the xsd:sequence
structure of XML schema.
• First-level elements in the COMPANY database.
<xsd:element name=“company”> <xsd:complexType>
<xsd:sequence>
<xsd:element name=“department” type=“Department”
minOccurs=“0” maxOccurs=“unbounded” />
<xsd:element name=“employee” type=“Employee” minOccurs=“0”
maxOccurs=“unbounded”>
23
• Specifying element type and minimum and maximum occurrences.
• minOccurs, and maxOccurs in the xsd:element tag specify the type and
multiplicity of each element in any document that conforms to the schema
specifications.
• These serve a similar role to the *, +, and ? symbols of XML DTD.
• Specifying keys.
• In XML schema, it is possible to specify constraints that correspond to
unique and primary key constraints in a relational database.
• For specifying primary keys, the tag xsd:key is used instead of xsd:unique.
• For specifying foreign keys, the tag xsd:keyref is used.

24
Storing and Extracting XML Documents from
Databases
• Using a file system or a DBMS to store the documents as text.
• Using a DBMS to store the document contents as data elements.
• Designing a specialized system for storing native XML data.
• Creating or publishing customized XML documents from preexisting
relational databases.

25
Xpath
• XPath: Specifying Path Expressions in XML.
• An XPath expression generally returns a sequence of items that satisfy
a certain pattern as specified by the expression.
For example, if the COMPANY XML document is stored at the location
• www.company.com/info.XML then the first XPath expression in Figure
13.6 can be written as doc(www.company.com/info.XML)/company

26
Xpath
• /company
/company/department
//employee [employeeSalary gt 70000]/employeeName
/company/employee [employeeSalary gt 70000]/employeeName
/company/project/projectWorker [hours ge 20.0]
• //, which is convenient to use if we do not know the full path name we are
searching for, but we do know the name of some tags of interest within the XML
document.
• It is also pos- sible to use the wildcard symbol *, which stands for any element
• Gt: greater than
• Ge: equal to

27
Xpath
• Can’t not:
• Reconstruct
• Reorder
• Create new elements

28
XQuery: Specifying Queries in XML
• FOR <variable bindings to individual nodes (elements)>
• LET <variable bindings to collections of nodes (elements)>
• WHERE <qualifier conditions>
• ORDER BY <ordering specifications>
• RETURN <query result specification>
EX:
LET $d : = doc(www.company.com/info.xml)
FOR $x IN $d/company/project[projectNumber = 5]/projectWorker, $y IN $d/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
ORDER BY $x/hours
RETURN <res> $y/employeeName/firstName, $y/employeeName/lastName, $x/hours </res>

29
Extracting XML Documents from Relational
Databases

Figure 1.4 THierarchical (tree) view with COURSE as the root. [2] 30
XML/SQL: SQL Functions for Creating XML
Data
• XMLELEMENT: This is used to specify a tag (element) name that will appear in the
XML result. It can specify a tag name for a complex element or for an individual
column.
• XMLFOREST: If several tags (elements) are needed in the XML result, this function
can create multiple element names in a simpler manner than XMLELEMENT. The
column names can be listed directly, separated by commas, with or without
renaming. If a column name is not renamed, it will be used as the element (tag)
name.
• XMLAGG: This can group together (or aggregate) several elements so they can be
placed under a parent element as a collection of subelements.
• XMLROOT: This allows the selected elements to be formatted as an XML
document with a single root element.
• XMLATTRIBUTES: This allows the creation of attributes for the elements of the
XML result.

31
Example
X1: SELECT XMLELEMENT (NAME “lastname”, E.LName)
FROM EMPLOYEE E
WHERE E.Ssn = “123456789” ;

<lastname>Smith</lastname>

32
Example
X2: SELECT XMLELEMENT(NAME“employee”,
XMLFOREST (
E.Lname AS “ln”,
E.Fname AS “fn”,
E.Salary AS “sal” ) )
FROM EMPLOYEE AS E
WHERE E.Ssn = “123456789” ;

<employee>
<ln>Smith</ln>
<fn>John</fn>
<sal>30000</sal>
</employee>

33
XML versus relational data
Relational data XML data
• Schema is always fixed in advance • Well-formed XML does no require
and difficult to change. predefined, fixed schema.
• Simple, flat table structures. • Ordering forced by document
• Ordering of rows and columns is format; may or may not be
unimportant. important.
• Exchange is problematic. • Designed for easy exchange.
• “Native” support in all serious • Often implemented as an “add-
commercial DBMS .0 on” on top of relations.

34
2. JSON

35
JSON
• JSON = JavaScript Object Notation:
• It’s really language independent.
• most programming languages can easily read it and instantiate objects or
some other data structure.

• JSON is a light-weight alternative to XML for data-interchange.


• Started gaining tracking ~2006 and now widely used.
• http://json.org/ has more information.

36
JSON Data – A name and a value
• A name/value pair consists of a field name (in double quotes), followed by a colon,
followed by a value
• Unordered sets of name/value pairs
• Begins with { (left brace)
• Ends with } (right brace)
• Each name is followed by : (colon)
• Name/value pairs are separated by , (comma)

{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}

37
JSON Data – A name and a value
• In JSON, values must be one of the following data types:
• a string
• a number
• an object (JSON object)
• an array
• a boolean
• null

{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}

38
JSON Data – A name and a value
• Strings in JSON must be written in double quotes.
{ "name":"John" }

• Numbers in JSON must be an integer or a floating point.


{ "age":30 }

• Values in JSON can be objects.


{
"employee":{ "name":"John", "age":30, "city":"New York" }
}

• Values in JSON can be arrays.


{
"employees":[ "John", "Anna", "Peter" ]
}

39
Fundamental difference between
XML Schema and JSON Schema
• XML Schema: specifies closed content unless deliberate measures are
taken to make it open (e.g., sprinkle the <any> element liberally
throughout the schema).
• JSON Schema: specifies open content unless deliberate measures are
taken to make it closed (e.g., sprinkle "additionalProperties": false
liberally throughout the schema).

40
Sr.No. Keyword & Description

1 $schema The $schema keyword states that this schema is written according to the draft v4 specification.

2 title You will use this to give a title to your schema.

3 description A little description of the schema.

4 dype The type keyword defines the first constraint on our JSON data: it has to be a JSON Object.

5 properties Defines various keys and their value types, minimum and maximum values to be used in JSON file.

6 required This keeps a list of required properties.

7 minimum This is the constraint to be put on the value and represents minimum acceptable value.

8 exclusiveMinimum If "exclusiveMinimum" is present and has boolean value true, the instance is valid if it is strictly greater
than the value of "minimum".

9 maximum This is the constraint to be put on the value and represents maximum acceptable value.

10 exclusiveMaximum If "exclusiveMaximum" is present and has boolean value true, the instance is valid if it is strictly lower
than the value of "maximum".

11 multipleOf A numeric instance is valid against "multipleOf" if the result of the division of the instance by this keyword's
value is an integer.

12 maxLength The length of a string instance is defined as the maximum number of its characters.

13 minLength The length of a string instance is defined as the minimum number of its characters.

14 pattern A string instance is considered valid if the regular expression matches the instance successfully.

41
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Product",
"description": "A product from Acme's catalog",
"type": "object",
"properties": {
"id": {
"description": "The unique identifier for a product",
"type": "integer" },
"name": {
"description": "Name of the product",
"type": "string" },
"price": {
"type": "number",
"minimum": 0,
"exclusiveMinimum": true } },
"required": ["id", "name", "price"]
}
42
JSONpath

Figure 1.6 JSONpath – Xpath comparing. [7] 43


.
JSONpath

Figure 1.7 JSONpath – Xpath comparing. [7] 44


.
JSONpath

45
JSONpath

Figure 1.8 JSONpath example. [8] 46


JSONpath

Figure 1.9 JSONpath example. [8] 47


JSONpath

Figure 1.10 JSONpath example. [8] 48


JSONpath

Figure 1.11 JSONpath example. [8] 49


JSONpath

Figure 1.12 JSONpath example. [8] 50


JSON vs XML
JSON XML
• JSON object has a type • XML data is typeless
• JSON types: string, number, array, Boolean • All XML data should be string

• Data is readily accessible as JSON objects • XML data needs to be parsed.

• JSON is supported by most browsers. • Cross-browser XML parsing can be tricky

• JSON has no display capabilities. • XML offers the capability to display data
because it is a markup language.
• JSON supports only text and number data type. • XML support various data types such as
number, text, images, charts, graphs, etc. It also
provides options for transferring the structure
or format of the data with actual data.
51
JSON vs XML
• Retrieving value is easy • Retrieving value is difficult
• Supported by many Ajax toolkit • Not fully supported by Ajax toolkit
• A fully automated way of • Developers have to write JavaScript code to
deserializing/serializing JavaScript. serialize/de-serialize from XML
• Native support for object. • The object has to be express by conventions -
mostly missed use of attributes and elements.

• It supports only UTF-8 encoding. • It supports various encoding.


• It doesn't support comments. • It supports comments.
• JSON files are easy to read as compared to • XML documents are relatively more difficult to
XML. read and interpret.
• It does not provide any support for • It supports namespaces.
namespaces.
• It is less secured. • It is more secure than JSON. 52
XML vs JSON
• JSON is Like XML Because
• Both JSON and XML are "self describing" (human readable)
• Both JSON and XML are hierarchical (values within values)
• Both JSON and XML can be parsed and used by lots of programming languages

• JSON is Unlike XML Because


• JSON doesn't use end tag
• JSON is shorter
• JSON is quicker to read and write
• JSON can use arrays
• JSON has a better fit for OO systems than XML

• The biggest difference is:


• XML has to be parsed with an XML parser. JSON can be parsed by a standard JavaScript function.

53
3. Redis

54
Introduction
- Redis stands for REmote DIctionary Server.
- An open source, in-memory data structure store, used as a database,
cache and message broker.
- Like NoSQL databases, such as Cassandra, MongoDB, Redis allows
the user to store vast amounts of data without the limits of a relational
database.

55
Specification
- Supports various data structures: strings, hashes, sets, lists, sorted sets,
bitmaps, hyperloglogs and geospatial indexes with radius queries and
streams.
- Built-in replication, Lua scripting, LRU cache, transactions and
different levels of on-disk persistence.
- Provides high availability via Redis Sentinel and automatic
partitioning with Redis Cluster.

56
Compared to other databases and software (1)

57
Compared to other databases and software (2)

58
Redis Keys
Redis keys are binary safe, this means that you can use any binary sequence as a key. The
empty string is also a valid key.
A few other rules about keys:
• Very long keys are not a good idea. For instance a key of 1024 bytes is a bad idea not only
memory-wise, but also because the lookup of the key in the dataset may require several
costly key-comparisons.
• Very short keys are often not a good idea. There is little point in writing "u1000flw" as a
key if you can instead write "user:1000:followers". The latter is more readable and the
added space is minor compared to the space used by the key object itself and the value
object. While short keys will obviously consume a bit less memory, your job is to find the
right balance.
• Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000".
Dots or dashes are often used for multi-word fields, as in "comment:1234:reply.to" or
"comment:1234:reply-to".
• The maximum allowed key size is 512 MB.

59
Data Types

60
String

61
String

62
String

63
String

64
List

65
List

66
List

67
Set

68
Set

69
Set

70
Set

71
Hash

72
Hash

73
Hash

74
Hash

75
Sorted set

76
Sorted set

77
Sorted set

78
Sorted set

79
Example –Voting on articles

80
Voting on articles

81
Voting on articles

82
83
84
Publish/subscribe

85
Persistence options
There are two different ways of persisting data to disk.
- Snapshotting: takes the data as it exists at one moment in time and
writes it to disk.
- AOF, or append—only file: works by copying incoming write
commands to disk as they happen.

86
Replication

87
Handling system failures
- redis-check-aof and redis-check-dump
- –fix as an argument to redis-check-aof, the command will fix the file.
Its method to fix an append-only file is simple: it scans through the
provided AOF, looking for an incomplete or incorrect command.

88
Replacing a failed master
REDIS SENTINEL relatively recent addition to the collection of tools
available with Redis. By the final publishing of this manuscript, Redis
Sentinel should be complete. Generally, Redis Sentinel pays attention to
Redis masters and the slaves of the masters and automatically handles
failover if the master goes down.

89
Transaction
- Transactions in Redis are different from transactions that exist in more
traditional relational databases. In a relational database, we can tell the
database server BEGIN, at which point we can perform a variety of
read and write operations that will be consistent with respect to each
other, after which we can run either COMMIT to make our changes
permanent or ROLLBACK to discard our changes.
- Within Redis, there’s a simple method for handling a sequence of
reads and writes that will be consistent with each other. We begin our
transaction by calling the special command MULTI, passing our series
of commands, followed by EXEC

90
Redis Cluster
- Redis Cluster provides a way to run a Redis installation where data
is automatically sharded across multiple Redis nodes.
-Redis Cluster also provides some degree of availability during partitions,
that is in practical terms the ability to continue the operations when some
nodes fail or are not able to communicate. However the cluster stops to
operate in the event of larger failures (for example when the majority of
masters are unavailable).
-So in practical terms, what do you get with Redis Cluster?
• The ability to automatically split your dataset among multiple nodes.
• The ability to continue operations when a subset of the nodes are
experiencing failures or are unable to communicate with the rest of the
cluster.

91
THANKS YOU

92
References
• [1] - https://en.wikipedia.org/wiki/XML
• [2] - Fundamental of Database System 7th edition solution
• [3] - https://en.wikipedia.org/wiki/Semi-structured_data
• [4] - https://en.wikipedia.org/wiki/Unstructured_data
• [5] - XML and Databases Copyright 1999-2003 by Ronald Bourret Last updated January, 2003
• [6] - http://edutechwiki.unige.ch/en/DTD_tutorial
• [7] - https://goessner.net/articles/JsonPath/
• [8] - Jsonb roadmap - Oleg Bartunov Postgres Professional
• [9] - XML, DTD, and XML Schema - Introduction to Databases CompSci 316 Fall 2014 – Duke computer science
• [10] - XML and JSON - Sampath Jayarathna, Cal Poly Pomona
• [11] - JSON and JSON-Schema for XML Developers - Approved for Public Release; Distribution Unlimited. Case Number 14-3179
• [12] - https://redis.io/documentation
• [13]- https://redislabs.com/community/ebook/

93

You might also like