You are on page 1of 13

Semantic Web

Instructors

Dr Prakash Chandra Sharma


2

Design Decisions for the Semantic Web

1. Make structured and semi structured data available in standardized formats on the web.

2. Make not just the datasets, but also the individual data elements and their relations accessible on the
web.

3. Describe the intended semantics of such data in a formalism, so that this intended semantics can be
processed by machines.
3

Basic Technology for the Semantic Web

1. Use labelled graph as the data model for objects and their relations, with objects as nodes in the
graph and the edges in the graph describing the relations between these objects. RDF is used to
represent such graphs.

2. Use web identifiers (Uniform Resource Identifiers-URI) to identify the individual data items and
their relations that appears in the datasets. This is reflected in the design of RDF.

3. Use ontologies as the data model to represent the intended semantics of the data. RDF schema
and Web Ontology Language (OWL) are used for this purpose.
4

From Data to Knowledge in Semantic Web

• To capture the intended semantics of the data, RDF schema and OWL are used which are not
just data description languages, but they are lightweight knowledge representation languages
(KRL).
• RDF schema is a very low expressivity logic that allows some very simple inferences.
• OWL is somewhat richer logic that allows additional inferences such as equality and
inequality, number restrictions, existence of objects and others.
• KRL allow the inference of additional information from the explicitly stated information.
• Data is processed and transformed into knowledge by some set of inferencing rules called
logics.
5

Web Architecture of the Semantic Web


• A key aspect of the traditional web is the fact
that content is distributed, both in location and
in ownership.
• A crucial contributor to the growth of the web
is the fact that “anybody can say anything
about anything”, or more precisely: any body
can refer to anybody’s web page without
having to negotiate 1st about permissions or
inquire about the right address or identifier to
use.

• A similar mechanism is at work in the


Semantic Web.
6

How to Get There from Here

1. We must agree on standard syntax to represent data and metadata.

2. We must have sufficient agreement on the metadata vocabularies in order to share intended
semantics of data.

3. We must publish large volumes of data in the formats of step 1, using the vocabularies of step 2.
7

A Layered Approach: Downward/Upward Compatibility


Principle
1. Downward Compatibility

2. Upward Partial Compatibility/Understanding


8

Features of “unstructured” data

Does not reside in traditional databases and data warehouses

May have an internal structure, but does not fit a relational data model

Generated by both humans and machines


 Textual and multimedia content
 Machine-to-machine communication

Examples include
 Personal messaging – email, instant messages, tweets, chat
 Business documents – business reports, presentations, survey responses
 Web content – web pages, blogs, wikis, audio files, photos, videos
 Sensor output – satellite imagery, geolocation data, scanner transactions
What is unstructured data?
a generic label for describing any corporate information that is
not in a database

information that either does not have a pre-defined data model


or is not organized in a pre-defined manner

unstructured data : Data that does not reside in fixed locations

any data that has no identifiable structure


Why is unstructured data important?
Unstructured data doubles every three months
7 million web pages are added every day

80% of business is conducted on unstructured


information
85% of all data stored is held in an unstructured format
11

But the scale is mind-boggling

1 ZB = 1021 bytes = 1024 Exabytes

About 85% is unstructured data


12

Some other significant challenges


Validity of statistical inference
 Sample biases
 Model biases

Privacy and public trust


 Disclosure threat due to mosaic effect

Data integrity
 Missing, inconsistent and inaccurate data
 Volatile sources

Data ownership and access


 Public good versus commercial advantage
 Value of private sector data
Technology
Business Intelligence
Sharepoint PowerPivot Power View
Power Query Power Map BI Semantic Model

Unstructured Semi- Structured


Excel structured SQL Server 2012
File shares/Folders Sharepoint Analysis Services
Internet Data Excel Services Integration Services
Service Manager

You might also like