You are on page 1of 256

Information Technology

Internet
⚫Historyof the Internet
⚫Protocols
⚫Computer Networks
⚫DNS
⚫Web Servers, E-mail etc.
The “Victorian” Internet
⚫ Invented in the 1840s.
⚫ Signals sent over wires that
were established over vast
distances
⚫ Used extensively by the U.S.
Government during the
American Civil War, 1861 -
1865
⚫ Morse Code was dots and
dashes, or short signals and
long signals
⚫ The electronic signal standard
of +/- 15 v. is still used in
network interface cards today.
What is the Internet?
⚫ A large computer network that joins together many
organizations
⚫ It provides the infrastructure for e-mail, file archives,
hypertext documents, databases etc
⚫ The vast collection of computer networks which form
and act as a single huge network for transport of
data and messages across distances which can be
anywhere from the same office to anywhere in the
world.
(Written by William F. Slater, III, 1996, President of the
Chicago Chapter of the Internet Society)
A Short History…
⚫ 1968 - DARPA (Defense Advanced Research Projects Agency)
contracts with BBN (Bolt, Beranek & Newman) to create
ARPAnet
⚫ 1970 - First five nodes:
⚫ UCLA
⚫ Stanford
⚫ UC Santa Barbara
⚫ U of Utah, and
⚫ BBN
⚫ 1974 - TCP specification by Vint Cerf
⚫ 1984 – On January 1, the Internet with its 1000 hosts
converts en masse to using TCP/IP for its messaging
A Brief Summary of the
Evolution of the Internet Mosaic
Age of
eCommerce
Begins
WWW Created 1995
Internet Created 1993
Named 1989
and
Goes
TCP/IP TCP/IP
Created 1984
ARPANET 1972
1969
Hypertext
Invented
Packet 1965
Switching
First Vast Invented
Computer 1964
Network
Silicon Envisioned
A Chip 1962
Mathematical 1958
Theory of
Memex Communication
Conceived 1948
1945

1945 1995

Copyright 2002, William F. Slater, III, Chicago, IL, USA


From Simple, But Significant Ideas Bigger Ones
Grow
1970s to 1995

Great efficiencies can be accomplished if we use


The Internet and the World Wide Web to conduct business.

The World Wide Web is easier to use if we have a browser that


To browser web pages, running in a graphical user interface context.

Computers connected via the Internet can be used


more easily if hypertext links are enabled using HTML
and URLs: it’s called World Wide Web
The ARPANET needs to convert to
a standard protocol and be renamed to
The Internet
We need a protocol for Efficient
and Reliable transmission of
Packets over a WAN: TCP/IP

Ideas from
1940s to 1969

1970 1995
Main documents for the
Internet
⚫ RFC (Request For Comments)
⚫ http://www.ietf.org/rfc/rfcXXXX.txt

⚫ RFC 1287 - Towards the Future Internet Architecture


⚫ RFC 2101 - IPv4 Address Behaviour Today
⚫ RFC 2775 - Internet Transparency
⚫ RFC 3234 - Middleboxes: Taxonomy and Issues (NAT,
Sockets, Proxies, Caches etc);

⚫ The Internet Engineering Task Force (IETF) adopts


some of the proposals published in RFCs as Internet
standards.
Communication model: TCP/IP

Protocol: a convention or standard that controls or enables the


connection, communication, and data transfer between two
computing endpoints.
TCP/IP
⚫ Network layer – enable hosts to send
packets in any network. The packets move
independetly to destination. It defines
Internet Protocol
⚫ Transport layer – enables
conversations between pair entities in
source and destination hosts.
⚫ Defines Transmission Control Protocol
⚫ Defines User Datagram Protocol
TCP/IP
⚫ Application layer – high level protocols:
Telnet, FTP, SMTP, DNS, HTTP
⚫ Network access – physical and data link
model from OSI. ARPANET, SATNET, packet
radio, LAN, etc.
Various sub-protocols from
TCP/IP
Communication between
levels
General Encapsulation for
protocols

⚫ Each layer adds its own header for identification


⚫ Flexibility due to higher layers don’t need to know the
lower levels technologies
TCP Header
TCP/IP Connections
⚫ Connection openning
⚫ Destination computer answers
⚫ Source computer establishes a connection
between the 2 computers
⚫ Data transfer begins
⚫ Socket = IP address + TCP port
Computer networks
⚫ Local Area Networks
⚫ Campus Area Network
⚫ Metropolitan Area Networks
⚫ Wide Area Networks
Local Area Network (LAN)
⚫ Local Area Network
⚫ Size – small
⚫ Transmission technologies
⚫ Topology
Campus Area Network
⚫ interconnection of local area networks (LANs)
within a limited geographical area
⚫ it can be considered one form of a
metropolitan area network, specific to an
academic setting.
Metropolitan Network
•Large computer networks usually spanning a city.
•They typically use wireless infrastructure or
Optical fiber connections to link their sites.
WANs
⚫ Country, continent
⚫ Hosts connected through sub-networks
⚫ Used to connect LANs and other types of
networks together, so that users and
computers in one location can communicate
with users and computers in other locations
WANs – routers and switches
Options for WAN connectivity
Description
Bandwidth Sample
Advantages Disadvantages
range protocols used

Leased line
Point-to-Point
connection
between two PPP, HDLC,
Most secure Expensive
computers or SDLC, HNAS
Local Area
Networks (LANs)

Circuit switching
A dedicated
circuit path is
created between
28 Kb/s - 144
end points. Best Less Expensive Call Setup PPP, ISDN
Kb/s
example is
dialup
connections
Description Bandwidth Sample
Advantages Disadvantages
range protocols used
Packet switching Devices transport
packets via a
shared single
point-to-point or
point-to-
multipoint link
across a carrier
internetwork. Shared media X.25 Frame-
Variable length across link Relay
packets are
transmitted over
Permanent
Virtual Circuits
(PVC) or
Switched Virtual
Circuits (SVC)
Cell relay Similar to packet
switching, but
uses fixed length
cells instead of
variable length best for
Overhead can be
packets. Data is simultaneous use ATM
considerable
divided into fixed- of Voice and data
length cells and
then transported
across virtual
circuits
Network topologies
⚫ Bus
⚫ Ring, token ring
⚫ Star
⚫ Switch
⚫ Mesh
⚫ Hierarchical
⚫ Complex
Bus
Ring, token ring

Provides a collision-free and redundant


networking environment.

•A frame travels along the circle,


stopping at each node.
•If that node wants to transmit data, it
adds destination address and data
information to the frame.
•The frame then travels around the
ring, searching for the destination
node.
•When it’s found, the data is taken out
of the frame and the cycle continues.
Switch

One of the most popular topologies for


Ethernet LANs is the star and extended
star topology. It is easy to setup, it’s
relatively cheap, and it creates more
redundancy than the Bus Topology.
Star topology

The Hierarchical Topology is much like the


Star Topology, except that it doesn’t use a
central node

This type of topology suffers from the same centralization flaw as the
Star Topology. If the device that is on top of the chain fails, consider the
entire network down. Obviously this is impractical and not used a great
deal in real applications.
Mesh
⚫ The Full-Mesh Topology connects
every single node together. This will
create the most redundant and
reliable network around- especially
for large networks

⚫ Partial-Mesh Topology – implements


only a few alternate routes
Internet Applications

DNS, HTTP, e-mail


URL
⚫ Uniform/Universal Resource Locator – o
“universal” mean for identifying resources in the
Internet
⚫ sometimes = to URI - see http://en.wikipedia.org/wiki/URI_scheme for more
⚫ General syntax: protocol://server.ext/path/…/file.ext
⚫ Examples:
⚫ http://econ.unitbv.ro/studenti/8000/note_TI.htm
⚫ ftp://ftp.unitbv.ro/antivirusi/NavUpdate.exe
⚫ http://193.230.54.50/studenti/8000/note_TI.htm
URI – IANA schemes (local doc)
⚫ <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
⚫ scheme name consist of a letter followed by any combination of letters, digits, and the plus ("+"),
period ("."), or hyphen ("-") characters; and is terminated by a colon (":")
⚫ hierarchical part of the URI is intended to hold identification information hierarchical in nature.
Usually this part begins with a double forward slash ("//"), followed by an authority part and an
optional path
⚫ query - an optional part separated with a question mark, which contains additional identification
information which is not hierarchical in nature. Its syntax is not generically defined, but is
commonly organized as a sequence of <key>=<value> pairs separated by an ampersand, e. g.
key1=value1&key2=value2&key3=value3.
⚫ fragment is an optional part separated from the front parts by a hash ("#"). It holds additional
identifying information which allows indirect identification of a secondary resource
IP addresses and Domain
Names
⚫ IP Address
⚫ Binary address used for identifying computers
within networks
⚫ General form: x.x.x.x, where x € [0, 255]
⚫ Domain names
⚫ Identifies a computer or computers on the Internet
⚫ Identifies an organization on the Internet
⚫ host.domain.TLD (Top Level Domain)
TLD
Domain Names -
categorization
⚫ TLD
⚫ Organization Type

com Comercial
edu Education
Gov Government (www.odci.gov
=CIA)
Mil Military (www.army.mil = US
Army)
Org Non for profit organizations
Domain Names -
categorization
⚫ Top Level Domain:
⚫ Country name

Ro Romania
DE Germania
IT Italia
Ca Canada
US Statele Unite
Uk Marea Britanie (www.ox.ac.uk)
Jp Japonia
Vanity ccTLDs
TLDs which are used for various purposes outside their home countries, because of their name:

ad is a ccTLD for Andorra, but has recently been increasingly used by advertising agencies.
ag is a ccTLD for Antigua and Barbuda and is sometimes used for agricultural sites. In Germany, AG (short for Aktiengesellschaft) is
appended to the name of a stock-based company, similar to Inc. in USA.
am is a ccTLD for Armenia, but is often used for AM radio stations. → http://aprs.fi/
as is a ccTLD for American Samoa. In Denmark and Norway, AS is appended to the name of a stock-based company, similar to Inc.
in USA.
be is a ccTLD for Belgium. Widely used by small Bulgarian websites because it's cheaper than a bg ccTLD.
cc is a ccTLD for Cocos (Keeling) Islands but is used for a wide variety of sites.
cd is a ccTLD for Democratic Republic of Congo but is used for CD merchants and file sharing sites.
dj is a ccTLD for Djibouti but is used for CD merchants and disc jockeys.
fm is a ccTLD for the Federated States of Micronesia but it is often used for FM radio stations.
gg is a ccTLD for Guernsey but it is often used by the gaming and gambling industry (with "gg" being the abbreviation for "good
game"), particularly in relation to horse racing gee-gee.
im is a ccTLD for Island of Man but is often used by instant messaging programs and services.
in is a ccTLD for India but is widely used in the internet industry.
je is a ccTLD for Jersey but is often used as a diminutive in Dutch (e.g. "huis.je"), as "you" ("zoek.je" = "search ye!"), or as "I" in
French (e.g. "moi.je")
la is a ccTLD for Laos but is marketed as the TLD for Los Angeles.
li is a ccTLD for Liechtenstein but is marketed as the TLD for Long Island.
md is a ccTLD for Moldova, but is marketed exclusively to the medical industry (as in "medical domain" or "medical doctor").
mu is a ccTLD for Mauritius, but is used within the music industry.
nu is a ccTLD for Niue but marketed as resembling "new" in English and "now" in Nordic/Dutch. Also meaning "nude" in
French/Portuguese.
sc is a ccTLD for Seychelles but is often used as .Source
to is a ccTLD for Tonga but is often used as the English word "to", like "go.to"
tv is a ccTLD for Tuvalu but it is used for the television ("tv")/entertainment industry purposes.
ws is a ccTLD for Samoa (earlier Western Samoa) is marketed as .Website
vu is a ccTLD for Vanuatu but means "seen" in French.
How DNS works
⚫ DNS:
⚫ Recursive (DNS Server gives the full answer);
⚫ Non-recursive (DNS Server gives partial answer).
⚫ Finding IP address for www.unitbv.ro:
⚫ the client connects to its first configured DNS
server → IP for the host that hosts TLD .ro →
IP for one of the DNS servers holding
unitbv.ro information → www.unitbv.ro =
193.254.231.8 → connection to IP;
List of DNS record types
⚫ A record (1): address record, maps a hostname to a 32-bit IPv4 address.
⚫ AAAA record (28): IPv6 address record maps a hostname to a 128-bit
IPv6 address.
⚫ CNAME record (5): canonical name record is an alias of one name to
another. The A record to which the alias points can be either local or remote
- on a foreign name server. This is useful when running multiple services
(like an FTP and a webserver) from a single IP address. Each service can
then have its own entry in DNS (like ftp.example.com. and
www.example.com.). It is also used when running multiple HTTP servers,
with different names, on the same physical host.
⚫ MX record (15): mail exchange record maps a domain name to a list of
mail exchange servers for that domain.
⚫ ETC….
URL vs. Domain
⚫ URL: http://www.example.net/index.html
⚫ For finding resources in a computer
⚫ Domain name: example.net
⚫ For finding a host/alias in a domain
⚫ Registered domain name: example.net
⚫ For finding an organization on the Internet
Server Web/HTTP
⚫ The Web is a Client/Server Architecture
⚫ The Application that sends pages “over” the
World Wide Web is called HTTP Server
⚫ Default port for the HTTP server is 80
⚫ Retrieving the pages is made through an
URL of the following type
http://server.ext:port/path/.../file.ext
Server Web/HTTP
⚫ Default pages
⚫ They are configured at the installation of the web
server
⚫ Exemples:
⚫ Default.htm, default.html,
⚫ Index.htm, index.html
⚫ Default.aspx, index.aspx;
⚫ Default.php, index.php4
⚫ Etc……
How HTTP works
1. The browser connects to the remote computer at
the IP and port from the address/URL;
2. It sends (through HTTP) GET
/students/8000/note_IT.htm
3. The server sends back:
1. Status resource code (200 OK, 404 = not found, etc.);
2. File type indicator (html, image, etc);
3. File content
How HTTP works
⚫ If the file contains HTML, the browser:
⚫ Parses the file
⚫ For each URL found in the file, it connects back
to the server for getting the resources
⚫ The Procedure repeats
HTTP answer example
HTTP/1.0 200 OK
Server: Microsoft-IIS/4.0
Connection: keep-alive
Date: Fri, 09 Feb 2001 22:41:10 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Fri, 09 Feb 2001 03:50:15 GMT
Content-Length: 5574
Folders in a Web Server
⚫ There is a root folder for the web server web
(c:\Inetpub\wwwroot);
⚫ At a request
GET /students/8000/note_TI.html
⚫ The Server sends back to the client
c:\Inetpub\wwwroot\students\8000\note_TI.html
Virtual Folders
⚫ E.G.:
⚫ Virtual Folder: News
⚫ Phisical Folder: E:\LocalNews
⚫ Root Web Server: c:\inetpub\wwwroot
⚫ At a Request for
⚫ GET /news/default.html
⚫ The Server sends back
⚫ E:\LocalNews\default.html
Virtual Web Servers
⚫ On ONE computer, a server may answer to
requests for multiple IP addresses
⚫ On ONE server/computer:
⚫ www.server.ro
⚫ www.server.com, etc.
⚫ Each with its own IP address
⚫ Or according to Host Header.
Logging visits
⚫ IN log that are generated:
⚫ Every hour;
⚫ daily;
⚫ weekly;
⚫ monthly, etc.
⚫ Log files may keep:
⚫ Client IP address;
⚫ Resource status;
Logging visits
⚫ Log files may keep :
⚫ Referrer;
⚫ server IP / web farms;
⚫ Port server;
⚫ Access URL;
⚫ Agent name (browser or…)
⚫ Date and hour for access…….
Applications for log Analysis
(Web Log Analyzer)
⚫ Traffic analysis from web and FTP server
logs;
⚫ Various reports;

⚫ ….
E-mail
⚫ Message exchange by means of remote
distance communication;
⚫ Messages may be encoded in:
⚫ Text ASCII (American Standard Code for
Information Interchange)
⚫ HTML
⚫ XML (eXtensible Markup Language)
E-mail
⚫ Protocols:
⚫ SMTP – Simple Mail Transfer Protocol
⚫ POP3 – Post Office Protocol
⚫ IMAP – Internet Message Access Protocol
E-mail - SMTP
⚫ It controls the transport to a destination
server; it is used for sending and receiving
messages between servers;
⚫ It is used for sending messages from the
client to the server;
E-mail – POP3
⚫ Standard protocol for retrieving and
downloading+moving email messages;
⚫ It controls a connection between a POP3
client and a server used for storing
messages;
⚫ It has 3 states:
⚫ Authentication – the POP3 client is connected to
server must be authenticated before users
downloading messages.
E-mail – POP3
⚫ Transaction state - the client sends POP3
commands and the server receives and
executes them according to the POP3
protocol
⚫ Update state – closes the connection
between the client and server, being the last
command sent by the client.
SMTP and POP3
E-mail Operations
POP 3 System Components
⚫ POP3 client– is an application used for
reading, composing and managing e-mail
messages;
⚫ SMTP – transfers the messages between the
client and the server;
⚫ POP3 protocol – email protocol used for
controlling the connection between the client
and the server.
Organization level e-mail
services
⚫ e-mail servers – a computer that has services for
SMTP, POP3 or IMAP; the users may connect to it
for downloading, sending and managing through a
network
⚫ Email domains –
⚫ Domain name
⚫ Mail eXchanger (MX) registration in DNS
⚫ Mail boxes – a mailbox it is used by one user who is
member of an email domain
IMAP
⚫ A method used for accessing messages
stored on a e-mail server;
⚫ Support for operations like creating, deleting
and renaming folders, checking for new
messages, setting flags for messages,
searching for messages ON the server.
Proxy
Secured HTTP (financial transactions)
Useful addresses
⚫ http://www.whatis.com – dictionar de termeni
legati de calculatoare
⚫ http://www.webopedia.com
⚫ http://www.howstuffworks.com
⚫ http://www.wikipedia.com
⚫ www.warriorsofthe.net
Personalization
Overview

• Personalization is the process of matching the data that you


collection about a user to the metadata with which you tag
your content
• The matching rules that you devise retrieve the most
relevant content for that user
Types of Personalization

• Profile-based personalization
• Behavior-based personalization
• Campaign-based personalization
– Campaigns have defined start and finish dates and a defined
audience
Personalization and the
Audience
• Without some notion of who a person is, personalizing for
that person is not possible
• By using an audience analysis, you can come up with the
set of traits and trait values that each audience type must
exhibit
• Each cluster of these traits is a user profile that you can
collect, store, and access to decide what kind of person you
are dealing with
Personalization and the
Audience (Cont.)
• How to collect audience traits
– You can ask
– You can infer/deduce
– You can buy
• If you decide to ask for information from your audience
members, keep the following points in mind
– Trust
– Value
– Time and effort
– Context – make sure that you explain why you need the answer to
the question you ask, or position the questions in a context where an
explanation isn’t necessary
Personalization and Components

• All the profile information cannot create personalization


without corresponding work on your component set
• You can target content components directly or indirectly
– Direct targeting – a specific elements targets the content
• AgeRange element for Song component
• Simple, but inflexible and time-consuming
– You may need many ADDITIONAL tags
– Indirect targeting – use nonspecific elements to target the content
• Genre element for Song component ➔ Disney is for 0-10 ages
– Match an age to the genres that the age is likely to perfer
• No additional tagging, but not always accurate
• You may need a few elements to infer
Personalization and Rules

• Given a set of user profiles and the elements that target


content, you need a set of rules to tie the two together
• A rule is a set of steps or IF…THEN statements
– Take the user’s age and search for components of the content type
Song that have an AgeRange element that contains the user’s age.
Display the components that you find.
– If the user is between 0 and 10 years old, search for components of
the content type Song with a value of Disney in their Genre element.
Display the components that you find.
– If the user is between 6 and 10 years old, search for components of
the content type Song with a value of Disney in their Genre element.
Add to those components that have either Back Street Boys or
Britney Spears in the Artist element. Display the component that you
find.
Personalization and Rules (Cont.)

• Depending on the kind of personalization that you’re trying


to accomplish, the rules may include other inputs
– Browser headers
– Past behavior
– Collaborative filter
– Push functions
• In addition to any other Song component that return, always
return the component with the Id element 1234
– Extra data sources
• Give a list of genres by searching the Genre table and matching
genre based on the user’s age…
Personalization and Rules (Cont.)

• Expect your rules not to work very well


– Craft your personalization rules to prioritize and not exclude content
– Be prepared to refine your rules
Customization VS.
Personalization
• Customization is something that the user does to change
the sort of content and user interface that one sees
– User interface, Layout, Style, Content, Preferences
• Customization is the core of “My” phenomenon
• Customization is something that the user does for herself,
but personalization is something that you do for the user
• Personalization comes before customization
– Personalization can set the overall context within which a person
may customize
– But customization cuts across audiences and trumps personalization
Customization VS.
Personalization (Cont.)
• Customization and personalization
– Each choice that a user makes to configure your publication is
another piece of data for that person’s user profile
– The way that users differentiate themselves in customizations
(especially content customization) can help you see audience
distinctions
– You may migrate customization into personalization ➔ if some
audience consistently choose the simple layout, why not make that
layout the default for that audience
Dynamic and Static Personalization

• A complete static site may provide just as much


personalization as a dynamic site
• Think about how to provide “Member only” in Web
• Whether you create all the pages beforehand or create each
as someone requests, it is a choice that you can make to
create the best overall system
Analyzing Personalization

• Think through what personalization you might need. Decide


– How much personalization you need (and cost vs. benefit)
– How you will get the information to provide personalization
– How you should segment your audience
– What components and elements you want to deliver to each
audience segment
Analyzing Personalization (Cont.)

• Create a personalization plan that includes


– Audience segment profiles and what you’re delivering to each
segment
– A list of Web site behaviors that trigger personalization
– Which push campaign you will implement
– How your content must be tagged to support personalization
– How the segments, content, behaviors, campaigns, and tagging work
together in a set of rules that describe which content displays under
which conditions
Analyzing Personalization (Cont.)

• Integrate the personalization with other parts of the logical


design, considering audiences as segmented in the design;
– Whether your design is capable of collecting all the information in the
user profile you’ve identified
– Whether the segmentation and behavioral assumptions you’ve made
can be tested and validated
– Whether the additional work and complexity involved in creating
components and templates is worth the personalization that you’ll
achieve
Search engines

1
Definitions

Search –
To examining a file in a computer, disk,
database or network for finding certain
information
Engine
Something that gives force or energy for moving a system

Search engine
An application that looks for certain key words and returns a list of
documents where these words were found. Esp. commercial
services that scan documents from the Internet.
2
Using a search engine…

3
How S.E. work

Crawler
URL1
URL2

Indexer Web

URL3 URL4

Search Eggs - 90%


Engine Your
Eggo - 81%
Database Eggs? Browser
Ego- 40%
Eggs Huh? - 10%

4
How S.E. work (2)

• crawlers, spiders:find & download the


content
– Searching for new sites and the ones that were
modified
– periodically
• S.E do not work in real time
– Some SE use their own database, others don’t
• The content can be bought from companies like
Inktomi
– The Crawlers do not cover the entire Web but
just a fraction
– “Invisible web”
5
How S.E. work (3)
• Content organization: labeling and sorting
– indexing for search – automatically
• Title, key words, description
• Sorting according to the URL popularity (PageRank from
Google)
– classification in directories
• Human experts
• As a result of Content organization we have 2
types of S.E.:
• S.E. – the input o is a query whose answer is to be found
and displayed
• directory – classified content.
Now directories have S.E. features and S.E. have directory
features.
6
How S.E. work (4)
• Database, cache: storing content
– Homogenous files usually distributed among multiples machines

• Query processor: finds, gets, displays results


– Get the query as input
– Displays the sorted output (content as links)

• At the other end of the tunnel there is your


browser

7
How S.E. work (5)

• The search processes and methods are based


on algorithms and they differ from S.E. to S.E
– Most algorithms are proprietary but they are
based on known information classification and
retrieving principles
– Google might be an exception – they published
their search methodology
• They cite the initial support of the NSF Digital Library
Initiative
• They Describe the PageRank
– “We chose our system name, Google, because it
is a common spelling of googol, or ten on
hundredth power”( 10100)
8
WWW coverage

• No S.E. covers more than a fraction of the


WWW
– It is impossible to compare the coverage

• further:
– There are many S.E having only national covering
• orientation
– There are many specialized or domain S.E
• Coverage oriented to a subject of interest

9
WWW coverage (2)
• According to a 2001 study, there were massively more than 550 billion
documents on the Web, mostly in the invisible Web, or deep Web.[27]
• A 2002 survey of 2,024 million Web pages[28] determined that by far the
most Web content was in English: 56.4%; next were pages in German
(7.7%), French (5.6%), and Japanese (4.9%).
• A more recent study, which used Web searches in 75 different languages
to sample the Web, determined that there were over 11.5 billion Web
pages in the publicly indexable Web as of the end of January 2005.[29]
• As of March 2009[update], the indexable web contains at least 26 billion
pages.[38]
• On July 25, 2008, Google software engineers Jesse Alpert and Nissan Hajaj
announced that Google Search had discovered one trillion unique URLs.[31]
• Over 100.1 million websites operated as of March 2008.[14] Of these 74%
were commercial or other sites operating in the .com generic top-level
domain.[14]

10
The size of the World Wide Web

GYWA = Sorted on Google, Yahoo!, Windows Live Search (Msn Search) and Ask
YGWA = Sorted on Yahoo!, Google, Windows Live Search (Msn Search) and Ask

“From the sum of these estimations, an estimated overlap between these search engines is
subtracted. The overlap is an overestimation; hence, the total estimated size of the indexed World
Wide Web is an underestimation.” 11
Estimated size of Google's index

12
Differences between S.E.
• Size
• Search options
• Speed
• Update frequency
• Relevance of results
• Ease of use

• http://www.searchengineshowdown.com/features/
• http://www.searchengineshowdown.com/features/byfeat
ure.shtml
13
14
Business Models
• Public good – independent budget
– PubMed (http://en.wikipedia.org/wiki/PubMed) –biomedical
research
– Librarians’ Index to Internet - http://lii.org/

• Revenue from provisioning of information


– All commercial S.E.
• Using S.E. for promoting one’s own activities
– Phone directories

15
Sponsored links
• “Sponsored links are links to websites that pay for
placement next to Google Maps search results. These
advertisements are always clearly labeled as 'Sponsored
Links', and are targeted to the topic and location of a
search. For example, search results for 'hotels near LAX'
will show sponsored links from certain hotels in this area.
Sponsored links are simply another way to find websites
that contain the information that you're searching for.”
• “Sponsored links that appear in AOL Search results or on
AOL channels are listings that have been purchased by
companies to have their businesses or Web sites appear
for specific search terms related to their services”
• Payment for crawling/indexing a site more often
16
Limits
• Each S.E. has limitation regarding
– Coverage
– Search features
– Finding quality info
• Some SE combined search with economics
becoming more than advertisers
• SE may become victims of spamindexing
– It affects the included content and its
classification
17
spamindexing
• Content spam
– Keyword stuffing
– Hidden or invisible unrelated text
– Meta tag stuffing
– "Gateway" or doorway pages
• Link spam
– Link farms
– Hidden links
– "Sybil attack"
– Spam blogs
– Page hijacking
• Using world-writable pages
– Spam in blogs
– Comment spam
• http://en.wikipedia.org/wiki/Spamdexing
18
Meta search engines
• meta engines search multiple engines
– getting combined results from a variety of
engines
• do not have their own databases
– but have their own business models affecting
results
• a number of techniques used
– interesting ones: clustering, statistical analyses

19
Examples of meta engines
- with organized results
Dogpile
results from a number of leading search engines; gives source,
so overlap can be compared; has SearchSpy -listing searches
that were performed
Surfwax
gives text sources & linking to sources; for some terms gives
related terms to focus
Turbo10
provides results in clusters; engines searched can be edited
Clusty
results grouped by topics or clusters for further sources

20
21
Examples of meta engines
- with organized results (2)
• large directory
– Complete Planet
directory of over 70,000 databases & specialty engines; classified

• results with graphical displays


Kartoo
results in display by topics of query
• new kid on the block (not a meta engine, but a search engine)
Cuil
Claim: “Cuil searches more pages on the Web than anyone else—three
times as many as Google and ten times as many as Microsoft”. Well … I
do not know if it holds.
Cuil claim: “Search 124,426,951,803 web pages” (20.03.2009)
22
23
24
© Tefko Saracevic 25
© Tefko Saracevic 26
How to find a search engine?

• resources that list or categorize engines


Search Engine Guide
engines categorized by topic; other engine information

Search Engine Colossus


– international directory of search engines by country, topic from 351
countries and territories; engines in many languages

Phil Bradley’s country based search engines


“currently a total of 4,017 search engines and 222 countries, territories,
islands and regions”

27
Information about S.E.
Search Engine Watch
• ratings, news, statistics, charts, explanations, tutorials
Search Engine Showdown
• “The users’ guide to web searching” - run by a librarian, news links, ratings
Virtual Chase
a site about “Teaching Legal Professionals How To Do Research;,” this section has
very good tips and links for consideration of quality on the web
SiteLines
a blog, written by Rita Vine, a professional librarian, & web search
trainer; many evaluations in archive
ResourceShelf
“Resources and News for Information Professionals,” edited by
Gary Price, a librarian & author of Invisible Web – has extensive
archive
WebsearchAbout
not evaluative, but provides news, capabilities, sources, articles
about web searching

28
A few tips for Web searching, including
the invisible kind

• Advanced web searc Univ. of California, Berkeley


• Four NETS for better searching Bernie Dodge
• Web search tutorial Searchenginez
• Finding information: search engines Phil Bradley
• Google Guide Nancy Blachman

29
A SELECTION OF A FEW (OF GREAT
MANY) SOURCES FOR
INVISIBLE WEB

30
Characteristics
• Many oriented toward scholarly, research &
professional, technical & related information
– include sources mostly not covered by general
search engines
• majority of these are trustworthy
• quality much higher, some carefully selected, some
edited
– origins vary widely
• from commercial to voluntary to government
sponsored
• Popular in many disciplines31
Large scholarly search engines &
directories - sample
• Infomine - a comprehensive virtual library and reference tool for
academic and scholarly Internet resources, including Web sites, databases
– covers a wide range of scholarly resources by fields

• Scirus – “it allows researchers to search for not only journal content but
also scientists' homepages, courseware, pre-print server material, patents and
institutional repository and website information. “
– by Elsevier, run in conjunction with Scopus and Science Direct, but this one free

• Google Scholar “Stand on the shoulders of giants” (but Newton and


John of Salisbury said it better)
– searches for scholarly articles & resources, but sources not disclosed (no idea on
what it covers )

32
Large edited sites
Open Directory Project
• large edited catalog of the web – global, run by
volunteers
BUBL LINK
• selected Internet resources covering all academic
subject areas; organized by Dewey Decimal System –
from UK

33
Science, scholarship engines, not free – a
sample

• In addition to freely accessible engines many


provide search free but access to full text paid
– by subscription or per item
– RUL provides access to these & many more
• General
ScienceDirect
Elsevier: “world's largest electronic collection of science, technology and medicine full
text and bibliographic information” [available at RUL]

• In a specific domain
ACM Portal
Asoc. for Computing Machinery: access to ACM Digital Library & Guide to Computing
[available at RUL]
34
Domain engines
 Cover specific subjects & topics
 from sciences, arts, humanities, to various media &
interests – you name it
• Important tool for subject searches
– particularly for subject specialist
– valued by professional searchers

• Selection mostly hand-picked rather than by


crawlers, following inclusion criteria
– often not readily discernable
– but content more trustworthy

• Usually well organized


35
in health & related fields …
• PubMed – Nat Library of Medicine
• biomedical literature from MEDLINE & health journals
• Psychcrawler - Amer. Psychological Association
• web index for psychology
 WebMDHealth
– news, medical information
 Rxlist
– The Internet Drug Index
 Mayo Clinic HealthOasis
– health advice
Kidshealth
sites for parents, kids, teen

36
in science …
Ocean Planet NASA
presentation of earth & its vast oceans
ArXiv Cornell U, National Science Foundation
e-print service in the fields of physics, mathematics, computer
science, and quantitative biology
large, non-reviewed contribution by authors, comments later
Athena Earth Sciences Resources
not a search engine but a large well organized directory

37
in education …
Intute
“Intute is a free online service providing you with a
database of hand selected Web resources for education
and research.”
Think Quest – Oracle Education Foundation
education resources, programs; web sites created by
students
Resource Discovery Network – UK
“UK's free national gateway to Internet resources for the
learning, teaching and research community”

38
in images, movies, video …
Internet Movie Database
• treasure trove of movies
Picsearch
picture searching

Blinkx
claims to be word largest search engine for videos; it has indexed over 32
million hours worth of video footage, made searchable by automatically
transcribing the speech content.

Moving Images Collections


“MIC documents moving image collections around the world.” Part
particularly oriented toward science educators. Now at Library of
Congress, but developed at Rutgers.

39
in humanities …
Shakespeare & Internet Search Tools & Resources
 great fun to navigate
 KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus
Erlangen
• German; a variety of resources for classics

 Perseus Digital Library Tufts University


• covers antiquity to renaissance; one of the best subject sites on
the web; affected the whole field
 Sch of Slavonic & East European Studies, University College
London
 includes country resources, e.g. Croatia

Diotima
Materials for study of women and gender in the Ancient World
40
in music …
Musipedia
Not everything is text. This is “a searchable, editable, and expandable collection
of tunes, melodies, and musical themes.” Great fun!

All Music Guide


• resource about musicians, albums, and songs

41
governments …
U Mich Document Center
 official documents from all over the world
– US government official web portal
“Whatever you want or need from the U.S. government”
 US State Department
• about the U.S & other countries
FirstGov
the US government official web portal

42
Enterprise search
• used to describe the application of search technology to information within an organization
• major challenge faced by Enterprise search is the need to index documents from a variety of
sources such as: file systems, intranets, document management systems, e-mail, and
databases and present a consolidated list of relevance ranked documents from these various
sources
• many applications require the integration of structured data as part of the search criteria and
when presenting results back to the users
• Differences from web search
– Adapters to index content from a variety of repositories, such as databases and content
management systems
– Federated search
• transforming a query and broadcasting it to a group of disparate databases with the appropriate syntax,
• merging the results collected from the databases,
• presenting them in a succinct and unified format with minimal duplication,
• providing a means, performed either automatically or by the portal user, to sort the merged result set.
– Entity extraction that seeks to locate and classify elements in text into predefined
categories such as the names of persons, organizations, locations, expressions of times,
quantities, monetary values, percentages, etc.
– Faceted search, a technique for accessing a collection of information represented using
a faceted classification, allowing users to explore by filtering available information.
– Access control, usually in the form of an Access control list (ACL), is often required to
restrict access to documents based on individual user identities.
43
Magic Quadrant for Information
Access Technology (2008)
• http://mediaproducts.gartner.com/reprints/microsoft/vol6/articl
e4/article4.html
• Included in Information Access Technology:
– document management, Web content management and relational
database management systems to provide users with insight into their
contents
– expected to include results from enterprise applications, such as
customer relationship management (CRM) and legacy systems
• it increasingly looks outside enterprises as well, to premium
sources of information, Web sites and elements of the social
Web
• Portal, ECM, business application and other vendors
frequently include enterprise search as part of their products
44
• most mature information access
technology is search engine technology
(from 1994 +), applied to unstructured
data in document repositories
• added to this category: auto
categorization, creative visualization,
content analytics and taxonomy support
technologies
• “Total software revenue in the world's
enterprise search market in 2007 was
$860.3 million. We forecast it to grow to
$1.5 billion by 2012, for a compound
annual growth rate of 11.4%
(see "Dataquest Insight: Technology and
Vendor Consolidation Will Drive the
Enterprise Search Market Through 2012").
“, Gartner
45
Mind map
Web 2.0
“Definition”
• Technology trend that enables collaboration
and sharing between users
Web 1.0 vs Web 2.0
• DoubleClick --> Google
• AdSense Ofoto --> Flickr
• Akamai --> BitTorrent
• mp3.com --> Napster
• Britannica Online --> Wikipedia
• personal websites --> blogging
• evite --> upcoming.org and EVDB
• domain name speculation --> search engine optimization
• page views --> cost per click
• screen scraping --> web services
• publishing --> participation
• content management systems --> wikis
• directories (taxonomy) --> tagging ("folksonomy")
• stickiness --> syndication
Web 1.0 vs Web 2.0
• Document Collaboration

Document attached in E-mail. Documents in Google Docs.


Web 1.0 vs Web 2.0
• Web Browsing

Direct Domain Name Search Engine


Web 1.0 vs Web 2.0
• Overall Organization and Categorization

Category/Directory Listing Tagging


Web 1.0 vs Web 2.0
• Information and Referrence

Encyclopedia Online Wikipedia


Web 1.0 vs Web 2.0
• Communication

Mailing Lists Forums


Web 1.0 vs Web 2.0
• Personal Homepages

Geocities personal homepages. Myspace and Facebook


homepages.
Web 1.0 vs Web 2.0
• Personal Blogs

AngelFire Wordpress.com or Wordpress.org


Web 1.0 vs Web 2.0
• Peer to Peer File Sharing

Napster Bittorrent
Web 1.0 vs Web 2.0
• Music and Entertainment

Goto the music store. Just download it.


Web 1.0 vs Web 2.0
• Image and Multimedia Sharing

Ofoto (Kodak) or Shutterfly Flickr (Yahoo)


Web 1.0 vs Web 2.0
• ”One-Click” File Hosting

RapidShare MediaFire
MediaFire does not enforce waiting
times for downloads, require
CAPTCHAs, limit simultaneous
downloading, or set bandwidth
limits.
Web 1.0 vs Web 2.0
• Video Sharing

Video file attached to email and YouTube makes it easier.


sent.
Web 2.0 Technologies
• CSS for content separation from presentation
• Folksonomies (collective tagging, social classification, social
indexing, social tagging)
• REST (Representational State Transfer), XML
• Rich interfaces based on AJAX
• Semantically correct XHTML and HTML
• Syndication, aggregation and notification via RSS/Atom
• Mash-ups – for content aggregation (ex: google maps + real
estate)
• Blogs for content publishing
• Wikis and forums for content generated in common by
users
Folksonomy
• is a system of classification derived
from the practice and method of
collaboratively creating and managing
tags to annotate and categorize content
(collaborative tagging, social
classification, social indexing, and
social tagging)
• popular on the Web around 2004 as
part of social software applications
such as social bookmarking and
photograph annotation.
• one of the defining characteristics of
Web 2.0 services, allowing users to
collectively classify and find
information
• tag clouds as a way to visualize tags in a
folksonomy
Tag clouds
• Visual depiction of user-
generated tags, or simply the
word content of a site,
typically used to describe the
content of web sites.
• Tags are usually single words
and are normally listed
alphabetically, and the
importance of a tag is shown
with font size or color.
Rich Internet application
• web applications that have many of the characteristics of desktop
applications, typically delivered either by way of a site-specific browser,
via a browser plug-in, or independently via sandboxes or virtual
machines.
• Adobe Flash, Java and Microsoft Silverlight are currently the three top
frameworks, with penetration rates around 95%, 80% and 45%
respectively.
• Apple has consistently not allowed the use of the top three RIAs in its
iPhone, iPod or iPad devices and has instead delivered the applications
through its own software and is now promoting HTML5.
• Users generally need to install a software framework using the
computer's operating system before launching the application, which
typically downloads, updates, verifies and executes the RIA
• AJAX -->>Gartner treats them as similar but separate technologies (for
delivering RIA)
Rich Internet application (2)
• Searchability: RIAs present indexing challenges to search engines, however
Adobe Flash content is now at least partially indexable
• Advanced communications with supporting servers can improve the user
experience, for example by using optimised network protocols,
asynchronous I/O and pre-fetching data (as in Google Maps, for example)
• Consistency of user-interface and -experience becomes controllable across
operating systems
• Installation and maintenance of plug-ins, sandboxes or virtual machines,
though required, make applications smaller
• Performance can improve - depending on the application and network
characteristics.
Syndication – RSS/Atom
• RSS - family of web feed formats used to publish frequently updated
works—such as blog entries, news headlines, audio, and video—in a
standardized format.
• They benefit readers who want to subscribe to timely updates from
favored websites or to aggregate feeds from many sites into one place.
• RSS feeds can be read using software called an "RSS reader", "feed
reader", or "aggregator", which can be web-based, desktop-based, or
mobile-device-based.

• Atom applies to a pair of related standards. The Atom Syndication


Format is an XML language used for web feeds, while the Atom
Publishing Protocol (AtomPub or APP) is a simple HTTP-based protocol
for creating and updating web resources.
Mashups
• is a web page or application that uses or combines data or functionality
from two or many more external sources to create a new service.
• easy, fast integration, frequently using open APIs and data sources to
produce enriching results that were not necessarily the original reason for
producing the raw source data.
• Types of mashups
– Data mashups combine similar types of media and information from multiple
sources into a single representation.
– Consumer mashups, opposite to the data mashup, combines different data
types. Generally visual elements and data from multiple sources
– Business mashups generally define applications that combine their own
resources, application and data, with other external web services.They focus
data into a single presentation and allow for collaborative action among
businesses and developers.
BLOG
• type of website, usually maintained by an individual with regular entries of
commentary, descriptions of events, or other material such as graphics or
video. Entries are commonly displayed in reverse-chronological order.
• Many blogs provide commentary or news on a particular subject; others
function as more personal online diaries. A typical blog combines text,
images, and links to other blogs, Web pages, and other media related to
its topic. The ability of readers to leave comments in an interactive format
is an important part of many blogs.
• Types
– Personal
– Corporate and organizational blogs
– By genre (political, travel, fashion …)
– By media type (video log, link log, photo log)
Wiki
• website that allows the easy creation and editing of any
number of interlinked web pages via a web browser using a
simplified markup language or a WYSIWYG text editor.
• Wikis are typically powered by wiki software and are often
used to create collaborative websites, to power community
websites, for personal note taking, in corporate intranets,
and in knowledge management systems.
• companies use wikis as their only collaborative software
and as a replacement for static intranets, and some schools
and universities use wikis to enhance group learning.
– A wiki invites all users to edit any page or to create new pages within the wiki
Web site, using only a plain-vanilla Web browser without any extra add-ons.
– Wiki promotes meaningful topic associations between different pages by
making page link creation almost intuitively easy and showing whether an
intended target page exists or not.
– A wiki is not a carefully crafted site for casual visitors. Instead, it seeks to involve
the visitor in an ongoing process of creation and collaboration that constantly
changes the Web site landscape.
Wiki (2)
• Controlling
changes – revision
history
• Searching – full
text search
• Security –
vandalism
• Allow to glue information via quick-and-easy-to-create
pages containing links to other corporate information
systems, like people directories, CMS, applications, and
thus build up knowledge bases.
Wiki (3) -
• Avoiding e-mail overload. Wikis allow all relevant
information to be shared by people working on a given
Enterprise
project. It is also very useful for the project manager to
have all the communication stored in one place, which
allows them to link the responsibility for every action
taken to a particular team member.
• Organizing information. Wikis allow users to structure
new and existing information. As with content, the
structure of data is sometimes also editable by users; see
structured wiki. On the other hand wiki is not strictly
hierarchical which might be a disadvantage in corporate
context.
• Building consensus. Wikis provide a framework for
collaborative writing. Particularly, they allow the
structured expression of views disagreed upon by authors
on a same page.
• Access rights, roles. Users can be forbidden from viewing
and/or editing given pages, depending on their
department or role within the organization.
User-created content
• also known as consumer-generated media (CGM) or user-created content (UCC),
refers to various kinds of media content, publicly available, that are produced by
end-users.
• Its use for a wide range of applications including problem processing, news, gossip
and research reflects the expansion of media production through new
technologies that are accessible and affordable to the general public. All digital
media technologies are included, such as question-answer databases, digital video,
blogging, podcasting, mobile phone photography and wikis.
• Types:
– Discussion boards, Blogs
– Wikis, Social networking sites
– News Sites, Trip planners
– Memories, Mobile Photos & Videos
– Customer review sites, Experience or photo sharing sites
– Audio, Video games
– Maps and location systems
Social networking
• focuses on building and reflecting of social
networks or social relations among people,
e.g., who share interests and/or activities.
A social network service essentially consists
of a representation of each user (often a
profile), his/her social links, and a variety of
additional services.
• Social networking sites allow users to share
ideas, activities, events, and interests
within their individual networks.
• Types:
– contain category places (such as former
school-year or classmates),
– contain means to connect with friends
(usually with self-description pages)
– contain a recommendation system
linked to trust.
Social Networking - Business
model
• Few social networks currently charge money for membership.
• MySpace and Facebook sell online advertising on their site. Hence, they
are seeking large memberships, and charging for membership would be
counterproductive. Some believe that the deeper information that the
sites have on each user will allow much better targeted advertising than
any other site can currently provide.
• Social networks operate under an autonomous business model, in which a
social network's members serve dual roles as both the suppliers and the
consumers of content. This is in contrast to a traditional business model,
where the suppliers and consumers are distinct agents.
• Revenue is typically gained in the autonomous business model via
advertisements, but subscription-based revenue is possible when
membership and content levels are sufficiently high.
SN for Business
• Organizations
– act as a customer relationship management tool for companies selling
products and services
– Companies can also use social networks for advertising in the form of banners
and text ads
• Major uses for businesses and social media:
– to create brand awareness,
– as an online reputation management tool,
– for recruiting,
– to learn about new technologies and competitors,
– as a lead gen tool to intercept potential prospects.
• These companies are able to drive traffic to their own online sites
while encouraging their consumers and clients to have discussions
on how to improve or change products or services.
Social Networking Impact
• Identity
• Privacy - users giving out too much personal information
• E-learning - National School Boards Association (USA) reports that almost
60 percent of students who use social networking talk about education
topics online and, surprisingly, more than 50 percent talk specifically
about schoolwork.
• Young People
• Use of SN in official investigations -
http://en.wikipedia.org/wiki/Use_of_social_network_websites_in_investi
gations
Web 2.0 Business Model
• A marketing tactic combined with a subscription business model

• The free marketing tactic creates distribution


• The premium subscription service makes the money
• To reach profitability freemium requires a large user base that's
inexpensive to support and converts well to a killer set of premium
features
• Usually just 3% to 5% of all users upgrade to paid services but the scale is
such that it can be a profitable business
Freemium Web 2.0 Applications
• Nettuts (developers)
• Pandora (radio, US only now)
• Dropbox (online backup)
• Freshbooks (Online invoicing, time tracking
and expense service)
• Most iphone apps
• Wordpress.com (blogs)
• Flickr (images sharing)
• Vimeo (video sharing)
Free or Freemium Alternatives to
Popular Paid/Commercial Software
Programs
• Microsoft Office Alternatives
– Google Docs (online office suite, word processing,
spreadsheet functionality and presentations, as
well as stat-tracking forms)
– Open Office
– Neo Office (Mac - open source, and has a few of
the same integration and learning curve issues as
Open Office)
– Zoho (accounting, CRM, wiki, meeting and other
solutions above and beyond an office suite
Adobe Creative Suite Alternatives
• Seashore - open source alternative specific to
Adobe Photoshop. It doesn’t touch on any of
the vector graphics or web content offerings
that the full Creative Suite handles
• GIMP - offers nearly as many options for
editing that Photoshop does
• InkScape – alternative to Adobe Illustrator
(vector)
Aviary – online picture editor and
more…
• KompoZer is a complete web authoring
system that combines web file management
and easy-to-use WYSIWYG web page editing
• Nvu (Free Web Authoring System)

•Google Sites
Adobe Acrobat Alternatives
• PDFreDirect - view, edit, create and protect
PDFs. The free version offers almost all of the
same functionalities as Adobe Acrobat, and
the paid version is a nice complete substitute
• PDFescape - read, fill out and print PDFs
online directly in the browser + PDFtypewriter
for creating and editing the PDF files using
Java
• PDFedit - edit, read and script PDF files online
Also see
• Open Source Alternative to Commercial
Software - http://www.osalt.com/
• http://alternativeto.net/
Q/A
• Questions, please?
Nupur Choudhury / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 8096-8100

World Wide Web and Its Journey from Web 1.0 to


Web 4.0
Nupur Choudhury
Department of Computer Science and Engineering.
Sikkim Manipal Institute of Technology.

Abstract- The fast lane toward the development of Web is eventually become the World Wide Web [1]. The 1989
coined to be as an outright phenomenon in the today’s society proposal was meant for a more effective CERN
with incorporated use of modern innovative technology and communication system but Berners- Lee eventually realised
redefining the way of organizing, communicating and the concept could be implemented throughout the world.
collaborating with individual which in terms lead us to
mixture of spectacular successes and failures. The purpose of
Berners- Lee and Belgian computer scientist Robert
this paper is to understand and conceptualize the evolution of Cailliau proposed in 1990 to use hypertext “to link and
Web from the scratch to the upcoming trends in the field of access information of various kinds as a web of nodes in
Web Technology. which the user can browse at will" [22]. In these ways the
first web service was designed and tested and latterly
Keywords: Web 1.0, Web 2.0, Web 3.0, Web 4.0, confined as Word Wide Web.
characteristics, Limitation, Architecture.
III. WEB 1.0
I. INTRODUCTION Web 1.0 was first implementation of the web and
In today’s era Web Technology can be easily defined it lasted from 1989 to 2005. It was define as web of
by the user in different descriptive way. But matter in fact information connections. According to the innovator of
many user are quite unknown to the information that from World Wide Web, Tim Berners-Lee considers the Web as
where the WWW was coined first. As this paper state the “read-only” Web [1]. It provides very little interaction
evolution of Web so it is important to initiate the story from where consumer can exchange the information together but
the beginning where it was stated first. it was not possible to interact with the website. The role of
Web was introduced by Tim Burners-Lee in late the web was very passive in nature.
1989[9][10]. He view of the capabilities of the World Wide Web 1.0 was referred as the first generation of World
Web was expressed by three innovations, typically Wide Web which was basically defined as
associated with three phases: namely, the Web of ” It is an information space in which the items of
documents (Web 1.0), the Web of people (Web 2.0) and the interest referred to as resources are identified by global
Web of data (the still-to-be-realised Web 3.0) [11].Through identifier called as Uniform Resources Identifiers (URIs) “.
its life cycle, the World Wide Web has been through First generation Web was era static pages and content
various phases of development. Going by the trend of delivery purpose only. In other words, the
constant evolution, the Web is now slowly but surely early web allowed us to search for information and read it.
transiting to more data centric phase in the context of Web There was very little in the way of user interaction or
version 3.0[7]. content contribution.
This paper is structured in such a way that, classifying A. CHARATERISTICS
obtaining nature of Web 1.0 and projecting prospective Web 1.0 Technologies includes core web
characteristics of Web 2.0 with added different dimensions protocols: HTML, HTTP and URI. The major
of the Web 3.0 semantic frameworks, whilst its scope is characteristics of Web 1.0 are as follow:
directed to explore a stronger appreciation into architectural  They have read only content.
foundations of the next generation of Web 4.0 of Web  Establish an online presence and make their
applications. This paper would attempt to build a user information available to anyone at any time.
centric view of the composition of features that would be  It includes static web pages and use basic
expected to be incorporated in future generations of Web Hypertext Mark-up Language.
technology. In sum, the paper presents a holistic view of B. LIMITATION
the World Wide Web. The major limitations of Web 1.0 are as follow:
II. WORLD WIDE WEB  The Web 1.0 pages can only be understood by
The World Wide Web is a system of interlinked humans (web readers) they do not have
hypertext documents accessed via the Internet [21]. With a machine compatible content.
web browser, one can view web pages that may contain  The web master is solely responsible for
text, images, videos, and other multimedia and navigate updating users and managing the content of
between them via hyperlinks. On March 12, 1989, Tim website.
Berners- Lee, a British computer scientist and former
CERN employee, wrote a proposal for what would

www.ijcsit.com 8096
Nupur Choudhury / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 8096-8100

 Lack of Dynamic representation i.e., to acquire Wide Web to a new phase of use and service
only static information, no web console were development [17]. The categorization can be used to
available to performing dynamic events. elaborate on the understanding of Web 2.0 achieved
through varied definitions.
IV. WEB 2.0
Web 2.0 is the second generation of web. It was  Technology Centric Definition:
defined by Dale Dougherty in 2004 as a read-write web [1]. Web has become a platform with
The concept began with a conference brainstorming session software above the level of a single
between O’Reilly and Media live International. The device. Technology that is associated
technologies of web 2.0 allow assembling and managing with blogs, wikis, podcasts, RSS feeds
large global crowds with common interests in social etc.
interactions.  Business Centric Definitions:
Tim O’Reilly defines web 2.0 on his website as follows A way of architecting software and
[8]: businesses. The business revolution in the
“Web 2.0 is the business revolution in the computer industry caused by the move to
computer industry caused by the move to the internet as platform and an attempt to
internet as platform, and an attempt to understand the rules for success on that of
understand the rules for success on that new new platform.
platform. Chief among those rules is this: Build  User Centric Definitions:
applications that harness network effects to get The Social Web is often used to
better the more people use them.” characterize sites that consist of
Web 2.0 facilitates major properties like participatory, communities. It is all about content
collaborative, and distributed practices which enable formal management and new ways of
and in-formal spheres of daily activities on going on web. communication and interaction between
In other terms it resemble major distinct characteristics of users. Web applications that facilitate
Web 2.0 include “relationship” technologies, participatory collective knowledge production, social
media and a social digital technology which in term can networking and increases user to user
also defined as the wisdom web. People-centric web and information exchange.
participative web is taken into concern and which facilities B. LIMITATION
reading and writing on the web which makes the web Sometimes it may happen that if the new technology
transaction bi-directional. meets expectations of the mass user at large, there may
Web 2.0 is a web as a platform where users can leave be a chance that these technologies may face lot of
many of the controls they have used in web 2.0. In other consequences from external environment which may
words, the user of web 2.0 has more interaction with less supress or limit the flow of technology in presenting
control. Web 2.0 is not only a new version of web 1.0 results which might not be feasible and may lead to
but it also implies to flexible web design, creative reuse, degrade the performance of the technology as a whole.
updates, collaborative content creation and modification  Constant iteration cycle of Change and
in web 2.0 that should be considered as one of the Updates to services [11].
outstanding feature of the web 2.0 is to support
 Ethical issues concerning build and usage of
collaboration and to help gather collective intelligence
Web 2.0 [11].
rather Web 1.0.
 Interconnectivity and knowledge sharing
between platforms across community
boundaries are still limited [12] [15].

V. WEB 3.0
Web 3.0 is one of modern and evolutionary topics
associated with the following initiatives of Web 2.0. Web
3.0 was first coined by John Markoff of the New York
Times and he suggested web 3.0 as third generation of the
web in 2006 [18]. Web 3.0 can be also stated as
“executable Web”.
The basic idea of web 3.0 is to define structure data and
link them in order to more effective discovery, automation,
integration, and reuse across various applications [6]. It is
able to improve data management, support accessibility
of mobile internet, simulate creativity and innovation,
Fig. 1 Comparison Web1.0 & Web 2.0 [28]
encourage factor of globalization phenomena, enhance
A. CHARATERISTICS customers’ satisfaction and help to organize collaboration
Web 2.0 is instead a label coined by Tim O’Reilly and in social web.
associates to reference the transition of the World

www.ijcsit.com 8097
Nupur Choudhury / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 8096-8100

Web 3.0 is also known as semantic web. Semantic web A. SEMANTIC WEB
was thought up by Tim Berners-Lee, inventor of the World The Semantic Web is a collaborative movement
Wide Web [1]. There is a dedicated team at the World led by international standards body the World Wide
Wide Web consortium Web Consortium. According to the W3C [4],
(W3C) working to improve, extend and standardize the “The Semantic Web provides a common
system, languages, publications and tools have already framework that allows data to be shared and reused
been developed [3]. Web 3.0 is a web where the concept of across application, enterprise, and community
website or webpage disappears, where data isn’t owned but boundaries “.
instead shared, where services show different views for the
same web or the same data. Those services can be The main purpose of the Semantic Web is driving
applications (like browsers, virtual worlds or anything the evolution of the current Web by enabling users to
else), devices or other, and have to be focused on context find, share and combine in formation more easily. The
and personalization, and both will be reached by using Semantic Web, as originally envisioned, is a system that
vertical search [13]. enables machines to “understand” and respond to
Web3.0 supports world wide database and web complex human requests based on their meaning. Such
oriented architecture which in earlier stage was described an “understanding” requires that the relevant
as a web of document. It deals mainly with static HTML information sources be semantically structured.
documents, but dynamically rendered pages and alternative Tim Berners- Lee originally expressed the Semantic
formats should follow the same conceptual layout standards Web as follows [2]:
whenever possible and links are between documents or part “If HTML and the Web made all the online
of them. The web of documents was designed for human documents look like one huge book, RDF, schema, and
Consumption in which primary objects are documents and inference languages will make all the data in the world
links are between documents (or parts of them). Semantics look like one huge database”.
of content and links are implicit and the degree of structure Tim Berners-Lee proposed a layered architecture
between objects is fairly low [19]. Figure 2 represents the for semantic web that often represented using a
structure of web of documents in simple [19]. diagram, with many variations since.

Fig. 2 Web of Document [20].


The proponents of the Web of Data envision much of
the world's data being interrelated and openly accessible to
the general public. This vision is analogous in many ways  Fig. 4 Semantic Web layered architecture [5]
to the Web of Documents of common knowledge, but The development of the Semantic Web proceeds in
instead of making documents and media openly accessible, steps, each step building a layer on top of another.
the focus is on making data openly accessible, the Web of Figure 4 shows the “layer cake” of the Semantic Web
Data hosts a variety of data sets that include which describes the main layers of the Semantic Web
encyclopaedic facts, drug and protein data, metadata on design and vision [5].
music, books and scholarly articles, social network  Unicode and URI: Unicode is used to represent
representations, geospatial information, and many other of any character uniquely whatever this character
types of information in some ways like a global database was written by any language and Uniform
that most its features are included Semantics of content and Resource Identifier (URI) is unique identifiers
links are explicit and the degree of structure between for resources of all. The functionality of Unicode
objects is high based on RDF model. In Fig. 3, the structure and URI could be described as the provision
of web of data is shown simplicity [14]. of a unique identification mechanism within
the language stack for the semantic web [20].
 XML: It is a language that lets one write
structured Web documents with a user-defined
vocabulary. XML is particularly suitable for
sending documents across the Web. XML has no
built-in mechanism to convey the meaning of the
user’s new tags to other users.
 RDF: Resource Description Framework is a
basic data model, like the entity-relationship
model, for writing simple statements about Web
Fig. 3 Web of Data [20]. objects (resources). A scheme for defining

www.ijcsit.com 8098
Nupur Choudhury / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 8096-8100

information on the Web. RDF provides the  Vagueness: This arises from the vagueness of
technology for expressing the meaning of terms user queries, of concepts represented by
and concepts in a form that computers can content providers, of matching query terms to
readily process. provider terms and of trying to combine
 RDF Schema: It provides a predefined, basic different knowledge bases with overlapping
type system for RDF models. RDF Schema but subtly different concepts.
provides modeling primitives for organizing  Inconsistency: These are logical
Web objects into hierarchies. Key primitives are contradictions which will inevitably arise
classes and properties, subclass and sub property during the development of large
relationships, and domain and range restrictions. ontologies, and when ontologies from
 Ontology: The ontology layer described separate sources are combined.
properties and the relation between properties  Deceit: This is when the producer of the
and different. Ontology can be defined as a information is intentionally misleading the
collection of terms used to describe a specific consumer of the information.
domain with the ability of inference.
 Logic layer: It is used to enhance the ontology
VI. COMPARISION
language further and to allow the writing of
The main difference between Web 1.0, Web 2.0 and
application-specific declarative knowledge.
Web 3.0 is that web 1.0 is consider as read-only web
 Proof layer: It involves the actual deductive targets on content creativity of producer web 2.0 targets
process as well as the representation of proofs on content creativity of users and producers while web 3.0
in Web languages (from lower levels) and targets on linked data sets. The very few comparative
proof validation. differences between Web 1.0, Web 2.0 and Web 3.0 are
 Trust layer: It will emerge through the use of given below:
digital signatures and other kinds of knowledge WEB 1.0 WEB 2.0 WEB 3.0
based on recommendations by trusted agents or
1996 – 2004 2004 -2016 2016+
on rating and certification agencies and
The Hypertext
consumer bodies. The Social Web The Semantic Web
Web
Semantic web is not limited to publish data on the Tim Berners Tim O’Reilly, Dale
web. It is about making links to connect related data. Tim Berners Lee
Lee Dougherty
Berners-Lee introduced a set of rules have become Read and Write
known as the Linked Data principles to publish and Read Only Executable Web
Web
connect data on the web in 2007 [16]: Millions of
 Use URIs as names for things Billions of User Trillions+ of Users
User
 Use HTTP URIs to look up those names Participation and
 Provide useful information, using the Echo System Understanding self
Interaction
standards (RDF) by look up a URI One Multi-user Virtual
 Include links to other URIs to discover more Bi-Directional
Directional environment
things People build
Data providers can add their data to a single Companies application though
global data space by publishing data on the web People Publish
Publish which people
according to the Linked Data principles. Content
Content interact and
B. CHARACTERISTICS publish content.
The major characteristics of Web 3.0 as marked by Web 3.0 is
Nova Spivack are [18]: curiously
 SaaS Business Model. Static content. Dynamic content. undefined.
 Open Source Software Platform. AI and 3D,The
 Distributed Database –or what called as “The web learning
World Wide Database”. Personal Blog and Social SemiBlog,
 Web Personalization. Websites Profile Haystack.
 Resource Pooling Message
Community portals Semantic Forums
 Intelligent Web. Board
Buddy List, Online Social Semantic Social
C. CHALLENGES Address Book networks. Information
Semantic Web faces several challenging issue Table 1. Comparison of Web 1.0, Web 2.0 and Web 3.0
like:
 Vastness: The World Wide Web contains VII. WEB 4.0 AND FUTURE WEB
many billions of pages. Redundancy in Data Web 4.0 can be considered as an Ultra-Intelligent
may occur which has not yet been able to Electronic Agent, symbiotic web and Ubiquitous web
eliminate all semantically duplicated terms. [25]. Interaction between humans and machines in

www.ijcsit.com 8099
Nupur Choudhury / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 8096-8100

symbiosis was motive behind of the symbiotic web. [10] Maged, N. Kamel Boulos & Steve, Wheeler, “The emerging Web
2.0 social software: an enabling suite of sociable technologies in
Powerful as human brain, progress in the development of health and health care education”, Health Information and Libraries
telecommunications, advancement on nanotechnology in Journal, pp: 2 -23, 2007
the world and controlled interfaces using web 4.0. In [11] Anderson, P. ”`All That Glisters Is Not Gold' -- Web 2.0 And
simple words, machines would be clever on reading the The Librarian”, Journal of Librarianship and Information Science,
39 (4), pp. 195–198, 2007.
contents of the web, and react in the form of executing and [12] Abel, F., Frank, M., Henze, N., Krause, D., Plappert, D., & Siehndel,
deciding what to execute first to load the websites fast with P., “Group Me! - Where Semantic Web meets Web 2.0”, 2007.
superior quality and performance and build more [13] Mind Booster, Noori, “WhatisWeb3.0?”
commanding interfaces [24]. http://mindboosternoori.blogspot.com/2007/08/what-is-web-30.html,
2007.
Web 4.0 will be read write concurrency web [23]. It [14] Tim, Berners-Lee & Christian, Bizer & Tom, Heath & Kingsley,
ensures global transparency, governance, distribution, Idehen, “Linked Data on the Web”, 17th International World Wide
participation, collaboration into key communities such as Web Conference, 2008.
industry, political, social and other communities. WebOS [15] Chan, C. K., Lee, Y. C., & Lin, V., “Harnessing Web 2.0 for
Collaborative Learning”, Springerlink, 2009.
will be such as a middleware in which will start functioning [16] Christian, Bizer & Tom, Heath & Tim, Berners-Lee, “Linked
like an operating system [26]. WebOS will be parallel to Data - The Story So Far”, Journal Semantic Web and Information
the human brain and implies a massive web of highly Systems, 2009.
intelligent interactions [27]. [17] Harrisom, T. M., & Barthel, B., “ Wielding new media in Web
2.0: exploring the history of engagement with the collaborative
construction of media products ”. New media & Society, 11(1&2),
VIII. CONCLUSION pp. 155–178, 2009.
[18] Nova Spivack, “Web 3.0: The Third Generation Web is Coming”
This paper provided an overview from the evolution of http://lifeboat.com/ex/web.3.0, 2011.
the web. Web 1.0, web 2.0, web 3.0 and web 4.0 were [19] Sareh Aghaei, Mohammad Ali Nematbakhsh and Hadi Khosravi
described as four generations of the web. The Farsani, “Evolution of the World Wide Web: From Web 1.0 to Web
characteristics of the generations are introduced and 4.0”,Computer Engineering Department, University of Isfahan,
Isfahan, Iran, International Journal of Web & Semantic Technology
compared. It is concluded web as an information space has (IJWesT) Vol.3, No.1,pp. 1-10, 2012
had much progress since 1989 and it is moving toward [20] Patel et al., International Journal of Advanced Research in Computer
using artificial intelligent techniques to be as a massive Science and Software Engineering 3(10), pp. 410-417, 2013.
web of highly intelligent interactions in close future. [21] W3C “World Wide Web Consortium”, http://www.w3.org.
[22] World Wide Web: Proposal for a HyperText Project
(http://www.w3.org/Proposal.html).
REFERENCES [23] “Web 4.0 - A New Web Technology”, http://website-
[1] Tim Berners-Lee, “The World Wide Web: A very short personal quality.blogspot.com/2010/01/web-40-new-webtechnology.html/,
history”, http://www.w3.org/People/Berners-Lee/ShortHistory.html , Hemnath (2010)
1998. [24] On2broker: Lessons Learned from Applying AI to the Web Dieter
[2] Berners-Lee, Tim; Fischetti, Mark,” Weaving the Web”, Harper San Fensel, Jürgen Angele, Stefan Decker, Michael Erdmann, Hans-
Francisco, chapter 12, ISBN 978-0-06-251587-2, 1999. Peter Schnurr, Rudi Studer and Andreas Witt Institute AIFB,
[3] Sean B, Palmer, “The Semantic Web: An Introduction”, University of Karlsruhe, D -76128 Karlsruhe, Germany
http://infomesh.net/2001/swintro/, 2001. dfe@aifb.uni-karlsruhe.de,http://www.aifb.uni-karlsruhe.de/~dfe.
[4] W3C Semantic Web Activity “http://www.w3.org/2001/sw/”, World [25] JONATHAN FOWLER AND ELIZABETH RODD(2013) Web
Wide Web Consortium, 2001. 4.0: The Ultra-Intelligent Electronic Agent is
[5] Jane, Greenberg & Stuart, Sutton & D. Grant, Campbell , http://bigthink.com/big-think-tv/web-40-the-ultra-intelligent-
“Metadata: A Fundamental Component of the Semantic Web”, electronic-agent-is-coming.
Bulletin of the American Society for Information Science and [26] Ron, Callari , “Web 4.0,Trip Down the Rabbit Hole or Brave
Technology Volume 29, Issue 4, pages 16–18,2003. New World?”, http://www.zmogo.com/web/web-40trip-down-the-
[6] Ossi, Nykänen, “Semantic Web: Definition”, rabbit-hole-or-brave-new-world/ .
http://www.w3c.tut.fi/talks/2003/0331umediaon/slide6-0.html, 2003. [27] Dan, Farber (2007), “From semantic Web (3.0) to the WebOS (4.0)”,
[7]   Motta, E., & Sabou, M, “Next Generation Semantic Web http://www.zdnet.com/blog/btl/from-semantic-web-30-to-the-webos-
Applications”, Heidelberg, Springer-Verlag Berlin, pp. 24-29, 2006. 40/4499/ .
[8] O’Reilly, Definition of Web 2.0. [28] Flat Word Business “Web 1.0 vs Web 2.0 vs Web 3.0 vs Web 4.0 –
http://radar.oreilly.com/archives/2006/12/web-20-compact- A bird’s eye on the evolution and definition”
definition-tryi.html, 2006. http://flatworldbusiness.wordpress.com/flat-
[9] Brian, Getting, “Basic Definitions: Web 1.0, Web 2.0, Web 3.0”, education/previously/web-1-0-vs-web-2-0-vs-web-3-0-a-bird-eye-
2007, http://www.practicalecommerce.com/articles/464-Basic- on-the-definition/
Definitions-Web-1-0-Web-2-0-Web-3-0
 
 

www.ijcsit.com 8100
Intranet, Extranet, Portal
Intranet
• An intranet is a network within an
organization that links together users by
means of Internet technologies.
• An intranet limits the Internet territory by
establishing access controlled zones where
users may communicate and interact freely.
• These networks are based on WWW --> users
may communicate in real time between
platforms.
Intranet (cont.)
• An intranet is useful for organizations that:
– Are geographically dispersed;
– Share common business objectives;
– Have common information needs;
– Value collaboration.
• An Intranet may have 3 functionality levels:
– It Displays general, static information;
– Sharing data – used for managing dynamic data within an
organization;
– Interactive communications – real-time collaboration and
creating a secured platform for interactive communication
within an organization.
An Intranet may be used for:
• Displaying the goal and scope of the organization;
• on-line manuals and procedures;
• Creating internal forums and bulletin boards;
• Displaying a digital phone book and a personnel catalogue;
• Event calendars for the events within the organization;
• Search engine for documents;
• Displaying news from within the organization and from the
outside world;
• Lists of articles written by partners;
• List of clients and contact information database;
• Marketing and price information for products together with
their catalogue;
Advantages for using an Intranet
• Workforce productivity: Intranets can help employees to quickly find and view information
and applications relevant to their roles and responsibilities. Via a simple-to-use web browser
interface, users can access data held in any database the organization wants to make
available, anytime and - subject to security provisions - from anywhere, increasing
employees' ability to perform their jobs faster, more accurately, and with confidence that
they have the right information.
• Time: With intranets, organizations can make more information available to employees on a
"pull" basis (ie: employees can link to relevant information at a time which suits them) rather
than being deluged indiscriminately by emails.
• Communication: Intranets can serve as powerful tools for communication within an
organization, vertically and horizontally.
• Web publishing allows 'cumbersome' corporate knowledge to be maintained and easily
accessed throughout the company using hypermedia and Web technologies. Examples
include: employee manuals, benefits documents, company policies, business standards,
newsfeeds, and even training, can be accessed using common Internet standards (Acrobat
files, Flash files, CGI applications). Because each business unit can update the online copy of a
document, the most recent version is always available to employees using the intranet.
• Business operations and management: Intranets are also being used as a platform for
developing and deploying applications to support business operations and decisions across
the internetworked enterprise.
Disadvantages
• Publication of information must be controlled
to ensure only correct and appropriate
information is provided in the intranet – poate
fi combinat cu solutii de workflow pentru
rezolvarea problemei.
• Appropriate security permissions must be in
place to ensure there are no concerns over
who accesses the intranet or abuse of the
intranet by users.
Exemples
• KPMG moved all of its information assets to an intranet called
KWorld.
• “The success of Cisco Systems has been largely attributed to its
innovative corporate intranet”
• Ford Motor Co has more than 175,000 employees in 950 locations
worldwide, each of whom had access to the company’s intranet,
called Myford.com. The intranet gave employees information about
benefits, demographics, salary history, general company news and
human resources forms.
• ShoreBank's branch, affiliate, and consulting service employees
around the world communicate and collaborate using SIREN. SIREN
is an intranet, extranet, and knowledge management solution
implemented in 2006 using Intranet DASHBOARD.
• The Australian National University uses an Intranet called
Claromentis to maintain one of its external sites.
Links
• http://www.intranetjournal.com
• http://www.intranetblog.com
• http://www.steptwo.com.au/columntwo/
• http://www.eaber.org
• http://b-r-ent.com
• http://www.sorce.biz/whitepaperindex.asp
• http://www.s-development.net/blogs/
• http://www.intranetmaturity.com/
• http://www.theworkplaceblog.com/
Extranet
• It is an Web site with controlled access where the visitors may come
from the outside of the organization
• Exemples:
– Sales extranets allow the owners to publish special content for
important clients or for those that [prospect the market.
– B2B/e-commerce/virtual stores extranets for selected clients.
– Extranets for project management or collaborative extranets allow
sharing documents, plans and electronic goods within partners.
– In 2003 in the United Kingdom, several of the leading vendors formed
the Network of Construction Collaboration Technology Providers, or
NCCTP, to promote the technologies and to establish data exchange
standards between the different systems
• An extranet uses the features and the goals of an intranet,
extending them beyond the borders of an organization
Extranet usage
• Sharing up-to-date documents, files and images with
suppliers, partners and clients from different locations;
• Working in collaboration for editing, revision, updating,
versioning and storing documents and digital goods;
• Managing projects among partners from a single location;
• Sharing updated versions of frequently updated
documents: sales reports, stock summaries, product
specifications, design documents, production planning etc.
• Access for partners to back-office functions such as stock
management, warranty information, dates for new
products, shared sales etc.
Advantages
• Extranets can improve organization productivity by automating
processes that were previously done manually (e.g.: reordering of
inventory from suppliers). Automation can also reduce the margin
of error of these processes.
• Extranets allow organization or project information to be viewed at
times convenient for business partners, customers, employees,
suppliers and other stake-holders. This cuts down on meeting times
and is an advantage when doing business with partners in different
time zones.
• Information on an extranet can be updated, edited and changed
instantly. All authorised users therefore have immediate access to
the most up-to-date information.
• Extranets can improve relationships with key customers, providing
them with accurate and updated information
Disadvantages
• Extranets can be expensive to implement and maintain
within an organisation (e.g.: hardware, software, employee
training costs) — if hosted internally instead of via an
Application Service Provider;
• Security of extranets can be a big concern when dealing
with valuable information. System access needs to be
carefully controlled to avoid sensitive information falling
into the wrong hands.
• Extranets can reduce personal contact (face-to-face
meetings) with customers and business partners. This could
cause a lack of connections made between people and a
company, which hurts the business when it comes to
loyalty of its business partners and customers.
Portals
• Definition: Portals are single point of access to information which is:
– from various logically linked internet based applications and
– is of interest to various type of users
• Site on the World Wide Web that typically provides personalized
capabilities to its visitors, providing a pathway to other content.
– It is designed to use distributed applications,
– different numbers and types of middleware and hardware to provide
services from a number of different sources.
– In addition, business portals are designed to share collaboration in
workplaces.
– A further business-driven requirement of portals is that the content be
able to work on multiple platforms such as personal computers,
personal digital assistants (PDAs), and cell phones.
Advantages
• advantages of using portals:
– intelligent integration and access to enterprise
content, applications and processes
– improved communication and collaboration
among customers, partners, and employees
– unified, real-time access to information held in
disparate systems
– personalized user interactions
– rapid, easy modification and maintenance of the
website presentation
Types of portals
Types of portals
Types of Portals
• International portals (Yahoo!)
• Regional portals [(MswPower.Com),China (Sina.com), Italy (Webplace.it)]
– local information such as weather forecasts, street maps and local business
information
• Government portals:
– USA.gov,
– DisabilityInfo.gov
– Directgov (UK) – pentru cetateni
– businesslink.gov.uk (UK) – pentru persoane juridice
• Corporate/Enterprise portals
• Hosted web portals
– corporate portals gained popularity a number of companies began offering
them as a hosted service
– Hyperoffice.com, OFFICEHQ.com
• Domain specific portals – pentru toate tipurile de afarceri
Other types of portals
• Entertainment Portals: Often all members of an entertainment portal are responsible for its
content and direct the type of entertainment that is available to visitors to the site. An
example of one such portal is the South African Music and Entertainment portal Overtone.
These can be an essential part of community based networking and collaboration
• Environmental Portals: In recent years, many Environmental Portals have been developed in
order to raise awareness about Environmental Indicators. Such an Example is the EUSOILS
• Investment Portals: These are an excellent resource when researching global and industry
specific markets
• B2B and B2C Portals: B2B or Business to Business Portals have become a very important
resource for Global business. They provide buyer and seller details for different commodities
and products and help in connecting businesses across the globe. A B2B portal that
specializes in a single industry is called a Vertical B2B Portal or a Vortal. B2C or Business to
Consumer portals are used to directly sell products to consumers
• Mini Portals: Some localized portals are based on local interests, and edited and maintained
by individuals. While they do not provide the same levels of services as major portals, they
are a good place for collaboration of ideas, for commonly interested people
• Voice Portals: In addition to standard web sites accessed through web browsers, people can
also access "voice sites" through "voice browsers". Destinations accessed in this way by
standard telephones are often called Voice Portals.
Enterprise portal
• Definition:
– Enterprise Information Portals are one of the most
popular ways in which enterprises can allow their
employees and customers to search and access
corporate information.
– It is a single gateway for users, such as employees,
customers and company’s partners to log into and
retrieve corporate information, company history
and other services or resources
– is a web portal for use within an organization.
Example of an EIP Architecture
Example of an EIP Architecture 2
Portal features - 1
1. Web interface;
2. Presentation services (user interface
management);
3. External data access mechanisms;
4. Data access management;
5. security, authentication and personalization;
6. Tools for portal development;
7. Portal administrative and management tools.
Portal features - 2
• Content and document management — services that support the full life cycle of content and document
creation and provide mechanisms for authoring, approval, version control and scheduled publishing. Some
portal solutions providers aim to remove the need for a third-party content management system.
• Collaboration — portal members can communicate synchronously (through chat or messaging) or
asynchronously through threaded discussion and email digests (forums) and blogs.
• Search & Navigation — Content is meant to be read, so on the usage side of the equation, being able to
find and retrieve targeted content is the essential task. As more content is added to repositories, the more
valuable those repositories become. Unfortunately, retrieving useful information becomes more difficult
as the volume of information grows unless effective search and navigation methods are employed.
• Personalization — the ability for portal members to subscribe to specific types of content and services.
Users can customize the look and feel of their environment.Customers who are using EIPs can edit and
design their own web sites which are full of their own personality and own style; they can also choose the
specific content and services they prefer. Like My Yahoo. MSN.
• Entitlement /Securitate — the ability for portal administrators to limit specific types of content and
services users have access to. For example, a company's proprietary information can be entitled for only
company employee access.
• Integration — the connection of functions and data from multiple systems into new components/portlets.
• Single sign-on (SSO) — many enterprise portals provide single sign-on capabilities between their users and
various other systems. This requires a user to authenticate only once. Access control lists manage the
mapping between portal content and services over the portal user base.
EIP Advantages
• Centralization: EIPs provide a centralized system that may contain a wide
range of a company’s corporate information and access to online
applications. This centralized information system enables customers or
employees to easily access information such as reports, application forms
or policy documents. Furthermore, it is easy for the individuals within the
company to update or edit content.
• Increase productivity and profit: information and time is money. A
Centralized and well organized information system provided by EIP can
help employees get quick response and information that increase
employees’ productivity. In addition, it can offer customers easy access to
resources that may increase the company’s sources of customers.
• Provide security area: EIP has one significant feature which is providing a
security area that for team or a specific partner to access, which means
only authorized users can access restriction information.
EIP Disavantages
• High cost: To maintain several Web and portal sites for employees, customers and
partners is an expensive process, companies can spend a huge amount of money
on an EIP system in the hope that it will provide a stable portal for them.
• Conflict: To keep the current infrastructures or introduce more advanced system?
Many businesses required a single, integrated Web environment to cover all the
information and applications that is easy for employees, partners and customers to
view and find information. However, no company would like to spend huge cost on
replacing their existing infrastructures.
• outmoded platform: Many companies used to have an outmoded development
platform. Also they pay unequal attention on their information system. Especially
for external, customers or employees can not find enough information or
applications that they wanted. Therefore, the company may lose lots of chances to
attract potential customers.
• Ignoring Importance of Information Systems: An information repository is one of
the basic requirements for a company to keep providing information to employees,
partners or customers which they wish to view as web pages over their intranet. In
contrast, a portal that does not contain all pertinent information resources can
decrease a company’s market share and competitive advantage.
Data presentation techniques within
portals
• Portlet (pluggable user interface components that are managed and
displayed in a web portal. Portlets produce fragments of markup code that
are aggregated into a portal page. Typically, a portal page is displayed as a
collection of non-overlapping portlet windows, where each portlet
window displays a portlet. Hence a portlet (or collection of portlets)
resembles a web-based application that is hosted in a portal. Portlet
applications include email, weather reports, discussion forums, and news.
• Web-parts: Web Part is an add-on ASP.NET technology to Windows
SharePoint Services. Web Parts are an integrated set of controls for
creating Web sites that enable end users to modify the content,
appearance, and behavior of Web pages directly from a browser
• Digital dashboards: also known as an enterprise dashboard or executive
dashboard, is a business management tool used to visually ascertain the
status (or "health") of a business enterprise via key business indicators.
Digital dashboards use visual, at-a-glance displays of data pulled from
disparate business systems to provide warnings, action notices, next steps,
and summaries of business conditions
Digital Dashboard

http://www.businessweek.com/magazine/content/06_07/b3971083.htm

A "dashboard" pulls up everything the CEO needs to run the show


Personalization
• Creating a product (electronic or not) for a certain user.
• What can be personalized (among others)
– web pages– personalized according to the user preferences
• Customization – something that the user does to change the sort of content
and user interface that one sees
• Personalization in an EIP– accomplished according to the attributes of a user:
department, functional area, role in that organization.
– Search engine / Google - http://searchengineland.com/070202-
224617.php
• Available for authenticated users
• Large-scale personalization taking into account: user history, bookmarks,
community behavior etc.
– Magazines
• Address
• Targeted ads according to the geographic area or interests
• Others – mugs, shirts, …
Why Personalization?
• Save time: Eliminate repetitive tasks; remember transactional details; recognize
habits and shorten the path to engage in such habits (example: frequently called
numbers on a phone should automatically go into the phone’s memory).
• Save money: Prevent redundant work (example: make it easier for employees and
suppliers to know someone else has already solved the problem that they are
currently facing); eliminate service components unnecessary to a customer;
identify lower-cost solutions that meet all other specifications.
• Better information: Provide training; filter out information not relevant to a
person; provide more specific information that is increasingly relevant to a
person’s interests; increase the reliability of information; replace “average”
information with information specific to that person’s environment.
• Address ongoing needs, challenges, or opportunities: Provide one-stop services;
allow flexibility in work hours, job responsibilities, and benefits; accommodate
unique personal preferences (example: allow employees to customize their office
space, within certain boundaries); recognize and reward achievement with special
treatment.
Personalization Disadvantages
• Anonymity preferred. There are many reasons why people might not want to be identified, from the innocent - it's a
birthday present they don't want their spouse to discover in advance on their credit card statement - to the unethical or
illegal. Some people are simply private, and prefer to mind their own business and let others mind theirs. Others recognize
the growing infringements on private space and choose to take the cautious route.
– A. Michael Froomkin, associate professor at the University of Miami School of Law, wrote, "Anonymity may be the
primary tool available to citizens to combat the compilation and analysis of personal profile data, although data
protection laws also may have some effect."
• Lack of relevance. People do not want a relationship with companies that have no relevance to them. Computer
programmers have no interest in getting to know an executive recruiter who only places sales executives. Homeowners who
only buy the finest products for their home will not be interested in a cut-rate furniture store.
– If you've never been to Arkansas, never plan to go there, and don't know anyone there, you don't want to be on the
mailing list of the Arkansas Tourism Board. On the Web, companies constantly ignore this factor and ask individuals
for information before demonstrating to the person's satisfaction that their services are relevant. The prime example
is companies that insist people fill out a lengthy form before they can gain access to a demo or to additional
information. If a company asks people for information before it has demonstrated relevance, between 30 and 50
percent-depending on which statistics you believe-will lie to prevent revealing personal information.
• Lack of credibility. If you don't trust a company, it becomes a relationship of last resort. Unless you have no choice, you
don't want to deal with it. People don't need proof that a company deserves to be in this category. Often, a small suggestion
that this might be the case is enough to justify caution.
• Lack of security. Good intentions aren't enough. If a company fails to protect its assets, and those of its stakeholders, then
people will not be willing to share anything of value with the firm. Security is like sausage making: the more you know about
it, the less likely you are to be comfortable. People have real reasons to fear that today's centralized networks are not
secure, because they frequently are not.
Personalization Disadvantages (2)
• Impossible. Sometimes, people just aren't able to take advantage of attractive offers. If a
company, local government, spouse, or neighborhood forbids a person from moving forward,
that's life. Likewise, if people lack the ability to accept personalization-perhaps they lack a
sophisticated enough cell phone, or a fast enough Web connection-it won't happen.
• Infrequent contact. People will have little interest in establishing a relationship with a cab
driver in a city they rarely visit, or with the company that installs their new septic system (a
once-in-twenty-five-years event.) Companies get around this limitation by broadening their
services to increase the frequency of contacts. Hewlett-Packard's printer division used to
focus on selling printers; now the firm realizes it can make more money selling printer
cartridges, as well as paper, and in the process increase the frequency of its interactions with
customers.
• Little value placed on potential benefits. People may not recognize the value in offered
personalization, such as when firms offer to customize product offers. Many people don't
want to receive any such offers, period. Employees who are offered personalized training
may not value it if they were unimpressed with their previous experiences with the training
unit, and thus believe that even personalization won't make the time invested worthwhile.
Mass Personalization
• Adapting the products of a company according to the preferences and the tasted of
their users/customers.
• Mass Customization – the customers may create and choose the products according
to certain specifications and limits (Haag et al., Management Information Systems for
the information age: Third Ed., 2006, p.331)
– http://en.wikipedia.org/wiki/Configuration_system
• Advantages:
– Adapting the products (e.g. baseball jerseys may be customized based on size,
colour, team and logo, however there are a finite number of choices for these
variables to choose. To personalize a jersey, a name or number can be
administered to it as well as custom fitting) -
http://shop.mlb.com/category/index.jsp?categoryId=1745986&cp=1960173
– E-commerce: cloths, CDs, music, web design for sites
– DELL –personalized computers-
http://www1.ca.dell.com/content/products/category.aspx/desktops?c=ca&cs=cadhs1&l=en
&s=dhs
Profile/Rule-based personalization
• The result of combining the user profile (geographical location,
access device, access mechanism, groups, interests and
preferences) with the profile of the content ( attributes –
description, structural and administrative metadata – different for
each content type), all based on logical rules that reflect certain
relevance criteria. The result is the quality of the personalized
experience.
• The user profile is used for displaying the right content according to
the context. The control of the personalization is given to both the
user and the system.
• It is important for exposing the relevant content/ news about new
articles, links or events, matching the user's interest or role, instead
of waiting for the user to accidentally find it.
• user rules and attributes need to be updated for reflecting the user
needs.
Behavior-based personalization
• Personalization according to the user behavior in the
system: shoppings, searched terms, click (Chia, 2002)
• The system creates a dynamic user model based on the
user profile taking also into account the behavior of other
users in the system.
• The model is used for
– filtering,
– recommending and
– Obtaining information and services for the user
• The control of the personalization belongs to the system –
the quality of the personalization is given by the amount of
interaction between the user and the system
The characteristics of the different
types of personalization

http://informationr.net/ir/9-3/paper181.html

http://www.clickz.com/showPage.html?page=910251
Rules versus collaborative filtering
• When complex filtering is required, a rule-based system may work better
than collaborative filtering, and vice versa. The following table details
examples where one type of personalization is better than the other.
Scenario Which Reason
filtering type
to use
f the number of items offered and users who Rules Very little room to compute user similarity necessary for
purchase them are rather low. collaborative filtering.
If price points are high or purchasing Rules Finite, limited arenas - collaborative filtering fails because
frequency is low. of the inherent lack of diversity.
If there is a pre-existing dependency between Rules Recommending a disability policy just because
items. Example: Disability policy required for collaborative filtering says many others "like this user" also
homeowner bought a policy is incorrect--one must have the
homeowner policy first.
If number of items offered and users who Collaborative Cannot write rules covering all items.
purchase them are rather high.
If price points are low, all quite dissimilar, or Collaborative The wide variance fits the collaborative filtering approach.
the products offered have a wide range of Collaborative filtering also lowers the risk of making "bad"
user appeal. recommendations.
When not much information is gathered about Collaborative In this case, user attributes on which to base rules may be
the user, but the user can be identified, lacking. Collaborative filtering can compare the user's
possibly by a login or cookie. experiences on the site to other users.
Customization VS. Personalization

• Customization is something that the user does to change the


sort of content and user interface that one sees
– User interface, Layout, Style, Content, Preferences
• Customization is the core of “My” phenomenon
• Customization is something that the user does for herself, but
personalization is something that you do for the user
• Personalization comes before customization
– Personalization can set the overall context within which a
person may customize
– But customization cuts across audiences and trumps
personalization
Customization VS. Personalization (Cont.)

• Customization and personalization


– Each choice that a user makes to configure your
publication is another piece of data for that
person’s user profile
– The way that users differentiate themselves in
customizations (especially content customization)
can help you see audience distinctions
– You may migrate customization into
personalization ➔ if some audience consistently
choose the simple layout, why not make that
layout the default for that audience
Dynamic and Static Personalization

• A complete static site may provide just as


much personalization as a dynamic site
• Think about how to provide “Member only” in
Web
• Whether you create all the pages beforehand
or create each as someone requests, it is a
choice that you can make to create the best
overall system
Analyzing Personalization

• Think through what personalization you might


need. Decide
– How much personalization you need (and cost vs.
benefit)
– How you will get the information to provide
personalization
– How you should segment your audience
– What components and elements you want to
deliver to each audience segment
Analyzing Personalization (Cont.)

• Create a personalization plan that includes


– Audience segment profiles and what you’re delivering to
each segment
– A list of Web site behaviors that trigger personalization
– Which push campaign you will implement
– How your content must be tagged to support
personalization
– How the segments, content, behaviors, campaigns, and
tagging work together in a set of rules that describe which
content displays under which conditions
Analyzing Personalization (Cont.)

• Integrate the personalization with other parts of the logical


design, considering audiences as segmented in the design;
– Whether your design is capable of collecting all the
information in the user profile you’ve identified
– Whether the segmentation and behavioral assumptions
you’ve made can be tested and validated
– Whether the additional work and complexity involved in
creating components and templates is worth the
personalization that you’ll achieve
Collaborative filtering
• Collaborative filtering (CF) is the process of filtering for information or patterns using
techniques involving collaboration among multiple agents, viewpoints, data sources, etc.
• Applications of collaborative filtering typically involve very large data sets.
• Collaborative filtering methods have been applied to many different kinds of data including
sensing and monitoring data - such as in mineral exploration, environmental sensing over
large areas or multiple sensors; financial data - such as financial service institutions that
integrate many financial sources; or in electronic commerce and web 2.0 applications where
the focus is on user data, etc.
• For example, a collaborative filtering or recommendation system for music tastes could make
predictions about which music a user should like given a partial list of that user's tastes (likes
or dislikes). Note that these predictions are specific to the user, but use information gleaned
from many users. This differs from the simpler approach of giving an average (non-specific)
score for each item of interest, for example based on its number of votes.
• Types:
– Active filtering -
– Passive filtering
– Item based filtering
• Active Filtering - it uses a peer-to-peer approach . This means that it is a system where peers,
coworkers, and people with similar interests rate products, reports, and other material
objects, also sharing this information over the web for other people to see. It is a system
based on the fact that people want to share consumer information with the other peers. The
users of active filtering use lists of commonly used links to send the information over the web
where others can view it and use the ratings of the products to make their own decisions.
• Passive filtering - collects information implicitly. A web browser is used to record a user’s
preferences by following and measuring their actions. These implicit filters are then used to
determine what else the user will like and recommend potential items of interest. Implicit
filtering relies on the actions of users to determine a value rating for specific content, such
as: Purchasing an item, Repeatedly using, saving, printing an item, Referring or linking to a
site, Number of times queried
• Item based filtering - Item based filtering is another method of collaborative filtering in
which items are rated and used as parameters instead of users. This type of filtering uses the
ratings to group various items together in groups so consumers can compare them as well as
a rating scale that is available to manufacturers so they can locate where their product stands
in the market in a consumer based rating scale.
Innovations in Collaborative
Filtering
• New algorithms have been developed for CF as a result of the NetFlix prize (an ongoing open
competition for the best collaborative filtering algorithm that predicts user ratings for films,
based on previous ratings).
• Cross-System Collaborative Filtering where user profiles across multiple recommender
systems are combined in a privacy preserving manner.
• Robust Collaborative Filtering, where recommendation is stable towards efforts of
manipulation
Recommender system
• form a specific type of information filtering (IF) technique that attempts to present
information items (movies, music, books, news, images, web pages, etc.) that are likely of
interest to the user.
• a recommender system compares the user's profile to some reference characteristics, and
seeks to predict the 'rating' that a user would give to an item they had not yet considered.
These characteristics may be from the information item (the content-based approach) or the
user's social environment (the collaborative filtering approach).
• When building the user's profile a distinction is made between explicit and implicit forms
of data collection.
• Examples of explicit data collection include the following: Asking a user to rate an item on a
sliding scale, Asking a user to rank a collection of items from favorite to least favorite,
Presenting two items to a user and asking him/her to choose the best one, Asking a user to
create a list of items that he/she likes.
• Examples of implicit data collection include the following: Observing the items that a user
views in an online store, Analyzing item/user viewing times, Keeping a record of the items
that a user purchases online, Obtaining a list of items that a user has listened to or watched
on his/her computer, Analyzing the user's social network and discovering similar likes and
dislikes
Recommender system
• The recommender system compares the collected data to similar data
collected from others and calculates a list of recommended items for the
user.
• Recommender systems are a useful alternative to search algorithms since
they help users discover items they might not have found by themselves.
Interestingly enough, recommender systems are often implemented using
search engines indexing non-traditional data.
Recommender system - examples
• Amazon.com (online retailer, includes product recommendations)
• Amie Street (music service)
• Baynote (recommendation web service)
• ChoiceStream (product recommendation system)
• Collarity (media recommendation platform)
• Daily Me (news recommendation system (hypothetical))
• Genius (music service that is part of the iTunes Store)
• Heeii (browser plugin web content recommender based on implicit feedback)
• inSuggest (recommendation engine)
• iLike (music service)
• Last.fm (music service)
• Loomia (content recommendation engine)
• Strands (developer of social recommendation technologies)
• Netflix (DVD rental service)
• Pandora (music service)
• Reddit (news recommendation system)
• Slacker (music service)
• StumbleUpon (web discovery service)
• StyleFeeder (personalized shopping search)
Personalized Ads Attract Big
Spenders, Frequent Shoppers
• Overall, 78% of consumers are interested in receiving personalized content, which
is consistent with last year’s response.
• The types of content consumers want personalized are relatively consistent with
the previous survey findings, with music, books and DVDs being the most popular
categories.
• Consumers continue to recognize the value of personalization in social networking
with 71% believing that personalization would improve their experience by
introducing them to other members with similar interests and preferences.
• Interest in personalized ads is strongest online and on television. A large majority
of consumers are interested in personalized advertising distributed through their
television (72%) or online (73%). The number of consumers interested in
personalization on their mobile device is relatively low (35%).
• 45% of consumers reported receiving personalized recommendations that were a
poor match based on their tastes and interests in 2008 (vs. 46% in 2007).
• The most often cited reasons for why recommendations were considered to be
poor were that they were inappropriate (such as evening bags for men (51%)), or
that they didn’t match their preferences (48%).
More on personalization
• http://en.wikipedia.org/wiki/Collaborative_filtering
• http://en.wikipedia.org/wiki/Personalized_marketing
• http://www.andreas-ittner.de/index_rs.html
• http://www.cylogy.com/library/glossary.html
• http://www.deitel.com/Default.aspx?tabid=1229
• http://en.wikipedia.org/wiki/Recommender_system
• http://en.wikipedia.org/wiki/Collective_intelligence
• http://www.marketingcharts.com/television/personalized-
ads-attract-big-spenders-frequent-shoppers-7613/
Content Management
• is a set of processes and technologies that support the evolutionary
life cycle of digital information
• Digital content may take the form of:
• text, such as documents,
• multimedia files, such as audio or video files,
• or any other file type which follows a content lifecycle which requires
management
• The digital content life cycle consists of 6 primary phases:
• create,
• update,
• publish,
• translate,
• archive and
• retrieve.
CM –collaborative process
• Content management is an inherently collaborative process. It often
consists of the following basic roles and responsibilities:
– Creator - responsible for creating and editing content.
– Editor - responsible for tuning the content message and the style of
delivery, including translation and localization.
– Publisher - responsible for releasing the content for use.
– Administrator - responsible for managing access permissions to folders
and files, usually accomplished by assigning access rights to user
groups or roles. Admins may also assist and support users in various
ways.
– Consumer, viewer or guest- the person who reads or otherwise takes
in content after it is published or shared.
• A critical aspect of content management is the ability to manage
versions of content as it evolves. Authors and editors often need to
restore older versions of edited products due to a process failure or
an undesirable series of edits.
CMS - Content management system
• computer application used to create, edit, manage, search and publish
various kinds of digital media and electronic text.
• frequently used for
• storing,
• controlling,
• versioning, and
• publishing industry-specific documentation such as news articles, operators'
manuals, technical manuals, sales guides, and marketing brochures
• The content managed may include computer files, image media, audio
files, video files, electronic documents, and Web content.
• Variations on the same theme:
• Web Content Management,
• Digital Asset Management,
• Digital Records Management,
• Electronic Content Management
CMS features
• identification of all key users and their content management roles;
• the ability to assign roles and responsibilities to different content categories or
types;
• definition of workflow tasks for collaborative creation, often coupled with event
messaging so that content managers are alerted to changes in content (For
example, a content creator submits a story, which is published only after the copy
editor revises it and the editor-in-chief approves it.);
• the ability to track and manage multiple versions of a single instance of content;
• the ability to capture content (e.g. scanning);
• the ability to publish the content to a repository to support access to the content
(Increasingly, the repository is an inherent part of the system, and incorporates
enterprise search and retrieval.);
• separation of content's semantic layer from its layout (For example, the CMS may
automatically set the color, fonts, or emphasis of text.).
Web Content Management Systems
• content management system software, usually implemented as a Web
application, for creating and managing HTML content.
• It is used to manage and control a large, dynamic collection of Web
material (HTML documents and their associated images).
• A WCMS facilitates content creation, content control, editing, and many
essential Web maintenance functions.
• the software provides authoring (and other) tools designed to allow users
with little or no knowledge of programming languages or markup
languages to create and manage content with relative ease of use
• Most systems use a database to store content, metadata, and/or artifacts
that might be needed by the system
• A presentation layer displays the content to regular Web-site visitors
based on a set of templates.
• Administration is typically done through browser-based interfaces, but
some systems require the use of a fat client.
WCMS
• Automated templates
– Create standard output templates (usually HTML and XML) that can be automatically
applied to new and existing content, allowing the appearance of all content to be
changed from one central place.
• Easily editable content
– Once content is separated from the visual presentation of a site, it usually becomes
much easier and quicker to edit and manipulate. Most WCMS software includes
WYSIWYG editing tools allowing non-technical individuals to create and edit content.
• Scalable feature sets
– Most WCMS software includes plug-ins or modules that can be easily installed to extend
an existing site's functionality
WCMS
• Workflow management
– Workflow is the process of creating cycles of sequential and parallel tasks that must be
accomplished in the CMS. For example, a content creator can submit a story, but it is not
published until the copy editor cleans it up and the editor-in-chief approves it.
• Delegation
– Some CMS software allows for various user groups to have limited privileges over
specific content on the website, spreading out the responsibility of content
management.
• Document management
– CMS software may provide a means of managing the life cycle of a document from initial
creation time, through revisions, publication, archive, and document destruction.
• Content virtualization
– CMS software may provide a means of allowing each user to work within a virtual copy
of the entire Web site, document set, and/or code base. This enables changes to
multiple interdependent resources to be viewed and/or executed in-context prior to
submission.
WCMS Types
• Offline processing
– These systems pre-process all content, applying templates
before publication to generate Web pages. Vignette CMS and
Bricolage are examples of this type of system.
• Online processing
– These systems apply templates on-demand. HTML may be
generated when a user visits the page, or pulled from a cache.
(Mambo, Joomla!, Drupal, TYPO3, Zikula and Plone.)
• Hybrid Systems
– combine the offline and online approaches (Blosxom)
Document Management Systems
• is a computer system used to track and store electronic
documents and/or images of paper documents.
• The term has some overlap with the concepts of content
management systems and is often viewed as a
component of enterprise content management (ECM)
systems and related to digital asset management,
document imaging, workflow systems and records
management systems.
DMS
Most methods for managing documents address the following areas:
• Location
– Where will documents be stored? Where will people need to go to access documents?
Physical journeys to filing cabinets and file rooms are analogous to the onscreen
navigation required to use a document management system.
• Filing
– How will documents be filed? What methods will be used to organize or index the
documents to assist in later retrieval? Document management systems will typically use
a database to store filing information.
• Retrieval
– How will documents be found? Typically, retrieval encompasses both browsing through
documents and searching for specific information.
• Security
– How will documents be kept secure? How will unauthorized personnel be prevented
from reading, modifying or destroying documents?
DMS
• Disaster recovery
– How can documents be recovered in case of destruction from fires, floods or natural
disasters?
• Retention period
– How long should documents be kept, i.e. retained? As organizations grow and regulations
increase, informal guidelines for keeping various types of documents give way to more formal
records management practices.
• Archiving
– How can documents be preserved for future readability?
• Distribution
– How can documents be available to the people that need them?
• Workflow
– If documents need to pass from one person to another, what are the rules for how their work
should flow?
• Creation
– How are documents created? This question becomes important when multiple people need to
collaborate, and the logistics of version control and authoring arise.
• Authentication
– Is there a way to vouch for the authenticity of a document ?
• Traceability
– When, where and by whom are documents created, modified, published and stored?
DMS Components
• Metadata
– stored for each document. Metadata may, for example, include the date the
document was stored and the identity of the user storing it
– Some systems also use optical character recognition on scanned images, or
perform text extraction on electronic documents.
– The resulting extracted text can be used to assist users in locating documents
by identifying probable keywords or providing for full text search capability, or
can be used on its own. Extracted text can also be stored as a component of
metadata, stored with the image, or separately as a source for searching
document collections.
• Integration
– Many document management systems attempt to integrate document
management directly into other applications, so that users may retrieve
existing documents directly from the document management system
repository, make changes, and save the changed document back to the
repository as a new version, all without leaving the application.
– Such integration is commonly available for office suites and e-mail or
collaboration/groupware software.
DMS Components
• Capture
– Images of paper documents using scanners or multifunction printers.
– Optical character recognition (OCR) software is often used, whether integrated into the
hardware or as stand-alone software, in order to convert digital images into machine
readable text.
• Indexing
– Indexing may be as simple as keeping track of unique document identifiers;
– it takes a more complex form, providing classification through the documents' metadata
or even through word indexes extracted from the documents' contents.
– Indexing exists mainly to support retrieval.
• Storage
– Storage of the documents often includes management of those same documents;
– where they are stored,
– for how long,
– migration of the documents from one storage media to another
– and eventual document destruction
DMS Components
• Retrieval
– Simple retrieval of individual documents can be supported by allowing the user to
specify the unique document identifier, and having the system use the basic index (or a
non-indexed query on its data store) to retrieve the document.
– More flexible retrieval allows the user to specify partial search terms involving the
document identifier and/or parts of the expected metadata.
– This would typically return a list of documents which match the user's search terms
• Distribution
– A published document for distribution has to be in a format that can not be easily
altered.
• Security
– Some document management systems have a rights management module that allows an
administrator to give access to documents based on type to only certain people or
groups of people
• Workflow
– Manual workflow requires a user to view the document and decide who to send it to
– Rules-based workflow allows an administrator to create a rule that dictates the flow of
the document through an organization: for instance, an invoice passes through an
approval process and then is routed to the accounts payable department
(A simple example would be to enter an invoice amount and if the amount is lower than a
certain set amount, it follows different routes through the organization)
DMS Components
• Collaboration
– Collaboration should be inherent in an EDMS. Documents
should be capable of being retrieved by an authorized user and
worked on
• Versioning
– is a process by which documents are checked in or out of the
document management system, allowing users to retrieve
previous versions and to continue work from a selected point
• Publishing
– Publishing a document is sometime tedious and involves the
procedures of proofreading, peer or public reviewing,
authorizing, printing and approving
Enterprise Content Management
• Oricare din strategiile si tehnologiile utilizate pentru
gestionarea capturii, stocarii, securitatii, controlul
reviziei, regasirii, distribuirii, pastrarii si distrugerii
documentelor la nivel de organizatie
• Gestioneaza continut structurat si nestructurat
• Recent trends in business and government indicate
that ECM is becoming a core investment for
organizations of all sizes, more immediately tied to
organizational goals than in the past: increasingly more
central to what an enterprise does, and how it
accomplishes its mission (AIIM Industry Watch: State of the ECM Industry, "The
ECM Association Moving from Why? To How?: The Maturing of ECM Users". AIIM Association for
Information and Image Management international, Silver Springs, 2006.)
Traditional Components
• Document Management
• Colaborare/ groupware
• Web Content Management
• Records Management
• Workflow / Business Process Management -
http://en.wikipedia.org/wiki/Workflow

You might also like