You are on page 1of 62

INTERNET AND WEB TECHNOLOGY

BCA SEE411
Unit-I: Introduction to Internet and WWW

Department of Computer Science and Engineering


School of Engineering Sciences and Technology
Jamia Hamdard University, New Delhi-62
mdaslamparwez@jamiahamdard.ac.in

1 / 62
Outline of Unit I

Introduction to Internet and WWW


Introduction to Internet:
History of World Wide Web;
Protocols governing the web;
Understanding the Internet:
syntax of URLs,
web page and browsers,
search engine;

Introduction to Cyber Laws in India.

2 / 62
Internet
Internet
Internet is the network formed by the co-operative
interconnection of a large number of computer networks.
It is a global web of computers connected to each other by wires,
(mostly phone lines).
If you look at a map of big cities, smaller towns, and scattered
houses, each is connected together with roads, railways, etc. This
is similar to the Internet, except with the Internet, wires connect
computers. The Internet is a superhighway.

3 / 62
Internet ...

It is also known as a network of networks.


There is no single owner of the internet. You cannot identify a single
owner who owns or who administrates or manages the whole network.
Every person who makes a connection to the internet becomes a part
of the owner.

4 / 62
Internet...
If you look into the history of computer networks you will find that computer
networks initially started in the late 60s.
They were mainly some clusters of computers in different laboratories and
organizations whose main purpose was to connect several computers together to
achieve a number of goals like exchanging messages, sharing some information,
etcetera.
The network that was there were connecting a number of computers of the same
type or they were for the same vendor. For example, network comprised of only
IBM computers; a network with only deck computers, or only with HP computers
and so on.
A network that had connected a number of IBM computers were totally unknown
entity to a HP computer. The HP computer did not understand how the IBM
network would work.
So there has to be a common binding force or a common standard that would all
the computers across these networks to talk or communicate among themselves.
There is a standard protocol called TCP and IP. TCP stands for Transmission
Control Protocol and IP stands for Internet Protocol.
So any computer or any network if it wants to get connected to the internet, the
computer or the network must understand the language of TCP IP.
5 / 62
History of Internet ...
Evolution of Internet:
It started as early as in the 1950s where the US defense organization ARPA
(Advanced Research Projects Agency) started to network a number of
computers.
A few computers located in different parts of the country were provided with
some sort of connectivity, so that they can communicate among themselves.
Subsequently, while it continued for some time like this in 1970s and beyond
the ARPA, which became to known as ARPA network advanced research
project agency network (ARPANET). So ARPANET started to create a
standard which is basically the predecessor to the TCP standard that we have
today.
At that time the standard that was proposed that is not exactly TCP, but it is
step in the right direction. It was a premiliminary protocol which through
subsequent refinements and modification became finally the TCP as we see
today.
In 1971 the universities were added to the network, the main purpose was that
many of the defense funded research used to take place in the universities and
ARPANET felt that universities should be part of the network.
6 / 62
Internet ...
Evolution of Internet: ...
And some basic internet services like telnet and FTP were made
available.
Now using telnet you can start remote session on a different
computer sitting on your own computer.
And using FTP (File Transfer Protocol), You can transfer a file or a
group of files between two machines.
In 1972 the first version of electronic mail came into being and the first
email message was sent during that time.
In 73, ARPANET spread its reach beyond US it connected to some
sites or locations in England or Norway. So it is in 1973 ARPANET
started to spread across continents.
1974 TCP was recognized as the standard and it was used for
communication across a system of networks. So you can say that in
1974 people actually started to talk about having a number of
networks.
7 / 62
Internet ...
Evolution of Internet: ...
In 1982, the US department of defense started building their own
defense data network based on the same technology that were
developed in ARPANET.
As ARPANET has evolved, it also brought along with it a number of
different technologies, some protocols, some standards which people
used and were actually able to communicate.
In 1983 this ARPANET actually got spit into ARPANET and a new
network military network MILNET which had some additional
security requirements.
In 1983, we saw the internet which is very familiar to the internet we
see today. TCP/IP was recognized as a standard.
Then in 1986, National Science Foundation started another network
NFSNET, with an objective to create very strong back bone network
used to connect the regional networks.

8 / 62
internet...
Evolution of Internet: ...
NFSNET is a system of regional networks which were connected over a
back bone network.
Some networking devices called routers were used to connect different
regional networks among themselves and this constituted what is called
a back bone network.
Thus, main purpose of NFSNET was to create a very powerful back
bone network which would provide a back bone for future generation
communication systems.
In 1991 some new applications like Archie and Gopher were released.
At that time applications like the FTP file transfer protocol become
very popular. People stated to keep a large number of resources on the
different FTP servers and through FTP you can basically connect to
that server and you can download the material whatever you want. It
is very similar to the World Wide Web that we see today.

9 / 62
Internet ...
Evolution of Internet: ...
Most of you have used internet through the World Wide Web through
the browser and you know how it looks like. But at that time there
was no user interface just we have to give command get a file. So how
do I know where that subject or that document is located?
At that time some big FTP catalogs were published, if one know the
address one can look at the catalog and find out where these
documents are located in this FTP server.
The Archie was developed as an FTP search engine. Well many of you
are familiar with Google, Yahoo; the search engines which people
today. So at that time Archie was the search engine through which
given a topic you want to search for. Archie return a list of FTP sites
where you could possible get that topic.
Gopher was a more intelligent version of Archie, gopher showed the
documents in a category and sub category.

10 / 62
Internet...
Evolution of Internet: ...
In 1992, the internet linked more than 17000 networks; there were
about 3 million hosts.
1993 the World Wide Web application were launched. They use
World Wide Web; they use either the internet explorer or Mozilla
conqueror as a browser to access world wide web.
In 1995, the concept of networks service providers came into being
and this network service providers started to offer service. But earlier
you had to build your own network and it was your responsibility to get
your network connected to the internet back bone.
Now there are service providers, well in India there are service providers
like BSNL, like Satyam, like Reliance, there are so many today.
Now you can approach them. They will provide you a connection and
it would be there responsibility to get your network connected to the
internet back bone.
So in 1995 there was about 30 million users.
11 / 62
Internet...

So over a period of time, the internet grew exponentially, the numbers


of nodes have increased in a fantastic way. So this exponential growth
continues and as of today we have more than a billion hosts which are
connected in the internet.
With the passage of time, a number of internet applications were
developed. Some of important application are:– Telnet, File Transfer
Protocol (FTP), Electronic Main (E mail), Internet Relay Chart
(IRC), Usenet News, and World Wide Web, etc.
TELNET allow a user to log into a remote computer and start remote
session. That means I am sitting on my computer, I am running a
program, and I am viewing the file. But actually everything is
happening on the other computer. What I am seeing on my screen is
the output of that program is coming straight away to my screen and I
am having an illusion that I am actually sitting on that other computer;
not on my computer.

12 / 62
Internet...

Similarly, file transfer protocol to transfer file between machines,


Electronic mail, which is the single largest application used by people
today. Every person who gets connected to the internet invariably uses
email.
Gopher, although today we do not see many gopher application. But
there was a time in 80s and mid 90s where gopher was very popular
through which we could browse through categories and sub categories
of resources or documents.
Internet relay chat: (IRC) is another application which is also very
widely used by people. Through IRC a number of persons can
communicate among themselves. If I type in some message, that
message can be viewed by all members of the group with whom I am
participating in that chat.
Usenet news is also very important application. It is like a discussion
group, news group or a discussion forum, say I have a discussion
forum through I can start a discussion.
13 / 62
Internet and world wide web...

World Wide Web which is the most important application that we


use today.
In fact today, World Wide Web is an umbrella under which all other
protocols can be used like electronic mail, like FTP, like news groups.
Everything else can be accessed under the same umbrella of World
Wide Web.
So, World Wide Web itself is an application and it also integrates
other application together.
So, today we have a single common interface, we start a browser and
through a browser we can have access of everything.
As there is no central agency, no central administration of the internet.
There are Internet societies, that are non-profit organizations. They
are created in order to bind the different activities in the internet
together. For example, the Internet Architecture board (IAB),
Internet Engineering Task Force (IETF) and Internet Engineering
Steering Group (IESG).
14 / 62
Internet and world wide web ...
These groups have some well-defined goal.
The IETF approves different working groups that have developed any new
protocol and wants to standardize it.
IESG group approve the publication of the draft as an RFC (Request for
Comments) if the feedback and the comments that have been received are
favorable.
History of World wide web:
The World Wide Web (WWW), commonly known as the Web, is an
information system where documents and other web resources are identified
by Uniform Resource Locators (URLs, such as
https://www.example.com/), which may be interlinked by hypertext, and
are accessible over the Internet.
World Wide Web allows computer users to locate and view
multimedia-based documents and other web resources on almost any subject
over the Internet.
The resources of the WWW may be accessed by users by a software
application called a web browser.

15 / 62
Internet and world wide web ...
History of World wide web:
English scientist Tim Berners-Lee invented the World Wide Web in 1989.
He wrote the first web browser in 1990 termed as ”WorldWideWeb” while
employed at The European Organization for Nuclear Research (CERN) near
Geneva, Switzerland. He renamed the browser Nexus to remove confusion
from World Wide Web, which is the abstract information space.
The browser was released outside CERN in 1991, first to other research
institutions starting in January 1991 and then to the general public in
August 1991.
Web resources may be any type of downloaded media, but web pages are
hypertext media that have been formatted in Hypertext Markup
Language (HTML).
Hypertext Markup Language (HTML) is the standard markup language
for documents designed to be displayed in a web browser.
In addition to text, web pages may contain images, video, audio, and
software components that are rendered in the user’s web browser as
coherent pages of multimedia content.

16 / 62
History of World wide web:...
The terms Internet and World Wide Web do not mean the same thing.
The Internet is a global system of interconnected computer networks.
In contrast, the World Wide Web is a global collection of documents and
other resources, linked by hyperlinks and URIs.
Tim Berners-Lee also wrote communication protocols the Hypertext
Transfer Protocol (HTTP)—a communications protocol used to send
information over the web.
In 1993, the web exploded with the availability of the Mosaic browser, which
featured a user-friendly graphical interface.
Mosaic was the first browser to allow images embedded in text making it
“the world’s first most popular browser ”.
Mosaic was created at the National Center for Supercomputing Applications
(NCSA) at the University of Illinois Urbana-Champaign by computer scientist
Marc Andreessen.
In October 1994, Tim Berners-Lee founded an organization—called the
World Wide Web Consortium (W3C)—devoted to developing
nonproprietary, interoperable technologies for the World Wide Web.
W3C’s primary goals is to make the web universally
accessible—regardless of ability, language or culture.
17 / 62
World Wide Web
Web 2.0
Web 1.0 (the state of the web through the 1990s and early 2000s) was focused on
a relatively small number of companies and advertisers producing content for users
to access (some people called it the “brochure web”).
In 2003 there was a noticeable shift in how people and businesses were using the
web and developing web-based applications.
The term Web 2.0 was coined by Dale Dougherty of in 2003 to describe this trend.
Generally, Web 2.0 companies use the web as a platform to create collaborative,
community-based sites (e.g., social networking sites, blogs, wikis, etc.)
Web 2.0 is providing new opportunities and connecting people and content in
unique ways
A Web 2.0 website allows users to interact and collaborate with each other through
social media dialogue as creators of user-generated content in a virtual community.
Examples of Web 2.0 features include social networking sites or social media sites
(e.g., Facebook), blogs, wikis, folksonomies (”tagging” keywords on websites and
links), video sharing sites (e.g., YouTube), image sharing sites (e.g., Flickr),
hosted services, Web applications (”apps”), collaborative consumption
platforms, and mashup applications.

18 / 62
Browser
Browser: An application that provides a way to look at and interact with the
information on the World Wide Web
It retrieves, presents, and traverses information resources, which include
web pages, images, video, and other multimedia content Browser History:
1993 – Mosaic was the first browser to allow images embedded in text making
it “the world’s first most popular browser”.
1994 – A noticeable improvement to Mosaic came Netscape Navigator.
1995 – Internet Explorer made its debut as Microsoft’s first web browser.
1996 – Opera started as a research project in 1994 that finally went public
two years later.
2003 – Apple’s Safari browser was released specifically for Macintosh
computers instead of Navigator.
2004 – Mozilla launched Firefox as Netscape Navigator faded out.
2008 – Google Chrome appeared to soon take over the browser market.
2011 – Opera Mini was released to focus on the fast-growing mobile browser
market.
2015 – Microsoft Edge was born to combat Google

19 / 62
Protocols governing web
Protocols:
The Internet relies on a number of protocols in order to function properly.
A protocol is simply a standard for enabling the connection,
communication, and data transfer between two places on a network.
Web browser uses these protocols to request information from a web server,
which is then displayed on the browser screen in the form of text and images.
The degree to which users can interact with that information depends on the
protocol.
Some Important protocols are:
Telnet (Terminal emulation protocol) It is one of the oldest protocol.
Telnet enables a user to communicate with a remote device. To access a
remote device, a network admin needs to enter the IP or host name of
the remote device, after which they will be presented with a virtual
terminal that can interact with the host.
Advantages: Compatible with multiple operating systems., Saves a lot
of time due to its swift connectivity with remote devices.
Disadvantages: Telnet lacks encryption capabilities and sends across
critical information in clear text, making it easier for malicious actors. It
is Expensive due to slow typing speeds.
20 / 62
Protocols ...
Protocols...
HyperText Transfer Protocol (HTTP:) It is the most widely used web
communications protocol.
If you look in the Address field of your web browser right now, it’s likely
you’ll see ”http://” at the front.
HTTP is a ”client-server” protocol. Users click a link on their web
browser (the client), and the browser sends a request over the internet
to a web server that houses the site the user requested. The server sends
back the content of the site, such as text and images, which display in
users’ web browsers.
HTTP is an unsecure communications protocol because the data it sends
back and forth between a browser and a server is unencrypted and can be
intercepted by third parties.
HTTP requires more power to establish communication and transfer data.

21 / 62
Internet and world wide web ...
Protocols...
File Transfer Protocol (FTP): It is primarily used to transfer files
such as documents, images, music, etc., between remote computers.
Users have to log on to an FTP server either through a command line
interface or through one of the many FTP graphical client programs
available. Once logged on, users can navigate through the remote
server’s file structure, moving, renaming, deleting, and copying files as
if it were their own computer.
Advantage: Enables sharing large files and multiple directories at the
same time, resume file sharing if it was interrupted, and Lets you recover
lost data, and schedule a file transfer.
Disadvantage: FTP lacks security and encryption. Data, usernames,
and passwords are transferred in plain text, making them vulnerable to
malicious actors.

22 / 62
Internet and world wide web ...
Protocols:...
Hypertext Transfer Protocol Secure (HTTPS): HTTPS is similar to HTTP, but
different in that it combines with a security protocol called SSL/TLS to provide
secure client-server communications over unsecure networks such as the internet.
You’re most likely to see HTTPS protocols on ecommerce websites that ask for
personal financial information like credit card numbers. You know a website is using
HTTPS protocols when you see the ”https://” in the web address displayed in your
browser’s Address field.
Simple Mail Transfer Protocol (SMTP:) SMTP is a protocol designed to transfer
electronic mail reliably and efficiently. SMTP is a push protocol and is used to send
the email. SMTP transfers emails between systems, and notifies on incoming emails.
Using SMTP, a client can transfer an email to another client on the same network
or another network through a relay or gateway access available to both networks.
Advantages: Ease of installation, Connects to any system without any
restriction, and doesn’t need any development from your side.
Disadvantages: Back and forth conversations between servers can delay
sending a message, and also increases the chance of the message not being
delivered. Certain firewalls can block the ports used with SMTP.

23 / 62
Internet and world wide web ...
Protocols...
Transmission Control Protocol (TCP): TCP separates data into packets
that can be shared over a network. These packets can then be sent by devices
like switches and routers to the designated targets.
TCP is a connection-oriented protocol, as it requires a connection to be
established between applications before data transfer.
Through flow control and acknowledgement of data, TCP provides extensive
error checking.
TCP ensures sequencing of data, meaning the data packets arrive in order at
the receiving end. Retransmission of lost data packets is also feasible.
Datagram Protocol (UDP): UDP works in a similar way to TCP, sending
packets of data over the network. The key difference between the two is that
TCP ensures a connection is made between the application and server, but
UDP does not.
UDP is a connection-less transport layer protocol that provides a simple but
unreliable message service.
Unlike TCP, UDP adds no reliability, flow control, or error recovery functions.
UDP is useful in situations where the reliability mechanisms of TCP are not
necessary.
24 / 62
Internet and world wide web ...
Protocols
Internet Protocol (IP): The Internet Protocol (IP) is a network-layer
protocol that contains addressing information and some control information
to enable packets to be routed in a network.
Along with the Transmission Control Protocol (TCP), IP represents the heart
of the Internet protocols.
IP has two primary responsibilities: providing connectionless, best-effort
delivery of datagrams (packets) through a network; and providing
fragmentation and reassembly of datagrams to support data links with
different maximum-transmission unit (MTU) sizes.
When you send or receive data (for example, an e-mail note or a Web page),
the message gets divided into little chunks called packets. Each of these
packets contains both the sender’s Internet address and the receiver’s address.
The Internet Protocol just delivers them.
There are two basic IP versions, IPv4 and IPv6. IPv4 has a 32-bit address
length whereas IPv6 address has 64 bit length.
There are thousands of protocols with their different functionalities: e.g. POP,
ICMP, DHCP, ARP, SNMP, etc. for data communication, management, and
security.
25 / 62
URL
Uniform Resource Locator (URL) is a (subset of) Uniform Resource
Identifier (URI) that specifies where an identified resource is available and the
mechanism for retrieving it.
In addition to identifying a resource, it provide a means of locating the
resource by describing its primary access mechanism (e.g., its network
”location”).
Most web browsers display the URL of a web page above the page in an
address bar.
Example: the addresses of web pages on the World Wide Web, such as
http://www.example.com/.
Some other examples:
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
telnet://192.0.2.16:80/
The format is based on Unix file path syntax, where forward slashes are used to
separate directory or folder and file or resource names.
Also, server names could be prepended to complete file paths, preceded by a
double-slash
26 / 62
URL ...
URL Syntax:
scheme://host:port/path?query-string#fragment-id
Complete URI syntax diagram

Every URL consists of some of the following:


Scheme name (commonly called protocol), followed by a colon is the URL’s
first component. The scheme name defines the namespace, purpose, and the
syntax of the remaining part of the URL.
It represents a protocol that a browser must need to use to request the
resource. Commonly used schemes are : http://, https://, ftp://, file://,
etc.
Authority: It includes two sub-components, domain name and Port,
separated by a colon (:).
Domain Name: depending upon scheme, the domain name or host
address or IP address gives the destination location for the URL.
Such as ’google.com’, and ’facebook.com’.
27 / 62
URL ...
Authority...
An optional port number; if omitted, the default for the scheme is used. A
port number specifies the type of service that is requested by the client since
servers often deliver multiple services. Some default port numbers include 80
for HTTP and 443 for HTTPS servers.
Path: A path component, consisting of a sequence of path segments separated by a
slash (/).
It represents the path of the resource to be fetched or the program to be run.
It may consist of one or more directory or folder names.
Aquery string for scripts. The query string contains data to be passed to software
running on the server. It is an optional component preceded by a question mark
(?), containing a query string of non-hierarchical data.
It may contain name/value pairs separated by ampersands, for example
?first name=John&last name=Doe.
fragment id: An optional fragment identifier that specifies a part or a position
within the overall resource or document.
The fragment component is preceded by a hash (#). When used with HTTP, it
usually specifies a section or location within the page, and the browser may scroll to
display that part of the page.

28 / 62
URL ...

29 / 62
Web Page
A web page is a simple document displayable by a web browser such as
Firefox, Google Chrome, Opera, Microsoft Internet Explorer or Edge, or
Apple’s Safari. These are also often called just ”pages.”
Such documents are commonly written in the HTML language. A web page
can embed a variety of different types of resources such as:
style information — controlling a page’s look-and-feel
scripts — which add interactivity to the page
media — images, sounds, and videos.
Website: A collection of web pages which are grouped together and usually
connected together in various ways. Often called a ”web site” or a ”site.”
A website refers to a central location containing more than one web page.
A web page can be accessed by entering a URL address into a browser’s address bar.
A web page may contain text, graphics, and hyperlinks to other web pages and
files.
The web page is always the last part of the URL.
For example:
”http://www.jamiahamdard.ac.in/LibrayInformationSystem/project.html”.
In the above url, ”project.html” is a web page.

30 / 62
Web Page and Web browser
Web page...
A Web pages may have file extension .htm, .html, .php, .cgi, .pl or others.
For URLs not having an ending of .htm, .html, .php, .cgi, .pl, .asp, .asp,
or another file extension,
For URLs not having an ending of .htm, .html, .php, .cgi, .pl, or another file
extension, the server loads the default index.htm web page in that directory.
Web browser:
A web browser (commonly referred to as a browser) is application
software for accessing the World Wide Web.
When a user requests a web page from a particular website, the web
browser retrieves the necessary content from a web server and then
displays the page on the user’s device.
Important browser software includes: Google Chrome, Internet
Explorer, Mozilla Firefox, Safari, Opera, Konqueror, Lynx.
Assignment: Write a short note on 5 most widely used browsers
alongwith their features, pros and cons.
31 / 62
Search Engine
A search engine is a software system that is designed to carry out web searches.
Search engine searches the World Wide Web in a systematic way for particular
information specified in a textual web search query.
The search results are generally presented in a list of results and are often called
hits. The information may consist of web pages, images, information and other
types of files.
It provides a list of results that best match the user query.
Search engines also maintain real-time information by running an algorithm on a
web crawler.
Some search engines also mine data available in databases or open directories.
Some may search a particular website also.
A website search engine shows the results only from that website and not the
entire Internet.
Today, there are many different search engines available on the Internet, each with
its own abilities and features.
For users, a search engine is accessed through a browser on their computer,
smartphone, tablet, or another device.

32 / 62
Search Engine ...
Most Popular Search Engines In The World:

Google Yandex Aol.com


Microsoft Bing DuckDuckGo Internet Archive
Yahoo Ask.com
Baidu Ecosia

33 / 62
Search Engine..
Search engines come in a number of configurations that reflect the
applications they are designed for.
Web search engines:, such as Google and Yahoo!,are be able to capture, or
crawl, many terabytes of data, and then provide subsecond response times to
millions of queries submitted every day from around the world.
Enterprise search engines: -are able to process the large variety of
information sources in a company and use company-specific knowledge as
part of search and related tasks such as data mining. E.g. search engines
used by e-commerce sites Amazon, Flipkarts
Desktop search engines, such as the Microsoft Window search feature,
is able to rapidly incorporate new documents, web pages, and email as the
person creates or looks at them.
All search engines perform three basic tasks:
Search the Internet, or select parts of the Internet based on important words,
Keep an index of the words they find, and where they find them, and
Allow users to look for words or combination of words found in that index.

34 / 62
Search Engine..
Kinds of Web Search: Depending on users intention the web search can be of
three types:
Navigational Search: It is a keyword search in which the searcher wishes to go to a
specific website, or a web page on a specific site. The searcher uses a web search
engine to navigate (go to) a website.
For example, if you wish to go to the website of the‘President of India’ just type
the query ‘President of India’ in a search engine (say Google) and search the web.
The search result contains a link to the President of India website along with other
links. Just by clicking the link, you will reach the website.
Informational Search: The intent of the informational search is to acquire some
information, assuming it is available on the internet. The search is conducted for
study, research or any other purpose where scholarly information is required.
For example, to find information on a topic ‘Career in library and information
science’. When the query is put to Google search engine on the web, it provides a
list as search results, which contains references of 42,400 hits from across the web.
Transactional Search It intends to reach a website for further interaction or some
other activities. Such queries could be shopping, downloading various types of files,
as image, song, movies, etc. and various web mediated services like gaming, etc.
For example, buy online tickets for airplane, train, bus, movie, etc.
35 / 62
Search Engine ...
Search engine components support two major functions–
Indexing process: It builds the structures that enable searching.
The major components of index processing include: text acquisition, text
transformation, and index creation.
text acquisition: The task of the text acquisition component is to
identify and make available the documents that will be searched. It
require building a collection by crawling or scanning the Web, a
corporate intranet, a desktop, or other sources of information.
text transformation The text transformation component transforms
documents into index terms or features. Index terms, are the parts of a
document that are stored in the index and used in searching. Examples
of types of index terms or features are words, phrases, names of people,
dates, and links in a web page.
index creation. The index creation component takes the output of the
text transformation component and creates the indexes or data
structures that enable fast searching.
Query process: It uses those structures and a person’s query to produce a
ranked list of documents. The major components are user interaction,
ranking, and evaluation.

36 / 62
Search Engine ...
The user interaction component provides the interface between the person doing
the searching and the search engine. One task for this component is accepting the
user’s query and transforming it into index terms. Another task is to take the
ranked list of documents from the search engine and organize it into the results
shown to the user.
The ranking component is the core of the search engine. It takes the transformed
query from the user interaction component and generates a ranked list of
documents using scores based on a retrieval model.
The task of the evaluation component is to measure and monitor effectiveness and
efficiency. An important part of that is to record and analyze user behavior using log
data. The results of evaluation are used to tune and improve the ranking
component.

37 / 62
structure of the Web is somewhat like a “bow-tie” [11]. That is, about 28% of the pages constitute a
strongly connected core (the center of the bow tie). About 22% form one of the tie’s loops: these are
Search
pages that can be reached from the core but Engine
not vice versa. ... loop consists of 22% of the pages
The other
that can reach the core, but cannot be reached from it. (The remaining nodes can neither reach the
Basic
core nor can Searchfrom
be reached Engine Architecture
the core.)

38 / 62
Search Engine...
Crawler or Spider
Every search engine relies on a crawler module to provide the useful material for its
operation.
Crawlers are small programs that ‘browse’ the Web on the search engine’s behalf,
similarly to how a human user would follow links to reach different pages.
The programs are given a starting set of URLs, whose pages they retrieve from the
Web.
The crawlers extract URLs appearing in the retrieved pages, and give this
information to the crawler control module. This module determines what links to
visit next, and feeds the links to visit back to the crawlers.
The crawl control module is responsible for directing the crawling operation
The crawlers also pass the retrieved pages into a page repository.
Crawlers continue visiting the Web, until local resources, such as storage, are
exhausted.
Crawl control may also use feedback from usage patterns to guide the crawling
process
The crawler module retrieves pages from the Web for later analysis by the indexing
module.

39 / 62
Search Engine ...
Indexer: The indexer module builds two basic indexes: a text (or content)
index and a structure (or link index).
The indexer module extracts all the words from each page, and records the URL
where each word occurred.
The result of indexer is a generally very large “lookup table” that can provide all
the URLs that point to pages where a given word occurs.
The table is of course limited to the pages that were covered in the crawling process
The indexing module may also create a structure index, which reflects the links
between pages.
The collection analysis module is responsible for creating a variety of other indexes,
such as the utility index. The collection analysis module may use the text and
structure indexes when creating utility indexes.
The utility indexes may provide access to pages of a given length, pages of a
certain “importance,” or pages with some number of images in them.
Sometimes search engines maintain a cache of the pages they have visited
beyond the time required to build the index. This cache allows them to serve out
result pages very quickly, in addition to providing basic search facilities.

40 / 62
Search Engine ...
Query Engine
The Query Engine module is responsible for receiving and filling search requests
from users. That is, query engine collects search terms from the user and retrieves
pages that are likely to be relevant.
The engine relies heavily on the indexes, and sometimes on the page repository.
Ranking
The ranking module has the task of sorting the results such that results near the
top are the most likely ones to be what the user is looking for.
The link structure of the Web contains important implied information, and can help
in filtering or ranking Web pages.
Two link based ranking techniques — PageRank and HITS.
Page repository
The page repository is a scalable storage system for managing large collections of
Web pages.
The page repository needs to perform two basic functions.
it must provide an interface for the crawler to store pages.
it must provide an efficient access API that the indexer and collection
analysis modules can use to retrieve the pages.
ASSIGNMENT: Write the features, pros, and cons of top five search engines
41 / 62
Cyber laws in India
Cyber laws
”Cyber” is a prefix used to describe a person, thing, or idea as part of the
computer and information age.
Taken from kybernetes, Greek word for ”steersman” or ”governor,” it was
first used in cybernetics, a word coined by Norbert Wiener and his colleagues.
In the late 1940s, cybernetics arose as the study of control systems and
communications between people and machines.
The virtual world of internet is known as cyberspace and the laws governing
this area are known as Cyber laws.
All the netizens of this space come under the ambit of these laws as it carries
a kind of universal jurisdiction.
Cyber law can also be described as that branch of law that deals with legal
issues related to use of inter-networked information technology.
In short, cyber law is the law governing computers and the internet.
Cyber law is the law governing cyber space
Cyber space includes computers, networks, data storage devices (such as
hard disks, USB disks), the internet, websites, emails and even cell
phones, ATM machines
42 / 62
Cyber laws...

”The modern thief can steal more with a computer than with a gun.
Tomorrow’s extremist may do more damage with a keyboard than or with a
bomb”
Cyber space creates moral, civil and criminal wrongs. It has now given a new
way to express criminal tendencies
Information technologies is encompassing all walks of life all over the world
Internet has dramatically changed the way we think, the way we govern, the
way we do commerce and the way we perceive ourselves
cyber space is open to participation by all
It has brought transition from paper to paperless world

43 / 62
Need for Cyber Law
In today’s highly digitalized world, almost everyone is affected by cyber law.
For example:
Almost all transactions in shares are in demat (electronic) form.
Almost all companies extensively depend upon their computer networks and keep
their valuable data in electronic form.
Government forms including income tax returns, company law forms etc. are now
filled in electronic form.
Consumers are increasingly using credit cards for shopping.
Most people are using email, cell phones and SMS messages for communication.
Even in ”non-cyber crime” cases, important evidence is found in computers / cell
phones e.g. in cases of divorce, murder, kidnapping, tax evasion, organized crime,
terrorist operations, counterfeit currency etc.
Cyber crime cases such as online banking frauds, online share trading fraud, source
code theft, credit card fraud, tax evasion, virus attacks, cyber sabotage, phishing
attacks, email hijacking, denial of service, hacking, pornography etc are becoming
common.
Digital signatures and e-contracts are fast replacing conventional methods of
transacting business.

44 / 62
Cyber Law in India ...
Cyber law
As per the cyber crime data maintained by the National Crime Records Bureau
(NCRB), a total of 27248, 44735, and 50035, Cyber Crime cases were registered in
2018, 2019, and 2020, respectively in all over India
Cyber law deals with
Cyber crimes
Electronic or digital signatures
Intellectual properties
Data protection & privacy
Who commits cyber crimes?
Insiders - Disgruntled employees and ex-employees, spouses, lovers
Hackers - Crack into networks with malicious intent
Virus Writers - Pose serious threats to networks and systems worldwide
Foreign Intelligence - Use cyber tools as part of their Services for espionage
activities and can pose the biggest threat to the security of another country
Terrorists - Use to formulate plans, to raise funds, propagand
Categories of cyber crime
Cyber crime against persons Cyber crimes against government
Cyber crimes against property
45 / 62
Cyber law in India ...
Cyber crime against persons: Crimes that are committed by the cyber
criminals against an individual or a person. A few cyber crime against
individuals are:
Cyber stalking: It involves online harassment where the user is subjected
to a plethora of online messages and emails. Cyberstalkers use social media,
websites and search engines to intimidate a user and instill fear. Usually, the
cyberstalker knows their victim and makes the person feel afraid or concerned
for their safety. Create physical threat by instilling fear to use the computer
technology such as internet, e-mail, phones, text messages, webcam, websites
or videos.
Impersonation A form of identity theft for committing fraud or cheating
of another person’s identity in which someone pretends to be someone else
by assuming that person’s identity, in order to access resources or obtain
credit and other benefits in that person’s name and fame. The criminal
gains access to a user’s personal information to steal funds, access
confidential information, or participate in tax or health insurance fraud.
Phishing: This type of attack involves hackers sending malicious email
attachments or URLs to users to gain access to their accounts or computer.
46 / 62
Cyber law ...
Crime Against Person
Hacking: It means unauthorized control/access over computer system and act of
hacking completely destroys the whole data as well as computer programs. Hackers
usually hacks telecommunication and mobile network.
Dissemination/Transmission of obscene material: It includes Indecent exposure/
Pornography (basically child pornography), hosting of web site containing these
prohibited materials. These obscene matters may cause harm to the mind of the
adolescent and tend to deprave or corrupt their mind.
Harassment with the use of computer: through sending letters, attachments of files
& folders i.e. via e-mails. At present harassment is common as usage of social sites
i.e. Facebook, Twitter, Orkut etc. increasing day by day.
Spoofing: Act of disguising a communication from an unknown source as being
from a known, trusted (genuin, or actual) source.
Spoofing can apply to emails, phone calls, and websites, or can be more technical,
such as a computer spoofing an IP address, Address Resolution Protocol (ARP), or
Domain Name System (DNS) server.
Spoofing can be used to gain access to a target’s personal information, spread
malware through infected links or attachments, bypass network access controls, or
redistribute traffic to conduct a denial-of-service attack.
47 / 62
Cyber law ...
Crime Against Property: Cybercrimes against all forms of property, these
types of crimes includes:
Un authorized computer trespassing: Accessing a computer without proper
authorization and gaining financial information, information from a department or
agency from any protected computer.
Computer vandalism: Damage or destruction that takes place in digital form.
Deliberately damaging property of another, that is, destroying or damaging the data
or information stored in computer when a network service is stopped or disrupted.
Transmission of harmful programmes/ virsues: Viruses are programs written by
programmers that attach themselves to a computer or a file and then circulate
themselves to other files and to other computers on a network with intent of altering
or deleting it.
Stealing secret information & data
Intellectual Property Crimes: Any unlawful act by which the owner is deprived
completely or partially of his rights to intellectual property (Copyright, patented,
trademark). Most common IPR crimes are piracy, infringement of copyright,
trademark, patents, designs and service mark violation, theft of computer source
code, etc.

48 / 62
Cyber law ...
Crime Against government Cyberspace is being used by individuals and groups
to threat the international governments or to threaten the citizens of a country.
Cyber terrorism: When an individual “Cracks” into a government or military
maintained website. Cyber terrorism is a issue in the domestic as well as global
concern. Terrorist attacks on the Internet are by distributed denial of service
attacks (i.e. multiple systems flood the bandwidth or resources of a targeted
system) , hate websites and hate e-mails, attacks on sensitive computer network
etc. Cyber terrorism activities endanger the sovereignty and integrity of the nation
Hacking government website
Cyber extortion
Computer viruses
Email bombing: It is a type of Net Abuse, where huge numbers of emails are sent
to an email address in order to overflow or flood the mailbox with mails or to flood
the server where the email address is
Cyber Crime agains Society:-An unlawful act done with the intention of causing
harm to the cyberspace will affect lage number of people, association, society. This
includes: Child Pornography, Online Gambling ,Forgery, financial crimes. etc.
49 / 62
Cyber law in India
Need for cyberlaw in India
Internet led to the emergence of numerous legal issues and problems
which necessitated the enactment of Cyber laws.
the existing laws of India could not be interpreted in the light of the
emerging cyberspace, to include all aspects relating to different
activities in cyberspace
None of the existing laws gave any legal validity or sanction to the
activities in Cyberspace. For example emails had no validity or sanction
in the existing law.
Internet requires an enabling and supportive legal infrastructure as the
traditional laws have failed to grant necessary legal infrastructure for
e-commerce and biggest future of internet
The main purpose of the Act is to provide legal recognition to
electronic commerce and to facilitate filing of electronic records with
the Government.

50 / 62
Cyber law in India ...
Information Technology Act, 2000 (”IT Act-2000”)
In India, cyber laws are contained in the Information Technology Act, 2000
(”IT Act-2000”) which came into force on October 17, 2000.
The primary purpose of the act is to provide legal recognition to electronic
commerce and to facilitate filing of electronic records with the
government.
IT Act -2000 Objectives:
aims to provide legal recognition for transactions carried out by
means of electronic data interchange and other means of electronic
communication, commonly referred to as ”electronic commerce”,
which involve the use of alternatives to paper-based methods of
communication and storage of information
aims at facilitating electronic filing of documents with the
Government agencies.
To amend the IPC, the Indian evidence act 1872, the bankers book
evidence act 1891 and the reserve bank of India act 1934.
Aims to provide the legal frame work to all electronic records

51 / 62
Cyber law in India ...

IT Act -2000
IT Act- 2000 consisted of 94 sections segregated into 13 chapters
This Act was amended by Information Technology Amendment Bill, 2008,
which was passed in Lokshabha and Rajiya sabha in 2008 as IT Amendment
Act-2008 and made effective from 27 October 2009.
The IT Act of 2000 was developed to promote the IT industry, regulate
ecommerce, facilitate e-governance and prevent cybercrime.
The Act also sought to foster security practices within India that would serve
the country in a global context.

52 / 62
IT Act 2000...
IT Act 2000-snap shot of important Cyber Law Provisions in India
Section 43 - Damages to Computer, Computer System.
Section 65- Tampering with computer source documents.
Sec-66 - Hacking with computer system, data alteration
Sec-67 - Publishing or transmitting obscene material/information in electronic
form
Sec-70 - Un authorized access to protected system
Sec-72 - Breach of confidentiality & privacy
Sec-73 – Publishing false digital signature certificates
Sec-503 & 499 IPC – sending threatening & defamatory messages by email
Sec-463 – Forgery of electronic records
Sec-420 – Bogus websites, cyber fronds
Sec-463 & 500 – Email spoofing & abuse
Sec-383 – Web jacking
NDPS Act – Online sale of drugs
Arms Act- Online sale of Arms

53 / 62
IT Act -2000...
Digital signature and Electronic signature
Digital signature: Digital signature means authentication of any electronic record
by a subscriber by means of an electronic method or procedure
It provide a viable solution for creating legally enforceable electronic records,
closing the gap in going fully paperless by completely eliminating the need to
print documents for signing.
Instead of using pen and paper, a digital signature uses digital keys
(public-key cryptography).
a digital signature attaches the identity of the signer to the document and
records a binding commitment to the document.
It is considered impossible to forge a digital signature the way a written
signature might be. In addition, the digital signature assures that any changes
made to the data that has been signed cannot go undetected.
Digital signatures are created in two steps: First, electronic record is converted
into a message digest by using ‘Hash function’ which digitally freezes the
electronic record thus ensuring the integrity of the content.
Secondly, the identity of the person affixing the digital signature is
authenticated through the use of a private key which attaches itself to the
message digest and which can be verified by anybody who has the public key.
54 / 62
IT Act -2000 ...
Digital signature and Electronic signature
Digital Signature is not like our handwritten signature. It is a jumble of
letters and digits. It looks something like this.
—– BEGIN SIGNATURE—-
Uz5xHz7DxFwvBAh24zPAQCmOYhT47gvuvzO0YbDA5txg5bN1Ni3hgPgnRz8Fw
xGU
oDnj7awl7BwSBeW4MSG7/3NS7oZyD/AWO1Uy2ydYD4UQt/w3d6D2Ilv3L8EO
iHiH +r5K8Gpe5zK5CLV+zBKwGY47n6Bpi9JCYXz5YwXj4JxTT+y8=gy5N
—– END SIGNATURE ——
Electronic signature: Electronic Signature is a digital form of a wet link
signature which is legally binding and secure but it does not incorporate any
coding or standards. It can be a symbol, image, process attached to the
message or document to recognize the identity and to give consent on it.
When we need to only verify the document we use electronic signature.
Electronic signature is very easy to use than digital signature but it is less
secured and less authentic than digital signature.

55 / 62
Cyber law in India ...
Offences and penalties
Section 65- Temping with the computers source documents.
Whoever intentionally or knowingly destroy, conceal or change any computer’s
source code that is used for a computer, computer program, and computer
system or computer network.
Punishment:
Any person who involves in such crimes could be sentenced upto 3 years
imprisonment or with a fine of Rs.2 lakhs or with both.
Section 66- Hacking with computer system, data alteration etc.
Whoever with the purpose or intention to cause any loss, damage or to
destroy, delete or to alter any information that resides in a public or any
person’s computer. Diminish its utility, values or affects it injuriously by any
means, commits hacking.
Punishment:
Any person who involves in such crimes could be sentenced upto 3 years
imprisonment, or with a fine that may extend upto 2 lakhs rupees, or both
Offence and Penalties

56 / 62
Cyber crimes ...
Cyber crimes found in India are:
Cyber pornography: It include pornographic websites; pornographic magazines
produced using computers (to publish and print the material) and the Internet (to
download and transmit pornographic pictures, photos, writings etc).
Sale of illegal articles: This would include sale of narcotics, weapons and wildlife
etc., by posting information on websites, auction websites, and bulletin boards
or simply by using email communication. E.g. many of the auction sites even in
India are believed to be selling cocaine in the name of ‘honey’.
Online gambling: There are millions of websites; all hosted on servers abroad, that
offer online gambling. In fact, it is believed that many of these websites are actually
fronts for money laundering.
Intellectual Property crimes These include software piracy, copyright infringement,
trademarks violations, theft of computer source code etc. In other words this is also
referred to as cyber squatting. Satyam Vs. Siffy is the most widely known case.
Bharti Cellular Ltd. filed a case in the Delhi High Court that some cyber squatters
had registered domain names such as barticellular.com and bhartimobile.com with
Network solutions under different. Yahoo had sued one Akash Arora for use of the
domain name ‘Yahooindia.Com’ deceptively similar to its ‘Yahoo.com’. fictitious
names.
57 / 62
Cyber laws in India...
Email spoofing: A spoofed email is one that appears to originate from one source
but actually has been sent from another source. E.g. Gauri has an e-mail address
gauri@indiaforensic.com. Her enemy, Prasad spoofs her e-mail and sends obscene
messages to all her acquaintances (friends). Since the e-mails appear to have
originated from Gauri, her friends could take offence and relationships could be
spoiled for life.
Email spoofing can also cause monetary damage through spreading misinformation
about bank or shares by spoofed emails.
Forgery Counterfeit currency notes, postage and revenue stamps, mark sheets etc
can be forged using sophisticated computers, printers and scanners. Outside many
colleges across India, one finds touts soliciting the sale of fake mark sheets or even
certificates.
Cyber Defamation: It occurs when defamation takes place with the help of
computers and / or the Internet. E.g. someone publishes defamatory matter about
someone on a website or sends e-mails containing defamatory information to all of
that person’s friends. India’s first case of cyber defamation was reported when a
company’s employee started sending derogatory, defamatory and obscene e-mails
about its Managing Director.

58 / 62
Cyber laws ...
Cyber stalking Cyber stalking involves following a person’s movements across the
Internet by posting messages (sometimes threatening) on the bulletin boards
frequented by the victim, entering the chat-rooms frequented by the victim,
constantly bombarding the victim with emails etc.
Email bombing: It refers to sending a large number of emails to the victim
resulting in the victim’s email account (in case of an individual) or mail servers (in
case of a company or an email service provider) crashing. In one case, a foreigner
who had been residing in Simla, India for almost thirty years wanted to avail of a
scheme introduced by the Simla Housing Board to buy land at lower rates. When he
made an application it was rejected on the grounds that the scheme was available
only for citizens of India. He decided to take his revenge. Consequently he sent
thousands of mails to the Simla Housing Board and repeatedly kept sending e-mails
till their servers crashed.
Salami attacks These attacks are used for the commission of financial crimes. The
key here is to make the alteration so insignificant that in a single case it would go
completely unnoticed. E.g. a bank employee inserts a program, into the bank’s
servers, that deducts a small amount of money (say Rs. 5 a month) from the
account of every customer. No account holder will probably notice this unauthorized
debit, but the bank employee will make a sizeable amount of money every month.

59 / 62
Cyber law in India ...

Denial of Service attack: This involves flooding a computer resource with


more requests than it can handle. This causes the resource (e.g. a web
server) to crash thereby denying authorized users the service offered by the
resource. Another variation to a typical denial of service attack is known as a
Distributed Denial of Service (DDoS) attack wherein the perpetrators are
many and are geographically widespread. It is very difficult to control such
attacks. The attack is initiated by sending excessive demands to the victim’s
computer(s), exceeding the limit that the victim’s servers can support and
making the servers crash. Denial-of-service attacks have had an impressive
history having, in the past, brought down websites like Amazon, CNN,
Yahoo and eBay!
Some other attacks include: Virus / worm attacks, Logic bomb, Trojan
attacks, Internet time theft, Web jacking, Theft of computer system,
Physically damaging a computer system

60 / 62
Cyber law...

61 / 62
Thank You !

62 / 62

You might also like