You are on page 1of 44

Chapter 5

Accessing Internet Information


1. Use of Search Engines
2. Use of New Media
 Introduction
 Searching for information- Internet Search
Tools
 Functions of Search Engines
 Types of Search Engines
 Search by Directories
 Directory of Web 2.0
 Social Networking
 Invisible Web/ Deep Web
 Steps in Searching
 A worldwide network of computers
interconnected to allows users to access and
transfer information residing in distantly
located computers.
 The “backbone” of these connections are
referred to as the “information
superhighway”.
 Information is exchanged
electronically over the network.
How did Internet first start?
 Started in the 1960s as a means for government
researchers to share information.
Purpose today?
 Source of information
 Can get information for a range of specific
purpose; from informing news to teaching you
how to cook.
 Because of the minute-by-minute changes with
the events that occur in the world, it is now vastly
used to get information on current information-
news, sports, and entertainment.
The hardware requirements of the Internet:

 Computers
 System of wires,
 Fibre-optic cables,
 Routers and circuits
 ISP provider
 Wi-Fi
 Digital Natives Marc Prensky (2001), refers to
a generation of youth who have grown up
with technology so as to have a cultural
fluency with information and communication
technologies similar to that of a native
language.
 Digital Natives today have research
opportunities that were not possible a
generation ago.
 Body of information that you
need is not floating around in
“cyberspace”
 Information resides in individual
computers accessed through
websites using search engines.
 Search engines are online
software programs that
searches through text on Web
pages. The programme
“Spiders” crawl on the “Web” to
collect information and stores
the information into a
database.
 URLs-Go directly to address
 Search Engines- single search engines e.g. Google
 Meta Search Engines- one search through-e.g.
Mama
 Directories- Organised into hierarchy of categories
 Subject guides- Specialised subject e.g. Legal
information
 Libraries on the net- WWW Virtual library
 Databases- Subscribed and free databases
 Alternative Search engines- White pages, Yellow
pages
 Deep Web/ Invisible web
Search Engines

4 functions of Search Engines:

1. Crawling
2. Building an index
3. Calculating relevancy & rankings
4. Displaying results
1. Crawling

 Search engine (e.g.


Google) first collect
websites using a
computer program
called wanderer,
crawler, robot, worm,
or spider. Web
Crawler
 ‘Spiders’ travel very fast
through the hyperlinks
that connect websites.
2. Indexing

 When the search engines


retrieves information, they
maintain a large index for a
huge number of Internet sites by
retrieving each individual web
pages.

◦ Google claims to have indexed


8,058,044,651 web pages, as of
22nd June 2005.

12
Types of search engines:

 a) Horizontal, general search


◦ all types of information;

➢ Examples:
◦ Google: http://www.google.com
◦ AltaVista: http://www.altavista.com
◦ Yahoo!: http://www.yahoo.com
b) Vertical search

 Searches only within certain topics.


 Uses a focused crawler.
 Are to index only webpages that are relevant
to their topic. (e.g. automotive industry, legal
information, medical information, scholarly
literature, and travel)
c) Meta is like a “one-stop shopping” to the Internet.
one search and a meta search service will search
other search engines and directories
simultaneously so that you get the results from all
of them in one place.
 Typical search engines like Google, Yahoo, or
Bing actually access only a tiny fraction –0.03% –
of the internet.
 Search engines help users locate relevant Web
sites, that are more highly trafficked sites.
 Entering a keyword into a search engine will
result in the list of websites indexed by the
search engine – excluding others not indexed by
search engine.
3. Ranking

 Search engines do not understand your


information need—they simply find information
according to the words that you have entered.
 Search engines should rank their results by the
content of each site.
 Mathematical equations (or algorithms) to rank
them, and the formula may not reflect the site's
legitimacy or value to you.
4. Display

 Web pages can be created and customized


with the goal of appearing near the top of a
search engine’s ‘hits’ (results).
 Ranking of relevancy of results may not be a
reliable test of measuring relevant results.
 Web pages are sometimes created and
customized so that they appear near the top
of a search engine’s results list regardless of
their credibility or usefulness.
 The result is -not all of your search results
will be relevant or trustworthy.
 Webpage - a Hypertext Document available on
the World Wide Web.

 Website is a collection of web pages like an entire


book or an entire newspaper made up of the
pages.

 The Search engine collects websites that contain


the information using a computer program called
‘crawler’.

 World Wide Web - A collection of resources


available on the Internet using a web browser.
 The Web is the visual display of the
information being accessed. Internet is the
World Wide Web (WWW).

 Web pages are collections of files and


documents stored on computers around the
world, formatted in a programming language
called HTML (hypertext mark-up language)
that permits users to move between them by
clicking on hyperlinks, or links.
 These “links” can come in the form of words,
phrases, icons or graphics, and create
interconnectedness between files and
documents, giving character to the image of
the World Wide Web as a “web.”
 Hyperlinks sometimes allow you to jump
from one website to another website.
 These links allow you to “turn the page”,
move around on the Internet.
 Search engines
◦ Uses a computer program (called web-spider) to
navigate through the web and collect information
about web pages.

 Subject directories / web directories


◦ Manual entry and classification

 Invisible web / deep web


◦ Includes dynamic electronic databases that are
not searchable through search engines.

23
 Directories have the human element of hand-selected
sites compiled and organized into categorical tree
structures.

 2 categories of Web directories:

▪ scholarly (assembled, edited and annotated by experts


and professionals).
▪ commercial (based on site traffic and advertising to
operate).
▪ To find subject directories, simply add the term
“directory” to your search query to a page of selected
sources on the topic you're searching.
The WWW Virtual Library is the
oldest directory on the Web
 World Cat
 Infoplease
 Library of Congress
 Refdesk.com
 Britannica
 World Digital Library
 Digital Public Library of America
 Internet Archive
 UniversitySpot.com
Go2web20 is a directory
of Web 2.0 sites.

These sites emphasizes


interaction and
collaboration between
their users.
Social networking is
at the core of the
Web 2.0 movement,
where users have
some level of
interaction or
involvement with
the content on the
websites they visit.
Facebook
Tools
 Online social networks are a way of
connecting with others and making it digital.
 Facebook, MySpace, and Twitter sites are
examples of online social networks, among
similar sites growing on the Web.
 Whether it is for fun, business, romance, or
any other reason, more and more people are
interacting over the Net. Contacts and
number of connections grow exponentially.
 The key safety tip to follow for online social
networking would be to:
◦ Adjust your privacy settings to keep some of your
information out of public view.

 Most social networking sites have privacy


settings that control the viewing your
personal profile, photos, etc.
 No. Search engines cannot index
the pages in the invisible web /
deep web:

◦ Pages which are not linked to by other


pages. More on
◦ Dynamic Web pages based on dynamic web
responses to database queries. pages later in
this module.
◦ Sites that require registration or
otherwise limit access to their pages.

31
 World Wide Web that is not linked isn't linked to
any other pages, spiders is not able to find it
(like in libraries information tucked away in
the stacks or the back rooms or hidden
material) is referred to as part of the "deep
web" or "invisible" web
 The vast majority of the Internet is said still lies
in the Deep Web/Invisible web.
 The actual size of the Deep Web is impossible to
measure, experts estimate it is about 500 times
the size of the web as we know it.
 To find resources on the invisible Web, see
“The Invisible Web” and “Web Directories”.

 Examples:

o White pages, electronic books, online


journals, image files, newspaper archives,
dictionary definitions and patents are
examples of the file types found in
databases. Frequently updated or changing
information, like ticket prices and job
listings, are also part of the deep Web.
 The same Naval research group to develop
intelligence-gathering tools created The
Onion Router Project, or TOR.
 Onion routing refers to the process of
removing encryption layers from Internet
communications, similar to peeling back the
layers of an onion.
 Deep Web pages operate but their existence
is invisible to Web crawlers.
 Recent news, such as the bust of the
infamous Silk Road drug-dealing site and
Edward Snowden’s NSA shenanigans, have
spotlighted the Deep Web’s existence.
 Step 1: Where to search: Internet?
 Step 2: Try several search engines
 Step 3: Dig deep for best results; Check quality
of content- Google & other search engines
 Step 4: Think before you search; Define your task.
Write into words. (information searching
strategies)*
 Use Boolean connectors Quotation marks.
 Step 5: Make search engine work for you; Add new
keywords; Keep notes and records

Please see this video for Boolean operators


http://video.about.com/websearch/What-Is-Boolean-
Search-.htm
 Step 6: Don’t believe everything you read;
Assess the credibility
 Step 7: What ?Find Primary Resources. More
Authentic
 Step 8: Who? The author/ publisher of article
 Step 9: Why? The reason for publishing the
article
 Step 10: When? Currency of Article

 http://www.findingdulcinea.com/guides/Technology/Internet/
Dulcineas-Guide-to-Searching-on-the-Web.pg_01.html#01
o Criteria used for traditional resources may also be used
for Internet Resources.
o Authority of person or organization behind the website.

❖ Author
• Is there an author of the document? (personal or
organisational)
• Is the author easily identifiable? If difficult to determine
authorship, be cautious.
• Check author's credentials on the particular subject.
• Can the author be contacted for clarification?

❖ Publisher - What can the URL tell you?


• The type of domain gives indication of authority.
(edu=educational, government, non-government,
nonprofit, commercial, etc.)
Accuracy of information in website is difficult to assess.

 Almost anyone can publish on the Web and facts are


not verified by editors or anyone.
 Authority: Know the distinction between author and
Webmaster; email or postal address? Check the
organisation – from the URL.
 Source: Don't rely on one source or one study. Check
and cross check other reliable sources.
 Primary source material: more trustworthy e.g.
census report from Government Census body or
newspaper.
 Evidence: Look for evidence. Don’t accept claims at
face value; test them by asking a few questions.
 Check and cross-check: if claims and arguments are
supported by facts and evidences.
 What is the goal and aim of the persons/
organisations or groups providing the
information?
 Any bias or lack of objectivity evident in the
site?
 What is the information intended for?
◦ inform,
◦ give facts/data,
◦ explain,
◦ persuade,
◦ share information,
◦ marketing/sell, entice?
 Is the information comprehensive enough?
 Check the breadth, the depth of information
or levels of details it covers.
 What items are included in the resource?
◦ time, period
◦ geographical coverage
◦ language
◦ formats / types of material
 Intended audience? News? General? For
academic research?
 Does the site contain original information
or simply links?
 Dates may not always be included on Web
pages.
 Undated factual or statistical information
should never be used.
 Is the information up to date? When was the
information created?
 Last revised date? and the regularity of updates
 Is the date of copyright included?
 Is the information still useful for your topic?
 How up-to-date are the links (if any)?

Revised date
 Search for 3 websites on the ‘Quality of
drinking water in South East Asia’ as a topic.
 Give reasons why you choose the websites
based on evaluation criteria.

You might also like