Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Reaction Paper: A Scale for Crawler Effectiveness on the Client-Side Hidden Web

Reaction Paper: A Scale for Crawler Effectiveness on the Client-Side Hidden Web

Ratings: (0)|Views: 148 |Likes:
Published by Benj Arriola
A reaction paper as part of the requirements of the University of Redlands MBA w/ Emphasis on information Systems - Information Systems Strategy Capstone Course - ISYS683W
A reaction paper as part of the requirements of the University of Redlands MBA w/ Emphasis on information Systems - Information Systems Strategy Capstone Course - ISYS683W

More info:

Published by: Benj Arriola on Jul 08, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less

07/08/2012

pdf

text

original

 
 
A Scale for Crawler Effectiveness onthe Client-Side Hidden Web
 
By: Benj Arriola
Article Report
July 8, 2012MBA ISYS683WUniversity of Redlands
 
Information Systems Strategy CapstoneProf. Mark Gruber
 
 Page 27/8/2012Text Search Information Retrieval
Article ReportISYS683W
MBA
University of Redlands
A Scale for Crawler Effectiveness on the Client-SideHidden Web
This report is a review of the academic paper under the same title:
 A Scale for Crawler Effectiveness of the Client-Side Hidden Web.
Research came from professors of theCommunications and Information Technologies Department at the University of A Coruña inSpain, published 2012 in the Computer Science and Information Systems Journal.
1
 This paper focuses on a comparison of technologies, mainly different software platforms of free and commercial web crawlers to test their effectiveness and in crawling the hidden web.The paper is academic in nature and like many science journal articles, it does not discuss thepractical or business application of this research and is written in a tone directed to theacademic audience where the application of these technologies are assumed to be known bythe readers.To get a better understanding of the paper, definitions will be discussed first; on what is acrawler and what is client-side hidden web, furthermore on its business application that wasnot tackled in the paper and the limitations that may give false notions to the average reader.Below is an outline of the flow of this report:
 
Definitions
o
 
Web Crawlers
o
 
Client-Side Hidden Web
 
Business Application Significance of this Study
 
The Academic Paper by the Professors of University of A Coruña
o
 
The Conducted Experiments
o
 
The Results of the Research Paper
o
 
Conclusions of the Research Paper
 
Possible Wrong Deductions by Readers of the Paper
o
 
Inferior Crawlers are Inferior for a Reason
 
Not Crawling AJAX and Flash links
o
 
Lack of Research of Crawling Technologies
o
 
Crawling and Information Retrieval are Two Different Things
o
 
Lack of Knowledge of Google, Bing and Yahoo
 
Search Engine Robots & IP Addresses
 
Redirection Handling
 
Report Conclusion
1
Prieto, V. M., Alvarez M., Lopez-Garcia, R., Cacheda, F., University of Caruña, A Scale for Crawler Effectiveness on the Client-Side HiddenWeb. Computer Science and Information Systems, Vol. 9, No. 2, 561-583. (2012) ComSIS Consortium
 
 
 Page 37/8/2012Text Search Information Retrieval
Article ReportISYS683W
MBA
University of Redlands
Definitions
What is a Crawler
Crawlers are simply software programs that visit pages through their URLs and the programcrawls or searches within the pages for other URLs to crawl and analyze until all pages areexhaustively crawled. Some crawlers may be limited to crawling HTML pages alone, whileothers also crawl other page assets such as images, videos, CSS files, JavaScript files andmore. Crawlers are also called spiders, robots, or simply bots.
What is The Client-Side Hidden Web
For every loaded URL in a web browser, a page can be created in real time on the serverwhich runs Server-Side Technologies. Conversely, every URL loaded in a browser can loadelements that may change the appearance or content of a webpage within the web browseritself, and these are Client-Side Technologies. Due to the number of technologies that build up
a webpage, not all information is readily “crawlable.” Crawlers are not necessarily client
devices or web browsers, they are software scripts trying to decipher code, mainly in HTML.And with the current web technologies such as JavaScript, Adobe Flash, Adobe ShockWave,Apple Quicktime, Real Technologies Real Player, AJAX, XML, and several other less popularclient-side technologies make it difficult for crawlers to gather all available data a webpagemay offer.
Business Application Significance of this Study
In the information age, more and more information is shared online, either publicly orprivately using the Internet. More common software applications for business have beenmoving to the Internet creating web-based applications using cloud technologies where thebase interface is through a web browser and may run on a large number of devices such ascomputers, mobile phones, tablets and others. Other applications may also use the cloud butnot necessarily through a web-browser, but a custom application that accesses the data fromthe cloud that expands the limitations of an applications in terms of possible allowed Internetprotocols and port numbers.With the greater utilization of the cloud, more data is stored on the internet and most of whichare web-based. This makes it more important to make data searchable. To be able to searchdata properly on the web, information must be appropriately saved and indexed which can beachieved by crawling the pages. Using crawlers with the best capabilities of crawling thehidden web decreases the limitations in format or method of content creation.The better content is crawled, the more complete the content that is indexed and searchablewhich can always help improve work efficiencies in a cloud environment.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->