You are on page 1of 9

Web Data Mining: Types, Techniques, Tools and Applications in

Current Scenario - A Survey

N.Hari Priya1, Dr.V.Anuratha2,


Research Scholar, Associate Professor,
Department of Computer Science, PG Department of Computer Science,
Sree Saraswathi Thyagaraja College, Sree Saraswathi Thyagaraja College,
Pollachi, Tamil Nadu, India Pollachi, Tamil Nadu, India
____________________________________________________________________________________
ABSTRACT:
Web data mining is an emerging technique used to crawl via web resources to gather the relevant
information in order to foster the growth of commerce and business activities floating on the internet.
World Wide Web is considered as the key source of data which promotes online transaction trends and
hence, www becomes the suitable field for data mining. Web Mining deals with extracting useful
knowledge and discovering user behaviour patterns from e-commerce websites. Based on the data, web
mining is categorized into different types such as Web Content Mining, Web Structure Mining and Web
Usage Mining. These types use different techniques, tools, approaches and algorithms for finding
interesting knowledge and patterns across the web. This paper sketches out the key features of web data
mining concerned with its techniques, tools and applications.
Keywords: Web data mining, World Wide Web, Web Content Mining, Web Structure Mining, Web
Usage Mining
____________________________________________________________________________________

I. INTRODUCTION understanding consumer and business activity on


Carly Fiorina is an American businesswoman the Web. World Wide Web is set to grow at an
and political figure, known primarily for her astonishing pace and is going to grasp an
tenure as CEO of Hewlett-Packard (HP). She exponential growth in data. It has become a rich
subsequently served as Chair of the philanthropic source of information which we can retrieve and
organization says: use it for generating actionable intelligence. We
“The goal is to turn data into information, and live in a world defined by e-commerce, e-
information into insight.” governance, e-market, e-finance, e-learning, and
The main goal of Web mining is to look e-banking. Web mining is the way in which data
for useful patterns in Web data by applying data mining techniques are applied to extract
mining techniques in order to attain insights. It is knowledge from web data. This web data could
an iterative process of discovering knowledge be a number of things. It could be web
and is proving to be a valuable strategy for documents, hyperlinks between documents
and/or usage logs of websites etc. Marketing or 1. Web Content Mining
sales strategy is aligned based on the results that Web content mining has seen rapid growth
web mining throws up. WWW is a popular and primarily because the web has attained a hasty
interactive medium to propagate information growth of content. A web page has plenty of data
today since the web is massive, variant, and such as text, images, audio, video or structured
vibrant and thus nurtures the scalability, records such as lists or tables. It is all about
multimedia data, and temporal issues [1].Various extracting useful information from the data that
web mining technologies are recorded in Fig 1. the web page is made of. It applies the principles
and techniques of data mining and knowledge
discovery process.
Web Content Mining Techniques [4]:
 Unstructured
 Structured
 Semi-Structured
 Multimedia
2. Web Structure Mining
Web structure mining is the process of using
graph theory to analyze node and connection
structure of website. It focuses on creating a sort
of structural summary about web pages and
Fig 1. Web Data Mining Technology
websites. Based on the hyperlinks and document
structure, structural summary is generated. It is
II. WEB MINING CATALOGUE
particularly useful in improving marketing
“Data is the kind of ubiquitous resource
strategies by discovering relationship and link
that we can shape to provide new innovations
hierarchy between web pages.
and new insights, and it’s all around us, and it
Web Structure Mining Techniques:
can be mined very easily.”
 Page Rank
Web mining is broadly categorized into
three distinct types based on the data to be mined.  Keyword Analysis

The contents of data mined from the Web may be  CLEVER

a collection of facts that constitutes text, 3. Web Usage Mining

structured data such as lists and tables, and even Web usage mining is also known as Web Log

images, audio and video [1], [5]. Web Mining is mining. A web log consists of text file, web server

divided into three basic types as exposed in Fig3. log, customer log, program log, and application

1. Web Content Mining server log placed on web servers. Users

2. Web Structure Mining navigational Patterns are analyzed with the help of

3. Web Usage Mining this technique. It assists organizations to find out


the life-time value of Clients, design cross- errors. Data preprocessing is a proven method of
marketing strategies across products &services to resolving such issues. Data preprocessing is used
evaluate the efficacy of promotional campaigns, in database-driven applications such as customer
optimize the functionality of web-based relationship management and rule-based
applications and provide more personalized applications.
content to visitors for their web space. The overall Pattern Discovery
process of web usage mining is shown in Fig 2 Pattern Discovery is the process of detecting
patterns from massive data sets. It aims to find
inherent regularities in a data set. It lays the
foundation for many essential data mining tasks.
Sequential pattern mining methods have been used
to analyze the data and identify patterns. Such
patterns have been used to implement efficient
systems and also helps in making predictions,
improve usability of systems, detect events, and to
make strategic product decisions.
Several data mining tasks are,
 Association, Correlation
 Classification, Clustering
 Causality Analysis.
Fig 2. Web Usage Mining Process
 Sequential Mining, Structural patterns
 Pattern analysis in Spatio-temporal,
Process of Web Usage Mining:
multimedia, time-series and stream data.
The Web Usage Mining process undergoes three
Pattern Analysis
phases [2]. They are;
Pattern Analysis is the final arena of Web
 Data Preparation
Usage Mining involving validation and
 Pattern Discovery
interpretation of mined patterns. It examines use
 Pattern Analysis
of advanced methods comprising statistical
Data Preparation
techniques, Neural Networks, Genetic Algorithm,
Data preprocessing is a data mining technique
Fuzzy pattern recognition, Machine learning and
that involves transformation of raw data into an
hardware implementation which are pertinent to
understandable format. Real-world data is often
the development of this approach.
incomplete, inconsistent, and lacking in certain
behaviors or trends, and is likely to hold many
Fig 3. Web Mining Catalogue

III. TOOLS 5. Tableau


Web mining is the effective solution for 6. Weka
7. Majestic
information retrieval and data analysis. With the 8. Bixo
growing importance of web mining, the web
1. Data Miner (Web Content Mining Tool)
mining tools have also rapidly come up. These
Data Miner is a well-known data mining
tools are used to extract, clean and analyze
tool which effectively extracts data from web
data so that valuable insights can be arrived with
pages. It provides the extracted data in CSV
the help of data visualization. Business
(Comma Separated Values) file or Excel
intelligence can be derived by discovering
spreadsheet format. It has more than 40,000
correlations and network of patterns for
public recipes, with the help of these Recipes,
assessing the future trends based on the past
structured data can be obtained.
data. This helps to shape business strategy.
Features
There are several tools available to work out the
 Extract Tables & Lists
business insights and intelligence.
 1 click scraping
Few of the tools that prop up Web Mining are:
 Scrape paginated results
1. Data Miner
 Scrape pages behind login / firewall
2. Web Scraper
3. Google Analytics  Scrape dynamic Ajax content
4. Oracle data Mining  Automatically fill forms
2. Web Scraper (Web Content Mining Tool) like, Adsense, Adwords, Google Display
Web Scraper is one of the most suitable tool for Network, Google Tag Manager, etc.
scraping web data. With the help of this tool, you  Sales and conversion tool.
can work out a sitemap or a plan regarding the 4. Oracle data Mining (Web Usage Mining
navigation of a website. Once done, web scrape Tool)
chrome extension will follow the given Oracle Data Mining (ODM) is designed by
navigation and extract the data. Oracle. As data mining software, it offers
Features enormous data mining algorithms which help to
 Tree / Navigation glean insights, work out predictions and make
 Pagination effective use of Oracle data and investment.
 Load More button With the help of ODM, it is possible to work out
 Cloud Scraper predictive models within the Oracle database so
 Run Multiple Scraper at once that customer behavior can be predicted easily.
 Schedule Scraper It focuses on specific set of customers and
 Download data in CSV and CouchDB evolves their profiles. Using SQL data mining
 Data Export to DropBox functions, it is possible to mine data tables and
3. Google Analytics (Web Usage Mining Tool) views, star schema data including transactional
Google Analytics is one of the best business data, aggregations, unstructured data i.e. CLOB
analytics tool. It can track and report website data type (using Oracle Text to extract tokens)
traffic. Web usage mining is effectively carried and spatial data.
out here since more than 50% of the people in Features
the world is using this tool for website analysis.  Regression
It is an important tool because it helps to  Attribute Importance
evaluate company’s online marketing strategy.  Anomaly Detection
With the help of this tool, effective data analysis  Association , Classification ,Clustering
can be made for gleaning insights for the  Feature Selection and Extraction
business. It also aids in understanding and  Text Mining, Spatial Mining
improving the performance of the website.  Online Analytical Processing
Features 5. Tableau (Web Usage Mining tool)
 Advertising and Campaign performance Tableau is one of the most efficient and
analysis quickly growing data visualization tool
 Analysis and testing of website employed in the business intelligence industry.
 Audience Characteristic and Behavior It is widely used because it simplifies raw data
analysis into an accessible format. Data visualizations
 Easy integration with Google’s product can be acquired in the form of dashboards and
worksheets. Employee working under various  Classification
divisions in an organization can interpret the  Regression
data with the help of Tableau. It is possible even  Visualization
for a non-technical user to work with a  Feature selection
customized dashboard. 7. Majestic (Web structure mining tool)
The Tableau Product Suite consists of Majestic is a business analytic tool that
 Tableau Desktop provides services for Search Engine
 Tableau Public Optimization strategies, marketing firms,
 Tableau Online website developers and media analysts. With the
 Tableau Server help of this tool, reliable and latest data can be
 Tableau Reader attained so that you can analyze the performance
Features of your websites. The data you get from this tool
Tableau has many features which makes it can help you categorize every page and domain
popular. Some key features of Tableau are: by link analysis or link mining. It helps in
 Data Driven Alerts accessing the world’s biggest Link Index
 Additional Connectors Database.
 Translate queries to visualizations Features
 Import all ranges and sizes of data  Campaigns
 Create interactive dashboards  Site explorer
 Server REST AP  URL submitter
6. Weka (Web Usage Mining tool)  Clique hunter
Weka is a collection of machine learning  Backlink history
algorithms for data mining tasks. It contains 8. Bixo (Web structure mining tool)
tools for data preparation, classification, Bixo is a tremendous web mining open
regression, clustering, association rules mining, source tool that runs a series of Cascading pipes
and visualization. It is an open source software on top of Hadoop. By building a customized
issued under the GNU General Public License. Cascading pipe assembly, one can quickly work
It was primarily designed as a tool for analyzing out specialized web mining applications that are
data from agricultural domains, but now it is optimized for a particular use case.
used in Java-based version (Weka 3), for which Features
development started in 1997, and is now used in  Fetch Subassembly

many different application areas, particularly for  Parse Subassembly

educational purposes and research. III. PRODUCTIVE APPLICATION


Features ZONES OF WEB MINING
 Data pre-processing Jeff Weiner is an American businessman. He is
 Clustering the chief executive officer (CEO) of LinkedIn, a
business-related social networking website says: is a productive area for Web Mining and it
“Data really powers everything that we do.” provides techniques, methods, and algorithms to
Web mining is the most promising and be useful in various real-world applications with
prevalent area in web research. Its applications respect to the critical e-CRM function.
attain rapid growth in World Wide Web. It plays Key Features:
a vital role in providing better services for both  Understand the customer behavior
the users and the customers [3].  Evaluate the effectiveness of a particular
Here is the list of areas where Web Mining is website
extensively used.  Quantify the success of a marketing
A. E-Business campaign
B. Customer Relationship Management C. E-Learning
C. E-Learning E-Learning provides intrinsic knowledge
D. M-Commerce of teaching and learning process for
E. Web Search Engine - Google effective education planning by applying
A. E-Business various technologies and tools. It is an
E-Business gives upsurge to analysis of efficient way of delivering courses online. Due
click-stream data i.e. web mining uncovers real- to its convenience and flexibility, the resources
time e-business opportunities across the world. are available from anywhere and at any time.
It affords means for targeting right customers Technological development and the internet
and understanding their needs and thereby have changed people's lives on different gauges.
customizing the services and strategies. It also Here electronic technologies are used to access
helps in improving the effectiveness of a web educational curriculum outside of a traditional
site as a channel for marketing by quantifying classroom.
the user’s behavior. D. M-commerce
B. Customer Relationship Management (CRM) M-Commerce is the buying and selling of
In customer relationship management, Web goods and services through wireless handheld
mining is the incorporation of information devices such as cellular telephone and personal
gathered from traditional data mining digital assistants (PDAs). It enables users to
methodologies and techniques with information access the internet without any additional plug
gathered over the World Wide Web. in. The main objective is to analyze the Mobile
Organizations lay premium on understanding, user’s movements to the new locations instead
adopting and managing the content and convert of considering only the recurrent moving
them into appropriate knowledge for serving locations. The location acquisition technology,
their customers and thereby improving the Global Positioning System (GPS) facilitates
operations and quicken the process of delivery easy attainment of a moving trajectory, which
of products to markets. The World Wide Web records the user movement history. Pattern
Mining and prediction techniques discover the V. REFERENCES
correlation between the moving behaviors and [1] D.Sridevi and Dr.A.Pandurangan, “Survey
purchasing transactions of mobile users to sight on Latest Trends in Web Mining”– International
the potential M-Commerce features. Journal of Research in Advent Technology,
E. Web Search Engine - Google vol.2, No.3, March 2014, E-ISSN:2321-9637
One of the most popular and widespread
search engine is Google. It affords access to [2] Anurag Kumar, Ravi Kumar Singh, “Web
information for the users above 2 billion web mining Overview, Techniques, Tools and
pages that it has indexed on its server. The pace Applications: A Survey” – International
and excellence of the search facility makes it, the Research Journal of Enggineering and
most popular search engine. Google offers a Technology(IRJET), volume:03 Issue: 12 Dec-
latest service known as ‘Google News’. It 2016, E-ISSN: 2395-0056, P-ISSN:2395-0072
incorporates news from all newspapers and
categorizes them in order to make it easier for [3] S.Vidya, K.Banumathy, “Web Mining –
the users to read. It effectively uses the data Concepts and Applications” – International
existing in the Web content and the Web graph Journal of Computer Science and Information
to enrich its search capabilities and offers best Technologies, vol.6(4), 2015, ISSN: 0975-9646
results to the users. Search engines are ranking
their search results in response to users' queries [4] Muhammd Jawad, Hamid Mughal, “Data
to make their search navigation easier. Page rank, Mining: Web Data Mining Techniques, Tools and
HITS and Weighted Page Content Rank are the Algorithm: An Overview” – International Journal
link algorithms [2]. of Advanced Computer Science and
Applications, Vol.9, 6, 2018.
IV. CONCLUSION
As Web and its convention continues to [5] Ashish Gupta, Anil Khandekar, “The Study of
propagate, it is a big deal for organizations to Web Mining- A Survey” – International Journal
analyses web data to extract fruitful insights. Web of Science, Engineering and Technology
is an immense sphere that comprises data in Research (IJSETR), Vol.2, Issue.12, Dec 2013,
numerous forms as text data and multimedia data. ISSN:2278-7798.
The size of web is constantly growing, hence it
becomes a challenging task to make analyses for [6] Dr.A.C.Mondal and Sourav Maitra,”A Study
finding interesting patterns with the existing data. of web mining Research- last few years and the
This paper outlines web mining techniques, some road Ahead” publish in ICCS, Burdwan
prominent tools and its applications which University 2010.
evidently specify the areas where web mining is
predominantly used. [7] Srivastava J, Desikan P and V Kumar, Web
Mining- Accomplishment & Future Direction‖ in
2004 Conference.

[8] Srivastava, J., Cooley, R., Deshpande, M.,


AndTan, P-N. (2000). Web usage mining:
Discovery and applications of usage patterns
from web data‖ SIGKDD Explorations, 1(2), 12-
23.H. Poor, New York: Springer-Verlag, 1985,
ch.4.

[9] Sravan Kumar, D. and Naveena Devi, B.


“Learner’s Centric Approach for Web Mining ” et
al. (IJCSIT) International Journal of Computer
Science and Information Technologies, Vol. 1(2),
2010.

[10] Wangbin Hu.Junpeng Yuan and Yuantao


Song, ”The Research of a web mining method in
Research Areas” published in Sixth Wahon
International centre on E-Business, e-Business
Track.

You might also like