Data Mining Notes

Explain various types of Web mining in detail.
[10 m] are related to text mining, machine learning and natural language Advantages and Disadvantages of Data Mining
processing. This mining is also known as text mining. This type of mining
Web Mining is the process of Data Mining techniques to automatically discover Dining in the process of analysing enormous amounts of information and
performs scanning and mining of the text, images and groups of web pages
and extract information from Web documents and services. The main purpose of datasets, extracting for mining" serial intelligence to help organizations solve
according to the content of the input.
web mining is discovering useful information from the World-Wide Web and its problems, predict trends, mitigate risks, and find new opportunities. Data mining
usage patterns. 2. Web Structure Mining: Web structure mining is the application of is like actual mining because, in both the milers are sifting through mountains of
discovering structure information from the web. The structure of the web material to find valuable resources and elements.
graph consists of web pages as nodes, and hyperlinks as edges connecting
Data mining also includes establishing relationships and finding patterns,
related pages. Structure mining basically shows the structured summary of
anomalies, and correlations to tackle issues, creating actionable information in
a particular website. It identifies relationship between web pages linked by
the process.
information or direct link connection. To determine the connection between
two commercial websites. Web structure mining can be very useful. Advantages of Data mining
3. Web Usage Mining: Web usage mining is the application of identifying It helps gather reliable information-Data mining allows companies,
or discovering interesting usage patterns from large data sets. And these organisations, and governments to gather reliable information.
Web mining is the application of data mining techniques to discover patterns patterns enable you to understand the user behaviors or something like that. Helps businesses make operational adjustments-Data mining helps businesses
from the World Wide Web. It uses automated methods to extract both structured In web usage mining, user access data on the web and collect data in form make profitable production and operational adjustments. Data mining can be
and unstructured data from web pages, server logs and link structures. of logs. So, Web usage mining is also called log mining. used to find correlations between products, consumers, suppliers and other
There are three main sub-categories of web mining. aspects of the business.
Web content mining extracts information from within a page. Web structure Helps to make informed decisions-It is often used for business purposes to
mining discovers the structure of the hyperlinks between documents, improve decision making. As more data is collected, the accuracy of data mining
categorizing sets of web pages and measuring the similarity and relationship becomes greater.
between different sites. Web usage mining finds patterns of usage of web pages.
It helps detect risks and fraud-Data mining can help identify risks and fraud
Applications of Mining: that may not be detectable through traditional means of data analysis.
1. Web mining helps to improve the power of web search engine by classifying Helps to analyse very large quantities of data quickly-Data mining can be
the web documents and identifying the web pages. used to analyse data that was previously too difficult to understand due to the
2. It is used for Web Searching eg. Google, Yahoo etc. and vertical searching eg. sheer volume or type of information.
FatLens Become etc. Helps to understand behaviours, trends and discover hidden patterns - Data
3. Web mining is used to predict user behaviour. mining can be used to find patterns and trends in user behaviour. It does this by
looking for anything that is repeated in the data, such as instances of buying
4. Web mining is very useful of a particular Website and e service eg, landing specific items. It helps companies gather reliable information.
page optimization Web mining can be broadly divided into three different types
of techniques of mining: It's an efficient, cost-effective solution compared to other data applications.
1. Web Content Mining: Web content mining is the application of It helps businesses make profitable production and operational adjustments.
extracting useful information from the content of the web documents. Web Data mining uses both new and legacy systems.
content consist of several types of data-text, image, audio, video etc.
It helps businesses make informed decisions.
Content data is the group of facts that a web page is designed. It can
provide effective and interesting patterns about user needs. Text documents It helps detect credit risks and fraud.
It helps data scientists easily analyse enormous amounts of data quickly. Issues and Challenges of Data Mining
Data scientists can use the information to detect fraud, build risk models, and 1. Security and Social Challenges 9. Data Visualization
improve product safety. Dynamic techniques are done through data assortment sharing, so it requires Data visualization is a vital cycle in data mining since it is the foremost
impressive security, Private information about people and touchy information is interaction that shows the output in a respectable way to the client. The
gathered for the client's profiles, client standard of conduct understanding. information extricated ought to pass on the specific significance of what it really
Disadvantages of Data Mining
plans to pass on.
Many data analytics tools are complex and challenging to use. Data scientists 2. Noisy and Incomplete Data
need the right training to use the tools effectively. Data Mining is the way toward obtaining information from huge volumes of data. 10. Data Privacy and Security
This present reality information is noisy, incomplete, and heterogeneous. Data mining typically prompts significant issues regarding governance, privacy,
Speaking of the tools, different ones work with varying types of data mining,
and data security.
depending on the algorithms they employ. Thus, data analysts must be sure to 3. Distributed Data
choose the correct tools. 11. User Interface
True data is normally put away on various stages in distributed processing
Data mining techniques are not infallible, so there's always the risk that the conditions. It very well may be on the internet, individual systems, or even on the The knowledge is determined utilizing data mining devices is valuable just in the
information isn't entirely accurate. This obstacle is especially relevant if there's a databases. event that it is fascinating or more all reasonable by the client.
lack of diversity in the dataset.
4. Complex Data 12. Mining dependent on Level of Abstraction
Companies can potentially sell the customer data they have gleaned to other True data is truly heterogeneous, and it very well may be media data, including Data mining measure should be community-oriented in light of the fact that it
businesses and organizations, raising privacy concerns.
natural language text, time series, spatial data, temporal data, complex data, permits clients to focus on example optimizing, presenting, and pattern finding
Data mining requires large databases, making the process hard to manage. audio or video, images, etc. for data mining dependent on brought results back.
Data mining tools are complex and require training to use - Data analytics is 5. Performance 13. Integration of Background Knowledge
a complicated process and often requires people with training to use the tools The presentation of the data mining framework basically relies upon the Previous information might be utilized to communicate examples to express
Data mining techniques are not infallible - Data mining doesn't always provide productivity of techniques and algorithms utilized. On the off chance that the discovered patterns and to direct the exploration processes.
accurate information. Rising privacy concerns - One of the major disadvantages techniques and algorithms planned are not sufficient; at that point, it will
14. Mining Methodology Challenges
of data mining are data and privacy concerns. influence the presentation of the data mining measure unfavourably.
These difficulties are identified with data mining methods and their limits.
Data mining requires large databases- Data mining is one of the most 6. Scalability and Efficiency of the Algorithms Mining methods that cause the issue are the control and handling of noise in data,
powerful tools in a marketer's toolbox, but it does have its drawbacks. One such
The Data Mining algorithm should be scalable and efficient to extricate the dimensionality of the domain, diversity of data available, versatility of the
drawback is that data mining requires large databases to be effective.
information from tremendous measures of data in the data set. mining method, and so on.
Expensive - Data mining be a very expensive process. For example, companies
7. Improvement of Mining Algorithms
have to hire additional employees and technology specialists to ensure that the
data mining is done correctly. Factors, for example, the difficulty of data mining approaches, the enormous size
of the database, and the entire data flow inspire the distribution and creation of
parallel data mining algorithms.
8. Incorporation of Background Knowledge
In the event that background knowledge can be consolidated, more accurate and
reliable data mining arrangements can be found accurate predictions.
DM & DBMS What is Classification in Data Mining? What are Cluster and its types?
DBMS DATA MINING Classification in data mining is a common technique that separates data points A cluster is a group of objects that belong to the same class in other words
Dbms is the well organised data with Data mining is analysing data from into different classes. It allows you to organize data sets of all sorts, including similar objects are group in one plaster and dissimilar objects are group in
limited scope. different information to discover complex and large datasets as well as small and simple ones. another cluster A cluster of data objects can be treated as one group.
useful knowledge.
Dbms is the set of digital databases DM is the collection of raw data from It primarily involves using algorithms that you can easily modify to improve the Cluster types:
that stored data in tabular form. which required data is selected data quality. This is a big reason why supervised learning is particularly common
1. Partitioning cluster -
Dbms support SQL query language to data mining support automatic data with classification in techniques in data mining. The primary goal of
find the data. search technique to find out data classification is to connect a variable of interest with the required variables. The Suppose we are given a database of an object and the partitioning method
pattern variable of interest should be of qualitative type. construct a partition of data each partition will represent a cluster and k<=n. It
Dbms is subject of DM as it manages data mining is super set of DBMS that means that it will classify the data into groups which satisfy the following
Types of Classification Techniques in Data Mining requirements-
limited data. handle unlimited data
Data is limited to organization scope data mining scope is global it can be Before we discuss the various classification algorithms in data mining, let's first
 Each group contain at least one object
as it used by employee specific org. used by any type of business manager look at the type of classification techniques available.
 Each object must belong to exactly one group
for analysis
Dbms is used to store data through Data mining is used for pattern
Primarily, we can divide the classification algorithms into two categories:  For a given number of partitions say k, the partitioning method will create
SQL. searching algorithms and modern data 1. Generative an initial partitioning
techniques.  Then it uses the iterative relocation technique to improve the partitioning
2. Discriminative
Dbms contain well structure tabular Data mining contain unstructured raw by moving object from one group to other.
data stored in database or organization data that is stored in data warehouse Here's a brief explanation of these two categories:
2. Hierarchical methods -
server. on the remote server of internet.
Generative
Dbms written exact answers or SQL Data mining written answer close to This method will create a hierarchical decomposition of the given set of data
query dbms is 100% accurate. the accuracy to it as accurate as dbms. A generative classification algorithm models the distribution of individual objects we can classify hierarchical method on basis of how the hierarchical
Dbms support interrelationship among Data mining contain data objects and classes. It tries to learn the model which creates the data through estimation of decomposition is formed there are two approaches here-
table, using concept or primary and data cubes that are related based on distributions and assumptions of the model. You can use generative algorithms to
i) Agglomerative approach - this approach is also known as the bottom up
foreign key. concept. predict unseen data. A prominent generative algorithm is the Naive Bayes
Dbms is simple efficient and control Data mining is the collection or more approach. In this we start with each object forming a separate group. It keeps on
Classifier.
collection of data by DBA. Complex object data which are in merging the objects or group that are close to one another. It keeps on doing so
uncontrolled. Discriminative until all of the groups are merged into one or until the termination condition
Major task of Dbms is get required Data mining is used to get data called It's a rudimentary classification algorithm that determines a class for a row of holds. ii) Divisive approach - this approach is also known as the top down
data as per SQL Query. as KDD (knowledge data Discovery) data. It models by using the observed data and depends on the data quality approach. In this we start with all of the objects in the same cluster. The
from hidden data patterns. instead of its distributions. continuous iteration of cluster is split up into smaller cluster. Is down until each
Dbms Employees deduction of data. Data mining employee’s induction of object in in one cluster or the termination condition holds. Method is rigid that
data. Logistic regression is an excellent type of discriminative classifiers. once a merging splitting is done, it can never be undone.
3. Similarity or distance measure -
These are core components used by distance-based clustering algorithm to cluster
similar data points into the same cluster, dissimilar or distant data points are
placed into different cluster.
What is Association rule learning?

Association rule learning is a kind of unsupervised learning technique that tests
for the reliance of one data element on another data element and design
appropriately so that it can be more cost-effective. It tries to discover some
interesting relations or associations between the variables of the dataset. It
depends on various rules to find interesting relations between variables in the
database. The association rule learning is the most important approach of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. In market basket analysis, it is an approach used by
several big retailers to find the relations between items.
Web mining can be viewed as the application of adapted data mining methods to
the internet, although data mining is defined as the application of the algorithm to
discover patterns on mostly structured data fixed into a knowledge discovery
process.
Web mining has a distinctive property to support a collection of multiple data
types. The web has several aspects that yield multiple approaches for the mining
process, such as web pages including text, web pages are connected via
hyperlinks, and user activity can be monitored via web server logs.In market
basket analysis, customer buying habits are analyzed by finding associations
between the different items that customers place in their shopping baskets. By
discovering such associations, retailers produce marketing methods by analyzing
which elements are frequently purchased by users. This association can lead to
increased sales by supporting retailers to do selective marketing and plan for their
shelf area.
Types of Association Rule Learning
Apriori Algorithm − This algorithm needs frequent datasets to produce
association rules. It is designed to work on databases that include transactions.
This algorithm needs a breadth-first search and hash tree to compute the itemset
efficiently. It is generally used for market basket analysis and support to learn the
products that can be purchased together. It can be used in the healthcare area to
discover drug reactions for patients.
Eclat Algorithm − The Eclat algorithm represents Equivalence Class
Transformation. This algorithm needs a depth-first search method to discover
frequent itemsets in a transaction database. It implements quicker execution than
Apriori Algorithm.
F-P Growth Algorithm − The F-P growth algorithm represents Frequent Pattern.
It is the enhanced version of the Apriori Algorithm. It describes the database in
the form of a tree structure that is referred to as a frequent pattern or tree. This
frequent tree aims to extract the most frequent patterns.

Data Mining Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Notes

Uploaded by

Copyright:

Available Formats

Explain various types of Web mining in detail.

What is Association rule learning?

You might also like