You are on page 1of 2

JOHN VICTOR GICHONGE

P101/1514G/15
DATAMINING ASSIGNMENT
MR. NJOROGE
Apriori algorithm. is an algorithm for frequent item set mining and association rule learning
over transactional databases. It proceeds by identifying the frequent individual items in the
database and extending them to larger and larger item sets as long as those item sets appear
sufficiently often in the database. The frequent item sets determined by Apriori can be used to
determine association rules which highlight general trends in the database: this has applications
in domains such as market basket analysis.

Frequent pattern growth tree algorithm. is an improvement of Apriori algorithm. FP growth


algorithm used for finding frequent itemset in a transaction database without candidate
generation. FP growth represents frequent items in frequent pattern trees or FP-tree.

Multilevel associations. Association rules generated from mining data at multiple levels of
abstraction are called multiple-level or multilevel association rules. Multilevel association rules
can be mined efficiently using concept hierarchies under a support-confidence framework. Rules
at high concept level may add to common sense while rules at low concept level may not be
useful always.

Constraint based mining is the research area studying the development of data mining
algorithms that search through a pattern or model space restricted by constraints. The term is
usually used to refer to algorithms that search for patterns only. The most well-known instance of
constraint-based mining is the mining of frequent patterns. Constraints are needed in pattern
mining algorithms to increase the efficiency of the search and to reduce the number of patterns
that are presented to the user, thus making knowledge discovery more effective and useful.

Analytical characterization. Characterization and analytical techniques are methods used to


identify, isolate or quantify chemicals or materials, or to characterize their physical properties.
They include microscopy, light or radiation scattering, spectroscopy, calorimetry,
chromatography, gravimetric and other measurements used in chemistry and materials science.
Illustrate basic Algorithm for attribute-oriented induction. The attribute-oriented induction is
a data mining method. The input value of AOI contains a relational data table and attribute-
related concept hierarchies. The output is a general feature inducted by the related data.
Though it is useful in searching for general feature with traditional AOI method, it only can
mine the feature from the single-value attribute data. If the data is of multiple-value attribute,
the traditional AOI method is not able to find general knowledge from the data. In addition, the
AOI algorithm is based on the way of induction to establish the concept hierarchies. Different
principles of classification or different category values produce different concept trees,
therefore, affecting the inductive conclusion. Based on the issue, this paper proposes a
modified AOI algorithm combined with a simplified Boolean bit Karnaugh map. It does not
need to establish the concept tree. It can handle data of multi value and find out the general
features implied within the attributes.
Illustrate with a diagram data warehousing architecture

Data warehouse is made up of the following layers:


 Data Source Layer- This is where original data, collected from a variety internal and
external sources, resides in the relational database. It stores different data types like
operational data, social media data and third party data
 Data Staging Layer- data is extracted from different internal and external data
sources. Made up of the following components: Landing Database and Staging Area and
Data Integration tools
 Data Storage Layer- is where data that was cleansed in the staging area is stored as a
single central repository.
 Data Presentation Layer- is where users interact with the cleansed and organized. This
layer of the data warehouse architecture provides users with the ability to query the data
for product or service insights, analyze the information to conduct hypothetical business
scenarios, and develop automated or ad-hoc reports.

You might also like