You are on page 1of 57

lOMoARcPSD|21669462

Dhiren DM LAB

B Tech (Parul University)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462

Faculty of Engineering & Technology


Subject Name: Data Mining
Subject Code: 203105432
B. Tech IT Year Semester 7th

Annexure No:

PRACTICAL - 1
AIM: Design and create cube by identifying measures and dimensions for star schema, snowflake
schema and fact constellation schema.

Software Used: Analysis services- SQL Server-2008.

(1) Creating a New Analysis Services Project


To create a new Analysis Services project, Use the New Project dialog box in BIDS to create a new
Analysis Services project, follow these steps:

1.Select Microsoft SQL Server 2008 ⇒SQL Server Business Intelligence. Development Studio from the
Programs menu to launch Business Intelligence Development Studio.

2.Select File ⇒ New Project. ⇒

3.In the New Project dialog box, select the Business Intelligence Projects project type.

4.Select the Analysis Services Project template.

5.Name the new project Practical1 and select a convenient location to save it.

6.Click OK to create the new project.

ENROLL NO - 200303108901 1

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

(2) Defining a Data Source

To define a data source, use the Data Source Wizard. Launch this wizard by right-clicking on
the Data Sources folder in new Analysis Services project
To define a data source for the new cube, follow these steps:
1. Right-click on the Data Sources folder in Solution Explorer and select New Data
Source.
2. Read the first page of the Data Source Wizard and click Next.
3. You can base a data source on a new or an existing connection. Because you don’t
have any existing connections, click New.
4. In the Connection Manager dialog box, select the server containing your analysis
services sample database from the Server Name combo box.
5. Fill in your authentication information.
6. Select the Native OLE DB\SQL Native Client provider (this is the default provider).
7. Select the AdventureWorksDW database. Figure shows the filled-in Connection
Manager dialog box.
8. Click OK to dismiss the Connection Manager dialog box.
9. Click Next.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

PRACTICAL 2
AIM: Make an OLAP cube and perform Roll Up and Drill Down operations on it. Show the Apex
and Base cuboid for the same. Draw Star-net query model for the cube.

Description:

ROLL UP:
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by climbing
down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure shows the
result of roll-up operations performed on the dimension location. The hierarchy for the location is defined as the Order
Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location hierarchy from
the level of the city to the level of the country.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

DRILL DOWN:

The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like
zooming-in on the data cube. It navigates from less detailed record to more detailed data. Drill-down can
be performed by either stepping down a concept hierarchy for a dimension or adding additional
dimensions.

Figure shows a drill-down operation performed on the dimension time by stepping down a concept
hierarchy which is defined as day, month, quarter, and year. Drill-down appears by descending the time
hierarchy from the level of the quarter to a more detailed level of the month. Because a drill-down adds
more details to the given data, it can also be performed by adding a new dimension to a cube. For example,
a drill-down on the central cubes of the figure can occur by introducing an additional dimension, such as a
customer group.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

SLICE

A slice is a subset of the cubes corresponding to a single value for one or more members of the dimension.
For example, a slice operation is executed when the customer wants a selection on one dimension of a
three-
dimensional cube resulting in a two-dimensional site. So, the Slice operations perform a selection on one
dimension of the given cube, thus resulting in a subcube.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

DICE

Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation. The dice operation on the cube based on the following
selection criteria involves three dimensions.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

PIVOT

Broadly, this term refers to a new view of data available within a Slice of a multidimensional OLAP Cube.
As an example: a financial analyst might want to view or “pivot” data in various ways, such as displaying
all the cities down the page and all the products across a page.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

QUERY IN MS SERVER

1) Student Database

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

2) Cube

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3) Roll Up

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

4) Cube and Rollup

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

PRACTICAL 3

AIM: Create calculated members using arithmetic operators and member property of
dimension members.

Follow the Steps to create new calculated measure:

1. Open Cube Designer for the Analysis Services Tutorial cube, and then click the
Calculations tab.

2.On the toolbar of the Calculations tab, click New Calculated Member.
A new form appears in the Calculation Expressions pane within which you define the
properties of this new calculated member.

The new member also appears in the Script Organizer pane.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3. In the Name box, change the name of the calculated measure to [Parikshit].

Parikshit

[Parikshit]

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

4. In the Expression box, type a plus sign ﴾+﴿ after [Measures].


[Data Source Count]. In the Format string list, select " Currency "
.In the Non‐empty behavior list, select the checkboxes for Fact Cache Policy Count-
Configuration Info Count, and then click OK.

5.On the toolbar of the Calculations tab, click Script View, and then review the
calculation script in the Calculation Expressions pane.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

PRACTICAL 4

Aim: Design and Create cube by identifying measures and dimensions for Design storage
using storage mode MOLAP, ROLAP and HOLAP.
Description:
Relational OLAP(ROLAP): ROLAP is an extended RDBMS along with
multidimensional data mapping to perform the standard relational operation.

Multidimensional OLAP (MOLAP) : MOLAP Implemented operation in


multidimensional data.

Hybrid Online Analytical Processing (HOLAP) : In HOLAP approach the aggregated


totals are stored in a multidimensional database while the detailed data is stored in the
relational database. This offers both data efficiency of the ROLAP model and the
performance of the MOLAP model.

Advantages of OLAP

● OLAP is a platform for all types of business including planning, budgeting,


reporting, and analysis.
● Information and calculations are consistent in an OLAP cube. This is a crucial benefit.
● Quickly create and analyze "What if" scenarios
● Easily search OLAP database for broad or specific terms.
● OLAP provides the building blocks for business modeling tools, Data mining tools,
performance reporting tools.
● Allows users to slice and dice cube data all by various dimensions, measures, and
filters.
● It is good for analyzing time series.
● Finding some clusters and outliers is easy with OLAP.
● It is a powerful visualization online analytical process system which provides faster
response times

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Cube View:

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

STORAGE SETTINGS :

1. MOLAP : By default MOLAP uses array-based multidimensional storage engines to


display multidimensional views of data. Basically, they use an OLAP cube.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

2. ROLAP : It works with data that exists in a relational database. Facts and dimension tables
are stored as relational tables.It also allows multidimensional analysis of data and is the fastest
growing OLAP.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3.HOLAP : Hybrid OLAP is a mixture of both ROLAP and MOLAP. It offers fast
computation of MOLAP and higher scalability of ROLAP.Aggregated or computed data is
stored in a multidimensional OLAP cube and Detailed information is stored in a relational
database.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Custom Settings :

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

PRACTICAL 5

Aim: Perform Pre-processing on a dataset. Apply various Filters and discuss the
effect of each filter applied.

Description :
1. WEKA : Waikato Environment for Knowledge analysis fully implemented in
JAVA we will be working and exploring on explorer.
2. Data preprocessing is a data mining technique that involves
transforming raw data into an understandable format.
3. Real-world data is often incomplete, inconsistent, lacking in certain
behaviors or trends, and is likely to contain many errors.
4. Data preprocessing is an important step in the data mining process.
5. Data pre-processing includes cleaning, normalization, transformation, feature
extraction and selection, etc.
6. The product of data pre-processing is the final training

Prerequisite :

1. Weka Tool Installation


2. Data set

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

 STEPS :

1. Home page of weka

2. Click on Explorer and find the tab of Preprocess.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3.Go to Open file ->C Drive ->Program Files ->Weka 3-8-4 -> data and
choose any data set given for this practical I have chosen iris.arff data set

4. After selecting database :

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

5. Select the filters and start exploring each of them and observe the changes .

→Filters: -

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

1. ADD

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

2. NORMALIZE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3. REMOVE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

4. REMOVERANGE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

5. REPLACEWITHMISSINGVALUE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

6. REMOVEDFREQUENTVALUES

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

7. REMOVEPERCENTAGE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

8. SUBSETBYEXPRESSION

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

9. RESAMPLE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

10. ADDID

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

11. DISCRETIZE

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

12. MERGENOMINALVALUES

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Practical 6

Aim : Perform different binning methods for smooth out noise.

● Data binning, bucketing is a data pre-processing method used to minimize


the effects of small observation errors.

● The original data values are divided into small intervals known as bins and
then they are replaced by a general value calculated for that bin.

● This has a smoothing effect on the input data and may also reduce the
chances of overfitting in case of small datasets

● There are 2 methods of dividing data into bins ”


1. Equal Frequency Binning : bins have equal frequency.

2. Equal Width Binning : bins have equal width with a range of each bin are
defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) /
(no of bins).

Source code :

#equal frequency
def equifreq(arr1, m):

a = len(arr1)
n = int(a / m)
for i in range(0, m):
arr = []
for j in range(i * n, (i + 1) * n):
if j >= a:
break
arr = arr + [arr1[j]]

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

print(arr)

#equal width
def equiwidth(arr1, m):
a = len(arr1)
w = int((max(arr1) - min(arr1)) / m)
min1 = min(arr1)
arr = []
for i in range(0, m + 1):
arr = arr + [min1 + w * i]
arri=[]

for i in range(0, m):


temp = []
for j in arr1:
if j >= arr[i] and j <= arr[i+1]:
temp += [j]
arri += [temp]
print(arri)

#data to be binned
data = [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
#no of bins
m=3

print("equal frequency binning")


equifreq(data, m)

print("\n\nequal width binning")


equiwidth(data, 3)

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

OUTPUT :

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Practical 7
Aim: Perform Association Rules Mining using weka tool.
Theory : -
It is an important data mining model studied extensively by the database
and data mining community.
Assume
all data are categorical.
No good
algorithm for numeric data.
Initially
used for Market Basket Analysis to find how items purchased by customers
are
related.
Support count: The support count of an itemset X, denoted by X.count, in a
data set T is
the number of transactions in T that contain X. Assume T
has n transactions.
Then,

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Process: -

1. Choose a Random dataset:

2. Select random attributes from the given database.

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3. Go to associate section and by default Apriori assosciator will be


there so just click on start :

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Practical 8

Aim: Perform classification with WEKA tool.

Theory : -

Classification : -
Classification is a data mining function that assigns items in a
collection to target categories or classes. The goal of classification is to
accurately predict the target class for each case in the data. For
example, a classification model could be used to identify loan
applicants as low, medium, or high credit risks.

Types of classifiers:
There are different types of classifiers, a classifier is an algorithm that
maps the input data to a specific category. Now, let us take a look at
the different types of classifiers:
1. Perceptron
2. Naive Bayes
3. Decision Tree
4. Logistic Regression
5. K-Nearest Neighbor
6. Artificial Neural Networks/Deep Learning
7. Support Vector Machine

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Steps: -
1. Open weka tool:

2. Select explorer option and choose the dataset from the pre-upload
data :

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

3. Go to classify tab → choose j48 and then start :

4. Visualization the j48 with decision tree:

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

5. Go to classify tab → choose NaiveBayes and then start :

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

6. Visualizing NaiveBayes

P R PS Total

Downloaded by Mriyank Das (mriyankdas428@gmail.com)


lOMoARcPSD|21669462

Downloaded by Mriyank Das (mriyankdas428@gmail.com)

You might also like