You are on page 1of 67

PRACTICAL NO 1

AIM: Import the legacy data from different sources such as Excel,
SQL Server, Oracle, etc and load in the target system.

STEP 1: Open Excel and go to Data tab and click on “From other
sources”. After clicking on it you’ll get different options in it select the
“From SQL Server”. You will be prompted to a window then.

STEP 2: Now in the server name section type or paste the Database name
in it and click on Next.
STEP 3: In this window you will have to select the respective database in
which your tables are present. I am selecting rohit_TYITPRACT1_901.

STEP 4: After selecting a database it’s time to select the tables that we
need to import the data of.
STEP 5: You will be prompted a wizard for checking if the data you
want to import is the one. Click on finish.
STEP 6: In this window you need to select in which format you want the
data in. In our practical we need it in the table format so we are gonna
select the Table option from these 4 options. Click on “OK”.

NOTE: Similarly we need to perform the same steps for all


the tables that we are gonna import and use it.

The data has been imported and is ready to use.


PRACTICAL NO: 2

Aim: Data Transformation and visualization in SQL Server.

Create a Destination Database first.


Create a new project on SSD.

Interface
Drag and drop Data flow task

Click New -> New


Give the server name and also select or enter the database name below.
Preview the database before clicking OK.

DESTINATION: Follow the same steps for the destination too just give
the destination database.
Click on New to add a new table that represents the destination.
Similarly perform the above steps for all the tables.
Employee:
Manager:
Product:
SalesFact:
Store:
Destination Database: Check the database for the output, you’ll see that the
respective tables have been created in the destination database.
PRACTICAL NO: 03

AIM: Perform Extraction, Transformation and Loading (ETL)


process to construct a database in SQL server.

Step1: Create a .csv file that contains data of different hospitals.

Step2: Create a destination database in Microsoft SQL Server


Management Studio.
Step3: Open the SSDT and create an new project with your RollNo. Now
we need to create a Data Flow Task. For this drag and drop the Data Flow
Task.
Step4: Now we have a .csv file which we need as a source. Go on the
data flow tab by double clicking the control flow task. So go on Other
Sources and then click on the Flat File option and drag & drop it.
Step5: Create a Derived Column By dragging and dropping it from the
common tab.

After that double click on the Derived Column to open the editor. Give the
column name that you want to see changed in the destination table and aslo give
the respective funtion to it.
Step6: Now for the destination we need to take the ADO Net Destination from the
Destination Sources.

Now we need to give the destination the connection manager. Double click on the
ADO Net Destination to open the editor. Then click on New.
Now click on New again.
Now give the Server name and also give the Database name or Select the respective
Database.

Once you’ve given the Server name and database name click on Ok and you’ll see it
is visible now.
Now create a new destination table by clicking on New and then preview it. After
you’ve done with it click on OK.

Be sure to map both the source and the destination tables.


OUTPUT:
The respected is the output of the practical where you can see that the
Data from the csv file has been transferred into the following database in
the form of a table.
PRACTICAL NO: 04

AIM: Creating a cube in SQL Server.

Step1: First open the Sql Server Management Studio and check for the
database created in the first practical. Take the SalesFact table for this.

Step2: Now open the SSDT and create a new project with the BI service
of Analysis service Multidimension and Data Mining Project.
Step3: Right click on the Data Source and click on New Data Source to
add a new data source. Delete all the existing data sources for
convenience purpose. To create a new Data source click on New.

Step4: Give the appropriate Server name and select the respective
Database in which the SalesFact table is present, also Test the connection.
Here you can see your database present in the data source.

Step5: Right click on Data Source View to open it’s wizard. Select the
Database and click Next.
Click on next as selecting the default value.

Select the Sales Fact table and click the ‘>’ button to add it in included
objects.
Now to get the related tables to SalesFact select the SalesFact on the right
side and click on Add Related Tables.

As you can see the related tables are added. Then click on Next.
Click on Finish now.

This is the output that you’ll see but this isn’t the final output, we need to
make a cube we go to the next step.
Step6: Right click on Cube to create a cube. Select New Cube. And keep
the default options selected and click next.

Click on the Suggest button to show the Measure Group Tables


automatically
Here the Tables are selected automatically and shown. Click on Next.

Click on Next.
Click on Next.

Click on Finish to get the final output.


Here you can see that the tables colors have changed showing that a cube
is generated. This is the final output that we were expecting.

PRACTICAL NO: 5

A. AIM: Import the datawarehouse data in microsoft excel and create pivot
table and pivot chart.
B. AIM: Import the cube in microsoft excel and create pivot table and chart to
perform data analysis.
PRACTICAL NO: 5(A)
Step1: Since my database was not working I imported an excel file and opened it
in a pivot table
SELECT the whole table and select Pivot Chart in that Select the one you need.

PRACTICAL NO: 5(B)

Step 1: Import Data one by one and paste as picture (total 3 data table)
Step2: Select the first picture and right click and click on format picture.
Then select 3-D Rotation and do the following rotations and use the
preset that would make a cube. Do this step for all three pictures.
PRACTICAL NO: 06

AIM: Perform What-If Analysis.

Step1: Create a data entry of a book store having 100 books in the storage. You sell a
certain % for the highest price of $50 and a certain % for the lowest price of $20.
Consider the if you sell the highest price for 60% of books and then calculate the total
amount as required. Similarly do for all the different scenarios such as 70%, 80%, etc.

Step2: Now to create a scenario we need to add the data to the scenario manager.
Click on What-if Analysis buttont then click on scenario manager and the interface
would be visible.
Step3: Now add the first scenario of 60%, give the changing cell value as the total
value that you got in the data. And then press OK.

Step4: Now set the value of the changing cells. And then hit OK.

Step5: In this step select the resulting cells that should show the result and then click
on OK.
Step6: Do this for all the tables or the values that you want in your
scenario.

Step6: In the final step you will finally see the Scenario Summary as shown below.
PRACTICAL NO: 7

AIM: Perfomt data classification using classification.

Step1: Feed some input data.


Step2: Creata a .arff file for weka using notepad.

Step3: Data schema


Weather(temperature, outlook, humidity, cloudy, play)
To create a table we use:- @relation -- example: @relation weather
To create a column we use:- @attribute -- example: @attribute
temperature.
To create options for columns we use curly brackets {}. These are
case sensitive.
To add data or to create a row we need to use @data. To terminate a
row we just need to press the Enter key.
Finally save using the .arff extension.
Step4: Open Weka and feed the Weather.arff file to it. You can
preprocess and visualize data over here.
Step5: Go to the Classify tab and use the appropriate classifier which u
want in the rules tab under Classifier. Choose the number of folds during
cross-validation. Select the respective column that you want to be
classified and click on Start.
Step6: Visualize in different visuals. Right click on any result set.
Margin Curve

Threshold Curve:
Cost Curve:

Cost/Benefit Analysis:
Tree Visualizer:
Practical No: 8

Clusterring
Step1: Use the previous .arff file and go to the cluster tab and select a
clusterer. Here we select SimpleKMeans.

Step2: Click on Start to run the following function. You’ll get the result
as follows. You can select the percentage split too. I have kept default.
Step3: Right click on a result and select the Visual cluster assignments.

Step4: The following is the output for the cluster. Over here you can
adjust the jitter slider according to your need.
Practical 9
Aim: Prediction Using Linear Regression
In Linear Regression these two variables are related through an equation,
where exponent (power) of both these variables is 1. Mathematically a
linear relationship represents a straight line when plotted as a graph. A
non-linear relationship where the exponent of any variable is not equal to
1 creates a curve.

y = ax + b is an equation for linear regression.

Where, y is the response variable, x is the predictor variable and a and b


are constants which are called the coefficients.
A simple example of regression is predicting weight of a person when his
height is known. To do this we need to have the relationship between
height and weight of a person.
The steps to create the relationship is -
• Carry out the experiment of gathering a sample of observed values of
height and
corresponding weight.
• Create a relationship model using the lm() functions in R.
• Find the coefficients from the model created and create the
mathematical equation
using these
• Get a summary of the relationship model to know the average error in
prediction.
Also called residuals.
The basic syntax for lm() function in linear regression is -
lm(formula, data)
Following is the description of the parameters used -
• formula is a symbol presenting the relation between x and y.
• data is the vector on which the formula will be applied
• To predict the weight of new persons, use the predict() function in R.

Get the summary of the relationship


Predict the weight of new persons

Visualize the regression graphically


Practical 10
Aim: Data Analysis using Time Series Analysis

Time series is a series of data points in which each data point is


associated with a timestamp. A simple example is the price of a stock in
the stock market at different points of time on a given day. Another
example is the amount of rainfall in a region at different months of the
year. R language uses many functions to create, manipulate and plot the
time series data. The data for the time series is stored in an R object
called time-series object. It is also a R data object like a vector or data
frame.
The time series object is created by using the ts() function.
Syntax
The basic syntax for ts() function in time series analysis is
timeseries.object.name <- ts(data, start, end, frequency)
Following is the description of the parameters used -
• data is a vector or matrix containing the values used in the time series.
• start specifies the start time for the first observation in time series.
• end specifies the end time for the last observation in time series.
• frequency specifies the number of observations per unit time.
Except the parameter "data" all other parameters are optional.
Example
Consider the annual rainfall details at a place starting from January 2012.
We create an R time series object for a period of 12 months and plot it.

# Get the data points in form of a R vector.


rainfall <-
c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,107
1)

# Convert it to a time series object.


rainfall.timeseries <- ts(rainfall,start = c(2012,1),frequency = 12)

# Print the timeseries data.


print(rainfall.timeseries)
# Give the chart file a name.
png(file = "rainfall.png")
# Plot a graph of the time series.
plot(rainfall.timeseries)
# Save the file.
dev.off()

You might also like