You are on page 1of 21

INTRODUCTION TO DATA

SCIENCE
BY MICROSOFT

WEBSITE : https://www.edx.org/
DURATION: 6 Weeks
AMOUNT PAID: Yes

Name: Pamposh Gurkha


Class : CSE-B
Registration No: RA1611003011034
CERTIFICATE
MODULE 1:– EXPLORING DATA IN MICROSOFT EXCEL
ONLINE

 Viewing a Table of Data in Excel


 Open the Workbook in Excel Online
 Filter and Sort the Data
 Formulae to Explore Data in Excel
 Conditional Formatting to Explore Data
Viewing a Table of Data in Excel
In this exercise, we will upload the Excel workbook containing our data to the OneDrive cloud storage account
associated with your Microsoft account, and then explore the data in Microsoft Excel Online.
Filter and Sort the Data
• Select cell A1, and then on the Insert tab of the ribbon above the worksheet, click Table. Verify that Excel has
automatically detected the data in the range A1:G366, and that the My table has headers checkbox is selected,
and then click OK.
• Click any cell to deselect the table, and then click the drop-down button for the Day column, and click Filter.
• In the Filter dialog box, clear the (Select All) checkbox, and then select only the Saturday and Sunday
checkboxes as shown here before clicking OK. The table of data is filtered to show only the records for
weekend days (Saturday and Sunday).
• Click the drop-down arrow for the Rainfall column and click Sort Descending. The table of data is sorted in
descending order of rainfall, so the first row contains the data for the weekend day with the most rain. This was
a Sunday on which there was 2.50 cm of rain.
• Click the drop-down arrow for the Day column again and then click Clear Filter from ‘Day’. The table now
shows all the data.
• Click the drop-down arrow for Date and click Sort Ascending to re-order the data into chronological order.
Using Formulae to Explore Data in Excel
• Click the B column header to select the entire B column. Then on the Home tab of the ribbon, in the Insert
drop-down menu, click Insert Sheet Columns This inserts a new Column1 column between the Date and Day
columns.
• In cell B1, rename Column1 to Month. Then in cell B2, enter the following formula: =TEXT(A2,
"mmmm") After you enter the formula, it should be copied automatically to all the other Month cells in the
table, and the name of the month for each record should be displayed.
• In cell I1, enter the text Revenue to add a new Revenue column to the table. Then in cell I2, enter the
following formula: = G2*H2 The formula is again automatically copied to the remaining rows in the table,
and the revenue (calculated as Price multiplied by Sales) is displayed.
• Click the I column header to select the entire column, and then on the Home tab of the ribbon, in the Number
section, in the Accounting Number Format ($) drop-down list, select $ English (United States). This formats
the revenue data as US dollars.
• select cell I367 (under the Revenue column). Then on the Home tab of the ribbon, in the Editing section, in the
AutoSum (Σ) drop-down menu, click Σ Sum. Enter the following formula: =SUBTOTAL(109,[Revenue]).
• Filter the Month column to show only the records for July, and then look at the subtotal at the bottom of the
Revenue column. It now shows the total revenue for July.
• Clear the filter on Month to show all the data.
Using Conditional Formatting to Explore Data

• Select cell D2 and then hold the Shift and Ctrl keys and press the Down-Arrow key to select all the values in the
Temperature column.
• On the Home tab of the ribbon, in the Conditional Formatting drop-down list, point to Color Scales, and select
the Red-White Color Scale (with red at the top and white at the bottom). The Temperature cells are reformatted
so that the hottest days are colored an intense red, and the coolest days are much lighter in color intensity.
Scrolling through the data now, it is easier to find days that are particularly hot or cool.
• Select all the values in the Rainfall column, and then in the Conditional Formatting drop-down list, point to
Data Bars, and select the Light Blue Data Bar gradient fill. The cells are formatted with a visual indication of
the comparative level of rainfall for each day.
• Select all the values in the Sales column, and then in the Conditional Formatting drop-down list, point to
Top/Bottom Rules, and select Top 10%. Then in the Top 10% dialog box, select Green Fill with Dark Green
Text and click OK. The cells containing sales values in the top 10% are highlighted in green (you may need to
scroll to see them).
• Reselect the values in the Sales column if you deselected them, and then in the Conditional Formatting drop-
down list, point to Top/Bottom Rules, and select Bottom 10%. Then in the Bottom 10% dialog box, select Red
Fill with Dark Red Text and click OK. The cells containing sales values in the bottom 10% are highlighted in
red.
MODULE 2:DATA ANALYSIS FUNDAMENTALS

 Aggregating Data
 Grouping and Summarizing Data
 Visualizing Data
 Analyzing Data in Excel Online
Analyzing Data with a PivotTable

• Pivot tables are one of Excel's most powerful features. A pivot table allows us
to extract the significance from a large, detailed data set.

• PivotTables are an excellent way to “slice and dice” data, summarizing


numeric measures by one or more dimensions.

• It will save us a lot of time by allowing you to quickly summarize large


amounts of data into a meaningful report.

• It allows us to reorganize and summarize selected columns and rows of data in


a spreadsheet or database table to obtain a desired report. A pivot table doesn't
actually change the spreadsheet or database itself.
Visualizing Data with Charts

A chart is a graphical representation of data, in which the data is represented by


symbols. A chart can represent tabular numeric data, functions or some kinds of
qualitative structure and provides different info.
It can often be easier to identify trends and relationships in data by creating data
visualizations.Some Examples of Data Charts are:
• Bar Charts
• Histograms
• Pie Charts
• Scatter Plot Charts
MODULE 3:GETTING STARTED WITH STATISTICS

 Using Descriptive Statistics


 Visualize the Distribution of data
 Working with Samples
 Inferential Statistics and Hypothesis Testing
Using Descriptive Statistics:

Descriptive Statistics help you understand the “shape” or distribution of your data; for
example, by finding measures on central tendency (the most common “typical” values) and
measures of variance (how much difference there is between the most common values and
other values that are higher or lower).

EXAMPLES:

MEDIAN(H2:H366) ,finds out median from cell H2 to H3 66


MODE.SNGL(H2:H366) ,finds out mode from cell H2 to H3 66
AVERAGE(H2:H366) ,finds out average from cell H2 to H3 66
VAR.P(H2:H366) ,finds out variance from cell H2 to H3 66
STDEV.P(H2:H366) ,finds out standard deviation from cell H2 to H3 66
Visualize the Distribution of data:

Data visualization is a general term that describes any effort to help people
understand the significance of data by placing it in a visual context.
Different types of charts provided by excel:
• Histograms
• Pie Chart
• Column Chart
• Line Chart
• Bar Chart
• Area Chart
• Scatter Chart
Working with Samples :

Until now, we’ve worked with the full population of data, but now we would work
with a sample of population ,not working with the full population.
Steps to create a random population:
• Add a new column within the spreadsheet and name it Random_number
• In the first cell underneath your heading row, type “= RAND()”
• Press “Enter,” and a random number will appear in the cell
• Copy and paste the first cell into the other cells in this column
• Once each row contains a random number, sort the records by
Random_number column
• Choose the first 500 emails. Those will be the random 500 out of 3000 emails.
Inferential Statistics and Hypothesis Testing:

Inferential statistics, as the name suggests, are used to make inferences, or


predictions, from data based on statistical relationships between fields (or features)
of the data.
Correlation:
Correlation is a statistical measurement of the strength of an apparent relationship
between two numeric variables
CORREL(A2:A366,B2:B366)
Correlation is measured as a value between -1 and 1. A value close to 1 indicates a
positive correlation; in other words, high values for one variable seem to
correspond with high values for the other variable. A value close to -1 on the other
hand indicates a negative correlation, in which high values for one variable
correspond to low values for the other variable. A value close to 0 indicates the
lack of any discernible relationship between the variables.
Z test steps: it is used to perform inference and hypothesis on
data

• To select the z-test tool, click the Data tab’s Data Analysis command button.
• When Excel displays the Data Analysis dialog box, select the z-Test: Two
Sample for Means tool and then click OK.
• n the Variable 1 Range and Variable 2 Range text boxes, identify the sample
values by telling Excel in what worksheet ranges you’ve stored the two
samples.
• Use the Hypothesized Mean Difference text box to indicate whether you
hypothesize that the means are equal.
• Use the Variable 1 Variance (Known) and Variable 2 Variance (Known) text
boxes to provide the population variance for the first and second samples.
• In the Alpha text box, state the confidence level for your z-test calculation.
• In the Output Options section, indicate where the z-test tool results should be
stored.
• Click OK.
MODULE 4:INTRODUCTION TO MACHINE
LEARNING

 Machine Learning Model


 Regression Model
 Classification Model
 Clustering Model
Creating a Machine Learning Model
• Machine Learning is a term used to describe the development of predictive
models based on historic data.
• In this, we will create an experiment and explore the data by visualizing it in
different formats.
• Data can be explored by using Jupyter Notebooks. They consist of an
interactive browser‐based environment in which you can add notes and run
code to manipulate and visualize data.
Publishing and Using a Machine Learning Model
• In this a predictive experiment is created that encapsulates your model and the
data preparation steps you have defined, and which defines the input and
output interfaces through which features are passed into the model and
predicted labels are returned.
• In this a web service is deployed and consumed for creating a regression
model.
Training a Classification Model
• Classification is another kind of supervised learning in which instead
of predicting a numeric value, the model is trained to predict the
category or class of an observation.
• In this, an experiment is copied from the Gallery to the Workspace.
• And an classification model is trained and the results are viewed.

Training a Clustering Model


• Clustering is an example of unsupervised learning; in other words,
training a predictive model with no known labels.
• In this, an experiment is copied from the Gallery to the Workspace.
• And an clustering model is trained and the results are viewed.

You might also like