You are on page 1of 43

ISOM 2600 Business Analytics

TOPIC 1: LIST, ARRAY AND


GRAPHING
X U H U WA N
HKUST
OCT 18, 2022

Tuesday, October 18, 2022 XUHU WAN, HKUST 1


1. Business analytics itself is a combination
of statistics, and computer science and
domain knowledge.
2. To gain insights of statistical models or
theory, it is necessary to implement and
compare different statistical models on
the real data.
Learning 3. In getting a broader perspective, we
Business should not only know how to implement
the models but understand how they
Analytics connect and are related to the deeper
with Real logic behind them.
Data 4. Python is a great tool to get the data,
manage the data, build models. Python is
good for deep customization of statistical
algorithm, application of machine
learning models, and application of deep
learning models. R is good for statistical
models.

Tuesday, October 18,


2022 XUHU WAN, HKUST 2
Data Analysis is a process of cleaning,
transforming, visualizing and modeling data
with the goal of discovering useful pattern.
The process of business analytics some time is
bottom-up: Find qualitative pattern of data in
business practice first and then explore data
How to do to build models and check if this pattern is
consistent and check the bias and variance of
data this pattern.
analysis 1. Transform Raw Data in a Desired Format
2. Clean the data and make changes on data
3. Prepare a model and check whether the
model is over/under fitting
4. Use the model to make business decision.

Tuesday, October 18,


2022 XUHU WAN, HKUST 3
Contents
1. Why Python ?
2. How to use python
3. How to use packages
4. Review of basic data types
5. List and List comprehension
6. Numpy and scipy
7. Matplotlib and Statistical Graphing

Tuesday, October 18, 2022 XUHU WAN, HKUST 4


Tuesday, October 18, 2022 XUHU WAN, HKUST 5
Which Programming Language?

Tuesday, October 18, 2022 XUHU WAN, HKUST 6


Tuesday, October 18, 2022 XUHU WAN, HKUST 7
Jupyter Notebook/Lab
• Instead of running python or ipython in the command or editor (pycharm,
spyder), Jupyter notebook/jupyter lab are more useful data analysis.
• This will open another Python interface in a web browser. it does not actually
need any Internet connection to run.

Fire lab/notebook in navigator

Tuesday, October 18, 2022 XUHU WAN, HKUST 8


Google Colab- Jupyter notebook in the cloud
Colab, or "Colaboratory", allows you to write and execute Python in your
browser, with
1. Zero configuration required
2. Access to GPUs free of charge
3. Easy sharing: share the most updated lecture notes instantly, share
your in-class practice.
Whether you're a student, a data scientist or an AI researcher, Colab can
make your work easier.

Tuesday, October 18, 2022 XUHU WAN, HKUST 9


Tuesday, October 18, 2022 XUHU WAN, HKUST 10
Install Packages
If you use Anaconda it includes many packages , such as numpy, scipy, pandas, etc.
However, we still need to install new packages that do not come with the installed
Anaconda, such as plotly. You only install them once with the following lines in noteboo
and can delete(or comment) it after installation.

If you run your notebook using local laptop, only need to install new package once.
However, you need to install the package every time you start the notebook if you
use colab.

Tuesday, October 18, 2022 XUHU WAN, HKUST 11


Importing Libraries
Libraries provide additional functionality in an organized and packaged way.
Basic python includes many functionality. But there are many methods and
attributes we need to import from external library. There libraries are still
python based but provide tools for many application.

- Numpy : array and matrix, random number generators

- Scipy: Numerical routines for optimization, linear algebra and statistics

- Matplotlib: a comprehensive library mainly for creating static visualization.

Tuesday, October 18, 2022 XUHU WAN, HKUST 12


Libraries we will use
NumPy
NumPy is a package for scientific computing.
We mainly use it to generate random variables

Scipy
SciPy provides algorithms for optimization,
integration, interpolation, eigenvalue
problems, algebraic equations, differential
equations, statistics and many other classes of
problems. We mainly use it to computer
cumulative distribution function.

Tuesday, October 18, 2022 XUHU WAN, HKUST 13


Libraries we will use
Matplotlib
Matplotlib is a comprehensive library for
creating static, animated, and interactive
visualizations in Python.

Pandas
Pandas offers data structures and operations
for manipulating numerical tables and time
series. This library is our main focus

Tuesday, October 18, 2022 XUHU WAN, HKUST 14


Tuesday, October 18, 2022 XUHU WAN, HKUST 15
Basic Data Types in Python

DATA
CODE -Data attributes
-Data Methods

Tuesday, October 18, 2022 XUHU WAN, HKUST 16


Bool
A Boolean value is either true or false. It is named after the British
mathematician, George Boole— some rules for reasoning about and
combining these values. This is the basis of all modern computer logic.

In Python, the two Boolean values are True and False (the capitalization must
be exactly as shown), and the Python type is bool. True is counted as 1 and
Falses is counted as 0 .

Tuesday, October 18, 2022 XUHU WAN, HKUST 17


String

- Index - Concatenate

Tuesday, October 18, 2022 XUHU WAN, HKUST 18


Practice:
1) Input Your last name AAA
2) Print Mr/Ms AAA

Tuesday, October 18, 2022 XUHU WAN, HKUST 19


Tuesday, October 18, 2022 XUHU WAN, HKUST 20
List: A universal data container

Lists are used to store


heterogeneous data, and are
created with a pair of square
brackets, [].
Lists are one of 4 built-in data
types in Python used to store
collections of data, (Tuple, Set,
and Dictionary)

Tuesday, October 18, 2022 XUHU WAN, HKUST 21


List Comprehension
List comprehension is often used when doing data preprocessing.
There are two different kinds of comprehension

 Plain comprehension

 Conditional comprehension

 Filtered comprehension

Tuesday, October 18, 2022 XUHU WAN, HKUST 22


Tuesday, October 18, 2022 XUHU WAN, HKUST 23
Numpy ndarray
The numpy library gives Python the ability to work with matrices and arrays. It
provides a high-performance scientific computation tools working with data
1D or 2D,3D arrays. For statistics, it also provides a lot of fast generators of
random variables. It can be defined from the list

However it has its own convenience that the list does not have

The numpy array can do element-wise computation

Tuesday, October 18, 2022 XUHU WAN, HKUST 24


Multidimensional array

Tuesday, October 18, 2022 XUHU WAN, HKUST 25


Built- in Function to Create Arrays

Tuesday, October 18, 2022 XUHU WAN, HKUST 26


Random Number Generators
Standard normal random
variables

Uniform [0,1] random


variables

Normal Random Variable

Tuesday, October 18, 2022 XUHU WAN, HKUST 27


Array index

Tuesday, October 18, 2022 XUHU WAN, HKUST 28


Mathematics with Array
- Element-wise computation

Tuesday, October 18, 2022 XUHU WAN, HKUST 29


Statistics in Scipy
Generate normal random variable

Compute cumulative probability:

Compute probability density

Compute critical value


i.e. P(X<?)=0.05

Tuesday, October 18, 2022 XUHU WAN, HKUST 30


Practice
1) Find 95% VaR : where Wealth X is normal with
mean=1M and std=0.5M
2) In upper tail hypothesis testing, find p-value

Tuesday, October 18, 2022 XUHU WAN, HKUST 31


Tuesday, October 18, 2022 XUHU WAN, HKUST 32
Visualizing Data
Data visualization is as much a part of the data processing step as the data
presentation step.
It is much easier to compare values when they are plotted than numeric
values.
 By visualizing data we are able to get a better intuitive sense of the data than
would be possible by looking at tables of values alone.

 Visualizations can bring to light hidden patterns in data, that you, the analyst,
can exploit for model selection

Tuesday, October 18, 2022 XUHU WAN, HKUST 33


Line Plot

Tuesday, October 18, 2022 XUHU WAN, HKUST 34


Histogram (Profile)

Dynamic changes of profiles across time are important in identifying the structural
change of pattern.

Tuesday, October 18, 2022 XUHU WAN, HKUST 35


Dynamic changes of profiles across time are important in identifying the structural
change of pattern.

Tuesday, October 18, 2022 XUHU WAN, HKUST 36


Figure object (Optional)

Tuesday, October 18, 2022 XUHU WAN, HKUST 37


Graph of Bivariate Variables

Tuesday, October 18, 2022 XUHU WAN, HKUST 38


Practice :
a) We will use random number generator to generate daily changes of stock
price(252 days ). For simplicity, we assume that the daily change follows a
standard normal distribution.
b) Apply cumulative sum method of numpy array to compute accumulative
sum of daily change, which is used to mimic stock price.
c) Plot stock price.

Tuesday, October 18, 2022 XUHU WAN, HKUST 39


Appendix:
Installation of
Anaconda

Tuesday, October 18, 2022 XUHU WAN, HKUST 40


Official Website:
https://www.anaconda.com/products/individual

Tuesday, October 18, 2022 XUHU WAN, HKUST 41


Anaconda Navigator:

Tuesday, October 18, 2022 XUHU WAN, HKUST 42


Tuesday, October 18, 2022 XUHU WAN, HKUST 43

You might also like