You are on page 1of 27

Nama : Talitha Syahda Aguslin

Nim : 20037061

Mata Kuliah : Analisis Data (A) Tugas 1

Mengimport Data

import pandas as pd
Data_BreastCancer = "/content/Breast_Cancer.xlsx"
Data = pd.read_excel(Data_BreastCancer)
df = pd.DataFrame(Data)
print(df)

Age Race Marital Status T Stage N Stage 6th Stage \


0 68 White Married T1 N1 IIA
1 50 White Married T2 N2 IIIA
2 58 White Divorced T3 N3 IIIC
3 58 White Married T1 N1 IIA
4 47 White Married T2 N1 IIB
... ... ... ... ... ... ...
4019 62 Other Married T1 N1 IIA
4020 56 White Divorced T2 N2 IIIA
4021 68 White Married T2 N1 IIB
4022 58 Black Divorced T2 N1 IIB
4023 46 White Married T2 N1 IIB

differentiate Grade A Stage Tumor Size Estrogen Status \


0 Poorly differentiated 3 Regional 4 Positive
1 Moderately differentiated 2 Regional 35 Positive
2 Moderately differentiated 2 Regional 63 Positive
3 Poorly differentiated 3 Regional 18 Positive
4 Poorly differentiated 3 Regional 41 Positive
... ... ... ... ... ...
4019 Moderately differentiated 2 Regional 9 Positive
4020 Moderately differentiated 2 Regional 46 Positive
4021 Moderately differentiated 2 Regional 22 Positive
4022 Moderately differentiated 2 Regional 44 Positive
4023 Moderately differentiated 2 Regional 30 Positive

Progesterone Status Regional Node Examined Reginol Node Positive \


0 Positive 24 1
1 Positive 14 5
2 Positive 14 7
3 Positive 2 1
4 Positive 3 1
... ... ... ...
4019 Positive 1 1
4020 Positive 14 8
4021 Negative 11 3
4022 Positive 11 1
4023 Positive 7 2

Survival Months Status


0 60 Alive
1 62 Alive
2 75 Alive
3 84 Alive
4 50 Alive
... ... ...
4019 49 Alive
4020 69 Alive
4021 69 Alive
4022 72 Alive
4023 100 Alive

[4024 rows x 16 columns]


Mengelompokkan Data Berdasarkan Tipe Data

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4024 entries, 0 to 4023
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 4024 non-null int64
1 Race 4024 non-null object
2 Marital Status 4024 non-null object
3 T Stage 4024 non-null object
4 N Stage 4024 non-null object
5 6th Stage 4024 non-null object
6 differentiate 4024 non-null object
7 Grade 4024 non-null object
8 A Stage 4024 non-null object
9 Tumor Size 4024 non-null int64
10 Estrogen Status 4024 non-null object
11 Progesterone Status 4024 non-null object
12 Regional Node Examined 4024 non-null int64
13 Reginol Node Positive 4024 non-null int64
14 Survival Months 4024 non-null int64
15 Status 4024 non-null object
dtypes: int64(5), object(11)
memory usage: 503.1+ KB

Mengelompokkan Data Berdasarkan Tipe Data Float

Data_Float = df.select_dtypes(include=[int])
print(Data_Float)

Age Tumor Size Regional Node Examined Reginol Node Positive \


0 68 4 24 1
1 50 35 14 5
2 58 63 14 7
3 58 18 2 1
4 47 41 3 1
... ... ... ... ...
4019 62 9 1 1
4020 56 46 14 8
4021 68 22 11 3
4022 58 44 11 1
4023 46 30 7 2

Survival Months
0 60
1 62
2 75
3 84
4 50
... ...
4019 49
4020 69
4021 69
4022 72
4023 100

[4024 rows x 5 columns]

Mengelompokkan Data Berdasarkan Tipe Data String(Object)


Data_String = df.select_dtypes(include=[object])
print(Data_String)

Race Marital Status T Stage N Stage 6th Stage \


0 White Married T1 N1 IIA
1 White Married T2 N2 IIIA
2 White Divorced T3 N3 IIIC
3 White Married T1 N1 IIA
4 White Married T2 N1 IIB
... ... ... ... ... ...
4019 Other Married T1 N1 IIA
4020 White Divorced T2 N2 IIIA
4021 White Married T2 N1 IIB
4022 Black Divorced T2 N1 IIB
4023 White Married T2 N1 IIB

differentiate Grade A Stage Estrogen Status \


0 Poorly differentiated 3 Regional Positive
1 Moderately differentiated 2 Regional Positive
2 Moderately differentiated 2 Regional Positive
3 Poorly differentiated 3 Regional Positive
4 Poorly differentiated 3 Regional Positive
... ... ... ... ...
4019 Moderately differentiated 2 Regional Positive
4020 Moderately differentiated 2 Regional Positive
4021 Moderately differentiated 2 Regional Positive
4022 Moderately differentiated 2 Regional Positive
4023 Moderately differentiated 2 Regional Positive

Progesterone Status Status


0 Positive Alive
1 Positive Alive
2 Positive Alive
3 Positive Alive
4 Positive Alive
... ... ...
4019 Positive Alive
4020 Positive Alive
4021 Negative Alive
4022 Positive Alive
4023 Positive Alive

[4024 rows x 11 columns]

Membangun Tipe Data

Membangun tipe data List

datalist = ["Age","Race","Marital Stage","T Stage","N Stage","6th Stage","Grade","A Stage","Tumor Size","Estrogen Status","Progesteron Status
print(datalist)
type(datalist)

['Age', 'Race', 'Marital Stage', 'T Stage', 'N Stage', '6th Stage', 'Grade', 'A Stage', 'Tumor Size', 'Estrogen Status', 'Progesteron S
list

 

*Membangun data bertipe data set

dataset = {"Age","68","Face","White","Age","50"}
print(dataset)
type(dataset)

{'68', '50', 'Face', 'White', 'Age'}


set

Membangun Data Bertipe data strings


datastrings = ("Marital")
print(datastrings + " Stage")
type(datastrings)

Marital Stage
str

Membangun data bertipe Tuples

datatuples = ("A Stage","Regional",4,35,63,18,41)


print(datatuples)
type(datatuples)

('A Stage', 'Regional', 4, 35, 63, 18, 41)


tuple

Membangun Data Bertipe Dictionary

a = {
"Age" : 65,
"Face" : "White",
"Marital Stage" : "Married",
"Tumor Size" : 4,
"Status" : "Alive"
}
print(a)
type(a)y

{'Age': 65, 'Face': 'White', 'Marital Stage': 'Married', 'Tumor Size': 4, 'Status': 'Alive'}
dict

Membangun Data Bertipe Deque

import collections
datacoll = collections.deque (["Married","Married","Divorced","Married","Married","Married"])
datacoll.append ("Single")
print (datacoll)

datacoll.appendleft ("separated")
print (datacoll)

datacoll.pop ()
print(datacoll)

datacoll.popleft()
print(datacoll)
type(datacoll)

deque(['Married', 'Married', 'Divorced', 'Married', 'Married', 'Married', 'Single'])


deque(['separated', 'Married', 'Married', 'Divorced', 'Married', 'Married', 'Married', 'Single'])
deque(['separated', 'Married', 'Married', 'Divorced', 'Married', 'Married', 'Married'])
deque(['Married', 'Married', 'Divorced', 'Married', 'Married', 'Married'])
collections.deque

Membangun Data bertipe Heap

import heapq
T = [4,35,63,18,41,20,8,30,103]
heapq.heapify (T)
heapq.heapreplace (T,2)
print(T)
type(T)

[2, 18, 8, 30, 41, 20, 63, 35, 103]


list
Nama : Talitha Syahda Aguslin

Nim : 20037061

Analisis Data Tugas 2

1. membuat dataframe dari python list

list1 = [68,50,58,58,47]
list2 = ["White","White","White","White","White"]
list3 = ["Married","Married","Divorced","Married","Married"]
import pandas as pd
list1 = list1 + [48]
list2 = list2 + ["Black"]
list3 = list3 + ["Divorced"]
dataframe_list = pd.DataFrame(list(zip(list1,list2,list3)),columns = ['Age',"Race","Marital Stage"], index = [1,2,3,4,5,6])
dataframe_list

Age Race Marital Stage

1 68 White Married

2 50 White Married

3 58 White Divorced

4 58 White Married

5 47 White Married

6 48 Black Divorced

2. membuat dataframe dari python tuple

tuple1 = ('Regional','Alive')
tuple2 = ('Distant','Dead')
dataframe_tuple = pd.DataFrame(tuple((tuple1,tuple2)), columns = ['a stage','status'])
dataframe_tuple

a stage status

0 Regional Alive

1 Distant Dead

3. membuat dataframe dari Excel format

xlsx_file = pd.read_excel("/content/Breast_Cancer.xlsx")
xlsx_file.head()

Regional Reginol
Marital T N 6th Tumor Estrogen Progesterone
Age Race differentiate Grade A Stage Node Node
Status Stage Stage Stage Size Status Status
Examined Positive

Poorly
0 68 White Married T1 N1 IIA 3 Regional 4 Positive Positive 24 1
differentiated

Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14 5
differentiated

Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14 7
differentiated

Poorly
3 58 White Married T1 N1 IIA 3 Regional 18 Positive Positive 2 1
differentiated

Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3 1
differentiated
4. membuat dataframe dari url format

import pandas as pd
download_url=("https://storage.googleapis.com/kagglesdsdata/datasets/2396275/4045493/Breast_Cancer.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Go
df = pd.read_csv(download_url)
type(df)
df

Regional Regi
Marital T N 6th Tumor Estrogen Progesterone
Age Race differentiate Grade A Stage Node N
Status Stage Stage Stage Size Status Status
Examined Posit

Poorly
0 68 White Married T1 N1 IIA 3 Regional 4 Positive Positive 24
differentiated

Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14
differentiated

Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14
differentiated

Poorly
3 58 White Married T1 N1 IIA 3 Regional 18 Positive Positive 2
differentiated

Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3
differentiated

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Moderately
4019 62 Other Married T1 N1 IIA 2 Regional 9 Positive Positive 1
differentiated

Moderately
4020 56 White Divorced T2 N2 IIIA 2 Regional 46 Positive Positive 14
differentiated

Moderately
4021 68 White Married T2 N1 IIB 2 Regional 22 Positive Negative 11
differentiated

Moderately
4022 58 Black Divorced T2 N1 IIB 2 Regional 44 Positive Positive 11
differentiated

Moderately
4023 46 White Married T2 N1 IIB 2 Regional 30 Positive Positive 7
differentiated

4024 rows × 16 columns

5. membuat dataframe dari csv format

csv_file = pd.read_csv("/content/Breast_Cancer.csv")
csv_file.head()

Regional Reginol
Marital T N 6th Tumor Estrogen Progesterone
Age Race differentiate Grade A Stage Node Node
Status Stage Stage Stage Size Status Status
Examined Positive

Poorly
0 68 White Married T1 N1 IIA 3 Regional 4 Positive Positive 24 1
differentiated

Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14 5
differentiated

Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14 7
differentiated

Poorly
3 58 White Married T1 N1 IIA 3 Regional 18 Positive Positive 2 1
differentiated

Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3 1
differentiated

6. membuat dataframe dari python numpy

import numpy as np
array = np.array([['White',68,4],['White',50,35],['White',58,63]])
dataframe_numpy = pd.DataFrame(array,columns = ['Race','Age','Tumor Size'])
dataframe_numpy
Race Age Tumor Size

0 White 68 4

1 White 50 35
7. membuat
2 White dataframe
58 dari63
pandas series

series1 = pd.Series(['Married','Married','Discovered'])
series2 = pd.Series([3,2,2])
dataframe_series = pd.DataFrame({'Marital Status':series1, 'grade':series2})
dataframe_series

Marital Status grade

0 Married 3

1 Married 2

2 Discovered 2
Nama : Talitha Syahda Aguslin

Nim : 20037061

Mata Kuliah : Analisis Data (Tugas 3)

1. import dataset dan library seaborn

import seaborn as sns


titanic = sns.load_dataset('titanic')
titanic.head()

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

deskripsi data titanic

import numpy as np
titanic.describe()

survived pclass age sibsp parch fare

count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

titanic.describe(exclude = np.number)

sex embarked class who adult_male deck embark_town alive alone

count 891 889 891 891 891 203 889 891 891

unique 2 3 3 3 2 7 3 2 2

top male S Third man True C Southampton no True

freq 577 644 491 537 537 59 644 549 537

Data Exploration
a. melihat nilai 10 data pertama dan 5 data terakhir

titanic.tail()
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alon

886 0 2 male 27.0 0 0 13.00 S Second man True NaN Southampton no Tru

887 1 1 female 19.0 0 0 30.00 S First woman False B Southampton yes Tru

888 0 3 female NaN 1 2 23.45 S Third woman False NaN Southampton no Fals

889 1 1 male 26.0 0 0 30.00 C First man True C Cherbourg yes Tru
titanic.head(10)
890 0 3 male 32.0 0 0 7.75 Q Third man True NaN Queenstown no Tru

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alon

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no Fals

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fals

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes Tru

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fals

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no Tru

5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no Tru

6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no Tru

7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no Fals

8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes Fals

9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes Fals

titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB

titanic = titanic.drop(columns = ['age','embark_town'])


titanic
survived pclass sex sibsp parch fare class who adult_male alive alone

0 0 3 male 1 0 7.2500 Third man True no False


b. melakukan deskripsi data
1 1 1 female 1 0 71.2833 First woman False yes False

2
titanic.describe()1 3 female 0 0 7.9250 Third woman False yes True

3 1 1 female 1 0 53.1000 First woman False yes False


survived pclass sibsp parch fare
4 0 3 male 0 0 8.0500 Third man True no True
count 891.000000 891.000000 891.000000 891.000000 891.000000
... ... ... ... ... ... ... ... ... ... ... ...
mean 0.383838 2.308642 0.523008 0.381594 32.204208
886 0 2 male 0 0 13.0000 Second man True no True
std 0.486592 0.836071 1.102743 0.806057 49.693429
887 1 1 female 0 0 30.0000 First woman False yes True
min 0.000000 1.000000 0.000000 0.000000 0.000000
888 0 3 female 1 2 23.4500 Third woman False no False
25% 0.000000 2.000000 0.000000 0.000000 7.910400
889 1 1 male 0 0 30.0000 First man True yes True
50% 0.000000 3.000000 0.000000 0.000000 14.454200
890 0 3 male 0 0 7.7500 Third man True no True
75% 1.000000 3.000000 1.000000 0.000000 31.000000
891 rows × 11 columns
max 1.000000 3.000000 8.000000 6.000000 512.329200

titanic.describe().T

count mean std min 25% 50% 75% max

survived 891.0 0.383838 0.486592 0.0 0.0000 0.0000 1.0 1.0000

pclass 891.0 2.308642 0.836071 1.0 2.0000 3.0000 3.0 3.0000

sibsp 891.0 0.523008 1.102743 0.0 0.0000 0.0000 1.0 8.0000

parch 891.0 0.381594 0.806057 0.0 0.0000 0.0000 0.0 6.0000

fare 891.0 32.204208 49.693429 0.0 7.9104 14.4542 31.0 512.3292

c. menentukan jumlah dan ukuran data

titanic.size

9801

titanic.shape

(891, 11)

titanic.who.value_counts()

man 537
woman 271
child 83
Name: who, dtype: int64

d. matriks varians-covarians

titanic.cov().style.background_gradient(cmap='coolwarm')

survived pclass sibsp parch fare adult_male alone

survived 0.236772 -0.137703 -0.018954 0.032017 6.221787 -0.132720 -0.048451

pclass -0.137703 0.699015 0.076599 0.012429 -22.830196 0.038494 0.055347

sibsp -0.018954 0.076599 1.216043 0.368739 8.748734 -0.136916 -0.315568

parch 0.032017 0.012429 0.368739 0.649728 8.661052 -0.138108 -0.230242

fare 6.221787 -22.830196 8.748734 8.661052 2469.436846 -4.428757 -6.613861

adult_male -0.132720 0.038494 -0.136916 -0.138108 -4.428757 0.239723 0.097026

alone -0.048451 0.055347 -0.315568 -0.230242 -6.613861 0.097026 0.239723


e. matriks korelasi

titanic.corr().style.background_gradient(cmap='coolwarm')

survived pclass sibsp parch fare adult_male alone

survived 1.000000 -0.338481 -0.035322 0.081629 0.257307 -0.557080 -0.203367

pclass -0.338481 1.000000 0.083081 0.018443 -0.549500 0.094035 0.135207

sibsp -0.035322 0.083081 1.000000 0.414838 0.159651 -0.253586 -0.584471

parch 0.081629 0.018443 0.414838 1.000000 0.216225 -0.349943 -0.583398

fare 0.257307 -0.549500 0.159651 0.216225 1.000000 -0.182024 -0.271832

adult_male -0.557080 0.094035 -0.253586 -0.349943 -0.182024 1.000000 0.404744

alone -0.203367 0.135207 -0.584471 -0.583398 -0.271832 0.404744 1.000000

f. persentase data kosong pada dataframe yang telah dibersihkan

persentase_data_kosong = titanic.isna().sum()*100/len(titanic)
nilaikosong_titanic = pd.DataFrame({'Persentase Data Kosong' : persentase_data_kosong})
nilaikosong_titanic

Persentase Data Kosong

survived 0.0

pclass 0.0

sex 0.0

sibsp 0.0

parch 0.0

fare 0.0

class 0.0

who 0.0

adult_male 0.0

alive 0.0

alone 0.0

g. dataframe yang telah dilakukan proses pembersihan baris

titanic=titanic.dropna()
titanic
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alon
h. visualisasi standar deviasi dan varians
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fals

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fals


import numpy as np # pip install numpy
import 6scipy.stats0 # pip install
1 scipy
male 54.0 0 0 51.8625 S First man True E Southampton no Tru
import matplotlib.pyplot as plt # pip install matplotlib
mean = 10
0 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes Fals
std = 1
11 1 1 female 58.0 0 0 26.5500 S First woman False C Southampton yes Tru
var = np.square(std)
plt.figure(figsize
... ... = (15, 8))
... ... ... ... ... ... ... ... ... ... ... ... ...
x = np.linspace(mean - 3*std, mean + 3*std, 100)
871
plt.plot(x, 1 1 female 47.0
scipy.stats.norm.pdf(x, 1
mean, std)) 1 52.5542 S First woman False D Southampton yes Fals
plt.axvline(x = mean - std, c = 'blue')
872 0 1 male 33.0 0 0 5.0000 S First man True B Southampton no Tru
plt.axvline(x = mean + std, c = 'blue')
plt.axvline(x
879 = mean
1 - 2*std, c = 'red')
1 female 56.0 0 1 83.1583 C First woman False C Cherbourg yes Fals
plt.axvline(x = mean + 2*std, c = 'red')
887
plt.axvline(x = 1
mean - 1 female
3*std, 19.0
c = 'black') 0 0 30.0000 S First woman False B Southampton yes Tru
plt.axvline(x = mean + 3*std, c = 'black')
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes Tru

<matplotlib.lines.Line2D
182 rows × 15 columns at 0x7f5e95ab93a0>

i. visualisasi quartil

plt.figure(figsize = (10, 5))


sns.boxplot(titanic['age'])
plt.axvline(titanic['age'].describe()['25%'], color = 'red', label = 'Q1')
plt.axvline(titanic['age'].describe()['50%'], color = 'yellow', label = 'Q2')
plt.axvline(titanic['age'].describe()['75%'], color = 'blue', label = 'Q3')
plt.annotate('Outlier', (titanic['age'].describe()['max'],0.1), xytext = (titanic['age'].describe()['max'],0.3),
arrowprops = dict(facecolor = 'blue'), fontsize = 13 )
IQR = titanic['age'].describe()['75%'] - titanic['age'].describe()['25%']
plt.annotate('Batas Atas', (titanic['age'].describe()['75%'] + 1.5*IQR, 0.2),
xytext = (titanic['age'].describe()['75%'] + 1.5*IQR, 0.4),
arrowprops = dict(facecolor = 'blue'), fontsize = 13 )
plt.annotate('Batas Bawah', (titanic['age'].describe()['min'], 0.2),
xytext = (titanic['age'].describe()['min'], 0.4),
arrowprops = dict(facecolor = 'blue'), fontsize = 13 )
plt.legend()
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword a
warnings.warn(
<matplotlib.legend.Legend at 0x7f5e93193eb0>

Colab paid products - Cancel contracts here


Nama : Talitha Syahda Aguslin

Nim : 20037061

Mata Kuliah : Analisis Data ( Tugas 4 )

Visualisasi Data Dengan Pandas

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

titanic = sns.load_dataset('titanic')

titanic.head(15)

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alo

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no Fal

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fal

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes Tr

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fal

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no Tr

5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no Tr

6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no Tr

7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no Fal

8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes Fal

9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes Fal

10 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes Fal

11 1 1 female 58.0 0 0 26.5500 S First woman False C Southampton yes Tr

12 0 3 male 20.0 0 0 8.0500 S Third man True NaN Southampton no Tr

13 0 3 male 39.0 1 5 31.2750 S Third man True NaN Southampton no Fal

14 0 3 female 14.0 0 0 7.8542 S Third child False NaN Southampton no Tr

Visualisasi Data

1. Scatter Plot

scatter_plot = titanic.plot.scatter(x = 'age', y = 'fare', c = 'green', title = 'Titanic Dataset', s = 30, figsize=(5,5))
Mengganti Background Color

scatter_plot = titanic.plot.scatter(x = 'age', y = 'fare', c = 'green', title = 'Titanic Dataset', s = 30, figsize=(5,5))
scatter_plot.set_facecolor('plum')

2. Line Chart

titanic_plot_line = titanic['age'].plot.line(color = ('pink'), title = 'Titanic Dataset', subplots =True, figsize=(8,8), layout=(2,2))
titanic_plot_line

array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)

3. Histogram

titanic['age'].plot.hist(color='maroon', bins=20, alpha=0.8, title = 'Titanic Dataset')


#batas interval

<AxesSubplot:title={'center':'Titanic Dataset'}, ylabel='Frequency'>


horizontal histogram

titanic['age'].plot.hist(color='maroon', bins=20, alpha=0.8, title = 'Titanic Dataset', orientation='horizontal')

<AxesSubplot:title={'center':'Titanic Dataset'}, xlabel='Frequency'>

titanic['age'].plot.hist(subplots=True, layout=(4,4), figsize=(20,20), bins=20)

array([[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>],
[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>],
[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>],
[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>]], dtype=object)

4. Bar Chart (Data Kategorik)

titanic['sex'].value_counts().sort_index().plot.bar(color='olive', alpha=0.5, rot=45)

<AxesSubplot:>
5. AREA (NUMERIK)

titanic['fare'].plot.area(color = 'darkorange',subplots=True, figsize=(8,8))

array([<AxesSubplot:>], dtype=object)

6. Box dan Box Plot

titanic['fare'].plot.box()

<AxesSubplot:>

titanic.boxplot(fontsize=8)

<AxesSubplot:>

7. Kernel Density Estimate Plot

titanic['age'].plot.kde(color = 'deeppink', bw_method=5)


<AxesSubplot:ylabel='Density'>

8. Pie CHART

titanic['who'].value_counts().plot.pie(explode=[0.2,0.1,0.1] ,autopct='%1.1f%%',shadow=True,figsize = (5,5))

<AxesSubplot:ylabel='who'>

Visualisasi Data dengan Matplotlib

1. Scatter Plot

import matplotlib.pyplot as plt

x1 = [2,3,4]
y1 = [5,5,5]

x2 =[1,2,3,4,5]
y2 =[2,3,2,3,4]
y3 =[6,8,7,8,7]

plt.scatter(x1,y1)
plt.scatter(x2,y2,marker='v',color='r')
plt.scatter(x2,y3,marker='^',color='m')
plt.show()
2. Line Chart

x =[1,2,3,4,5,6,7,8,9]
y1=[1,3,5,3,1,3,5,3,1]
y2=[2,4,6,4,2,4,6,4,2]
plt.plot(x,y1,label='line L')
plt.plot(x,y2,label='line H')
plt.plot()

plt.xlabel('x axis')
plt.xlabel('y axis')
plt.title('Line Graph Example')
plt.legend()
plt.show()

3. Histogram

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
iris = pd.read_csv("https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv")
print(iris.head())
n = 5 + np.random.randn(1000)
m = [m for m in range(len(n))]
plt.bar(m,n)
plt.title("raw Data")
plt.show
plt.hist(n, bins=20)
plt.title("Histogram")
plt.show()
plt.hist(n, cumulative=True, bins=20)
plt.title('Cumulative Histogram')
plt.show()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

4. Bar Chart

import matplotlib.pyplot as plt


x1 = [1, 3, 4, 5, 6, 7, 9]
y1 = [4, 7, 2, 4, 7, 8, 3]
x2 = [2, 4, 6, 8, 10]
y2 = [5, 6, 2, 6, 2]
plt.bar(x1, y1, label='Yellow Bar', color='y')
plt.bar(x2, y2, label='Red Bar', color='r')
plt.plot()
plt.xlabel('bar number')
plt.ylabel('bar height')
plt.title(' Bar Chart Example')
plt.legend()
plt.show()

5. Stack Plot

import matplotlib.pyplot as plt


idxes = [1,2,3,4, 5, 6,7,8,9]
arr1 =[23,40,28, 43, 8, 44, 43, 18,17]
arr2 = [17,30,22,14,17,17,29,22,30]
arr3 = [ 15,31,18,22,18,19,13,32,39]
plt.plot([],[],color='m', label = 'D 1')
plt.plot([],[],color='k', label = 'D 2')
plt.plot([],[],color='c', label = 'D 3')
plt.stackplot(idxes,arr1,arr2,arr3, colors=['m','k','c'])
plt.title('stack plot ')
plt.legend()
plt.show()
6. Pie Chart

labels = 'S1','S2','S3'
sections = [56,66,24]
colors=['c','y','m']

plt.pie(sections, labels=labels, colors=colors, startangle = 90,


explode = (0,0.1,0),
autopct = '%1.2f%%')

plt.axis('equal')
plt.title('Pie Chart Exampple')
plt.show()

7. 3D Scatter Plot

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X = titanic['age']
Y = titanic['fare']
ax.scatter(X,Y, c='r', marker='<')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
plt.show()

8. 3D Bar Plot

fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = np.random.randint(10, size=10)
z = np.zeros(10)

dx = np.ones(10)
dy = np.ones(10)
dz = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

ax.bar3d(x, y, z, dx, dy, dz, color='r')

ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
plt.title('3D Bar Chart Example')
plt.tight_layout()
plt.show()

9. Wireframe Plot

from mpl_toolkits.mplot3d import axes3d


fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')

x,y,z = axes3d.get_test_data()

ax.plot_wireframe(x,y,z, rstride = 2, cstride = 2)

plt.title("Wireframe plot")
plt.tight_layout()
plt.show()

10. Bubble Plot

sns.scatterplot(data=titanic, x="age", y="fare",size="pclass", legend=False, sizes=(20, 2000))


plt.show()
11. scatter plot spinning 3D

fig = plt.figure(figsize = (8,8))


ax = plt.axes(projection = '3d')

z = np.linspace(0, 50, 1000)


x = np.sin(z)
y = np.cos(z)
ax.plot3D(x, y, z, 'teal')

ax.view_init(-140, 60)
plt.show()

Visualisasi Data dengan Seaborn

1. Displot

import seaborn as sns


titanic = sns.load_dataset('titanic')
sns.displot(titanic['age'], color='rosybrown', kde=True)

<seaborn.axisgrid.FacetGrid at 0x7f13149d1880>

sns.displot(titanic['fare'], color='slateblue', kde=True)


<seaborn.axisgrid.FacetGrid at 0x7fa422849e20>

2. Scatter Plot

df = sns.load_dataset('titanic')
sns.regplot(x=df['age'], y=df['fare'], color='indigo', marker='x')

<AxesSubplot:xlabel='age', ylabel='fare'>

3. Box Plot

sns.boxplot(titanic['age'], color='forestgreen')

/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword a


warnings.warn(
<AxesSubplot:xlabel='age'>

4. Matriks Korelasi

sns.heatmap(titanic.corr(),cmap='Blues', annot = True)


<AxesSubplot:>

5. Joint Plot ( 2 Data Numerik )

sns.jointplot(x='age', y='fare', data=titanic, kind ='reg')

<seaborn.axisgrid.JointGrid at 0x7f13140233d0>

6 Violin Plot

import seaborn as sns


df = sns.load_dataset('titanic')
sns.violinplot(x=df['who'], y=df['age'])

<AxesSubplot:xlabel='who', ylabel='age'>

7. Pair Plot

iris = sns.load_dataset('iris')
sns.pairplot(hue='species',data=iris)
<seaborn.axisgrid.PairGrid at 0x7fa426dcad00>

Colab paid products - Cancel contracts here

You might also like