Tugas1 - 4 Analisis Data Talitha Syahda Aguslin (20037061)

Nama : Talitha Syahda Aguslin
Nim : 20037061
Mata Kuliah : Analisis Data (A) Tugas 1
Mengimport Data
import pandas as pd
Data_BreastCancer = "/content/Breast_Cancer.xlsx"
Data = pd.read_excel(Data_BreastCancer)
df = pd.DataFrame(Data)
print(df)
Age Race Marital Status T Stage N Stage 6th Stage \

0 68 White Married T1 N1 IIA
1 50 White Married T2 N2 IIIA
2 58 White Divorced T3 N3 IIIC
3 58 White Married T1 N1 IIA
4 47 White Married T2 N1 IIB
... ... ... ... ... ... ...
4019 62 Other Married T1 N1 IIA
4020 56 White Divorced T2 N2 IIIA
4022 58 Black Divorced T2 N1 IIB
differentiate Grade A Stage Tumor Size Estrogen Status \

0 Poorly differentiated 3 Regional 4 Positive
1 Moderately differentiated 2 Regional 35 Positive
... ... ... ... ... ...
Progesterone Status Regional Node Examined Reginol Node Positive \

0 Positive 24 1
1 Positive 14 5
2 Positive 14 7
3 Positive 2 1
4 Positive 3 1
... ... ... ...
4019 Positive 1 1
4020 Positive 14 8
4021 Negative 11 3
4022 Positive 11 1
4023 Positive 7 2
Survival Months Status

0 60 Alive
1 62 Alive
2 75 Alive
3 84 Alive
4 50 Alive
... ... ...
4019 49 Alive
4020 69 Alive
4021 69 Alive
4022 72 Alive
4023 100 Alive
[4024 rows x 16 columns]

Mengelompokkan Data Berdasarkan Tipe Data
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4024 entries, 0 to 4023
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 4024 non-null int64
1 Race 4024 non-null object
2 Marital Status 4024 non-null object
3 T Stage 4024 non-null object
4 N Stage 4024 non-null object
5 6th Stage 4024 non-null object
6 differentiate 4024 non-null object
7 Grade 4024 non-null object
8 A Stage 4024 non-null object
9 Tumor Size 4024 non-null int64
10 Estrogen Status 4024 non-null object
11 Progesterone Status 4024 non-null object
12 Regional Node Examined 4024 non-null int64
13 Reginol Node Positive 4024 non-null int64
14 Survival Months 4024 non-null int64
15 Status 4024 non-null object
dtypes: int64(5), object(11)
memory usage: 503.1+ KB
Mengelompokkan Data Berdasarkan Tipe Data Float
Data_Float = df.select_dtypes(include=[int])
print(Data_Float)
Age Tumor Size Regional Node Examined Reginol Node Positive \

0 68 4 24 1
1 50 35 14 5
2 58 63 14 7
3 58 18 2 1
4 47 41 3 1
... ... ... ... ...
4019 62 9 1 1
4020 56 46 14 8
4021 68 22 11 3
4022 58 44 11 1
4023 46 30 7 2
Survival Months
0 60
1 62
2 75
3 84
4 50
... ...
4019 49
4020 69
4021 69
4022 72
4023 100
Mengelompokkan Data Berdasarkan Tipe Data String(Object)

Data_String = df.select_dtypes(include=[object])
print(Data_String)
Race Marital Status T Stage N Stage 6th Stage \

0 White Married T1 N1 IIA
1 White Married T2 N2 IIIA
2 White Divorced T3 N3 IIIC
3 White Married T1 N1 IIA
4 White Married T2 N1 IIB
... ... ... ... ... ...
4019 Other Married T1 N1 IIA
4020 White Divorced T2 N2 IIIA
4022 Black Divorced T2 N1 IIB
differentiate Grade A Stage Estrogen Status \

0 Poorly differentiated 3 Regional Positive
1 Moderately differentiated 2 Regional Positive
... ... ... ... ...
Progesterone Status Status

0 Positive Alive
1 Positive Alive
2 Positive Alive
3 Positive Alive
4 Positive Alive
... ... ...
4019 Positive Alive
4020 Positive Alive
4021 Negative Alive
4022 Positive Alive
4023 Positive Alive
Membangun Tipe Data
Membangun tipe data List
datalist = ["Age","Race","Marital Stage","T Stage","N Stage","6th Stage","Grade","A Stage","Tumor Size","Estrogen Status","Progesteron Status
print(datalist)
type(datalist)
['Age', 'Race', 'Marital Stage', 'T Stage', 'N Stage', '6th Stage', 'Grade', 'A Stage', 'Tumor Size', 'Estrogen Status', 'Progesteron S
list
 
*Membangun data bertipe data set
dataset = {"Age","68","Face","White","Age","50"}
print(dataset)
type(dataset)
{'68', '50', 'Face', 'White', 'Age'}

set
Membangun Data Bertipe data strings

datastrings = ("Marital")
print(datastrings + " Stage")
type(datastrings)
Marital Stage
str
Membangun data bertipe Tuples
datatuples = ("A Stage","Regional",4,35,63,18,41)

print(datatuples)
type(datatuples)
('A Stage', 'Regional', 4, 35, 63, 18, 41)

tuple
Membangun Data Bertipe Dictionary
a = {
"Age" : 65,
"Face" : "White",
"Marital Stage" : "Married",
"Tumor Size" : 4,
"Status" : "Alive"
}
print(a)
type(a)y
{'Age': 65, 'Face': 'White', 'Marital Stage': 'Married', 'Tumor Size': 4, 'Status': 'Alive'}
dict
Membangun Data Bertipe Deque
import collections
datacoll = collections.deque (["Married","Married","Divorced","Married","Married","Married"])
datacoll.append ("Single")
print (datacoll)
datacoll.appendleft ("separated")
print (datacoll)
datacoll.pop ()
print(datacoll)
datacoll.popleft()
print(datacoll)
type(datacoll)
deque(['Married', 'Married', 'Divorced', 'Married', 'Married', 'Married', 'Single'])

deque(['separated', 'Married', 'Married', 'Divorced', 'Married', 'Married', 'Married', 'Single'])
deque(['separated', 'Married', 'Married', 'Divorced', 'Married', 'Married', 'Married'])
deque(['Married', 'Married', 'Divorced', 'Married', 'Married', 'Married'])
collections.deque
Membangun Data bertipe Heap
import heapq
T = [4,35,63,18,41,20,8,30,103]
heapq.heapify (T)
heapq.heapreplace (T,2)
print(T)
type(T)
[2, 18, 8, 30, 41, 20, 63, 35, 103]

list
Nim : 20037061
Analisis Data Tugas 2
1. membuat dataframe dari python list
list1 = [68,50,58,58,47]
list2 = ["White","White","White","White","White"]
list3 = ["Married","Married","Divorced","Married","Married"]
import pandas as pd
list1 = list1 + [48]
list2 = list2 + ["Black"]
list3 = list3 + ["Divorced"]
dataframe_list = pd.DataFrame(list(zip(list1,list2,list3)),columns = ['Age',"Race","Marital Stage"], index = [1,2,3,4,5,6])
dataframe_list
Age Race Marital Stage
1 68 White Married
2 50 White Married
3 58 White Divorced
4 58 White Married
5 47 White Married
6 48 Black Divorced
2. membuat dataframe dari python tuple
tuple1 = ('Regional','Alive')
tuple2 = ('Distant','Dead')
dataframe_tuple = pd.DataFrame(tuple((tuple1,tuple2)), columns = ['a stage','status'])
dataframe_tuple
a stage status
0 Regional Alive
1 Distant Dead
3. membuat dataframe dari Excel format
xlsx_file = pd.read_excel("/content/Breast_Cancer.xlsx")
xlsx_file.head()
Regional Reginol
Marital T N 6th Tumor Estrogen Progesterone
Age Race differentiate Grade A Stage Node Node
Status Stage Stage Stage Size Status Status
Examined Positive
Poorly
0 68 White Married T1 N1 IIA 3 Regional 4 Positive Positive 24 1
differentiated
Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14 5
differentiated
Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14 7
differentiated
Poorly
differentiated
Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3 1
differentiated
4. membuat dataframe dari url format
import pandas as pd
download_url=("https://storage.googleapis.com/kagglesdsdata/datasets/2396275/4045493/Breast_Cancer.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Go
df = pd.read_csv(download_url)
type(df)
df
Regional Regi
Age Race differentiate Grade A Stage Node N
Examined Posit
Poorly
0 68 White Married T1 N1 IIA 3 Regional 4 Positive Positive 24
differentiated
Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14
differentiated
Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14
differentiated
Poorly
3 58 White Married T1 N1 IIA 3 Regional 18 Positive Positive 2
differentiated
Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3
differentiated
... ... ... ... ... ... ... ... ... ... ... ... ... ...
Moderately
4019 62 Other Married T1 N1 IIA 2 Regional 9 Positive Positive 1
differentiated
Moderately
4020 56 White Divorced T2 N2 IIIA 2 Regional 46 Positive Positive 14
differentiated
Moderately
4021 68 White Married T2 N1 IIB 2 Regional 22 Positive Negative 11
differentiated
Moderately
4022 58 Black Divorced T2 N1 IIB 2 Regional 44 Positive Positive 11
differentiated
Moderately
4023 46 White Married T2 N1 IIB 2 Regional 30 Positive Positive 7
differentiated
4024 rows × 16 columns
5. membuat dataframe dari csv format
csv_file = pd.read_csv("/content/Breast_Cancer.csv")
csv_file.head()
Regional Reginol
Age Race differentiate Grade A Stage Node Node
Examined Positive
Poorly
differentiated
Moderately
1 50 White Married T2 N2 IIIA 2 Regional 35 Positive Positive 14 5
differentiated
Moderately
2 58 White Divorced T3 N3 IIIC 2 Regional 63 Positive Positive 14 7
differentiated
Poorly
differentiated
Poorly
4 47 White Married T2 N1 IIB 3 Regional 41 Positive Positive 3 1
differentiated
6. membuat dataframe dari python numpy
import numpy as np
array = np.array([['White',68,4],['White',50,35],['White',58,63]])
dataframe_numpy = pd.DataFrame(array,columns = ['Race','Age','Tumor Size'])
dataframe_numpy
Race Age Tumor Size
0 White 68 4
1 White 50 35
7. membuat
2 White dataframe
58 dari63
pandas series
series1 = pd.Series(['Married','Married','Discovered'])
series2 = pd.Series([3,2,2])
dataframe_series = pd.DataFrame({'Marital Status':series1, 'grade':series2})
dataframe_series
Marital Status grade
0 Married 3
1 Married 2
2 Discovered 2
Nim : 20037061
Mata Kuliah : Analisis Data (Tugas 3)
1. import dataset dan library seaborn
import seaborn as sns

titanic = sns.load_dataset('titanic')
titanic.head()
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
deskripsi data titanic
import numpy as np
titanic.describe()
survived pclass age sibsp parch fare
count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
titanic.describe(exclude = np.number)
sex embarked class who adult_male deck embark_town alive alone
count 891 889 891 891 891 203 889 891 891
unique 2 3 3 3 2 7 3 2 2
top male S Third man True C Southampton no True
freq 577 644 491 537 537 59 644 549 537
Data Exploration
a. melihat nilai 10 data pertama dan 5 data terakhir
titanic.tail()
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alon
886 0 2 male 27.0 0 0 13.00 S Second man True NaN Southampton no Tru
887 1 1 female 19.0 0 0 30.00 S First woman False B Southampton yes Tru
888 0 3 female NaN 1 2 23.45 S Third woman False NaN Southampton no Fals
889 1 1 male 26.0 0 0 30.00 C First man True C Cherbourg yes Tru
titanic.head(10)
890 0 3 male 32.0 0 0 7.75 Q Third man True NaN Queenstown no Tru
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no Fals
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fals
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes Tru
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fals
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no Tru
5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no Tru
6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no Tru
7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no Fals
8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes Fals
9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes Fals
titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
titanic = titanic.drop(columns = ['age','embark_town'])

titanic
survived pclass sex sibsp parch fare class who adult_male alive alone
0 0 3 male 1 0 7.2500 Third man True no False

b. melakukan deskripsi data
1 1 1 female 1 0 71.2833 First woman False yes False
2
titanic.describe()1 3 female 0 0 7.9250 Third woman False yes True
3 1 1 female 1 0 53.1000 First woman False yes False

survived pclass sibsp parch fare
4 0 3 male 0 0 8.0500 Third man True no True
count 891.000000 891.000000 891.000000 891.000000 891.000000
... ... ... ... ... ... ... ... ... ... ... ...
mean 0.383838 2.308642 0.523008 0.381594 32.204208
886 0 2 male 0 0 13.0000 Second man True no True
std 0.486592 0.836071 1.102743 0.806057 49.693429
887 1 1 female 0 0 30.0000 First woman False yes True
min 0.000000 1.000000 0.000000 0.000000 0.000000
888 0 3 female 1 2 23.4500 Third woman False no False
25% 0.000000 2.000000 0.000000 0.000000 7.910400
889 1 1 male 0 0 30.0000 First man True yes True
50% 0.000000 3.000000 0.000000 0.000000 14.454200
890 0 3 male 0 0 7.7500 Third man True no True
75% 1.000000 3.000000 1.000000 0.000000 31.000000
891 rows × 11 columns
max 1.000000 3.000000 8.000000 6.000000 512.329200
titanic.describe().T
count mean std min 25% 50% 75% max
survived 891.0 0.383838 0.486592 0.0 0.0000 0.0000 1.0 1.0000
pclass 891.0 2.308642 0.836071 1.0 2.0000 3.0000 3.0 3.0000
sibsp 891.0 0.523008 1.102743 0.0 0.0000 0.0000 1.0 8.0000
parch 891.0 0.381594 0.806057 0.0 0.0000 0.0000 0.0 6.0000
fare 891.0 32.204208 49.693429 0.0 7.9104 14.4542 31.0 512.3292
c. menentukan jumlah dan ukuran data
titanic.size
9801
titanic.shape
(891, 11)
titanic.who.value_counts()
man 537
woman 271
child 83
Name: who, dtype: int64
d. matriks varians-covarians
titanic.cov().style.background_gradient(cmap='coolwarm')
survived pclass sibsp parch fare adult_male alone
survived 0.236772 -0.137703 -0.018954 0.032017 6.221787 -0.132720 -0.048451
pclass -0.137703 0.699015 0.076599 0.012429 -22.830196 0.038494 0.055347
sibsp -0.018954 0.076599 1.216043 0.368739 8.748734 -0.136916 -0.315568
parch 0.032017 0.012429 0.368739 0.649728 8.661052 -0.138108 -0.230242
fare 6.221787 -22.830196 8.748734 8.661052 2469.436846 -4.428757 -6.613861
adult_male -0.132720 0.038494 -0.136916 -0.138108 -4.428757 0.239723 0.097026
alone -0.048451 0.055347 -0.315568 -0.230242 -6.613861 0.097026 0.239723

e. matriks korelasi
titanic.corr().style.background_gradient(cmap='coolwarm')
survived pclass sibsp parch fare adult_male alone
survived 1.000000 -0.338481 -0.035322 0.081629 0.257307 -0.557080 -0.203367
pclass -0.338481 1.000000 0.083081 0.018443 -0.549500 0.094035 0.135207
sibsp -0.035322 0.083081 1.000000 0.414838 0.159651 -0.253586 -0.584471
parch 0.081629 0.018443 0.414838 1.000000 0.216225 -0.349943 -0.583398
fare 0.257307 -0.549500 0.159651 0.216225 1.000000 -0.182024 -0.271832
adult_male -0.557080 0.094035 -0.253586 -0.349943 -0.182024 1.000000 0.404744
alone -0.203367 0.135207 -0.584471 -0.583398 -0.271832 0.404744 1.000000
f. persentase data kosong pada dataframe yang telah dibersihkan
persentase_data_kosong = titanic.isna().sum()*100/len(titanic)
nilaikosong_titanic = pd.DataFrame({'Persentase Data Kosong' : persentase_data_kosong})
nilaikosong_titanic
Persentase Data Kosong
survived 0.0
pclass 0.0
sex 0.0
sibsp 0.0
parch 0.0
fare 0.0
class 0.0
who 0.0
adult_male 0.0
alive 0.0
alone 0.0
g. dataframe yang telah dilakukan proses pembersihan baris
titanic=titanic.dropna()
titanic
h. visualisasi standar deviasi dan varians
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fals
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fals

import numpy as np # pip install numpy
import 6scipy.stats0 # pip install
1 scipy
male 54.0 0 0 51.8625 S First man True E Southampton no Tru
import matplotlib.pyplot as plt # pip install matplotlib
mean = 10
0 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes Fals
std = 1
11 1 1 female 58.0 0 0 26.5500 S First woman False C Southampton yes Tru
var = np.square(std)
plt.figure(figsize
... ... = (15, 8))
... ... ... ... ... ... ... ... ... ... ... ... ...
x = np.linspace(mean - 3*std, mean + 3*std, 100)
871
plt.plot(x, 1 1 female 47.0
scipy.stats.norm.pdf(x, 1
mean, std)) 1 52.5542 S First woman False D Southampton yes Fals
plt.axvline(x = mean - std, c = 'blue')
872 0 1 male 33.0 0 0 5.0000 S First man True B Southampton no Tru
plt.axvline(x = mean + std, c = 'blue')
plt.axvline(x
879 = mean
1 - 2*std, c = 'red')
1 female 56.0 0 1 83.1583 C First woman False C Cherbourg yes Fals
plt.axvline(x = mean + 2*std, c = 'red')
887
plt.axvline(x = 1
mean - 1 female
3*std, 19.0
c = 'black') 0 0 30.0000 S First woman False B Southampton yes Tru
plt.axvline(x = mean + 3*std, c = 'black')
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes Tru
<matplotlib.lines.Line2D
182 rows × 15 columns at 0x7f5e95ab93a0>
i. visualisasi quartil
plt.figure(figsize = (10, 5))

sns.boxplot(titanic['age'])
plt.axvline(titanic['age'].describe()['25%'], color = 'red', label = 'Q1')
plt.axvline(titanic['age'].describe()['50%'], color = 'yellow', label = 'Q2')
plt.axvline(titanic['age'].describe()['75%'], color = 'blue', label = 'Q3')
plt.annotate('Outlier', (titanic['age'].describe()['max'],0.1), xytext = (titanic['age'].describe()['max'],0.3),
arrowprops = dict(facecolor = 'blue'), fontsize = 13 )
IQR = titanic['age'].describe()['75%'] - titanic['age'].describe()['25%']
plt.annotate('Batas Atas', (titanic['age'].describe()['75%'] + 1.5*IQR, 0.2),
xytext = (titanic['age'].describe()['75%'] + 1.5*IQR, 0.4),
plt.annotate('Batas Bawah', (titanic['age'].describe()['min'], 0.2),
xytext = (titanic['age'].describe()['min'], 0.4),
plt.legend()
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword a
warnings.warn(
<matplotlib.legend.Legend at 0x7f5e93193eb0>
Colab paid products - Cancel contracts here

Nim : 20037061
Mata Kuliah : Analisis Data ( Tugas 4 )
Visualisasi Data Dengan Pandas
import pandas as pd
import matplotlib.pyplot as plt
titanic.head(15)
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alo
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no Fal
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fal
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes Tr
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fal
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no Tr
5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no Tr
6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no Tr
7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no Fal
8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes Fal
9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes Fal
10 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes Fal
11 1 1 female 58.0 0 0 26.5500 S First woman False C Southampton yes Tr
12 0 3 male 20.0 0 0 8.0500 S Third man True NaN Southampton no Tr
13 0 3 male 39.0 1 5 31.2750 S Third man True NaN Southampton no Fal
14 0 3 female 14.0 0 0 7.8542 S Third child False NaN Southampton no Tr
Visualisasi Data
1. Scatter Plot
scatter_plot = titanic.plot.scatter(x = 'age', y = 'fare', c = 'green', title = 'Titanic Dataset', s = 30, figsize=(5,5))
Mengganti Background Color
scatter_plot = titanic.plot.scatter(x = 'age', y = 'fare', c = 'green', title = 'Titanic Dataset', s = 30, figsize=(5,5))
scatter_plot.set_facecolor('plum')
2. Line Chart
titanic_plot_line = titanic['age'].plot.line(color = ('pink'), title = 'Titanic Dataset', subplots =True, figsize=(8,8), layout=(2,2))
titanic_plot_line
array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)
3. Histogram
titanic['age'].plot.hist(color='maroon', bins=20, alpha=0.8, title = 'Titanic Dataset')

#batas interval
<AxesSubplot:title={'center':'Titanic Dataset'}, ylabel='Frequency'>

horizontal histogram
titanic['age'].plot.hist(color='maroon', bins=20, alpha=0.8, title = 'Titanic Dataset', orientation='horizontal')
<AxesSubplot:title={'center':'Titanic Dataset'}, xlabel='Frequency'>
titanic['age'].plot.hist(subplots=True, layout=(4,4), figsize=(20,20), bins=20)
array([[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>],
[<AxesSubplot:ylabel='Frequency'>,
<AxesSubplot:ylabel='Frequency'>]], dtype=object)
4. Bar Chart (Data Kategorik)
titanic['sex'].value_counts().sort_index().plot.bar(color='olive', alpha=0.5, rot=45)
<AxesSubplot:>
5. AREA (NUMERIK)
titanic['fare'].plot.area(color = 'darkorange',subplots=True, figsize=(8,8))
array([<AxesSubplot:>], dtype=object)
6. Box dan Box Plot
titanic['fare'].plot.box()
<AxesSubplot:>
titanic.boxplot(fontsize=8)
<AxesSubplot:>
7. Kernel Density Estimate Plot
titanic['age'].plot.kde(color = 'deeppink', bw_method=5)

<AxesSubplot:ylabel='Density'>
8. Pie CHART
titanic['who'].value_counts().plot.pie(explode=[0.2,0.1,0.1] ,autopct='%1.1f%%',shadow=True,figsize = (5,5))
<AxesSubplot:ylabel='who'>
Visualisasi Data dengan Matplotlib
1. Scatter Plot
x1 = [2,3,4]
y1 = [5,5,5]
x2 =[1,2,3,4,5]
y2 =[2,3,2,3,4]
y3 =[6,8,7,8,7]
plt.scatter(x1,y1)
plt.scatter(x2,y2,marker='v',color='r')
plt.scatter(x2,y3,marker='^',color='m')
plt.show()
2. Line Chart
x =[1,2,3,4,5,6,7,8,9]
y1=[1,3,5,3,1,3,5,3,1]
y2=[2,4,6,4,2,4,6,4,2]
plt.plot(x,y1,label='line L')
plt.plot(x,y2,label='line H')
plt.plot()
plt.xlabel('x axis')
plt.xlabel('y axis')
plt.title('Line Graph Example')
plt.legend()
plt.show()
3. Histogram

import numpy as np
import pandas as pd
iris = pd.read_csv("https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv")
print(iris.head())
n = 5 + np.random.randn(1000)
m = [m for m in range(len(n))]
plt.bar(m,n)
plt.title("raw Data")
plt.show
plt.hist(n, bins=20)
plt.title("Histogram")
plt.show()
plt.hist(n, cumulative=True, bins=20)
plt.title('Cumulative Histogram')
plt.show()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
4. Bar Chart

x1 = [1, 3, 4, 5, 6, 7, 9]
y1 = [4, 7, 2, 4, 7, 8, 3]
x2 = [2, 4, 6, 8, 10]
y2 = [5, 6, 2, 6, 2]
plt.bar(x1, y1, label='Yellow Bar', color='y')
plt.bar(x2, y2, label='Red Bar', color='r')
plt.plot()
plt.xlabel('bar number')
plt.ylabel('bar height')
plt.title(' Bar Chart Example')
plt.legend()
plt.show()
5. Stack Plot

idxes = [1,2,3,4, 5, 6,7,8,9]
arr1 =[23,40,28, 43, 8, 44, 43, 18,17]
arr2 = [17,30,22,14,17,17,29,22,30]
arr3 = [ 15,31,18,22,18,19,13,32,39]
plt.plot([],[],color='m', label = 'D 1')
plt.plot([],[],color='k', label = 'D 2')
plt.plot([],[],color='c', label = 'D 3')
plt.stackplot(idxes,arr1,arr2,arr3, colors=['m','k','c'])
plt.title('stack plot ')
plt.legend()
plt.show()
6. Pie Chart
labels = 'S1','S2','S3'
sections = [56,66,24]
colors=['c','y','m']
plt.pie(sections, labels=labels, colors=colors, startangle = 90,

explode = (0,0.1,0),
autopct = '%1.2f%%')
plt.axis('equal')
plt.title('Pie Chart Exampple')
plt.show()
7. 3D Scatter Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X = titanic['age']
Y = titanic['fare']
ax.scatter(X,Y, c='r', marker='<')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
plt.show()
8. 3D Bar Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = np.random.randint(10, size=10)
z = np.zeros(10)
dx = np.ones(10)
dy = np.ones(10)
dz = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ax.bar3d(x, y, z, dx, dy, dz, color='r')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
plt.title('3D Bar Chart Example')
plt.tight_layout()
plt.show()
9. Wireframe Plot
from mpl_toolkits.mplot3d import axes3d

fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')
x,y,z = axes3d.get_test_data()
ax.plot_wireframe(x,y,z, rstride = 2, cstride = 2)
plt.title("Wireframe plot")
plt.tight_layout()
plt.show()
10. Bubble Plot
sns.scatterplot(data=titanic, x="age", y="fare",size="pclass", legend=False, sizes=(20, 2000))

plt.show()
11. scatter plot spinning 3D
fig = plt.figure(figsize = (8,8))

ax = plt.axes(projection = '3d')
z = np.linspace(0, 50, 1000)

x = np.sin(z)
y = np.cos(z)
ax.plot3D(x, y, z, 'teal')
ax.view_init(-140, 60)
plt.show()
Visualisasi Data dengan Seaborn
1. Displot

sns.displot(titanic['age'], color='rosybrown', kde=True)
<seaborn.axisgrid.FacetGrid at 0x7f13149d1880>
sns.displot(titanic['fare'], color='slateblue', kde=True)

<seaborn.axisgrid.FacetGrid at 0x7fa422849e20>
2. Scatter Plot
df = sns.load_dataset('titanic')
sns.regplot(x=df['age'], y=df['fare'], color='indigo', marker='x')
<AxesSubplot:xlabel='age', ylabel='fare'>
3. Box Plot
sns.boxplot(titanic['age'], color='forestgreen')
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword a

warnings.warn(
<AxesSubplot:xlabel='age'>
4. Matriks Korelasi
sns.heatmap(titanic.corr(),cmap='Blues', annot = True)

<AxesSubplot:>
5. Joint Plot ( 2 Data Numerik )
sns.jointplot(x='age', y='fare', data=titanic, kind ='reg')
<seaborn.axisgrid.JointGrid at 0x7f13140233d0>
6 Violin Plot

df = sns.load_dataset('titanic')
sns.violinplot(x=df['who'], y=df['age'])
<AxesSubplot:xlabel='who', ylabel='age'>
7. Pair Plot
iris = sns.load_dataset('iris')
sns.pairplot(hue='species',data=iris)
<seaborn.axisgrid.PairGrid at 0x7fa426dcad00>
Colab paid products - Cancel contracts here

Tugas1 - 4 Analisis Data Talitha Syahda Aguslin (20037061)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tugas1 - 4 Analisis Data Talitha Syahda Aguslin (20037061)

Uploaded by

Copyright:

Available Formats

Nama : Talitha Syahda Aguslin

Mata Kuliah : Analisis Data (A) Tugas 1

Age Race Marital Status T Stage N Stage 6th Stage \

differentiate Grade A Stage Tumor Size Estrogen Status \

Progesterone Status Regional Node Examined Reginol Node Positive \

Survival Months Status

[4024 rows x 16 columns]

Mengelompokkan Data Berdasarkan Tipe Data Float

Age Tumor Size Regional Node Examined Reginol Node Positive \

[4024 rows x 5 columns]

Mengelompokkan Data Berdasarkan Tipe Data String(Object)

Race Marital Status T Stage N Stage 6th Stage \

differentiate Grade A Stage Estrogen Status \

Progesterone Status Status

[4024 rows x 11 columns]

Membangun Tipe Data

Membangun tipe data List

*Membangun data bertipe data set

{'68', '50', 'Face', 'White', 'Age'}

Membangun Data Bertipe data strings

Membangun data bertipe Tuples

datatuples = ("A Stage","Regional",4,35,63,18,41)

('A Stage', 'Regional', 4, 35, 63, 18, 41)

Membangun Data Bertipe Dictionary

Membangun Data Bertipe Deque

deque(['Married', 'Married', 'Divorced', 'Married', 'Married', 'Married', 'Single'])

Membangun Data bertipe Heap

[2, 18, 8, 30, 41, 20, 63, 35, 103]

Analisis Data Tugas 2

1. membuat dataframe dari python list

Age Race Marital Stage

2. membuat dataframe dari python tuple

3. membuat dataframe dari Excel format

4024 rows × 16 columns

5. membuat dataframe dari csv format

6. membuat dataframe dari python numpy

Marital Status grade

Mata Kuliah : Analisis Data (Tugas 3)

1. import dataset dan library seaborn

import seaborn as sns

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

deskripsi data titanic

survived pclass age sibsp parch fare

count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

sex embarked class who adult_male deck embark_town alive alone

top male S Third man True C Southampton no True

freq 577 644 491 537 537 59 644 549 537

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no Fals

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes Fals

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes Fals

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no Tru

5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no Tru

6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no Tru