~ Data Preparation ~ Faizal Mahananto Bio • Faizal Mahananto • Departemen Sistem Informasi ITS • Acc. Bg : ITS (S1) – Kumamoto Univ. (S2,S3) • Interes Penelitian • Data analisis (time series, signal) • Medical, Economic, Engineering • Proyek • ICU Patient Monitor and analysis • Out-patien rehabilitation apps • Predictive maintenance using ML Modul Pelatihan • Business Intelligence(BI) adalah mata kuliah WAJIB yang diajarkan pada Departemen Sistem Informasi, ITS (4 sks). • Decision Support System adalah mata kuliah PILIHAN untuk Lab Rekayasa Data dan Intelegensi Bisnis (RDIB). • Mata kuliah lain yang bersinggungan dengan isi pelatihan ini adalah Statistika, Data Mining, Machine Learning, Forecasting Techniques. Business Intelligence
Turban et al, 2010
Learning Outcome : memanfaatkan BI untuk memaksimalan
data yang dimiliki dalam mencapai goal organisasi. Solusi sederhana Outline 1. Data Formatting 4. Basic Descriptive Analytics 1. Create data 1. Describe function 2. Google Colab env. 2. Individual measurement calculation 2. Basic Python 1. Data connection 5. Hands on 2. Library for desc analytics 3. Functions 3. Auto Visualization 1. Matplotlib 2. Visualizing (Bar, Scatter, Line) Persiapan • Data : BI TSA.xlsx • Akses ke Google Colab : https://colab.research.google.com/ Google Colab • Cloud based, python environment. Working with Google Colab • Create notebook • Create simple code • Rename notebook • Check Google Drive working folder Data Formatting • Export Excel to CSV • CSV File to Google Drive • Excel file to Google Drive Basic connection • CSV data access
Browser
Code Basic connection • Google Sheet data Access
Code - Read Code - Write
Basic connection • Database Access Library for desc. analytics • Library for numeric Pandas and Numpy Library for desc. analytics • Access data Python function
Dengan return value
Tanpa return value Basic Descriptive Analytics Data analytics can be broken into four key types: 1. Descriptive, which answers the question, “What happened?” 2. Diagnostic, which answers the question, “Why did this happen?” 3. Predictive, which answers the question, “What might happen in the future?” 4. Prescriptive, which answers the question, “What should we do next?” Basic Descriptive Analytics • Analisis deskriptif menjawab: • Performa bisnis yang sedang berjalan • Rata-rata pengunjung toko per-hari, revenue, inventory, production • Trend dari data historis • Kenaikan/penurunan penjualan, followers, engagement rate • Perbandingan data and posisi bisnis • Rasio pendapatan dan pengeluaran, rasio ROI
• Analisis deskriptif memberikan pandangan garis besar terhadap bisnis
dan data. • Namun, kurang mampu memberikan informasi mengenai penyebab kondisi tersebut dan apa yang terjadi selanjutnya. Basic Descriptive Analytics • Alur analisis deskriptif 1. Tentukan ukuran/variabel bisnis yang ingin dianalisis 2. Identifikasi data yang dibutuhkan 3. Siapkan data 4. Lakukan data analisis yang dibutuhkan 5. Visualisasikan data Basic Descriptive Analytics • Descriptive statistics Basic Descriptive Analytics • Examples Basic Descriptive Analytics • Examples Basic Descriptive Analytics Basic Descriptive Analytics • Data Trend • https://www.kaggle.com/datasets/kandij/electric-production • kaggle datasets download -d kandij/electric-production • Code • from sklearn.linear_model import LinearRegression • model = LinearRegression() • model.fit(np.arange(len(dataelc)).reshape(- 1,1), dataelc['Value']) • r_sq = model.score(np.arange(len(dataelc)).reshape(- 1,1), dataelc['Value']) • print('R2:', r_sq) • print('intercept:', model.intercept_) • print('slope:', model.coef_) Access data from Kaggle • Buat akun Kaggle. • Download Personal API • Install Kaggle connection • !pip install -q Kaggle • Import API • from google.colab import files • files.upload() • !mkdir ~/.kaggle • !cp kaggle.json ~/.kaggle/ • !chmod 600 ~/.kaggle/kaggle.json • !kaggle datasets download -d kandij/electric- production • !unzip electric-production.zip Access data from Kaggle • Akses dengan pandas • pd.read_csv('/content/Electric_Production.csv’) Visualisasikan • from matplotlib import pyplot as plt • plt.scatter(np.arange(len(dataelc)).reshape(- 1,1), dataelc['Value'],color='g') • plt.plot(np.arange(len(dataelc)).reshape(- 1,1), model.predict(np.arange(len(dataelc)).reshape(- 1,1)),color='g') • plt.scatter(np.arange(len(dataelc)).reshape(- 1,1), dataelc['Value'], color='r') • plt.plot(np.arange(len(dataelc)).reshape(- 1,1), dataelc['Value'], color='r') Hands on Data → variable bisnis Access data from database • hostname: relational.fit.cvut.cz • port: 3306 • username: guest • password: relational • database : AdventureWorks2014