You are on page 1of 3

Introduction to Data Science (CS361)

데이터 사이언스 개론
2023년 봄학기 (Tue/Thu 14:30~16:00), 선택, 3:0:3

1. Instructor
Jae-Gil Lee (x 3545, jaegil@kaist.ac.kr)

2. Summary
Data science is an inter-disciplinary field focused on extracting knowledge from
typically large data sets. This course aims at teaching basic skills in data science for
undergraduate students. It covers basic probability and statistics theories required
for data science; exploratory data analysis (EDA) required for understanding a given
data set; and predictive analysis based on statistical or machine learning techniques.
Additionally, it discusses recent big data processing techniques and various data
science applications. The students will learn how to implement the methodologies
using the Python language (on Google Colab).

IMPORTANT: If you took CS492 (Introduction to Data Science) in 2021 or 2022, you
cannot enroll this course.

3. Textbooks
• Main: Joel Grus, Data Science from Scratch: First Principles with Python, 2nd ed.,
O’Reilly, 2019.
• Auxiliary: Peter Bruce, Andrew Bruce, and Peter Gedeck, Practical Statistics for
Data Scientists: 50+ Essential Concepts Using R and Python, 2nd ed., O’Reilly, 2019.
• Auxiliary: Zhi-Hua Zhou, Machine Learning, Springer, 2021.

4. Course requirements (tentative)


• One data science competition with courtesy of the Korea Customs Service
(대한민국 관세청): The winners of this competition will be awarded by the
Commissioner (관세청장) of the Korea Customs Service.

5. Grading policy
• Midterm exam: 35%
• Final exam: 35%
• Project (data science competition): 30%
• Class attendance: deducting 1 point after 3 absences
• A-F style

6. Prerequisite
• Data structure related course (e.g., CS206)
• Python programming

1
7. Tentative schedule (subject to change)

Week Contents
Introduction
1
• Big data, data science, data scientist, etc.
Statistics and Probability Theory
2 • Central tendency, dispersion, correlation, etc.
• Bayes theorem, normal distribution, central limit theorem, etc.
Hypothesis and Inference
3 • Statistical hypothesis testing, confidence interval, p-value, A/B
testing, etc.
Data Acquisition
4
• Web scraping, data API (e.g., Twitter API), JSON format, etc.
Data Understanding
5 • Exploratory data analysis
• Data visualization (matplotlib, seaborn libraries)
Data Preprocessing
6 • Data cleaning, data scaling, etc.
• Dimensionality reduction etc.
Machine Learning Basics
7 • Modeling, training, overfitting, bias-variance tradeoff, feature
extraction, etc.
8 Midterm Exam
Linear and Logistic Regression
9 • Regression and prediction concepts
• Gradient descent, maximum likelihood estimation, etc.
k-Nearest Neighbor and Naïve Bayes
• Classification concepts
10
• k-Nearest neighbor concepts and examples
• Naïve Bayes concepts and examples
Decision Tree
11 • Decision tree concepts and examples
• Information theory (e.g., entropy)
Neural Network and Deep Learning (I)
12 • Perceptron, feed-forward neural network, etc.
• Learning theory (e.g., backpropagation)
Neural Network and Deep Learning (II)
• Multi-layer perceptron (MLP)
13 • Loss & optimization, activation function, softmax & cross-
entropy loss, etc.
• MNIST image classification example
Big Data Processing: MapReduce
14 • MapReduce concepts and Hadoop/Spark
• MapReduce algorithms

2
Relevant Topics and Applications
15 • Recent trends in industry and academia
• Recommender etc.
16 Final Exam

You might also like