You are on page 1of 7

Bunker Hill Community College

Computer Technology Department

CIT-137-M1
Spring 2020, Monday 6:00-9:45
Introduction to Big Data
With R and R Studio

Professor Michael Harris


Email Address: mdharris@bhcc.mass.edu
Office: (617) 228 2486
Cell (text): (617) 480-3003
COMMONWEALTH OF MASSACHUSETTS
BUNKER HILL COMMUNITY COLLEGE
CHARLESTOWN, MASSACHUSETTS

COMPUTER INFORMATION TECHNOLOGY DEPARTMENT

Introduction to Big Data with R and R Studio


COURSE OUTLINE & REQUIREMENTS

COURSE DESCRIPTION: This course provides practical foundation level training that enables immediate and
effective participation in big data and other analytics projects. It includes an introduction to big data and the
Data Analytics Lifecycle to address business challenges that leverage big data. The course provides grounding
in basic and advanced analytic methods and an introduction to big data analytics technology and tools. Labs
offer opportunities for students to understand how these methods and tools may be applied to real world
business challenges by a practicing data scientist. The course takes an "Open", or technology-neutral
approach, and includes a final lab which addresses a data science challenge by applying the concepts taught in
the course with an open source database. Prerequisite: Information Technology Problem Solving (CIT113) or
equivalent (CIT110, CIT120, CIT182 or department chair approval).

PREREQUISITES FOR THIS COURSE:

CIT-113 or Equivalent (110, 120, 182) or dept. char signature

2
COURSE OBJECTIVES: Students should be able to do the following after completing this course.

 Describe the role of Data Science in society, and state how data is used in a real world environment.

 Describe various tools a data scientist uses and demonstrate how to use an open source software
package called R-Studio, a GUI (graphical user interface) for the CLI (command line interface) software
R.

 Utilize R to write functions, loops, examine and explore data and utilize libraries for added functionality
for data analysis such as: tidyverse, dplyr, ggplot2, lubridate, tidyr, stringr, reshape2

 Utilize basic statistical parameters related distributions and show how data can be used and analyzed
from distributions.

 Demonstrate how to turn unstructured data (messy and clean data) into structured data (tidy data).

 Demonstrate how to live link R, Excel and Tableau to a database, and update the software as the
database updates in real time.

 Demonstrate how to search for online databases, find open data sources, and search the data for
answers to questions.

 Utilize resiliency skills, improve communication, and learn to overcome obstacles in a rapidly changing
environment while working on a complex, multistage group project.

 Show how to web scrape data, clean it, and present the data to a user in a readable, often visual,
format which utilizes tools and techniques learned throughout the course.

3
INSTRUCTOR: The instructor for this course is: Professor <Michael Harris>
E.Mail Address: <mdharris@bhcc.mass.edu>
Desk Location: <D123E>
Telephone: <617-480-3003>
Office Hours: <W 4:00-5:45, Th 11:30-2:15 >
REQUIRED COURSE MATERIAL:
1. R and R Studio Software
2. Data Camp Online course content MOOC
3. https://sites.google.com/site/cit137sp19 Course website

SUPPLEMENTAL COURSE MATERIAL:


1. R for Data Science https://r4ds.had.co.nz/
2. https://en.wikibooks.org/wiki/Data_Science:_An_Introduction
3. http://bhcc.onthehub.com/
4. An Introduction To Data Science Textbook (note Creative Commons book, no publisher)
5. OpenIntro Statistics 3rd edition, Textbook (note Creative Commons book, no publisher)
6. OpenIntro Labs for R Labs for Open Intro to Statistics

BLOGS AND OTHER DATA SCIENCE RESOURCES


1. http://flowingdata.com/
2. http://fivethirtyeight.com/
3. http://www.kdnuggets.com/
4. https://www.kaggle.com/

STUDENT REQUIREMENTS: To complete this course, receive a final grade and full credit each student must:
1. Complete assigned homework and attend classes
2. Complete all homework assignments
3. Complete all required Lab Projects
4. Complete a final project and give a presentation on the project

STUDENT EVALUATION: A letter grade will be awarded at the completion of the course according to the following
weighted average:

The point to Letter Grade equivalency is as follows:

940 - 1,000 Points A


939 - 900 “ A-
899 - 870 “ B+
869 - 830 “ B
829 - 800 “ B-
799 - 770 “ C+
769 - 700 “ C
699 - 600 “ D
Less Than 600 F
4
COURSE ASSIGNMENT GRID

Wk Topic Datacamp Work Programming Assignment

1 Course Introduction

Introduction to R
2 Introduction to RStudio
Chapter 1-2

Introduction to R
3 Subsetting
Chapter 3-4

4 Loops Intermediate R

5 dplyr Intro to Dplyr

6 Functions Tourism Data Cleaning

7 Ggplot2 Data Vis with Ggplot2 pt1

8 Ggplot2 cont Tourism Project

Working with Data in the


9 Tidy Data
Tidyverse

Exploratory Data Analysis


10 EDA
in R

11 Intro to Regression Correlation and Regression

Multiple & Logistic Multiple Regression


12 Multiple Regression
Regression Chapter 1-3

13 Machine Learning Intro to Machine Learning

14 Data Vis

15 Tufte

16 Final Tableau Project Final Project

5
GRADING INFORMATION AND CRITERIA:

Assignment Project Frequency of Points for Each Percentage of Total


Assignment Assignment Grade
Programming Assignments 2 100 40
Data Camp Assignments 10 20 40
Tableau Assignments 2 50 20

ATTENDANCE POLICY: Each student is required to attend all class sessions. The Student Services Office
(617.228.2000) should be notified if a student would be absent for an extended period of time. See the Student
Handbook for more details.

MOBILE DEVICES: Cellphones are not to be used during class-time. If you need to take a call, leave the
room in a manner that is undisruptive to the class. Laptops and cellphones must be muted at all times.

TEACHING METHODOLOGY: This class will be taught through a problem based learning methodology, so
your grade will be determined not by exams, but by how well you do on your homework and the class projects.

ATTENDEES: Only registered students are allowed in the classroom and the door must be kept closed during
class time. If the student wishes to leave the room during class time, he/she will close the door behind them and
will be let back in upon their arrival.

STUDENT CODE OF BEHAVIOR: Students found guilty of violating the code of ethics will be subject to
the rule listed by BHCC policy. Below is a statement from BHCC catalog:

“If it is proven that a student in any course in which he or she is enrolled has knowingly cheated or
plagiarized, this may result in a failing grade for an exam or assignment, withdrawal from the course or a
failing grade for the course. The student would also be subject to disciplinary proceedings as outlined in
the Student Handbook for violation of the Student Code of Conduct.”

POLICY FOR INDIVIDUALS WITH DISABILITIES: BHCC is committed to providing equal access to the
educational experience of all students in compliance with Section 504 of the Rehabilitation Act of 1973 and the
Americans with Disabilities Act of 1990. A student with a documented disability, who has not already done so,
should schedule an appointment at the Office for Students with Disabilities (Room D106) in order to obtain
appropriate services.

6
LAB ASSIGNMENTS: Documentation will be assigned for each lab. The documentation will screen capture the
labs and answer the questions pertained in the labs. If there are any problems understanding any parts of the
lab, this should also be noted. This class does not require a laptop or home computer. If you do not have
either a laptop or computer, the college has resources available to you. Such as:

 The College’s Computer Lab is open five (5) days per week during the summer, and their schedule is as
follows:
o Charlestown Campus, Room D111
Fall and Spring Semesters Hours:
Monday - Thurs, 7am to 9:45 pm
Friday, 8am to 9:45pm
Saturday – Sunday, 9:00 – 3:45

 The library has computers and their schedule is as follows:


o Monday - Friday: 8 a.m. - 8 p.m.

HOMEWORK ASSIGNMENTS: Homework assignments vary; reading, studying, preparing questions, and
papers. All are to be handed in on time. If a problem of should arise, please use my contact information.

EXAMINATIONS: This course does not have examinations, instead the student’s grade will be based on the
homework, and lab work completed during the semester. Some of the lab work will be collaborative project-
based assignments and some will be solo based assignments.

Please Note: The above schedule is subject to change.

The tools shown in this class are for educational purposes only and the instructor/BHCC is not responsible for
any of my actions. The VMs are not to be cloned from the classroom unless explicitly instructed to do so. Also,
if I choose to bring in my laptop, the instructor/BHCC is not responsible for any theft or malfunctions of the
device. Sending a reply to this email constitutes my understanding and agreement to what I have just read.
Please type “I agree with the syllabus and except the term therein”, if you agree with these terms.

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative
Commons, PO Box 1866, Mountain View, CA 94042, USA.

You might also like