You are on page 1of 22

Data Science and Analytics Curriculum

development at Rensselaer
(and the Tetherless World Constellation)
NRC BigData Education Workshop
April 11-12, 2014, Washington DC
Peter Fox (RPI and WHOI/AOP&E) pfox@cs.rpi.edu, @taswegian
Tetherless World Constellation, http://tw.rpi.edu #twcrpi
Earth and Environmental Science, Computer Science, Cognitive Science, and
IT and Web Science
Data is a 1st class citizen

omsonreuters.com/content/press_room/science/686112 2
tw.rpi.edu Future Web
•Web Science
•Policy
•Social Hendler

Xinformatics
Research •Data Science Fox
Themes •Semantic eScience
•Data Frameworks

McGuinness

Semantic Foundations
•Knowledge Provenance
•Ontology Engineering Environments
•Inference, Trust

Multiple depts/schools/programs ~ 35 (Post-doc, Staff, Grad, Ugrad)


Govt. Data
•Open
•Linked
•Apps Hendler/ Erickson

Env. Informatics
Application •Ecosystems Fox
Themes •Sea Ice
•Ocean imagery
•Carbon

McGuinness
Platforms:
Bio-nano tech center Health Care/ Life Sciences
•Population Science
Exp. Media and Perf. Arts Ctr. •Translational Med
Center for Comput. Innovation •Health Records
Institute for Data Exploration and
Applications http://idea.rpi.edu
GIS4Science
Data Analytics Context

http://tw.rpi.edu/web/Courses
Experience

Data Information Knowledge

Creation Presentation Integration


Gathering Organization Conversation

Data Science Xinformatics Semantic


5 eScience
Web Science
I teach and am involved:
• Data Science*, Xinformatics*, GIS for the Sciences*,
Semantic eScience*, Data Analytics*, Sematic
Technologies**
• School of Science
– ITWS and E&ES curriculum committees, SoS CC
– E&ES international student advisor
– Institute Faculty Fellow
• Institute-wide
– New Digital Humanities program
• Institute for Data Exploration and Applications
Data Science/ Xinformatics
Science has fully entered a new mode of operation.
Data science is advancing inductive conduct of
science driven by the greater volumes, complexity
and heterogeneity of data being made available In the last 2-3 years, Informatics has attained greater
over the Internet. Data science combines of visibility across a broad range of disciplines,
aspects of data management, library science, especially in light of great successes in bio- and
computer science, and physical science using biomedical-informatics and significant challenges in
supporting cyberinfrastructure and information the explosion of data and information resources.
technology. As such it is changing the way all of Xinformatics is intended to provide both the common
these disciplines do both their individual and informatics knowledge as well as how it is
collaborative work. Data science is helping implemented in specific disciplines, e.g. X=astro, geo,
scientists face new global problems of a chem, etc. Informatics' theoretical basis arises from
magnitude, complexity and interdisciplinary nature information science, cognitive science, social science,
whose progress is presently limited by lack of library science as well as computer science. As such,
available tools and a fully trained and agile it aggregates these studies and adds both the practice
workforce. At present, there is a lack formal of information processing, and the engineering of
training in the key cognitive and skill areas that information systems. This course will introduce
would enable graduates to become key informatics, each of its components and ground the
participants in e-science collaborations. The need material that students will learn in discipline areas by
is to teach key methodologies in application areas coursework and project assignments.
based on real research experience and build a
skill-set. At the heart of this new way of doing
science, especially experimental and observational
science but also increasingly computational
science, is the generation of data.
Modern informatics enables a new
scale-free framework approach
Mediation; generations

Borgmann et al., Cyber Learning Report, NSF 2008


Data Analytics Challenge

10
IT and Web Science
• First IT academic program in U.S.
• First web science degree program in
U.S.
• BS in ITWS (20 concentrations) and MS
in IT (10 concentrations)
• PhD in Multi-Disciplinary Sciences

• http://itws.rpi.edu
     
  Technical Track Courses Concentrations
 

Computer Engineering 1) ECSE-2610 Computer Components and Operations Civil Engineering


Track 2) ENGR-2350 Embedded Control Computer Hardware
3) ECSE-2660 Computer Architecture, Networking and Computer Networking (hardware focus)
Operating Systems Mechanical/Aeronautical Eng.

Computer Science Track 1) CSCI-2200 Foundations of Computer Science Cognitive Science


2) CSCI-2300 Introduction to Algorithms Computer Networking (software focus)
3) CSCI-2500 Computer Organization Information Security
Machine and Computational Learning

Information Systems Track 1) CSCI-2200 Foundation of Computer Science Arts


2) CSCI-2500 Computer Organization Communication
3) Four credits from the following: Economics
 CSCI-2220 Programming in Java (2 credits) Entrepreneurship
 CSCI-2961 Program in Python (2 credits) Finance
 CSCI-2300 Introduction to Algorithms (4 credits) Management Information
 ITWS-49XX Web Systems Development II (4 credits) Systems
Medicine
Pre-law
Psychology
STS

Web Science Track 1) CSCI-2200 Foundations of Computer Science Data Science


2) CSCI-2500 Computer Organization Science Informatics
3) One of the following: Web Technologies
 CSCI-49XX Web Systems Development II  
 Web/Data Course approved by ITWS Curriculum
Committee
CHANGES TO THE MASTER’S IN
INFORMATION TECHNOLOGY
PROGRAM
• In Spring 2013 the MS in IT core curriculum was revised to
include Data Analytics.
• Networking core classes were replaced with Data
Analytics core classes: Data Science, Database Mining,
X-informatics, and Data Analytics (a new class offered in
Spring 2014).
• The MS in IT program also added two new concentrations:
Data Science and Analytics and Information Dominance.
• The Information Dominance concentration was developed
for a new Navy program that will be educating a select
group of 5-10 naval officers a year with the skills needed
for military cyberspace operations. Two officers started in
Fall 2013 and three began in Spring 2014.
MS in IT Required Core Courses
Term(s)
IT Core Area Course Number Course Title Offered
Database Systems CSCI-4380 Database Systems Fall/Spring
Data Analytics ITWS-6350 Data Science Fall

Software Design and CSCI-4440 Software Design and Documentation Fall


Engineering ITWS-6400 X-Informatics Spring
Management of Business Issues for Engineers and Scientists
ITWS-6300 Fall/Spring
Technology* (Professional Track Only)
Human Computer COMM-6420 Foundations of HCI Usability Fall
Interaction COMM-696X Human Media Interaction Spring

* For the research track, replace ITWS-6300 Business Issues for Engineers and Scientists with one of the two semester courses ITWS-
6980 Master’s Project or ITWS-6990 Master’s Thesis.
Advanced Core options for students who have previously completed a Core Course
Term(s)
IT Core Area Course Number Course Title Offered
CSCI-6390 Database Mining Fall
Database Systems ITWS-6350 Data Science Fall
ITWS-696X Semantic E-Science Fall
CSCI-6390 Database Mining Fall
Data Analytics ITWS-6400 X-Informatics Spring
ITWX-696X Data Analytics Spring
CSCI-6500 Distributed Computing Over the Internet Fall
Software Design ECSE-6780 Software Engineering II Fall
ITWS-696X Semantic E-Science Fall

Management of MGMT-6080 Networks, Innovation and Value Creation Fall


Technology MGMT-6140 Information Systems for Management Spring
COMM-6620 Information Architecture Spring
Human Computer
COMM-6770 User-Centered Design Fall
Interaction
COMM-696X Interactive Media Design Summer
Two New MS in IT Concentrations
Concentration Course Number Course Name Term(s)
Offered

The Information Dominance concentration prepares students for


Concentration Course Number Course Name Term(s) careers designing, building, and managing secure information
Offered systems and networks.  The concentration includes advanced
study in encryption and network security, formal models and
Data and Information analytics extends analysis (descriptive and policies for access control in databases and application systems,
predictive models to obtain knowledge from data) by using secure coding techniques, and other related information
insight from analyses to recommend action or to guide and assurance topics.  The combination of coursework provides
communicate decision-making. Thus, analytics is not so much comprehensive coverage of issues and solutions for utilizing
concerned with individual analyses or analysis steps, but with an high assurance systems for tactical decision-making.  It
entire methodology. Key topics include: advanced statistical prepares students for careers ranging from secure information
computing theory, multivariate analysis, and application of systems analyst, to information security engineer, to field
computer science courses such as data mining and machine information manager and chief information officer.  It is also
learning and change detection by uncovering unexpected appropriate for all IT professionals who want to enhance their
patterns in data. knowledge of how to use pervasive information in situational
Select two or three of the following courses: awareness, operations scenarios, and decision-making.
ITWS-6350 Data Science Fall Select two or three of the following courses:

Knowledge Discovery with Data


ITWS-6400 X-Informatics Spring ISYE-6180 Spring
Mining

ITWS-696X Data Analytics Spring Cryptography and Network


CSCI-6960 Fall
Security I
ITWS-696X Semantic E-Science Fall
ITWS-4370 Information System Security Spring
Advanced Semantic Information
ITWX-696X Spring Dominance Fall/
Technologies* CSCI-4650 Networking Laboratory I
Spring
Data
Science and If only two of the above were chosen, select one more of MGMT-7760 Risk Management Fall
Analytics the following courses:
Ethics of Modeling for Industrial
COMM-6620 Information Architecture Spring ISYE-4310 Fall
Systems Engineering
CSCI-4020 Computer Algorithms Spring If only two of the above were chosen, select one more of the
following courses:
CSCI-4150 Introduction to AI Fall
CSCI-6390 Database Mining Fall CSCI-6390 Database Mining Fall

Network Programming Cryptography and Network


CSCI-6968 Spring
CSCI-4220 or CSCI- Security II
or Parallel Algorithm Spring
6220
Design CSCI-4660 Networking Laboratory II
Fall/
Spring
Optimization Algorithms
ISYE-4220 Fall
and Applications Evaluation Methods for Decision
ECSE-6860 Fall
Making
Knowledge Discovery
ISYE-6180 Spring Information and Decision
with Data Mining Fall/
ISYE-6500 Technologies for Industrial and
Spring
Technology Foundations Service Systems
MGMT-696X Fall
for Business Analytics
Computational Analysis of
CSCI-496X Fall
Predictive Analytics Social Processes
MGMT-696X Spring
Also at RPI
• Data Science Research Center and Data Science
Education Center (dsrc.rpi.edu, 2009)
• http://www.rpi.edu/about/inside/issue/v4n17/
datacenter.html
– Over 45: research faculty, post-docs, grad students, staff,
undergraduates…
• Data is one of the Rensselaer Plan’s five thrusts
• Other key faculty
– Fran Berman (Center for Digital Society and RDA)
– Bulent Yener (DSRC Director)
– Jin Hendler (IDEA Director)
data.rpi.edu (v0.1, 2009)
Soon…
More RPI Curriculua
• Environmental Science with Geoinformatics
concentration
• Bio, geo, chem, astro, materials - informatics
• GIS for Science
• Master of Science – Data Science?? (pending)
• Multi-disciplinary science program - PhD in Data
and Web Science
• DATUM: Data in Undergraduate Math! (Bennett)
• Missing – intermediate statistics
• Graphs – significant potential here – must teach!
5-6 years in…
• Science and interdisciplinary from the start!
– Not a question of: do we train scientists to be
technical/data people, or do we train technical people
to learn the science
– It’s a skill/ course level approach that is needed
• We teach methodology and principles over
technology *
• Data science must be a skill, and natural like
using instruments, writing/using codes
• Team/ collaboration aspects are key **
• Foundations and theory must be taught ***
Challenging the “Heroic”
Science Paradigm

This national and international has drawn attention to the need for a
reassessment of priorities to recognize that, in the new data era, the burden
of making data and information usable shifts from the user to the provider.
And thus … in <10 years

You might also like