Ai and ML

MACHINE LEARNING FOR HIGH SCHOOL STUDENTS
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of

the requirements for the degree of
MASTER OF SCIENCE
in
Computer Science
by
Siddharth Chittora
FALL
2019
© 2019
Siddharth Chittora
ALL RIGHTS RESERVED

ii
A Project
by
Siddharth Chittora
Approved by:
__________________________________, Committee Chair

Dr. Anna Baynes
__________________________________, Second Reader

Dr. Haiquan Chan
____________________________
Date
iii
Student: Siddharth Chittora
I certify that this student has met the requirements for format contained in the
University format manual, and this project is suitable for electronic submission to the
library and credit is to be awarded for the project.
__________________________, Graduate Coordinator ___________________

Dr. Jinsong Ouyang Date
Department of Computer Science
iv
Abstract
of
by
Siddharth Chittora
We live in a time which is witnessing a boom in the field of Computer Science
especially in the subfields of Machine Learning and Artificial Intelligence. Yet, there is
a noticeable gap in the number of qualified job applicants who can help solve some of the
Machine Learning problems of the day. There is a current movement to introduce
Computer Science education much earlier, to k-12 students. The hope is, with an earlier
introduction and better continuous training, students will be properly trained and
interested into studying Computer Science in college and able to obtain these coveted
positions after they graduate.
Given this hope and movement, there needs to be more applications and
instructional plans to support k-12 teachers as they bring more Computer Science topics
into the classroom. One such area of Computer Science is Machine Learning. Currently
there are some tools and curriculum available for broader Computer Science areas. But
there is a lack of instructional methods and tools for Machine Learning for High School
Students. And there is especially a lack of visually interactive lesson plans for high
schoolers to learn basic concepts in Machine Learning.
v
To solve this problem, I present a visually interactive instructional tool intended
to introduce basic concepts of Machine Learning to High School Students. Studies have
shown that any content when delivered in form that is visual and interactive along with
textual information is more likely to be retained in memory [1]. The tool is interactive
and also focuses on giving the High School Students a taste of required mathematical
background for Machine Leaning and Artificial Intelligence. Selection of datasets and
problem I am trying to solve are chosen such that the High School Students can relate to
them, for example instead of solving problems like stock price prediction and network
intrusion detection, I will solve problems like Rain and No Rain Prediction and
Temperature Prediction, which can be more interesting for High School Students. The
students are more likely to stay focused and interested if they have a feeling of
achievement after completing a topic. They start with simpler yet interesting problems
and work their way up. Through this tool a student will be able to brush up basic concepts
of mathematics required for Machine Learning and Artificial Intelligence and classical
Machine Learning and Artificial Intelligence algorithms like K Nearest Neighbors, Basic
Linear Classification, Linear Regression, Naïve Bayes and others in more interactive
ways through the use of visualizations such as graphs and charts.
_______________________, Committee Chair

Dr. Anna Baynes
_______________________
Date
vi
ACKNOWLEDGEMENTS
I would like to thank Prof. Anna Baynes, my project advisor for all knowledge
and guidance she has provided me through this project. She has been very supportive and
has led me with care and caution throughout the project.
I would also like to thank Prof. Haiquan Chen for giving me his time and inputs
as second reader.
I would also like to thank my parents, Rajendra and Bharti for their constant
support and love and making me believe that I can accomplish anything when I put my
mind to it. I would like to thank my sister, Palak for always reminding me to chase my
dreams.
I would also like to thank my friends for their support and for standing by my side
in worst times and left me up.
Lastly, I would like to thank the Department of Computer Science, Dr. Jinsong
Ouyang and Dr. Nikrouz Faroughi for giving me this wonderful opportunity to learn and
grow.
vii
TABLE OF CONTENTS
Page
Acknowledgments ..................................................................................................... vii
List of Tables .............................................................................................................. ix
List of Figures ............................................................................................................. x
Chapter
1. INTRODUCTION ................................................................................................. 1
2. BACKGROUND ................................................................................................... 4
3. DESIGN CHOICE ................................................................................................. 7
4. IMPLEMENTATION AND DEVELOPMENT .................................................. 19
5. TOOL WALKTHROUGH .................................................................................. 29
6. CONCLUSION ................................................................................................... 36
Bibliography ............................................................................................................... 38
viii
LIST OF TABLES
Tables Page
1. SVG Path Commands and Parameters ............................................................. 20
ix
LIST OF FIGURES
Figures Page
1. Types of Charts Based on Visual Encodings .................................................... 9
2. Data Science Pipeline Diagram ....................................................................... 10
3. Data Science Pipeline Diagram (After Interaction) ........................................ 11
4. Data Distribution Chart .................................................................................... 12
5. Data Distribution Chart (After Interaction) ..................................................... 12
6. Initial Classification Dashboard ....................................................................... 14
7. Initial Regression Dashboard .......................................................................... 14
8. Improved Classification Dashboard ................................................................ 16
9. Improved Regression Dashboard .................................................................... 17
10. Code Snippet – SVG Path ................................................................................ 21
11. Code Snippet – SVG Line ............................................................................... 21
12. Code Snippet – SVG Line Generator .............................................................. 22
13. Code Snippet – SVG Area Generator ............................................................. 22
14. Code Snippet – Implementing SVG Area Path ............................................... 22
15. Code Snippet – SVG Circle ............................................................................ 23
16. Code Snippet – D3 Force Simulation .............................................................. 24
17. Code Snippet – ForceX and ForceY ............................................................... 24
18. Code Snippet – Defining D3 Brush ................................................................ 25
19. Code Snippet – Defining Clip Path ................................................................. 25
20. Code Snippet – Adding Brush ......................................................................... 25
x
21. Code Snippet – Tool Tip ................................................................................. 26
22. Code Snippet – Original Data ......................................................................... 27
23. Code Snippet – Normalized Data .................................................................... 27
24. Code Snippet – SKLearn Naïve Bayes ........................................................... 28
25. Code Snippet – Classification Metrics ............................................................ 28
26. Code Snippet – SKLearn Linear Regression .................................................. 28
27. Code Snippet – Regression Metrics ................................................................ 28
28. Data Science Pipeline – Beginning – Healthcare Industry ............................. 29
29. Data Science Pipeline – Completed – Healthcare Industry ............................ 30
30. Data Distribution Chart - Initial State ............................................................. 31
31. Data Distribution Chart - After Interaction ...................................................... 31
32. Classification Dashboard - Initial State ........................................................... 33
33. Classification Dashboard - Data Loaded ......................................................... 33
34. Classification Dashboard - After Prediction ................................................... 34
35. Regression Dashboard - Initial State ............................................................... 35
36. Regression Dashboard - After Prediction ....................................................... 35
xi
1
Chapter 1: Introduction
Machine Learning is a growing sub-field of Computer Science, and it being
ubiquitous and involved in many facets of our lives, can be used to attract more students
towards Computer Science. Machine Learning and Artificial Intelligence is present in
many products and applications in our world and our future technological goals; therefore,
it would prove be beneficial if High School Students have an understanding of this area.
We can find a wide variety of online platforms [2, 3] and courses [4, 5] for learning and
experimenting with Machine Leaning and Artificial Intelligence concepts, but these
resources require the students to have a basic understanding of programming and other
related concepts. There is an absence of a concrete learning path in the field of Machine
Learning and Artificial Intelligence that caters High School Students [6, 7], where a student
with no prior experience of programming can understand the fundamentals of Machine
Learning and see steps involved in solving a simple Machine Learning problems [8, 9].
There can be many reasons for unavailability of learning strategies for Machine
Learning and Artificial Intelligence. One of these reasons is the lack of visually interactive
content on Machine Learning and Artificial Intelligence that a High School Student can
comprehend. Most of the online Machine Learning and Artificial Intelligence courses
expect the students to know how to code in a required language like Python and R. Also,
these courses tend to emphasize heavily solely on the part of solving the Machine Learning
problem. Other important steps involved in solving a Data Science pipeline, like data
collection, visualization, processing and analysis, might not be incorporated into the
2
Machine Learning lesson [6]. Most High School Students do not know how to code and
the courses available online can be overwhelming and intimidating. Students need to be
made familiar with the background of the field including the required mathematical
fundamentals and languages used with the help of relevant examples. They need to know
importance of data and the entire process from collection to analysis. There have been a
few attempts, for example curriculum provided to k-12 students for an introduction to
computational thinking, presentations [10] and video lectures but there is still room for
improvement [11, 12].
To address this problem, I present a novel visual interactive tool that will effectively
deliver basic concepts of Machine Learning and Artificial Intelligence. It has been proven
that any content when delivered through a medium that is visual and interactive, along with
textual information, is more likely to be retained in memory [1]. The tool is interactive and
will also focus on giving the High School Students a taste solving a Machine Learning and
Artificial Intelligence problem, by explaining the end-to-end process and steps involved in
the solution. The selection of data sets play a vital role [7] in creating a tool that caters to
school students, for example instead of solving problems like stock price prediction and
network intrusion detection for which an average High School Student is less likely to have
knowledge about, I use problems on Weather Data Analysis, Rain Fall Prediction, and
Temperature Prediction. Students are more likely to stay focused and interested if they
have a feeling of achievement after completing a topic. They should start with simpler yet
interesting problems and work their way up. Through this tool a student will be able to
3
understand basic concepts of Machine Learning and can see a Machine Learning Algorithm
in action by interacting with visualizations such as graphs and charts.

4
Chapter 2: Background
Machine Learning is a growing subfield of Computer Science (CS). Machine
Learning and Artificial Intelligence is the future and we need our future work force, which
are current High Schoolers to understand the benefits and methods of learning this difficult
topic. In this section, I present the related work and background of attempts in making
Computer Science and Machine Learning more approachable for High School Students.
2.1. Exploring Computer Science (ECS):
ECS [13] is a curriculum for introductory k-12 students designed to engage students
in computational thinking and practice. ECS was originally designed for Los Angeles
Unified School District to broaden the participation in Computing particularly for girls and
students of color. After initial success it gained national prominence.
ECS was designed with the aim to familiarize High School Students with the
breadth of Computer Science through exploring topics that are engaging and accessible for
students. The curriculum focuses more on the concepts of Computer Science and help them
understand why a certain tool and language is used to address a particular type of problem
rather than creating the entire course such that it revolves around a certain tool or
programming language.
The goal of ECS is to develop and strengthen the concepts of algorithms, problem
solving, and programming by using the problems with context that are relevant to students
lives. The curriculum was developed around Computer Science content and computational
practices so that students get a feel of what computer scientists do.

5
Despite giving a sound understanding of Computer Science fundamentals, a few
areas where ECS curriculum lack is a detailed learning path for Machine Learning. The
curriculum does not offer hands-on projects based in Machine Learning for the students to
try and experiment with different Machine Learning algorithms.
2.2. Related Research:
There have been a few studies attempting to introduce Data Science to High School
Students. One of the studies [7] proposed conducting workshops to make students work on
Machine Learning examples and make them understand the working of simple algorithms,
focusing on collection of data, processing it and drawing inferences from the predictions.
There is study that proposes setting up a Machine Learning Laboratory [14]. The study
[14] primarily focused on use of K-Means Algorithm which has limited use case.
The related work found do not focus on creating a solid learning plan which
simplifies the fundamentals of Machine Learning to the level of a High School Student
who has limited background in the area.
2.3. Exploratory Visualizations:
A visualization in which the designer wants to provide interactive features for the
users to explore and identify insights themselves is called Exploratory Visualization. This
allows the user to understand the data and its possibilities using different visual
representations. These types of visualizations are created with a high level of granularity
6
and used to explore different stories the data hides. This form of exploratory analysis is a
part of quantitative analysis of the data.
2.4. Explanatory Visualization:
A visualization in which the designer has an idea of what message he wants to
convey and narrates it in the form of a story to the viewer is called Explanatory
Visualization. In order to create these visualizations, the designer extracts information
that he wants the view to notice, by data processing and data analytics. Complex
problems are broken into smaller sub problems so that it is easier for the viewer to focus
on the important information in the data.
2.5. Visual Encoding:
The process of creating visualizations by combinations of different shapes and
textures along with different color, angle, slope and volume combinations is called visual
encoding. A single concept or content can be expressed in many forms of visual encodings.
Hence, it is critical in the design process to use appropriate visual encodings to convey the
content accurately to the viewers. The best visual encoding allows the viewer to require
minimum effort in understanding and analyzing the visualization.
The visual design principles provide guidelines in making design choices [15]. The
simplicity of the visualization makes the users understand the content expressed in the
visualization in a better way.

7
Chapter 3: Design Choice
In order to build any tool, before making design choices, fundamental requirements
need to be gathered. There are three main steps involved in designing a tool, System
Design, Visual Design and Backend Design, each executed to minimize redundancy and
maximize reusability. In this section, I will discuss the methodology used to design the
interactive visualizations of the educational Machine Learning tool.
3.1. System Requirement Analysis
The problem statement in this system’s design is to build an interactive teaching
tool that will be used to introduce the fundamental of Machine Learning to High School
Students by the help interactive visualizations. The tool requires to use a dataset and try to
solve problems understandable for a High School Student. The tool will be web-based and
requires the appropriate web framework which can support the Python script that handles
the Machine Learning tasks.
To represent the data received from the Python backend, in a format that is easily
understood by a High School Student, a novel visualization tool is required. The
visualizations must correctly represent the data for the Machine Learning problem that is
being solved.
3.2. System Design
The tool consists of two parts, custom FLASK API that is used to perform all the
Machine Learning related tasks and deliver data to the frontend in the form of Comma
8
Separated Values (CSV) and Interactive dashboards that are used to demonstrate the
process of solving Machine Learning problems using the data obtained from the FLASK
API. As the tool caters to High School Students, the selection of dataset and Machine
Learning problem statement is very important. The tool uses weather dataset [16] to
demonstrate the process of solving two types of Supervised Learning Problems, namely,
Classification and Regression. The tool also uses an interactive diagram to explain the Data
Science Pipeline, which shows the end-to-end process of solving a Data Science related
problem, right from getting the raw data to processing it and obtaining inference from it.
3.3. Visual Design
The aim of all the visualizations used in the tools is to give High School Student a
basic understanding of Machine Learning and how it can be used to solve day to day
problems in our lives. It is very important to use the appropriate visual encoding to correctly
represent the information being conveyed by the data used in solving a problem. There are
four main types of visualizations in the tool:
1. Data Science Pipeline Diagram
2. Data Distribution Chart
3. Line Charts
4. Area Charts
9
Each of these visualizations have different visual encodings based on the
information they are trying to convey. Different charts can be put into four categories
shown in Figure 1:
1. Relationship
2. Comparison
3. Distribution
4. Composition
Figure 1: Types of Charts Based on Visual Encodings [17]

10
The CSV files obtained from the backend, is input for the visualization and has data
related to comparison and distribution.
The Data Science Pipeline diagram, shown in Figure 2, is an interactive
visualization that is used to exhibit the process solving a typical Data Science problem.
Unlike a typical static diagram, students can interact with different components of the
diagram and get insights of the different steps involved in solving a Data Science problem,
shown in Figure 3.
Figure 2: Data Science Pipeline Diagram

11
Figure 3: Data Science Pipeline Diagram (After Interaction)
The Data Distribution Chart, shown in Figure 4 and 5, is an interactive Bubble
Chart that is used to show how the data is distributed among different labels in the given
Weather dataset. Bubble Chart has been used here because for the dataset that I have, a
Bubble Chart can clearly represent the how the data is distributed. The charts are
accompanied by a legend that tells the student what the color of the circles mean. The
colors of the circles are chosen such that they correctly represent the information that chart
delivers without distracting the students. The color palette consists of red, pink, light green
and green and are self-explanatory.

12
Figure 4: Data Distribution Chart
Figure 5: Data Distribution Chart (After Interaction)
The Line Charts and Area Charts are used to represent the timeline of different
weather conditions in the dataset and can be interacted with in different ways. These charts
13
are part of the Dashboards that are used to demonstrate the steps involved in solving the
two types in Machine Learning problems that I am targeting in this tool. The design process
of the dashboards had two stages.
The initial design of the dashboard had multiple Line Charts side-by-side, shown
in Figure 6 and Figure 7. This allowed the students to compare the data for different
weather conditions easily. The students can interact with the chart by hovering over the
graph and see the actual weather data using a tool tip. Beneath each Line Chart, is an Area
Chart which represented the same data, which can be used to zoom into the graphs using
brush interaction, and the Area Chart was followed by the legend for the charts. The color
scheme for the legend was carefully chosen to represent particular weather condition.
Above the charts were three button that were used to select the type of data to be
plotted on the charts. Below the legends was sections that displayed the outcomes of the
prediction given by the Machine Learning Models. This design however had some
limitations in terms of usability, like having three charts alongside each other left no space
for other essential information about the model performance to be displayed. Also, the
outcomes were displayed in the bottom which could have been misinterpreted the students.
14
Figure 6: Initial Classification Dashboard
Figure 7: Initial Regression Dashboard
The reason behind redesigning the UI was space constraints which add challenges
in the design process. If the entire webpage is filled with visualization, there will be no
15
space left to receive the viewer’s input as well as to provide essential information about
the visualization. According to the visual design principles [15] a single webpage should
have the visualization as well as user information section with clear guidelines for user to
interact. Considering space constraint issue, the visual design enforces major decision
based on visual perception principles.
i. Pre-attentive Processing
Pre-attentive processing is a visual perception which helps in designing distinct
visual depictions where the user will recognize the distinction instantly without giving a
thought.
The final design has major improvements in terms of the usability, dashboard UI
and how the information is being displayed on the dashboard, which is cleaner looking.
16
Figure 8: Improved Classification Dashboard
The new Classification dashboard now has three panels, shown in Figure 8. The
left panel consists of the three sections, Data, Graph and Models. The Data section is used
to select what type of data is to be displayed. The Graph section is used to switch between
charts for Temperature, Humidity and Rainfall Measure. The Model section is used to
select which Machine Learning Models that is to be used for performing prediction.
The center panel is where the graphs are displayed. The charts have the same
interactions as the previous design. The Right panel is where the information is displayed.
The information displayed consists of the Output labels and performance metrics of the
models used for prediction.

17
Figure 9: Improved Regression Dashboard
The new Regression Dashboard now has two panels, shown in Figure 9. The left
panel has two section, Data Section which is used to select the type of data to be displayed
and Metrics Section to show the performance metrics of the Machine Learning Model used
to perform prediction. The right panel is where the charts find their place. There two charts
alongside each other for a better comparison between Original data and Predicted data.
The graphs in the new design preserve the same interactions as previous design.
3.4. Backend Design
The backend design involves querying the custom FLASK API to fetch the data the
is required for the chart being the displayed. Every type of data and Machine Learning
Model that is selected by the student has an assigned function in the API. For every query
that the frontend makes to the API, the weather dataset is accessed, and the desired data
18
processing operations are performed on it and returned back to the part of frontend the
made the request in JSON object.
The resulting JSON object contains details about the weather, which includes Date,
Minimum Temperature, Maximum Temperature, Humidity at 9 am, Humidity at 3 pm,
Rainfall in milli meters, Rain today, Rain Tomorrow. All weather details have numeric
apart from Rain today and Rain Tomorrow. I will use these details to make predictions
according to the problem being addressed.
The Flask API is used to perform a variety of tasks on the data, which includes
fetching Weather dataset, processing and cleaning it so that Machine Learning models use
the part of data that is required reducing the processing time taken by the models ,
normalizing the data which will help Machine Learning models to converge the results and
increasing their performance. The API is also responsible for training the Machine
Learning Models with appropriate data and returning the prediction to the frontend.
As the tool is web-based, special care was taken to select the dataset, Machine Learning
problems that will be solved, and models that will be used to solve these problems, so that
the students have a great learning experience and less wait time. I have selected two easy
level problems one for each type of supervised learning problems.

19
Chapter 4: Implementation and Development
In this section I will discuss the coding phase and implementation details with the
tools and platforms used. The data visualizations are created using D3.js which is the main
focus of the application. The interactive visualizations are built using D3.js along with
JavaScript. The UI is created using HTML5, CSS3 and Bootstrap. The custom Flask API,
which is queried for the data to be used for creating interesting and interactive
visualizations, derives its power from Python Data Science libraries like, Sci-Kit Learn,
Pandas and NumPy.
4.1. Data Visualizations
The Data Visualizations are built keeping in mind the design decisions made earlier.
I have three types of charts, Bubble Chart, Line Chart and Area Chart which are used to
represent the data. The Line Charts, Area Charts and the Data Science Pipeline diagram
are created using the D3.js Path elements. The Bubble Chart used D3.js Force simulation
for the transition effect to animate the bubbles.
4.1.1. Path
D3 Scalable Vector Graphics (SVG) Paths are used to construct the Data Science
Pipeline diagram and the Line Chart. In a Line Chart the line paths are used to represent
the weather phenomenon trends. SVG Paths can be used to create a variety of design
elements, like rectangles, circles, ellipses, polylines, polygons, straight lines and curves.
20
The SVG Path element shape is defined by one attribute: d. The SVG Path Mini-
Language contains a series of commands and parameters to define attribute ‘d’. These
commands and parameters, shown in Table 1, are a sequential set of instructions for how
to draw an SVG path.
Table 1: SVG Path Commands and Parameters [18]
Command Parameters Repeatable Explanation
moveto
M(m) x, y Yes Move the pen to a new location. No line is
drawn. All path data must begin with a 'moveto'
command.
L(l) x, y Yes lineto
Draw a line from the current point to the point
(x,y).
H(h) X Yes horizontal-lineto
Draw a horizontal line from the current point to
x.
V(v) y Yes vertical-lineto
Draw a horizontal line from the current point to
y.
curveto
Draw a cubic Bézier curve from the current
C(c) x1 y1 x2 y2 x y Yes point to the point (x,y) using (x1,y1) as the
control point at the beginning of the curve and
(x2,y2) as the control point at the end of the
curve.
elliptical-arc
rx ry Draws an elliptical arc from the current point to
(x, y). The size and orientation of the ellipse are
x-axis-rotation defined by two radii (rx, ry) and an x-axis-
rotation, which indicate how the ellipse as a
A(a) large-arc-flag Yes whole is rotated relative to the current SVG
coordinate system. The center (cx, cy) of the
sweep-flag ellipse is calculated automatically to satisfy the
constraints imposed by the other parameters.
xy large-arc-flag and sweep-flag contribute to the
automatic calculations and help determine how
the arc is drawn.
21
The Data Science Pipeline has been constructed using Arc command of SVG Path
and Line generator, shown in Figure 10.
Figure 10: Code Snippet - SVG Path
The code below, shown in Figure 11, is an alternative way that can be used to create
a straight line.
Figure 11: Code Snippet - SVG Line
The Line Charts are created using SVG Line Generators, shown in Figure 12. Here
.datum is used to define which data is used to create the lines. The Line Generator is defined
using d3.line() the d attribute uses the line generator to plot the line on SVG canvas which
takes the x and y as the parameters and automatically gets the x1, y1, x2, y2 values from
the data used.

22
Figure 12: Code Snippet - SVG Line Generator
4.1.2. Area
To create an Area Chart I have used SVG Area Generator, shown in Figure 13. I
have used the same data as Line Chart and represented in a different manner. SVG area is
created using SVG path, implemented by filling the area under the graph by selected
color which gives us an impression of an area element, which is done by SVG Area
Generator, shown in Figure 14.
Figure 13: Code Snippet - SVG Area Generator
Figure 14: Code Snippet - Implementing SVG Area Path

23
4.1.3. Circle
I have used circles in the Bubble Chart where each circle represents an entry in the
weather dataset I am using. The color of the circles is decided by the Rain Today and Rain
Tomorrow for each record, shown in Figure 15. The circles can be interacted by using D3
Force simulation which will be discussed in next section. The center coordinates cx and cy
of the circles is generated automatically be the D3 Force Simulation.
Figure 15: Code Snippet - SVG Circle

24
4.1.4. Force
To animate the movement of the circles in the Bubble Chart upon interaction I have
used D3 Force Simulation. We used Collide Force offered by D3 Force Simulation to create
the simulation. I manipulated the x and y values of the center of the circles using two
positioning properties of D3 Force Simulation, forceX and forceY, shown in Figure 16 and
Figure 17.
Figure 16: Code Snippet - D3 Force Simulation
Figure 17: Code Snippet - ForceX and ForceY

25
4.1.5. Brush
The Area Charts can be interacted with using Brush interaction offered by D3.js. It
is used to zoom into the Line Chart and Area Chart. Its implemented by first defining the
Clip path, shown in Figure 18 and Figure 19, which makes sure that nothing is platted
outside the defined area. Then I added the brush interaction over its specified area, shown
in Figure 20.
Figure 18: Code Snippet – Defining D3 Brush
Figure 19: Code Snippet – Defining Clip Path
Figure 20: Code Snippet – Adding Brush

26
4.1.6. Tool Tip
Tool Tip is used to display more information about the data when a user is
interacting with the visualizations. Here I am using tool tip for the Line Charts when the
user hovers mouse over the chart area, shown in Figure 21.
Figure 21: Code Snippet – Tool Tip
4.2. Backend Development
In this section I will discuss the backend implementation and how it is used to get
the desired results. I will discuss how the Flask API is designed and how Sci-Kit Learn
Library is used to perform Machine Learning tasks the frontend requests.
4.2.1. Flask API
Flask is a lightweight Web Server Gateway Interface (WSGI) web application
framework. It is classified as microframework as it does not require particular tool and
libraries. It is used in this tool to serve as the backbone. The Flask API used in the tool to
perform all the data related tasks, like cleaning, processing and performing predictions.
27
Every request that the frontend can make to the API has a unique URL that triggers
the associated function in the API. The function imports the data, performs the requested
task and return the results in JSON format. The frontend reads the JSON response form the
API and used the received data for display the requested results.
Below are some examples of how the Flask API is used to fetch Original Data,
shown in Figure 22, and Normalized Data, shown in Figure 23, in order to be displayed in
the charts.
Figure 22: Code Snippet – Original Data
Figure 23: Code Snippet – Normalized Data
4.2.2. Sci-Kit Learn
Sci-Kit Learn (SKLearn) is an easy to use Machine Learning Library for Python
that features various classification, regression and clustering algorithms including Support
Vector Machine, Bayes, Linear Regression and Logistic Regression, which has been used
in the tool. These Machine Learning algorithms are access using the Flask API. Along with
various Machine Learning algorithms SKLearn also offer different metrics to assess the
28
performance of the algorithms used. Classification algorithm, shown in Figure 24, can be
assessed on metrics like Accuracy, Precision, Recall, Confusion Matrix, of which Accuracy
and Precision has been used, shown in Figure 25. Regression algorithm, shown in Figure
26, is mainly assessed on Root Mean Squared error and R2 score, shown in Figure 27.
Figure 24: Code Snippet – SKLearn Naïve Bayes
Figure 25: Code Snippet – Classification Metrics
Figure 26: Code Snippet – SKLearn Linear Regression
Figure 27: Code Snippet – Regression Metrics

29
Chapter 5: Tool Walkthrough
5.1. Data Science Pipeline
In this section I will discuss how to interact with the Data Science Pipeline
Diagram, shown in Figure 28. The diagram three main area for interaction, the diagram
itself, an example list of areas where Machine Learning finds application, and the “Next”
button used to navigate the diagram. I start by selecting form the list of applications which
give a context to interaction flow. Then by clicking “Next” I can see the blocks that
represent data move into the pipeline. Each “Next” button is clicked I am taken to next step
in the Data Science Problem solving process. The interaction end with diagram showing
the Insights according to the application selected in the beginning, shown in Figure 29.
Figure 28: Data Science Pipeline - Beginning - Healthcare Industry

30
Figure 29: Data Science Pipeline - Completed - Healthcare Industry
5.2. Data Distribution Chart
In this section I will discuss how to interact with the Data Distribution Chart, shown
in Figure 30. The Bubble Chart can be interacted with by clicking on the char area. A single
click separates the bubbles according to the classes they belong to, shown in Figure 31 and
double clicking on it bring backs the chart to its original state. It’s an Explanatory
Visualization, where the user can learn about how the data is distributed amongst the
classes Rain Today and Rain Tomorrow. The chart is accompanied by a legend that helps
understand the color coding using in the chart.

31
Figure 30: Data Distribution Chart – Initial State
Figure 31: Data Distribution Chart – After Interaction

32
5.3. Classification Dashboard
The Classification Dashboard, shown in Figure 32, is used to explore the process
of solving the Rain or No Rain Problem. I will be using the weather dataset and based on
the daily weather conditions like, Temperature, Humidity, Rainfall Measure, whether or
not did it rain today, I will predict whether or not will it rain tomorrow.
The Dashboard has three panels. The Left panel is used to explore different form
of data (original, normalized), types of graph (temperature, humidity and rainfall measure)
and algorithm used for prediction (Support Vector Machine, Naïve Bayes, Logistic
Regression). The user can perform prediction by selecting the one of the Machine Learning
Models from the Models list and the click “Predict” in the Data section.
The center panel is used to display the charts and legends, shown in Figure 33. The
Right panel is used to display the rain today and rain tomorrow labels, prediction given by
the Machine Learning algorithms and the performance metrics of the selected Machine
Learning algorithm, shown in Figure 34. The interaction begins with clicking “Loading
Data” in the data section in the left panel. This generates the graph in the center panel which
the user can interact with.

33
Figure 32: Classification Dashboard – Initial State
Figure 33: Classification Dashboard – Data Loaded

34
Figure 34:Classification Dashboard – After Prediction
5.4. Regression Dashboard
Regression Dashboard, shown in Figure 35 is used to explore the process of solving
Temperature Prediction problem. I will be using the weather dataset and based on the daily
weather conditions like, Minimum Temperature, Maximum Temperature, Humidity,
Rainfall Measure, I will predict the daily temperature.
The dashboard is divided into two sections. The left panel is used to select the type
of data to be displayed and display information about the regression algorithm used for
performing prediction. The right panel is used to display the charts and legends, which can
be used for interaction, as shown in Figure 36.

35
Figure 35: Regression Dashboard – Initial State
Figure 36: Regression Dashboard – After Prediction

36
Chapter 6: Conclusion
6.1. Summary
The goal of this project is to develop as concrete learning plan to introduce Machine
Learning to High School Students using an interactive visual tool. The project introduces
a novel visual and interactive tool that serves as the probable solution to the problem this
paper aims at solving. This web-based tool can be used for both learning and teaching,
where students can use it to explore the area of Machine Learning or teachers can use it as
an educational tool. The tool is accessible at http://ml4hss.herokuapp.com/static/tut.html.
The user can have a consolidated understanding of basic Machine Learning fundamentals
using this tool.
6.2. Future Work
The tool is complete in itself but there is always some scope of enhancements
depending upon the needs of the user. The following is a list of some upgrades that can be
added to the tool:
1) Increase the user interaction with tool, by implementing the following features:
a) Choose between different datasets to explore different problems.
b) Enter different data for prediction apart from the data in the dataset. This can be
done by having different fields to collect input from the user and use it to give out
the predictions.
37
2) Add the sections to introduce fundamentals of Neural Networks. In order to add the
support for Neural Networks, light weight and efficient models should be used as the
tool is web-based.
3) There is need to perform a Usability Test of the tool by the target users, to gather data
that can be used to improve different areas of the tool.

38
Bibliography
1. K. Gutierrez, “Studies Confirm Power of Visuals in eLearning,” [Online]. Available:

https://www.shiftelearning.com/blog/bid/350326/studies-confirm-the-power-of-
visuals-in-elearning [Accessed: March 2019].
2. S. Yee. and T. Chu, “A Visual Introduction to Machine Learning,” [Online].

Available: http://www.r2d3.us/visual-intro-to-machine-learning-part-1 [Accessed:
January 2019].
3. S. Yee. and T. Chu, “Model Tuning and the Bias Variance tradeoff,” [Online].
Available: http://www.r2d3.us/visual-intro-to-machine-learning-part-2 [Accessed:
Januray 2019].
4. K. Malone and S. Thrun, “Intro to Machine Learning,” [Online]. Available:

https://www.udacity.com/course/intro-to-machine-learning--ud120 [Accessed:
October 2019].
5. K. Eremenko, “Machine Learning A-Z: Hands-on Python and R in Data Science,”

[Online]. Available: https://www.udemy.com/course/machinelearning/ [Accessed:
October 2019].
6. R. Gavaldà, “Machine Learning in Secondary Education?,” Universitat Politècnica de

Catalunya, 2008.
7. S. Srikant and V. Aggarwal, “Introducing Data Science to School Kids,” in

Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science
Education, Seattle, WA, pp. 561-566, March 2017.
8. M. Bienkowski, D. W. Rutstein, Y. Xu and K. McElhaney, “Deepening Learning in

High School Computer Science through Practices in the NGSS,” in Proceedings of
the 2016 ACM SIGCSE Technical Symposium on Computing Science Education,
Memphis, TN, pp. 694-694, March 2016.
9. S. Vandenberg, S. Small, M. Fryling, R. Flatland and M. Egan, “A Summer Program

to Attract Potential Computer Science Majors,” in Proceedings of the 2018 ACM
SIGCSE Technical Symposium on Computer Science Education, Baltimore, MD, pp.
467-472, February 2018.
10. J. B. Gordon, “Machine Learning for High School Students,” [Online]. Available:
http://www.cs.columbia.edu/~CS4HS/talks/ml_for_hs.pdf [Accessed: February
2019].
39
11. J. Ho, “AI Classroom Activity: Machine Learning,” [Online]. Available:

https://www.teachermagazine.com.au/articles/ai-classroom-activity-machine-learning
[Accessed: February 2019].
12. S. Wolfram, “Machine Learning for Middle Schoolers,” [Online]. Available:

https://blog.stephenwolfram.com/2017/05/machine-learning-for-middle-schoolers/
[Accessed: February 2019].
13. Exploringcs, “Exploring Computer Science,” [Online]. Available:

http://www.exploringcs.org/curriculum [Accessed: October 2019].
14. S. McGee, R. McGee-Tekula, J. Duck, C. McGee, L. Dettori, R. I. Greenberg, E.

Snow, D. Rutstein, D. Reed, B. Wilkerson, D. Yanek, A. M. Rasmussen and D.
Brylow, “Equal Outcomes 4 All: A Study of Student Learning in ECS,” in
Proceedings of the 2018 ACM SIGCSE Technical Symposium on Computer Science
Education, Baltimore, MD, pp. 50-55, February 2018.
15. T. Kei, “Principles and elements of visual design: A review of the literature on visual
design of instructional materials,” Educational Studies, vol. 57, International
Christian University, pp. 167-174, April 2015.
16. Z. Avagyan, “Weather Dataset,” [Online]. Available:

https://www.kaggle.com/zaraavagyan/weathercsv [Accessed: November 2019].
17. R. Orban, C. Saden and J. Dinu, “Data Visualization and D3.js,” [Online]. Available:
https://www.udacity.com/course/data-visualization-and-d3js--ud507 [Accessed: June
2019]
18. Dashingd3js, “SVG Paths and D3.js,” [Online]. Available:

https://www.dashingd3js.com/svg-paths-and-d3js; [Accessed: February 2019].

Ai and ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ai and ML

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING FOR HIGH SCHOOL STUDENTS

Presented to the faculty of the Department of Computer Science

California State University, Sacramento

Submitted in partial satisfaction of

ALL RIGHTS RESERVED

__________________________________, Committee Chair

__________________________________, Second Reader

library and credit is to be awarded for the project.

__________________________, Graduate Coordinator ___________________

Department of Computer Science

MACHINE LEARNING FOR HIGH SCHOOL STUDENTS

We live in a time which is witnessing a boom in the field of Computer Science

Machine Learning problems of the day. There is a current movement to introduce

positions after they graduate.

schoolers to learn basic concepts in Machine Learning.

ways through the use of visualizations such as graphs and charts.

_______________________, Committee Chair

has led me with care and caution throughout the project.

in worst times and left me up.

Acknowledgments ..................................................................................................... vii

List of Tables .............................................................................................................. ix

List of Figures ............................................................................................................. x

3. DESIGN CHOICE ................................................................................................. 7

4. IMPLEMENTATION AND DEVELOPMENT .................................................. 19

5. TOOL WALKTHROUGH .................................................................................. 29

1. SVG Path Commands and Parameters ............................................................. 20

1. Types of Charts Based on Visual Encodings .................................................... 9

2. Data Science Pipeline Diagram ....................................................................... 10

3. Data Science Pipeline Diagram (After Interaction) ........................................ 11

4. Data Distribution Chart .................................................................................... 12

5. Data Distribution Chart (After Interaction) ..................................................... 12

6. Initial Classification Dashboard ....................................................................... 14

7. Initial Regression Dashboard .......................................................................... 14

8. Improved Classification Dashboard ................................................................ 16

9. Improved Regression Dashboard .................................................................... 17

10. Code Snippet – SVG Path ................................................................................ 21

11. Code Snippet – SVG Line ............................................................................... 21

12. Code Snippet – SVG Line Generator .............................................................. 22

13. Code Snippet – SVG Area Generator ............................................................. 22

14. Code Snippet – Implementing SVG Area Path ............................................... 22

15. Code Snippet – SVG Circle ............................................................................ 23

16. Code Snippet – D3 Force Simulation .............................................................. 24

17. Code Snippet – ForceX and ForceY ............................................................... 24

18. Code Snippet – Defining D3 Brush ................................................................ 25

19. Code Snippet – Defining Clip Path ................................................................. 25

20. Code Snippet – Adding Brush ......................................................................... 25

22. Code Snippet – Original Data ......................................................................... 27

23. Code Snippet – Normalized Data .................................................................... 27

24. Code Snippet – SKLearn Naïve Bayes ........................................................... 28

25. Code Snippet – Classification Metrics ............................................................ 28

26. Code Snippet – SKLearn Linear Regression .................................................. 28

27. Code Snippet – Regression Metrics ................................................................ 28

28. Data Science Pipeline – Beginning – Healthcare Industry ............................. 29

29. Data Science Pipeline – Completed – Healthcare Industry ............................ 30

30. Data Distribution Chart - Initial State ............................................................. 31

31. Data Distribution Chart - After Interaction ...................................................... 31

32. Classification Dashboard - Initial State ........................................................... 33

33. Classification Dashboard - Data Loaded ......................................................... 33

34. Classification Dashboard - After Prediction ................................................... 34

35. Regression Dashboard - Initial State ............................................................... 35

36. Regression Dashboard - After Prediction ....................................................... 35

Machine Learning is a growing sub-field of Computer Science, and it being

towards Computer Science. Machine Learning and Artificial Intelligence is present in

________, Graduate Coordinator _