You are on page 1of 6

1. What is the topic you have selected for your semester project and why?

y? List three other topics that would be


related to your semester project topic. You do NOT need to have three actual datasets. I am looking for just the
topics so you can use them for your examples in questions 2-5 below.

My topic is to choose where scooters should be parked. Because scooters now are more and more popular [1][2].
Choosing a good place to arrange the scooters is important, which can bring more profit and will be convenient to
customers.
Which places people tend to go by scooter?
Which lines people tend to choose to get to the destination?
What’s the relationship between time and number of trips?

2. Explain what makes a social network dataset different than a traditional dataset such as Excel? Provide an
example (from the 3 you listed in #1) that compares a social network dataset and a traditional dataset. Be sure to
clearly mark the key points.

A traditional dataset is about an entity with its several attribute s.


Trip Start place End place Time Distance Fee
01 A B 5min 2miles $1
This is an example of traditional dataset.
A social network dataset is a specialized language for describing the structure and contents of the sets of
observations that they use.

This is an example of social network dataset. A,B,C,D,E are places. And the data means that there is one trip from
place A to B, 3 trips from A to C, etc.
From two examples, it can be easily found that traditional dataset focuses on the attributes of the entity, while
network dataset focuses more on the relationships among entities or subjects. Also, network data are often not
probability samples, and the observations of individual nodes are not independent.

3. What is a matrix and how does a typical matrix differ from an adjacency matrix? Provide an example of both a
typical matric and an adjacency matrix (from the 3 you listed in #1).

Matrices are basically tables. They are ways of storing number and other things. A matrix is nothing more than a
rectangular arrangement of a set of elements [3]. Rectangles have sizes that are described by the number of rows
of elements and columns of elements that they contain.
In an adjacency matrix, a grid is set up that lists all the nodes on both the X-axis (horizontal) and the Y-axis
(vertical). Then, values are filled in to the matrix to indicate if there is or is not an edge between every pair of
nodes. Typically, a 0 indicates no edge and a 1 indicates an edge [4].
The difference is that in an adjacency matrix, the rows are the same as the columns, while in a typical matrix, the
rows are different from the columns.
Place A Place B Place C
Place A 0 3 5
Place B 2 0 1
Place C 3 0 0
1 of 6
This is an example of adjacency matrix. The matrix is about the trips of scooters in one hour. There are 3 scooter
trips from Place A to Place B, and 2 trips from Place B to Place A. If it is a undirected network, the matrix will be
symmetric. If it is not, the matrix will not necessarily be symmetric.
Louisville Ky
Bird 250
Lime 250
Bolt 150
Spin 150
This is an example of typical matrix, which is about the number of scooters of different producers in Louisville
Ky. Like there are 250 scooters from Bird producer in Louisville Ky.

4. Define what is meant by the shortest path. Explain why finding shortest path between nodes is important
for social network analysis. Provide an example (from the 3 you listed in #1) of the shortest path.

A path is a series of nodes that can be traversed following edges between them. The shortest path is the shortest
distance from one node to another node.
The length of the path represents the speed of information spread. So the shortest path can spread the information
fastest. Like in a supply chain problem, if the path is the shortest, it usually means less time and less cost, which
is important to a company.
Also, the shortest path is important to find the cluster center in a social network.

This example is about finding the shortest path from place A to place G. So that people can spend least time
reaching the destination.

This is also an example. And in this example, we can determine the cluster center based on the shortest path. If D
is the center, the sum of distances from different places to the center will be the shortest. And the company
should park the scooters here.
5. Name three types graphs and select one use for the graph to represent social network data. Provide two graph
examples (from the 3 you listed in #1).

Directed graph, undirected graph. Binary graph, signed graph, valued graph.

This is an example of directed graph. It means that the trip starts at Place A, then passes place B, and finally ends
at place C.
2 of 6
This is also an example, meaning that there are different ways to get to destinations.

6. Find one news article from October 2020 related to the elections, COVID-19, or Hurricane Zeta. Include the link
to the article you selected. Make a list of 10 data points (attributes) you think are important. Create a social
network graph (nodes and edges) from 2-3 of your data points. You do not need to use any software, refer to the
early chapters of the nodes/edges.

I. US election dominates by two parties. The Republicans and The Democrats.


II. The Republicans are the conservative political party
III. The Democrats are the liberal political party.
IV. The Republican Party is also known as the GOP or the Grand Old Party.
V. It has stood for lower taxes, gun rights, and tighter restrictions on immigration in recent years.
VI. Republican presidents include George W Bush, Ronald Reagan, and Richard Nixon
VII. The Democrats have Barack Obama.
VIII. These voters decide state-level contests.
IX. To vote, you need to be a US citizen.
X. To vote, you need to 18 years old. [5]

US Citizen 18 years old

To Vote

US Election

The Democrats.
The Republicans

Liberal political
Conservative party.
political party

GOP lower taxes, gun George W Bush, Barack


rights Ronald Reagan, Obama
and Richard
Nixon

3 of 6
Part II – Practical Application (50 points total)

1. Why did you select your dataset for your semester project topic (i.e. what is the problem you are hoping
to solve from the dataset you selected)?

I’d like to figure out where the scooters should be parked and why? And figure out the relationship
between surroundings and the parking location. If the scooters are parked at the appropriate location, it
will attract more customers so that the company can make more profit. And based on the number of trips
or demand, the number of scooters also need to be decided [2].

2. Explain your dataset in terms of basic demographics (descriptive statistics). What type of statistical
analysis do you plan to perform and what software will you use?

Each Trip ID means a different trip by scooter. So, the sum of Trip ID represents the frequency of use of
scooters. Start date and end date with Trip ID can tell which day how many times the scooters are used.
Start time are same. Trip duration means how long the trip lasts. Trip distance represents the distance of
the trip.

The longitude and latitude can represent the location of the trip’s beginning and ending. Day of Week
represents which day in a week the trip happens. And HourNum is same, meaning the time the trip
happens.
And I am going to use tableau to figure out the relationship between DayOfWeek or HourNum and the
number of trips. It can tell us when scooters are mostly used, and based on the result we can deduce that
what kind of people use the scooter most frequently. And the company can based on the results and
design different promotions attract main customers.
Also I will use excel to filter those data with duration and distance equaling zero. Because those data is
meaningless. Maybe something goes wrong with the system, and leading to those meaningless data.

4 of 6
I will use some software that can project the latitude and longitude on the map. So that I can determine
which places scooters are used mostly so that the company can arrange more scooters there. And based
on the surroundings, the influence on the place can be figured out.
And maybe I can use gephi to tell the clusters center of the trips. So that the scooters can be parked
there.

3. Using your dataset as a starting point, provide an example of unsupervised learning and a second
example of supervised learning. You can either begin with unsupervised or supervised learning and then
add more data to your example if needed.

This is an example of unsupervised learning. It’s about trips from different origin to different
destinations. And we can find the cluster center, which means that place can have more customers. So
that the company can arrange more scooters at the cluster center. And the weight of edges can be the
number of trips.

Week1 Week2 Week3 Week4 Week5


Sum of trips a b c d e
This is an example of supervised learning, which uses given data to predict future data. We can analyze
the trend and use week as X-axis, sum of trips as Y-axis. Finally, we can predict future data, so the
company can decide whether they should add more scooters or reduce the number.

5 of 6
Reference

[1] James, O., Swiderski, J., Hicks, J., Teoman, D., & Buehler, R. (2019). Pedestrians and E-Scooters:
An Initial Look at E-Scooter Parking and Perceptions by Riders and Non-Riders. Sustainability, 11(20),
5591. doi:10.3390/su11205591
[2] Agarwal, H. (2020, March 30). Analyzing E-Scooter Activity through Visualization and Machine
learning in Python. Retrieved November 04, 2020, from https://towardsdatascience.com/analyzing-e-
scooter-activity-through-visualization-and-machine-learning-in-python-a33585b2c29
[3] Rectangular Array. (n.d.). Retrieved November 04, 2020, from
https://www.sciencedirect.com/topics/mathematics/rectangular-array
[4] Adjacency Matrix. (n.d.). Retrieved November 04, 2020, from
https://www.sciencedirect.com/topics/computer-science/adjacency-matrix
[5] US election 2020: A really simple guide. (2020, October 28). Retrieved November 04, 2020, from
https://www.bbc.com/news/election-us-2020-53785985

6 of 6

You might also like