Data Visualization Final

INF4000 Data Visualization
Student Name
Institution
Department
Course
Module
Lecturer
Submission date
Data Visualization
Knowledge Building
Data visualization is a term that refers to the methods that are used to transmit content
visually or information by storing it as graphic elements (Tschandl et al., 2018). The topic
selected is; “Prevalence of Mental Disorders and Substance use Disorders”. With data
visualization, the selected dataset shall be analyzed and provide more information on how
psychological disorders prevail in various parts of the world. The visual charts creates will
also work to provide further information on the nature of mental conditions for the last three
decades. The dataset is made up of 9 columns and 6840 rows. The nine columns include
entity which holds the country name or region where the data is collected from, year column,
and the rest of the columns hold prevalence percentage for specific mental illnesses. These
columns are schizophrenia, alcohol, drug use, anxiety, depression, bipolar, eating disorders.
From a glimpse of the dataset it is evident that depression as well as anxiety is the most
prevalent among all the psychological illnesses.
Datasets are often created for specific research or practical purposes, and can be
obtained from a variety of sources such as government agencies, research institutions, and
online databases (Vieira et al., 2018). In some cases, a researcher or organization may create
their own dataset by collecting data through surveys, experiments, or other methods. There
are many possible sources for datasets on mental disorders. Some examples include:
 Government agencies such as the Centers for Disease Control and Prevention (CDC)
or the World Health Organization (WHO) may collect data on the prevalence and
treatment of mental disorders.
 Research institutions and universities may conduct studies on mental disorders and
make their data available to the public.
 Online databases such as Kaggle or the Open Science Framework may host datasets
on mental disorders that have been shared by researchers.

Data Visualization
My choice of using this particular dataset on mental disorders was largely informed
by the kind of visualization topic chosen and other variety of factors, including the research
question being addressed, the availability of the data, and the suitability of the data for the
intended analysis (Mirman, 2017). It is important to carefully consider the limitations and
biases of any dataset, and to properly cite the source of the data in any published work.
There are a number of observations discovered from visualization of the dataset used.
Firstly, the dataset contains records from all over the world; country wise, per continent as
well as per region. For fast visualization, there was need to filter the data into various
continents, regions and some countries. For instance a filtration of England shows that
bipolar disorder has been on the increase for the last three decades, as seen in figure 1 below.
Additionally, the condition is determined to be influenced mainly by depression and
anxiety though anxiety seems to have emerged towards the end of the last decade. From the
visualization also it is clear that depression factor of Bipolar disorder in England is reducing
over the years. Figure 2 below shows bipolar disorder prevalence in England with other
factors (anxiety and depression) involved.
Figure 1: England’s Bipolar disorder with anxiety

Data Visualization
Figure 2: England chart on Bipolar against years
Figure 3: England Bipolar chart with trend lines
As shown in figure 3, the trend lines shows a steady growth in bipolar cases in
England. This visualization is informed by the fact that majority of people who are diagnosed
with bipolar disorder also suffer from an anxiety issue. These include post-traumatic stress
disorder (PTSD), generalized anxiety disorder (GAD), panic disorder, as well as social
phobia (Lee and Yoon, 2017). Anxiety and depression, either on their own or in conjunction
with another mental health condition, have been linked to a heightened likelihood of suicidal
behavior as well as relational problems.

Data Visualization
Figure 4: Depression in England is declining
From the above visualizations, it is clear that depression levels were high at the
beginning of the study. Also notable is that there is a slight increase in depression
immediately after 2010. This could be the cause of inflation and economic meltdown of the
year 2008.
Theoretical Framework
ASSERT Framework
The ASSERT model is comprised of the following six components: Ask a question to
be answered in the visualization work, investigate possible answers to the question by
looking for evidence. Organize this information so that it can provide a response to the query,
Data Visualization
Imagine other ways to respond to the question using the information that is currently
accessible. Finally, after you have represented the data in a relevant visualization for the
purpose of answering the question, Use these words to tell a story with some meaning (Rees
and Laramee, 2019). The diagram below shows various levels of the assert framework
utilized.
Ask
The question asked for the above visualization is: “What is the relationship of bipolar
disorder to anxiety and depression disorders?” The question provides a clear information of
what specific data are to be searched in the internet or the dataset being analyzed.
Search
The dataset was obtained from Our Word in Data website, https://ourworldindata.org/.
This dataset fully corresponds to the type of visualization envisioned for this task.
Structure
Filtered the data per country to analyze data from England only.
Data Visualization
Envision
I researched on the internet the previous visualizations related to mental disorders
especially in England and how they were visualized the kind of questions answered.
Represent
Use of R Studio to design visual scatterplot charts to determine the relationship
between bipolar disorder with anxiety and depression.
Tell
Explained the results visualized conclusively.
Grammar Graphics
Data: The data for this visualization consists of a table with nine columns: "Entity", “Year”,
“schizophrenia”, “alcohol”, “drug use”, “anxiety”, “depression”, “bipolar”, “eating disorders”
The rows represent the population in percentage for individuals living with the psychological
disorders stated, from different countries in different years.
Aesthetics: The variable "Anxiety" is mapped to the color aesthetic, so each anxiety level is
represented with a different color. The variable "Year" is mapped to the x-axis position, so
the populations are plotted at different points along the x-axis depending on the year. The
variable "Bipolar" is mapped to the y-axis position, so the height of each point on the plot
represents the population of the corresponding country in the corresponding year.
Geometry: The geometry used in this visualization is points, with each point representing the
population of bipolar disorder patients in a particular year.
Scales: The y-axis uses a linear scale, with the minimum value set to 0 and the maximum
value determined by the maximum bipolar prevalence in the data. The x-axis uses an ordinal
scale, with the values corresponding to the years in the data.
Coordinate systems: The visualization uses a Cartesian coordinate system, with the x-axis
representing the years and the y-axis representing the populations.
Annotations: The title and axis labels provide additional context for the visualization.
Data Visualization
Accessibility
Accessibility in visualization refers to the design and use of visualizations in a way
that is inclusive and usable for people with a wide range of abilities and disabilities. This
includes considerations such as visual acuity, color perception, and cognitive abilities, as well
as factors such as cultural and linguistic diversity (Linderman et al., 2019). By considering
these factors, visualizations can be made more accessible and usable for a wider audience.
These visualizations are designed to represent data in a way that is easy to understand
and interact with, even for users who may have visual impairments, hearing impairments,
cognitive impairments, or motor impairments. The figure below shows a chart designed with
high contrasting colors to ensure visibility.
Figure 5: England chart with trendlines
There are a number of different approaches that can be taken when creating
accessibility dataset visualizations. For example, designers can use high-contrast colors, large
font sizes, and clear labels to make the visualization easier to read for people with visual
impairments (Kraak and Ormeling, 2020). They can also provide audio descriptions of the
visualization, allowing users with hearing impairments to access the information.
Additionally, they can simplify the visualization by using simple shapes, patterns, and colors
to represent the data, making it easier to understand for users with cognitive impairments.
Data Visualization
Finally, they can make the visualization touch-based, allowing users with motor impairments
to interact with the data using touch.
There are many benefits to using accessibility dataset visualizations. By making data
more accessible and understandable, these visualizations can help to promote more informed
decision-making and better understanding of complex concepts. They can also help to break
down barriers to information and ensure that all users, regardless of their abilities, have equal
access to data and the insights it can provide.
Accessibility in visualizations is an important tool for making data more accessible
and understandable for all users. By designing these visualizations with accessibility in mind,
we can help to ensure that everyone has the opportunity to fully engage with and understand
the data that is so central to our world today.
 Visual clarity: Visualizations should be designed to be clear and easy to read, using
appropriate font sizes, colors, and layout.
 Color accessibility: Colors should be chosen and used in a way that is legible and
distinguishable for people with different types of color vision impairments.
 Alternative representations: Visualizations should provide alternative representations
of the data, such as text descriptions or data tables, to enable users with visual
impairments to access the information.
 Usability: Visualizations should be easy to use and navigate, with clear labels and
interactive elements that are easy to understand and operate.
By designing dataset visualizations with accessibility in mind, it was possible to make
data more accessible and understandable for a wider range of users. This can be particularly
important for users who may rely on visualizations to understand and interpret data, such as
researchers, analysts, and decision-makers (Goldman et al., 2019). The visuals created
majorly relied on the use of colors to showcase the various trends required. I utilized high-
Data Visualization
contrast colors, large font sizes, and clear labels to make the visualization easier to read for
people with visual impairments. I also simplified visualization by using simple shapes,
patterns, and colors to represent the data, making it easier to understand for users with
cognitive impairments.
Visualization Choice
A scatterplot is a type of data visualization that uses points to represent the values of
two different variables. It is often used to show the relationship between two variables, such
as the relationship between age and income (Murray, 2017). The choice of a scatterplot as the
visualization method can be justified based on the goal of the visualization. Since the goal is
to show the relationship between two variables, a scatterplot was effective choice because it
allows the viewer to see the distribution of the data and how the two variables are correlated.
Scatterplots are particularly useful for identifying patterns and trends in the data, such as a
positive or negative correlation.
Additionally, scatterplots can be enhanced with additional elements such as trend
lines or regression lines to help show the strength and direction of the relationship between
the two variables (Sullivan et al., 2017). This can be useful for making predictions or for
identifying outliers in the data. The scatterplot is a flexible and effective visualization method
for showing the relationship between two variables, making it a good choice for many data
analysis tasks.
There are several alternative visualization methods that can be used to show the
relationship between two variables, depending on the characteristics and goals of the data
analysis. Some possible alternatives to scatterplots include:
1. Line plots: Line plots show the relationship between two variables by connecting
data points with a line. They can be useful for showing trends over time or for
comparing multiple groups. However, line graphs can be difficult to compare when
there are multiple lines on the same graph, as the viewer has to mentally combine the
Data Visualization
lines to compare the trends (Luo et al., 2018). This can be especially challenging
when the lines are closely spaced or have different scales. Additionally, line graphs do
not show the distribution of the data, so it can be difficult to see the underlying
patterns or identify outliers.
2. Bar charts: Bar charts can be used to compare the relationship between two variables
by displaying one variable as the x-axis and the other as the y-axis. This can be useful
for comparing the distribution of the data or for showing the relationship between a
categorical variable and a continuous variable. The downsides of bar chart for this
particular visualization is that it has limited ability to show large amounts of data.
When there are many data points, it can be difficult to fit all of the bars on a single
chart without making the chart cluttered or hard to read. Bar charts are also typically
used to compare the distribution of a categorical variable, so they are not well-suited
for showing trends over time or the relationship between two continuous variables.
3. Heat maps: Heat maps use color to show the relationship between two variables by
encoding the values of one variable as the x-axis and the values of the other variable
as the y-axis. They can be useful for showing patterns and trends in the data,
particularly when there are many data points. However, heat maps have limited ability
to show individual data points: Heat maps use color to encode the values of two
variables, so it can be difficult to see the individual data points or to determine precise
values from the map. Scatterplots, on the other hand, use individual points to
represent the data, making it easier to see and interpret the individual data points.
4. Bubble charts: Bubble charts are similar to scatterplots, but they use the size of the
points to represent a third variable. This can be useful for showing the relationship
between three variables or for adding an additional level of detail to the visualization.
Bubble charts can be difficult to interpret accurately, particularly when the data has a
Data Visualization
wide range of values or when the bubbles are densely packed. This can make it
difficult to determine precise values from the chart. Additionally, bubble charts can be
more complex than scatterplots, as they require an additional variable to be encoded
as the size of the bubbles. This can make them harder to interpret and less suitable for
certain audiences.
The choice of visualization method will depend on the goals of the data analysis, the
characteristics of the data, and the preferences of the audience (Liu et al., 2018). It may be
helpful to experiment with different visualization methods to find the one that best
communicates the insights from the data.
Ethical Implications
The dataset used as well as the topic chosen is related to health information of people
around the world (Martin, 2020). Using visualizations to represent health data can have
important ethical implications, as the way that the data is represented can significantly affect
how it is perceived and understood by the viewer (Pu and Kay, 2020). It is important to
consider the ethical implications of using visualizations in health datasets and to take steps to
ensure that the visualizations are used in a responsible and transparent manner (LaRossa and
Bennett, 2018). Some potential ethical considerations when using visualizations in health
datasets include:
1. Confidentiality: It is important to ensure that the visualizations do not reveal any
personal or identifying information about the individuals in the dataset, as this could
violate their privacy.
2. Misrepresentation: Visualizations can be misleading if they are not designed or used
correctly. It is important to ensure that the visualizations accurately and fairly
represent the data, and to avoid using techniques that might mislead the viewer.
3. Stereotypes: Visualizations can perpetuate stereotypes or biases if they are not
carefully designed and used. It is important to consider the potential impacts of the
Data Visualization
visualizations on different groups of people and to ensure that they do not reinforce
harmful stereotypes.
4. Informed consent: It is important to obtain informed consent from individuals before
using their health data in visualizations, and to ensure that they understand how the
data will be used and shared.
Visualizations can be a powerful tool for communicating health data and insights, but
they can also be used to misinform the public or arrive at inaccurate conclusions if they are
not designed or used correctly (Nolte et al., 2018). For instance, use of selective presentation
of data: Visualizations can be used to present only a portion of the health data, or to exclude
certain data points, in order to support a particular viewpoint or conclusion. This can give a
misleading or incomplete picture of the data and can lead to inaccurate conclusions about the
effectiveness of a treatment or the prevalence of a particular condition.
Some analysts could also use misleading scales in their visualization. The choice of
scale on an axis can significantly affect the appearance of the data and the conclusions that
are drawn from it (Sedrakyan et al., 2019). For example, using a large scale can make small
differences appear larger than they are, which could lead to overstating the importance of a
particular treatment or risk factor. The way that data is encoded in a visualization, such as the
use of color or the position of data points, can affect the conclusions that are drawn from it
(Cao, 2017). For example, using a particular color to represent a certain group or condition
can create unconscious biases in the viewer and lead to inaccurate conclusions.
Additionally, misleading titles or labels in visuals can lead to inaccurate conclusions.
The titles and labels on a visualization can shape the viewer's interpretation of the data (Qin
et al., 2020). Using misleading or biased titles or labels can lead to inaccurate conclusions
about the significance of the data or the implications for public health.
Data Visualization
Proposals
 The visualizations can be improved by making them clearer and easier to read, such
as by using appropriate font sizes, colors, and layout.
 They can also be improved by adding context or background information that helps
the viewer to understand the data and its significance.
 Theses visualizations can be made more engaging and interactive by adding elements
such as hover-over text, filtering options, or zoom functionality.
 These visualizations can be made more accessible and inclusive by adding alternative
representations of the data, such as text descriptions or data tables.
 Ensuring accuracy and transparency by ensuring that they accurately and fairly
represent the data, and by being transparent about the methods and sources used to
create the visualization.

Data Visualization
References
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys
(CSUR), 50(3), 1-42.
Goldman, M., Craft, B., Hastie, M., Repečka, K., McDade, F., Kamath, A., ... & Haussler, D.
(2019). The UCSC Xena platform for public and private cancer genomics data
visualization and interpretation. biorxiv, 326470.
Kraak, M. J., & Ormeling, F. (2020). Cartography: visualization of geospatial data. CRC
Press.
LaRossa, R., & Bennett, L. A. (2018). Ethical dilemmas in qualitative family research. In The
psychosocial interior of the family (pp. 139-156). Routledge.
Lee, C. H., & Yoon, H. J. (2017). Medical big data: promise and challenges. Kidney research
and clinical practice, 36(1), 3.
Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., & Kluger, Y. (2019). Fast
interpolation-based t-SNE for improved visualization of single-cell RNA-seq
data. Nature methods, 16(3), 243-245.
Liu, J., Tang, T., Wang, W., Xu, B., Kong, X., & Xia, F. (2018). A survey of scholarly data
visualization. IEEE access, 6, 19205-19221.
Luo, Y., Qin, X., Tang, N., & Li, G. (2018, April). Deepeye: Towards automatic data
visualization. In 2018 IEEE 34th international conference on data engineering (ICDE)
(pp. 101-112). IEEE.
Martin, K. E. (2020). Ethical issues in the big data industry. In Strategic Information
Management (pp. 450-471). Routledge.
Mirman, D. (2017). Growth curve analysis and visualization using R. Chapman and
Hall/CRC.
Murray, S. (2017). Interactive data visualization for the web: an introduction to designing
with D3. " O'Reilly Media, Inc.".

Data Visualization
Nolte, H., MacVicar, T. D., Tellkamp, F., & Krüger, M. (2018). Instant clue: a software suite
for interactive data visualization and analysis. Scientific reports, 8(1), 1-8.
Pu, X., & Kay, M. (2020, April). A probabilistic grammar of graphics. In Proceedings of the
2020 CHI Conference on Human Factors in Computing Systems (pp. 1-13).
Qin, X., Luo, Y., Tang, N., & Li, G. (2020). Making data visualization more efficient and
effective: a survey. The VLDB Journal, 29(1), 93-117.
Rees, D., & Laramee, R. S. (2019, February). A survey of information visualization books. In
Computer Graphics Forum (Vol. 38, No. 1, pp. 610-646).
Sedrakyan, G., Mannens, E., & Verbert, K. (2019). Guiding the choice of learning dashboard
visualizations: Linking dashboard design and data visualization concepts. Journal of
Computer Languages, 50, 19-38.
Sullivan, B. L., Phillips, T., Dayer, A. A., Wood, C. L., Farnsworth, A., Iliff, M. J., ... &
Kelling, S. (2017). Using open access observational data for conservation action: A
case study for birds. Biological Conservation, 208, 5-14.
Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset, a large collection
of multi-source dermatoscopic images of common pigmented skin lesions. Scientific
data, 5(1), 1-9.
Vieira, C., Parsons, P., & Byrd, V. (2018). Visual learning analytics of educational data: A
systematic literature review and research agenda. Computers & Education, 122, 119-
135.

Data Visualization Final

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Visualization Final

Uploaded by

Copyright:

Available Formats

INF4000 Data Visualization

prevalent among all the psychological illnesses.

treatment of mental disorders.

make their data available to the public.

on mental disorders that have been shared by researchers.

Additionally, the condition is determined to be influenced mainly by depression and

factors (anxiety and depression) involved.

Figure 1: England’s Bipolar disorder with anxiety

Figure 2: England chart on Bipolar against years

Figure 3: England Bipolar chart with trend lines

behavior as well as relational problems.

Figure 4: Depression in England is declining

be answered in the visualization work, investigate possible answers to the question by

between bipolar disorder with anxiety and depression.

“schizophrenia”, “alcohol”, “drug use”, “anxiety”, “depression”, “bipolar”, “eating disorders”

disorders stated, from different countries in different years.

represents the population of the corresponding country in the corresponding year.

population of bipolar disorder patients in a particular year.

scale, with the values corresponding to the years in the data.

representing the years and the y-axis representing the populations.

high contrasting colors to ensure visibility.

Figure 5: England chart with trendlines

visualization, allowing users with hearing impairments to access the information.

to interact with the data using touch.

access to data and the insights it can provide.

Accessibility in visualizations is an important tool for making data more accessible

the data that is so central to our world today.

appropriate font sizes, colors, and layout.

distinguishable for people with different types of color vision impairments.

 Alternative representations: Visualizations should provide alternative representations

impairments to access the information.

interactive elements that are easy to understand and operate.

By designing dataset visualizations with accessibility in mind, it was possible to make

positive or negative correlation.

Additionally, scatterplots can be enhanced with additional elements such as trend

analysis. Some possible alternatives to scatterplots include:

patterns or identify outliers.

more complex than scatterplots, as they require an additional variable to be encoded

communicates the insights from the data.

1. Confidentiality: It is important to ensure that the visualizations do not reveal any

violate their privacy.

2. Misrepresentation: Visualizations can be misleading if they are not designed or used

correctly. It is important to ensure that the visualizations accurately and fairly

3. Stereotypes: Visualizations can perpetuate stereotypes or biases if they are not

4. Informed consent: It is important to obtain informed consent from individuals before

data will be used and shared.

effectiveness of a treatment or the prevalence of a particular condition.

Additionally, misleading titles or labels in visuals can lead to inaccurate conclusions.

as by using appropriate font sizes, colors, and layout.

the viewer to understand the data and its significance.

such as hover-over text, filtering options, or zoom functionality.

representations of the data, such as text descriptions or data tables.

create the visualization.

visualization and interpretation. biorxiv, 326470.

Kraak, M. J., & Ormeling, F. (2020). Cartography: visualization of geospatial data. CRC

psychosocial interior of the family (pp. 139-156). Routledge.

and clinical practice, 36(1), 3.

interpolation-based t-SNE for improved visualization of single-cell RNA-seq

data. Nature methods, 16(3), 243-245.

visualization. IEEE access, 6, 19205-19221.

visualization. In 2018 IEEE 34th international conference on data engineering (ICDE)

(pp. 101-112). IEEE.

Management (pp. 450-471). Routledge.