You are on page 1of 16

INF4000 Data Visualization

Student Name

Institution

Department

Course

Module

Lecturer

Submission date
Data Visualization

Knowledge Building
Data visualization is a term that refers to the methods that are used to transmit content

visually or information by storing it as graphic elements (Tschandl et al., 2018). The topic

selected is; “Prevalence of Mental Disorders and Substance use Disorders”. With data

visualization, the selected dataset shall be analyzed and provide more information on how

psychological disorders prevail in various parts of the world. The visual charts creates will

also work to provide further information on the nature of mental conditions for the last three

decades. The dataset is made up of 9 columns and 6840 rows. The nine columns include

entity which holds the country name or region where the data is collected from, year column,

and the rest of the columns hold prevalence percentage for specific mental illnesses. These

columns are schizophrenia, alcohol, drug use, anxiety, depression, bipolar, eating disorders.

From a glimpse of the dataset it is evident that depression as well as anxiety is the most

prevalent among all the psychological illnesses.

Datasets are often created for specific research or practical purposes, and can be

obtained from a variety of sources such as government agencies, research institutions, and

online databases (Vieira et al., 2018). In some cases, a researcher or organization may create

their own dataset by collecting data through surveys, experiments, or other methods. There

are many possible sources for datasets on mental disorders. Some examples include:

 Government agencies such as the Centers for Disease Control and Prevention (CDC)

or the World Health Organization (WHO) may collect data on the prevalence and

treatment of mental disorders.

 Research institutions and universities may conduct studies on mental disorders and

make their data available to the public.

 Online databases such as Kaggle or the Open Science Framework may host datasets

on mental disorders that have been shared by researchers.


Data Visualization

My choice of using this particular dataset on mental disorders was largely informed

by the kind of visualization topic chosen and other variety of factors, including the research

question being addressed, the availability of the data, and the suitability of the data for the

intended analysis (Mirman, 2017). It is important to carefully consider the limitations and

biases of any dataset, and to properly cite the source of the data in any published work.

There are a number of observations discovered from visualization of the dataset used.

Firstly, the dataset contains records from all over the world; country wise, per continent as

well as per region. For fast visualization, there was need to filter the data into various

continents, regions and some countries. For instance a filtration of England shows that

bipolar disorder has been on the increase for the last three decades, as seen in figure 1 below.

Additionally, the condition is determined to be influenced mainly by depression and

anxiety though anxiety seems to have emerged towards the end of the last decade. From the

visualization also it is clear that depression factor of Bipolar disorder in England is reducing

over the years. Figure 2 below shows bipolar disorder prevalence in England with other

factors (anxiety and depression) involved.

Figure 1: England’s Bipolar disorder with anxiety


Data Visualization

Figure 2: England chart on Bipolar against years

Figure 3: England Bipolar chart with trend lines

As shown in figure 3, the trend lines shows a steady growth in bipolar cases in

England. This visualization is informed by the fact that majority of people who are diagnosed

with bipolar disorder also suffer from an anxiety issue. These include post-traumatic stress

disorder (PTSD), generalized anxiety disorder (GAD), panic disorder, as well as social

phobia (Lee and Yoon, 2017). Anxiety and depression, either on their own or in conjunction

with another mental health condition, have been linked to a heightened likelihood of suicidal

behavior as well as relational problems.


Data Visualization

Figure 4: Depression in England is declining

From the above visualizations, it is clear that depression levels were high at the

beginning of the study. Also notable is that there is a slight increase in depression

immediately after 2010. This could be the cause of inflation and economic meltdown of the

year 2008.

Theoretical Framework
ASSERT Framework
The ASSERT model is comprised of the following six components: Ask a question to

be answered in the visualization work, investigate possible answers to the question by

looking for evidence. Organize this information so that it can provide a response to the query,
Data Visualization

Imagine other ways to respond to the question using the information that is currently

accessible. Finally, after you have represented the data in a relevant visualization for the

purpose of answering the question, Use these words to tell a story with some meaning (Rees

and Laramee, 2019). The diagram below shows various levels of the assert framework

utilized.

Ask
The question asked for the above visualization is: “What is the relationship of bipolar

disorder to anxiety and depression disorders?” The question provides a clear information of

what specific data are to be searched in the internet or the dataset being analyzed.

Search
The dataset was obtained from Our Word in Data website, https://ourworldindata.org/.

This dataset fully corresponds to the type of visualization envisioned for this task.

Structure
Filtered the data per country to analyze data from England only.
Data Visualization

Envision
I researched on the internet the previous visualizations related to mental disorders

especially in England and how they were visualized the kind of questions answered.

Represent
Use of R Studio to design visual scatterplot charts to determine the relationship

between bipolar disorder with anxiety and depression.

Tell
Explained the results visualized conclusively.

Grammar Graphics
Data: The data for this visualization consists of a table with nine columns: "Entity", “Year”,

“schizophrenia”, “alcohol”, “drug use”, “anxiety”, “depression”, “bipolar”, “eating disorders”

The rows represent the population in percentage for individuals living with the psychological

disorders stated, from different countries in different years.

Aesthetics: The variable "Anxiety" is mapped to the color aesthetic, so each anxiety level is

represented with a different color. The variable "Year" is mapped to the x-axis position, so

the populations are plotted at different points along the x-axis depending on the year. The

variable "Bipolar" is mapped to the y-axis position, so the height of each point on the plot

represents the population of the corresponding country in the corresponding year.

Geometry: The geometry used in this visualization is points, with each point representing the

population of bipolar disorder patients in a particular year.

Scales: The y-axis uses a linear scale, with the minimum value set to 0 and the maximum

value determined by the maximum bipolar prevalence in the data. The x-axis uses an ordinal

scale, with the values corresponding to the years in the data.

Coordinate systems: The visualization uses a Cartesian coordinate system, with the x-axis

representing the years and the y-axis representing the populations.

Annotations: The title and axis labels provide additional context for the visualization.
Data Visualization

Accessibility
Accessibility in visualization refers to the design and use of visualizations in a way

that is inclusive and usable for people with a wide range of abilities and disabilities. This

includes considerations such as visual acuity, color perception, and cognitive abilities, as well

as factors such as cultural and linguistic diversity (Linderman et al., 2019). By considering

these factors, visualizations can be made more accessible and usable for a wider audience.

These visualizations are designed to represent data in a way that is easy to understand

and interact with, even for users who may have visual impairments, hearing impairments,

cognitive impairments, or motor impairments. The figure below shows a chart designed with

high contrasting colors to ensure visibility.

Figure 5: England chart with trendlines

There are a number of different approaches that can be taken when creating

accessibility dataset visualizations. For example, designers can use high-contrast colors, large

font sizes, and clear labels to make the visualization easier to read for people with visual

impairments (Kraak and Ormeling, 2020). They can also provide audio descriptions of the

visualization, allowing users with hearing impairments to access the information.

Additionally, they can simplify the visualization by using simple shapes, patterns, and colors

to represent the data, making it easier to understand for users with cognitive impairments.
Data Visualization

Finally, they can make the visualization touch-based, allowing users with motor impairments

to interact with the data using touch.

There are many benefits to using accessibility dataset visualizations. By making data

more accessible and understandable, these visualizations can help to promote more informed

decision-making and better understanding of complex concepts. They can also help to break

down barriers to information and ensure that all users, regardless of their abilities, have equal

access to data and the insights it can provide.

Accessibility in visualizations is an important tool for making data more accessible

and understandable for all users. By designing these visualizations with accessibility in mind,

we can help to ensure that everyone has the opportunity to fully engage with and understand

the data that is so central to our world today.

 Visual clarity: Visualizations should be designed to be clear and easy to read, using

appropriate font sizes, colors, and layout.

 Color accessibility: Colors should be chosen and used in a way that is legible and

distinguishable for people with different types of color vision impairments.

 Alternative representations: Visualizations should provide alternative representations

of the data, such as text descriptions or data tables, to enable users with visual

impairments to access the information.

 Usability: Visualizations should be easy to use and navigate, with clear labels and

interactive elements that are easy to understand and operate.

By designing dataset visualizations with accessibility in mind, it was possible to make

data more accessible and understandable for a wider range of users. This can be particularly

important for users who may rely on visualizations to understand and interpret data, such as

researchers, analysts, and decision-makers (Goldman et al., 2019). The visuals created

majorly relied on the use of colors to showcase the various trends required. I utilized high-
Data Visualization

contrast colors, large font sizes, and clear labels to make the visualization easier to read for

people with visual impairments. I also simplified visualization by using simple shapes,

patterns, and colors to represent the data, making it easier to understand for users with

cognitive impairments.

Visualization Choice
A scatterplot is a type of data visualization that uses points to represent the values of

two different variables. It is often used to show the relationship between two variables, such

as the relationship between age and income (Murray, 2017). The choice of a scatterplot as the

visualization method can be justified based on the goal of the visualization. Since the goal is

to show the relationship between two variables, a scatterplot was effective choice because it

allows the viewer to see the distribution of the data and how the two variables are correlated.

Scatterplots are particularly useful for identifying patterns and trends in the data, such as a

positive or negative correlation.

Additionally, scatterplots can be enhanced with additional elements such as trend

lines or regression lines to help show the strength and direction of the relationship between

the two variables (Sullivan et al., 2017). This can be useful for making predictions or for

identifying outliers in the data. The scatterplot is a flexible and effective visualization method

for showing the relationship between two variables, making it a good choice for many data

analysis tasks.

There are several alternative visualization methods that can be used to show the

relationship between two variables, depending on the characteristics and goals of the data

analysis. Some possible alternatives to scatterplots include:

1. Line plots: Line plots show the relationship between two variables by connecting

data points with a line. They can be useful for showing trends over time or for

comparing multiple groups. However, line graphs can be difficult to compare when

there are multiple lines on the same graph, as the viewer has to mentally combine the
Data Visualization

lines to compare the trends (Luo et al., 2018). This can be especially challenging

when the lines are closely spaced or have different scales. Additionally, line graphs do

not show the distribution of the data, so it can be difficult to see the underlying

patterns or identify outliers.

2. Bar charts: Bar charts can be used to compare the relationship between two variables

by displaying one variable as the x-axis and the other as the y-axis. This can be useful

for comparing the distribution of the data or for showing the relationship between a

categorical variable and a continuous variable. The downsides of bar chart for this

particular visualization is that it has limited ability to show large amounts of data.

When there are many data points, it can be difficult to fit all of the bars on a single

chart without making the chart cluttered or hard to read. Bar charts are also typically

used to compare the distribution of a categorical variable, so they are not well-suited

for showing trends over time or the relationship between two continuous variables.

3. Heat maps: Heat maps use color to show the relationship between two variables by

encoding the values of one variable as the x-axis and the values of the other variable

as the y-axis. They can be useful for showing patterns and trends in the data,

particularly when there are many data points. However, heat maps have limited ability

to show individual data points: Heat maps use color to encode the values of two

variables, so it can be difficult to see the individual data points or to determine precise

values from the map. Scatterplots, on the other hand, use individual points to

represent the data, making it easier to see and interpret the individual data points.

4. Bubble charts: Bubble charts are similar to scatterplots, but they use the size of the

points to represent a third variable. This can be useful for showing the relationship

between three variables or for adding an additional level of detail to the visualization.

Bubble charts can be difficult to interpret accurately, particularly when the data has a
Data Visualization

wide range of values or when the bubbles are densely packed. This can make it

difficult to determine precise values from the chart. Additionally, bubble charts can be

more complex than scatterplots, as they require an additional variable to be encoded

as the size of the bubbles. This can make them harder to interpret and less suitable for

certain audiences.

The choice of visualization method will depend on the goals of the data analysis, the

characteristics of the data, and the preferences of the audience (Liu et al., 2018). It may be

helpful to experiment with different visualization methods to find the one that best

communicates the insights from the data.

Ethical Implications
The dataset used as well as the topic chosen is related to health information of people

around the world (Martin, 2020). Using visualizations to represent health data can have

important ethical implications, as the way that the data is represented can significantly affect

how it is perceived and understood by the viewer (Pu and Kay, 2020). It is important to

consider the ethical implications of using visualizations in health datasets and to take steps to

ensure that the visualizations are used in a responsible and transparent manner (LaRossa and

Bennett, 2018). Some potential ethical considerations when using visualizations in health

datasets include:

1. Confidentiality: It is important to ensure that the visualizations do not reveal any

personal or identifying information about the individuals in the dataset, as this could

violate their privacy.

2. Misrepresentation: Visualizations can be misleading if they are not designed or used

correctly. It is important to ensure that the visualizations accurately and fairly

represent the data, and to avoid using techniques that might mislead the viewer.

3. Stereotypes: Visualizations can perpetuate stereotypes or biases if they are not

carefully designed and used. It is important to consider the potential impacts of the
Data Visualization

visualizations on different groups of people and to ensure that they do not reinforce

harmful stereotypes.

4. Informed consent: It is important to obtain informed consent from individuals before

using their health data in visualizations, and to ensure that they understand how the

data will be used and shared.

Visualizations can be a powerful tool for communicating health data and insights, but

they can also be used to misinform the public or arrive at inaccurate conclusions if they are

not designed or used correctly (Nolte et al., 2018). For instance, use of selective presentation

of data: Visualizations can be used to present only a portion of the health data, or to exclude

certain data points, in order to support a particular viewpoint or conclusion. This can give a

misleading or incomplete picture of the data and can lead to inaccurate conclusions about the

effectiveness of a treatment or the prevalence of a particular condition.

Some analysts could also use misleading scales in their visualization. The choice of

scale on an axis can significantly affect the appearance of the data and the conclusions that

are drawn from it (Sedrakyan et al., 2019). For example, using a large scale can make small

differences appear larger than they are, which could lead to overstating the importance of a

particular treatment or risk factor. The way that data is encoded in a visualization, such as the

use of color or the position of data points, can affect the conclusions that are drawn from it

(Cao, 2017). For example, using a particular color to represent a certain group or condition

can create unconscious biases in the viewer and lead to inaccurate conclusions.

Additionally, misleading titles or labels in visuals can lead to inaccurate conclusions.

The titles and labels on a visualization can shape the viewer's interpretation of the data (Qin

et al., 2020). Using misleading or biased titles or labels can lead to inaccurate conclusions

about the significance of the data or the implications for public health.
Data Visualization

Proposals
 The visualizations can be improved by making them clearer and easier to read, such

as by using appropriate font sizes, colors, and layout.

 They can also be improved by adding context or background information that helps

the viewer to understand the data and its significance.

 Theses visualizations can be made more engaging and interactive by adding elements

such as hover-over text, filtering options, or zoom functionality.

 These visualizations can be made more accessible and inclusive by adding alternative

representations of the data, such as text descriptions or data tables.

 Ensuring accuracy and transparency by ensuring that they accurately and fairly

represent the data, and by being transparent about the methods and sources used to

create the visualization.


Data Visualization

References
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys

(CSUR), 50(3), 1-42.

Goldman, M., Craft, B., Hastie, M., Repečka, K., McDade, F., Kamath, A., ... & Haussler, D.

(2019). The UCSC Xena platform for public and private cancer genomics data

visualization and interpretation. biorxiv, 326470.

Kraak, M. J., & Ormeling, F. (2020). Cartography: visualization of geospatial data. CRC

Press.

LaRossa, R., & Bennett, L. A. (2018). Ethical dilemmas in qualitative family research. In The

psychosocial interior of the family (pp. 139-156). Routledge.

Lee, C. H., & Yoon, H. J. (2017). Medical big data: promise and challenges. Kidney research

and clinical practice, 36(1), 3.

Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., & Kluger, Y. (2019). Fast

interpolation-based t-SNE for improved visualization of single-cell RNA-seq

data. Nature methods, 16(3), 243-245.

Liu, J., Tang, T., Wang, W., Xu, B., Kong, X., & Xia, F. (2018). A survey of scholarly data

visualization. IEEE access, 6, 19205-19221.

Luo, Y., Qin, X., Tang, N., & Li, G. (2018, April). Deepeye: Towards automatic data

visualization. In 2018 IEEE 34th international conference on data engineering (ICDE)

(pp. 101-112). IEEE.

Martin, K. E. (2020). Ethical issues in the big data industry. In Strategic Information

Management (pp. 450-471). Routledge.

Mirman, D. (2017). Growth curve analysis and visualization using R. Chapman and

Hall/CRC.

Murray, S. (2017). Interactive data visualization for the web: an introduction to designing

with D3. " O'Reilly Media, Inc.".


Data Visualization

Nolte, H., MacVicar, T. D., Tellkamp, F., & Krüger, M. (2018). Instant clue: a software suite

for interactive data visualization and analysis. Scientific reports, 8(1), 1-8.

Pu, X., & Kay, M. (2020, April). A probabilistic grammar of graphics. In Proceedings of the

2020 CHI Conference on Human Factors in Computing Systems (pp. 1-13).

Qin, X., Luo, Y., Tang, N., & Li, G. (2020). Making data visualization more efficient and

effective: a survey. The VLDB Journal, 29(1), 93-117.

Rees, D., & Laramee, R. S. (2019, February). A survey of information visualization books. In

Computer Graphics Forum (Vol. 38, No. 1, pp. 610-646).

Sedrakyan, G., Mannens, E., & Verbert, K. (2019). Guiding the choice of learning dashboard

visualizations: Linking dashboard design and data visualization concepts. Journal of

Computer Languages, 50, 19-38.

Sullivan, B. L., Phillips, T., Dayer, A. A., Wood, C. L., Farnsworth, A., Iliff, M. J., ... &

Kelling, S. (2017). Using open access observational data for conservation action: A

case study for birds. Biological Conservation, 208, 5-14.

Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset, a large collection

of multi-source dermatoscopic images of common pigmented skin lesions. Scientific

data, 5(1), 1-9.

Vieira, C., Parsons, P., & Byrd, V. (2018). Visual learning analytics of educational data: A

systematic literature review and research agenda. Computers & Education, 122, 119-

135.

You might also like