You are on page 1of 22

Data Analysis Report

1. Introduction

The purpose of this data analysis report is to explore the responses of 1010
Slovakian nationals aged between 15 and 30 years old to a survey conducted in
2013. The survey consisted of 150 questions on various topics, such as music
preferences, movie preferences, hobbies and interests, phobias, health habits,
personality traits, views on life and opinions, spending habits, and demographics.
The students were also invited to involve their friends, which resulted in a diverse
group of participants with a wide range of backgrounds and interests.

The survey dataset used in this analysis was obtained from Kaggle, an online
platform that hosts a community of professionals in the fields of data science and
machine learning engineering. The questionnaire was written in Slovak language,
and the results were later translated into English for further analysis.

For this analysis, we have focused on the variables related to music, movies, and
demographics. The report is structured in a way that will allow readers to easily
follow along and understand the methods and conclusions of our analysis. We have
divided the report into three main sections: Introduction, Analysis, Conclusion &
Appendices. The Analysis section is further divided into sub-sections that address
specific questions related to the data, and each of these sub-sections includes tables
and graphs to support our findings.

Overall, this report aims to answer some of the main questions related to the music,
movie, and political preferences of the participants. Through this analysis, we hope
to gain a deeper understanding of the attitudes and behaviors of young Slovakian
nationals towards these topics.
2. Analysis

2.1. Data

We analyzed the responses of 1010 Slovakian nationals aged between 15 and 30


years old who participated in a survey in 2013. The survey consisted of 150
questions on various topics. Out of these 150 questions, we narrowed down our
analysis to focus solely on the responses related to music, movies, and
demographics. Our decision to choose these variables was based on their potential
to provide insights into the cultural and social dynamics of this demographic.

2.2. Methods

In our study, we used a mixed-methods approach to gather and analyze data. We


aimed to investigate the relationship between individual preferences for music,
movies, and demographics, and to achieve this, we utilized a publicly available
survey dataset from Kaggle that contained responses from a large and diverse
sample of individuals. First, we performed descriptive analyses to examine the
distribution of responses for each variable and then conducted correlation analyses
to investigate the relationships between them. We used R programming language to
analyze the data and produce these types of representative graphs.

2.3. Music Preferences

Music preferences are an important aspect of people's daily lives, and the
questionnaire we administered aimed to understand how individuals aged between
15 and 30 years old in Slovakia feel about different genres of music. In this survey
musical preferences were one of the most extensively studied variables.

The survey included 19 questions on music, covering a wide range of genres, styles,
and artists. The survey consisted of questions such as "I enjoy listening to music,"
and "I prefer slow-paced music to fast-paced music," which were rated on a scale
from strongly disagree to strongly agree.

Participants were also asked to rate their enjoyment of various genres of music,
including dance, disco, funk, folk, country, classical, musicals, pop, rock, metal, hard
rock, punk, hip hop, rap, reggae, ska, swing, jazz, rock n roll, alternative music,
Latin, techno, trance, and opera. Understanding music preferences can have
implications for a wide range of industries, from music production to advertising and
marketing.

The results of our analysis showed that the most popular genres among the
respondents were pop, rock, and electronic music.

We analyzed the responses related to music preferences and found that 80% of the
respondents liked music, with Rock music being the most popular genre among the
participants.

Female participants were likelier to prefer pop and rock music, while male
participants were more likely to prefer electronic and fast-paced music, such as
techno & trance than women, but rock seems to be among the most popular music
genres between the two genders.
# Load the data and ggplot2 library
library(ggplot2) library(tidyverse)
library(ggExtra) library(hrbrthemes)
library(dplyr) library(viridis)
library(tidyr) data <- read.csv("responses.csv")
library(forcats)

# Subset the data to only include women's ratings for music genres
women_ratings <- data[data$Gender == "female", c("Dance", "Country", "Pop",
"Rock", "Opera", "Techno..Trance", "Swing..Jazz")]

# Calculate the mean rating for each genre


mean_ratings <- colMeans(women_ratings, na.rm = TRUE)

# Create a data frame with the mean ratings and genre names
df <- data.frame(Genre = names(mean_ratings), Rating = mean_ratings)

# Create a bar plot with the mean ratings for each genre
ggplot(df, aes(x = Genre, y = Rating)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Average Ratings for Music Genres by Women") +
ylab("Rating") +
xlab("Music Genre")
# Load the data and ggplot2 library
library(ggplot2) library(hrbrthemes)
library(ggExtra) library(viridis)
library(dplyr) data <- read.csv("responses.csv")
library(tidyr) male_data <- subset(data, Gender ==
library(forcats) "male")
library(tidyverse)

# Subset the data to only include women's ratings for music genres
music_cols <- c("Dance", "Country", "Pop", "Rock", "Opera", "Techno..Trance",
"Swing..Jazz")
male_music_data <- male_data[, music_cols]

# Calculate the mean for each music genre


male_mean_ratings <- colMeans(male_music_data, na.rm = TRUE)

# Create a data frame with the mean ratings and music genre names
male_df <- data.frame(Genre = music_cols, Rating = male_mean_ratings)

# Create a bar plot with the mean ratings for each music genre
ggplot(male_df, aes(x = Genre, y = Rating)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Average Ratings for Music Genres (Men)") +
ylab("Rating") +
xlab("Music Genre")

print(male_df)

However, it's important to note that preferences can vary widely among individuals
and these are just general trends. For example, one female participant might prefer
classical music over pop, while another male participant might enjoy folk music
instead of electronic.

We also found that participants who liked classical music were more likely to be
older and have a higher education level.

# Load the data and ggplot2 library


data <- read.csv("responses.csv") library(forcats)
library(ggplot2) library(tidyverse)
library(ggExtra) library(hrbrthemes)
library(dplyr) library(viridis)
library(tidyr)

# Calculate the percentage of people who rated classical music a 4 or 5 for


each age
classical_percent <- aggregate(data$Classical.music %in% c(4,5),
by=list(data$Age), mean)

# Rename the columns to more meaningful names


names(classical_percent) <- c("Age", "Percentage")

# Create a line plot showing the relationship between age and the likelihood of
liking classical music
ggplot(classical_percent, aes(x=Age, y=Percentage)) +
geom_line(color="#69b3a2", alpha=0.8) +
ggtitle("Likelihood of Liking Classical Music by Age") +
ylab('Percentage of people rating classical music 4 or 5') +
xlab('Age (years)')

# Load the data and ggplot2 library


data <- read.csv("responses.csv") library(forcats)
library(ggplot2) library(tidyverse)
library(ggExtra) library(hrbrthemes)
library(dplyr) library(viridis)
library(tidyr)

# Subset the data to only include columns for education and classical music
rating
subset_data <- data[, c("Education", "Classical.music")]
# Calculate the mean classical music rating for each education level
education_ratings <- aggregate(Classical.music ~ Education, subset_data, mean)

# Create a line plot with the mean classical music ratings for each education
level
ggplot(education_ratings, aes(x = Education, y = Classical.music, group = 1)) +
geom_line(color = "#69b3a2", alpha = 0.8) +
ggtitle("Likelihood of Liking Classical Music by Education Level") +
ylab("Average Classical Music Rating") +
xlab("Education Level") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

2.4. Movie Preferences

Movie preferences can vary greatly from person to person, with different individuals
having different genres that they enjoy or dislike. To understand these preferences,
the questionnaire included questions about the respondent's enjoyment of movies in
general and their specific enjoyment of various movie genres.

The respondents were asked to rate their level of agreement with the statement "I
really enjoy watching movies" on a scale from strongly disagree to strongly agree.
They were then asked to rate their enjoyment of different genres such as horror,
thriller, comedy, romantic, sci-fi, war, tales, cartoons, documentaries, westerns, and
action movies, on a scale from don't enjoy at all to enjoy very much.

The answers to these questions provide valuable insights into the movie preferences
of the respondents, which can help in understanding trends in the movie industry and
can be used to guide the development of new movies.

By analyzing this data we can gain a better understanding of what types of movies
are popular and which ones are less so, and use this information to improve the
movie-viewing experience for audiences.

We analyzed the responses related to movie preferences and found that 97% of the
respondents liked watching movies, with comedy being the most popular genre
among the participants, with more than 80% of respondents enjoying it.
Female participants were more likely to prefer romantic and comedy movies, while
male participants were more likely to prefer action and horror movies.

We also observed a positive correlation between age and education level with
preferences for certain movie genres. Respondents who enjoyed documentaries
tended to be older and have a higher education level, while those who enjoyed
horror movies tended to be younger and have a lower education level.

2.5. Demographics

After analyzing the responses to the demographics questions, we found that the
majority of respondents were between the ages of 18 and 35, with an average age of
20.
# Load the data and ggplot2 library
data <- read.csv("responses.csv") library(forcats)
library(ggplot2) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)

# Create a histogram of the ages of respondents


ggplot(data, aes(x = Age)) +
geom_histogram(binwidth = 1, fill = "#69b3a2", alpha = 0.8) +
ggtitle("Distribution of Ages per response given") +
xlab("Age") +
ylab("Responses given count")

In terms of height and weight, the average height was 170 cm and the average
weight was 70 kg.

The majority of respondents reported having one or two siblings, with only a small
percentage reporting having more than three.
# Load the data and ggplot2 library
data <- read.csv("responses.csv")
library(ggplot2) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)
library(forcats)

# Subset the data to only include the siblings column


siblings_data <- data %>% select(Number.of.siblings)

# Count the number of responses for each value of siblings


siblings_counts <- count(siblings_data, Number.of.siblings, name = "Count")

# Create a bar plot of the siblings counts


ggplot(siblings_counts, aes(x = Number.of.siblings, y = Count)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Number of Siblings per Respondent") +
ylab("Number of Respondents") +
xlab("Number of Siblings")

In terms of gender, the survey had a relatively balanced distribution between male
and female respondents.
# Load the data and ggplot2 library
data <- read.csv("responses.csv")
library(ggplot2) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)
library(forcats)

# Calculate the percentage of males and females


gender_percent <- data %>%
group_by(Gender) %>%
summarise(count = n()) %>%
mutate(percent = count / sum(count) * 100)

# Create a bar plot to compare the percentage of males and females


ggplot(gender_percent, aes(x = Gender, y = percent, fill = Gender)) +
geom_bar(stat = "identity", alpha = 0.8) +
ggtitle("Percentage of Males and Females") +
ylab("Percentage") +
xlab("Gender")

When it came to handedness, the majority of respondents reported being


right-handed, with only a small percentage being left-handed.
# Load the data and ggplot2 library
data <- read.csv("responses.csv")
library(ggplot2) library(forcats)
library(dplyr) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)

# Calculate the percentage of left-handed and right-handed people


hand_percents <- data %>%
group_by(Left...right.handed) %>%
summarize(count = n()) %>%
mutate(percent = count / sum(count) * 100)

# Create a bar plot comparing left-handed and right-handed people


ggplot(hand_percents, aes(x = Left...right.handed, y = percent)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Percentage of Left-Handed and Right-Handed People") +
ylab("Percentage") +
xlab("Handedness")
In terms of education, the majority of respondents had completed secondary school,
with a significant quantity having a college or bachelor's degree. A small number of
respondents were currently primary school pupils.

# Load the data and ggplot2 library


library(ggplot2) library(forcats)
library(dplyr) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)

data <- read.csv("responses.csv")

# Create a table of education levels and their frequency


education_table <- table(data$Education)

# Convert the table to a data frame and rename the columns


education_df <- as.data.frame(education_table)
names(education_df) <- c("Education", "Count")

# Create a bar plot of education levels


ggplot(education_df, aes(x = Education, y = Count)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Education Level of Respondents") +
ylab("Respondents") +
xlab("Education Level") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

The survey revealed that the majority of respondents spent most of their childhood in
a city, with only a few spending most of their childhood in a village.

# Load the data and required libraries


library(ggplot2) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(tidyr) library(viridis)
library(forcats)

# Read the data


data <- read.csv("responses.csv")

# Create a new column that combines "Village" and "city" categories


data$Residence <- ifelse(data$Village...town == "village", "Village", "Town/City")

# Calculate the percentage of people who lived in villages versus towns/cities


residence_percents <- data %>%
group_by(Residence) %>%
summarise(count = n()) %>%
mutate(Percent = count/sum(count) * 100)

# Create a bar plot with the percentage of people who lived in villages versus
towns/cities
ggplot(residence_percents, aes(x = Residence, y = Percent)) +
geom_bar(stat = "identity", fill = "#69b3a2", alpha = 0.8) +
ggtitle("Percentage of People Who Lived in Villages vs Towns/Cities") +
ylab("Percentage") +
xlab("Residence Type")

The majority of respondents reported living in a house or bungalow during their


childhood, with only a majority who lived in a block of flats. According to these, we
may think that there is a link between the fact of living in a town/village and the type
of house you have (flat or house).

# Load the data and ggplot2 library


library(ggplot2) library(tidyverse)
library(dplyr) library(hrbrthemes)
library(forcats) library(viridis)

# Read the data


data <- read.csv("responses.csv")
# Calculate the percentage of people who live in a house vs. a block of flats
house_perc <- sum(data$`House...block.of.flats` == "house/bungalow") / nrow(data)
* 100
flat_perc <- sum(data$`House...block.of.flats` == "block of flats") / nrow(data) * 100

# Create a data frame with the percentages


df <- data.frame(Housing = c("House", "Block of Flats"),
Percentage = c(house_perc, flat_perc))

# Create a bar plot with the percentages


ggplot(df, aes(x = Housing, y = Percentage, fill = Housing)) +
geom_bar(stat = "identity", alpha = 0.8) +
scale_fill_manual(values = c("#69b3a2", "#404080")) +
ggtitle("Percentage of People Who Live in a House vs. a Block of Flats") +
ylab("Percentage") +
xlab("Housing Type")

Rock is most of the time seen as a rebellious, emancipating music, so we could think
that the people who like rock are against the authority, against the government for
example, and not so much into politics. this graph shows that most of the people who
like rock are feeling average or hate politics.

# Load the data and ggplot2 library


library(ggplot2)
library(dplyr)

# Read the data


data <- read.csv("responses.csv")

# Select the columns with the music genre and politics rating
music_cols <- c("Rock")
politics_col <- "Politics"

# Filter the data to include only rows where the respondent gave a rating of 1-5
for politics and 1-5 for rock music
rated_music_data <- data %>%
filter(Rock >= 1, Rock <= 5, Politics >= 1, Politics <= 5) %>%
select(music_cols, politics_col)

# Create a scatter plot to show the relationship between rock music and
politics rating
ggplot(rated_music_data, aes(x = Politics, y = Rock)) +
geom_jitter() +
labs(title = "Correlation between Politics Rating and Rock Music Liked",
x = "Politics Rating",
y = "Rock Music Rating")

We have seen with classical music, which is more seen as a noble and calm type of
music, that people who liked this type of music very much wouldn’t be a majority to
hate or love politics. On the other hand, most of the people who like politics don’t
necessarily hate classical music. The statement is that most people are at the same
time feeling average about politics and classical music, or hating both.
# Load the data and ggplot2 library
library(ggplot2)
library(dplyr)

# Read the data


data <- read.csv("responses.csv")

# Select the columns with the music genre and politics ratings
music_cols <- c("Classical.music")
politics_col <- "Politics"

# Filter the data to include only rows where the respondent gave a rating of 1-5
for politics and 1-5 for classical music
rated_music_data <- data %>%
filter(Classical.music >= 1, Classical.music <= 5, Politics >= 1, Politics <= 5) %>%
select(music_cols, politics_col)

# Create a scatter plot to show the relationship between classical music and
politics rating
ggplot(rated_music_data, aes(x = Politics, y = Classical.music)) +
geom_jitter() +
labs(title = "Correlation between Politics Rating and Classical Music Liked",
x = "Politics Rating",
y = "Classical Music Rating")

We have also noted that there could be a link between the level of education and the
ability to like politics.

# Load the data and ggplot2 library


library(ggplot2)
library(dplyr)

# Read the data


data <- read.csv("responses.csv")

# Select the columns with the politics rating and education level
politics_col <- "Politics"
education_col <- "Education"

# Filter the data to include only rows where the respondent gave a rating of 1-5
for politics
rated_politics_data <- data %>%
filter(Politics >= 1, Politics <= 5) %>%
select(politics_col, education_col)

# Calculate the average politics rating by education level


politics_by_education <- rated_politics_data %>%
group_by(Education) %>%
summarise(mean_politics = mean(Politics))

# Create a line plot to show the relationship between politics rating and
education level
ggplot(politics_by_education, aes(x = Education, y = mean_politics, group = 1)) +
geom_line() +
labs(title = "Average Politics Rating by Education Level",
x = "Education Level",
y = "Politics Rating") +
ylim(1, 5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

These demographic factors can provide insights into the preferences and behaviors
of the survey respondents and can be used to inform future research in the field.

3. Conclusion

All things considered, this data analysis report provides valuable insights into the
music, movie, and demographic preferences of young Slovakian nationals. Our
analysis involved 1010 participants between the ages of 15 and 30, who completed a
survey consisting of 150 questions on various topics.

The results of our analysis highlight some interesting trends in music and movie
preferences among the participants. For example, pop, rock, and electronic music
were found to be the most popular genres among the respondents, while comedy
movies were the most preferred movie genre. We also found that female participants
were more likely to prefer romantic and comedy movies, while male participants
were more likely to prefer action and horror movies.

Our analysis also revealed some interesting demographic data, such as the fact that
the majority of respondents were between the ages of 18 and 35, with an average
age of 25. The majority of respondents reported having one or two siblings, and the
majority spent most of their childhood in a city.
Overall, these findings can be useful for businesses and organizations that target
young Slovakian nationals, such as music and movie streaming services or
marketing agencies. By understanding the preferences and demographics of this
population, these businesses can tailor their products and services to better meet the
needs and interests of their target audience.

You might also like