Professional Documents
Culture Documents
January 6, 2022
1
Introduction 3
Dataset Description 4
Objective and scope 4
Tables and their variables 5
Relationship-Entity Diagram 9
Modifications to Main Tables 18
Anthropometry Table: 18
Comorbidity Table: 18
Disability Table 18
Exercise Table 18
Table Frequency Eating Habits 18
Eating Habits Table 19
Members Table 19
Table Province Region 19
Region Table 19
General Health Table 19
Sociodemographic Table 19
Vaccination Table 19
Incorporation of Additional Tables 20
Index Comorbidity 20
Disability Index 20
Vaccination Index 20
Individual Income Index 20
Selected Segmentations 21
Explanation of the Report 22
General Data 22
Glossary of Calculated Measures 28
Future Lines 35
Bibliography: 36
Introduction
This document is a final work of integration of the knowledge obtained during the Course of
Data Analytics.
The objective is to process a set of data given by a Database, to present pertinent
information within a report using the Power Bi tool.
The topic was part of the 2nd National Survey of Nutrition and Health (ENNyS).
In the transformation process, the total of the database was taken, and different tables were
formed to represent the selected variables in a more organized and summarized way.
Then it worked on each of these tables to give it a format that allows to be represented in a
more efficient graphic and dynamic way.
A series of measures have also been generated to represent important information
according to this model.
Dataset Description
The dataset is composed of a subset of variables from the dataset corresponding to the data
of the 2nd National Nutrition and Health Survey, collected between 2018-2019 and published
in 2021 by the Ministry of Health of Argentina. For each subject, results were collected on
their sociodemographic variables, general health, eating habits and physical activity (among
other data that was excluded from the project).
The survey is a cross-sectional study of the Argentine Republic in which a survey is applied
to individuals residing in private households in urban localities of 5,000 inhabitants or more.
These localities allowed the representativeness of the following regions:
· GBA: Autonomous City of Buenos Aires (CABA) and 24 parties of the Buenos Aires
Conurbano.
· CENTER: Rest of the province of Buenos Aires, Córdoba, Entre Ríos and Santa Fe.
· NORTHWEST (NOA): Catamarca, Jujuy, La Rioja, Salta, Santiago del Estero and
Tucumán.
· PATAGONIA: Chubut, La Pampa, Neuquén, Río Negro, Santa Cruz and Tierra del Fuego.
On the other hand, for reasons of maximizing the generability of the results and simplifying
their interpretation, only those subjects of legal age were selected, resulting in the following
sample:
The objective of the project is to make an optimal visualization for the results of the
descriptive statistics of the Argentine population in the areas of interest. In addition, it is
intended to visualize the relationship between general health and other variables that
possibly correlate with the former, such as eating, nutritional, physical activity habits, etc.
The end user of the project is anyone who is interested in observing graphically and
quantitatively the relationship between the health of Argentines and other variables, for
example their habits. On the other hand, it could also have value (as an exploratory study)
when planning public policies that aim to maximize health and prevent illness in the general
population.
Table 1: Survey
Contains the total of the subset of variables selected for the project. They are not "clean"
and, therefore, will not be used, since the rest of the tables contain the data already
processed and optimized.
Tabla 3: Región.
Tabla 4: Provincia-Región
Tabla 5: Sociodemografía.
Tabla 7: Ejercicio.
Tabla 9: Comorbilidad.
Relationship-Entity Diagram
The Entity-Relationship diagram originally proposed (from which modifications were made)
focuses on a single primary key (Subject _ID), therefore, the beginning is represented by the
following image corresponding to Table 2:
For more specificity, the relationships are shown in the following list:
Modifications to Main Tables
Anthropometry Table:
1. Column name changed "ID" to "ID_Miembro" to better identify the Primary Key
Comorbidity Table:
1. Column headers were renamed to begin with a capital letter
2. Given the need to have the names of diseases in a single column, the Unpivot tool is
used. This generates duplicate records from the ID column.
3. The "Value" column is eliminated, which previously served as a binary model to
determine if the registry had the disease or not. By doing this, those respondents
who do not have comorbidities were automatically removed from the table (leaving
those without comorbidity reflected in the Members Table).
4. The "attribute" column containing diseases is renamed "comorbidity" and the "ID"
column "ID_Miembro" to better identify the Primary Key
5. From this table is generated Index_Comorbilidad which will be explained in the
section "additional tables"
Disability Table
1. Column headers renamed to begin with a capital letter
2. Column is removed without _disc, which reported that the respondent did not have
disabilities. In this way, the table only contains members with some type of disability
(leaving those without disabilities reflected in the Members Table).
3. An ordering of records according to column ID was carried out
4. Given the need to have the names of the disabilities in a single cotulum, the Unpivot
tool is used. This generates duplicate records from the ID column.
5. The "Value" column, which previously served as a binary model to determine
whether the registry had a disability or not, is deleted.
6. The columna "attribute" containing disabilities is renamed to "Disability" and the "ID"
column to "ID_Miembro" to better identify the Primary Key
7. From this table is generated Index_Discapacidad which will be explained in the
section "additional tables"
Exercise Table
1. Column name changed "ID" to "ID_Miembro" to better identify the Primary Key
Members Table
1. ID column renamed to ID_Miembro to better identify the Primary Key
Region Table
1. Original table as imported from the .xlsx file
Sociodemographic Table
3. Column name changed "ID" to "ID_Miembro" to better identify the Primary Key
Vaccination Table
4. Given the need to have the names of the vaccines in a single column, the Unpivot
tool is used. This generates duplicate records from the ID column.
5. The "Value" column is filtered leaving only results equivalent to "Yes". By doing this,
those respondents who do not have vaccines or did not answer the question were
eliminated.
6. The "Value" column is eliminated, which previously served to determine if the registry
had Vaccines or not, or if it had not answered.
7. ID column renamed to ID_Miembro to better identify the Primary Key
8. From this table is generated Index_Vacunacion which will be explained in the section
"additional tables"
Incorporation of Additional Tables
Index Comorbidity
Used to summarize the number of comorbidities in a table with a single record and unique
index assigned.
It is not an intermediate table, but it works to improve the order by removing duplicates and
applying filters more effectively when manipulating the information.
Disability Index
It is used to summarize the number of disabilities in a table with a single record and a single
index assigned.
It is not an intermediate table, but it works to improve the order by removing duplicates and
applying filters more effectively when manipulating the information.
Vaccination Index
It is used to summarize the number of vaccines in a table with a single record and a single
index assigned.
It is not an intermediate table, but it works to improve the order by removing duplicates and
applying filters more effectively when manipulating the information.
It is used to summarize the types of food in a table with a single record and a single index
asignado.
It serves to summarize the answers regarding the reading of the nutritional information
present on the label of the foods and beverages, in a table with a single record and a single
index assigned.
In order for all the tablas mentioned above to work, calculated columns were made in the
associated tables.
Selected Segmentations
Sex
This segmentation allows you to filter by sex. We estimate that it is a strongly determining
variable of the rest of the variables, so when filtered by sex interesting changes in the graphs
are observed. It is available on all tabs via a button called "Filter Panel".
Region
This segmentation allows you to filter by region. We estimate that it is a strongly determining
variable of the rest of the variables, so when filtered by region interesting changes in the
graphs are observed. It is available on all tabs via a button called "Filter Panel".
Comorbidity
This segmentation allows filtering by the different diseases and is available only in the
"Comorbidities" tab through a button called "Filter Panel". It was created with the aim of
observing how the relationship between the disease or diseases changes with the rest of the
variables available on the flap (e.g., distribution by sex, relationship with BMI, etc.)
Disability
This segmentation allows you to filter by the different disabilities and is available only on the
"Disability" tab through a button called "Filter Panel". It was created with the aim of observing
how the relationship between the disability or disabilities changes with the rest of the
variables available on the flap (e.g., age distribution).
Explanation of the Report
General Data
1. Card that shows what is the average age on the total of respondents. To do this, the
Average Age measure was used.
2. Card that reflects the percentage of people who have some comorbidity over the total
number of respondents. To achieve this, the measure of Percentage People with
Comorbidity had to be used.
3. Pie chart used in order to show how many out of the total number of respondents are
male and female, i.e., by sex. As an attribute of the legend sex was used, and the value
is Count of ID_Miembro.
4. Bar chart that shows the number of respondents according to their occupation. On the
y-axis, his occupation was put and on the x the number of people, while the value is
the top 5 of Count of ID_Miembro.
5. Finally, on this flap, another bar chart was made that shows how many respondents
there are according to their income. In the x-axis, again we have the number of people,
and in the y axis there are income ranges of the people surveyed. The value is Count
of ID_Miembro.
6. Button to open Filter Panel to obtain data according to age ranges, region and sex.
General Health
1. Card that shows the percentage of people who have some health coverage out of the
total number of respondents. To do this, the Percentage of People with Health
Coverage measure is used
2. Card showing the average Body Masa Index on the total number of respondents. To
do this, the measurement automatically generated by power bi, from the
"Anthropometry" table in "Average" mode, is used.
3. Card showing the percentage of people who are overweight/obese out of the total
number of respondents. For this, the percentage of overweight/obese people
measure is used
4. Bar chart showing the average comorbidities according to age, using as colors a
conditional format based on the same average. To do this, the Measure of Average
Comorbidities on the Total Respondents is used as a value and the Age column
of the Sociodemographic table as the axis.
5. Bar chart showing the percentage of obese/overweight people according to age,
using as colors a conditional format based on the same percentage. To do this, the
measure Percentage of Overweight/Obese People is used as a value and the Age
column of the Sociodemographic table as the axis.
6. Button to open Filter Panel to get data by age ranges, region and gender
Comorbidity
1. Card showing the average number of comorabilities over the total number of
respondents. To do this, the Average Comorbidity measure is used on the Total
Respondents
2. Card showing the percentage of people who perceive themselves to be in good
health over the total of respondents. To do this, the measure Percentage of People
with self-perceived health as good is used
3. Infographic showing the top 5 most prevalent of how many people possess that
comorbidity out of ten respondents. To do this, the measure of Average
Comorbidities over the Total Respondents is used as a value and the
comorbidities group as an axis.
4. Stacked Horizontal Bars Chart (100%) showing the percentage of people who have
some comorbidity over 100% (divided according to sex). To do this, the measure
Percentage people with Female Comorbidity, the measure Percentage people
with Male Comorbidity and Percentage comorbidity -1 as values are used; and
the attribute Sex of the Sociodemographic table.
5. AI graph of Key Influencers, which seeks to determine if the Body Mass Index is an
influence variable for the average comorbidities.
To do this, the Average Comorbidities measure on the Total Of Respondents is
used as data to be analyzed, IMC_Interpretacion and the Average BMI measure of
the Anthropometry table as supporting data and Sex of the Sociodemographic table
as data of expansion category
6. Button to open Filter Panel to obtain data according to age ranges, region, gender
and type of comorbidity.
Disability
1. Card showing the percentage of people with disabilities out of the total number of
respondents. For this, the measure Percentage of People with Disabilities is used.
2. Card showing the average number of disabilities over the total number of
respondents. To do this, the Average Disability measure is used over total
Respondents
3. Infographic showing the top 5 most prevalent of how many people have that disability
out of ten respondents. For this, the measure Percentage of People with
Disabilities is used as a value and the Disability group as the axis.
4. Bar chart showing the percentage of people with a disability according to age, using
as colors a conditional format based on the same average. To do this, the measure
Percentage of People with Disabilities is used as a value and the Age column of
the Sociodemographic table as the axis.
5. AI graph of Key Influencers, which seeks to determine if age is an influence variable
for the prevalence of disability. For this, the measure Percentage of People with
Disabilities is used as data to be analyzed and Age as justification data.
6. Button to open Filter Panel to obtain data according to age ranges, region, gender
and disability.
Exercise
1. Card showing the average number of minutes of exercise per week over the total
number of respondents.
2. Card that shows the percentage of people who perform some type of physical
activity. To do this, the measure Percentage of People who do some activity is
used.
3. Graph of stacked horizontal bars (100%) that shows how the moment in which the
exercise is performed (at work, in free time or moving) is distributed over 100% (I
divide according to sex). For this, the measure Number of People Who Do some
Activity Outside Work, the measure Number of People Who Do Some Activity at
Work and Number of People Who Do Some Displacement Activity; and the Sex
attribute of the Sociodemographic table are used.
4. Horizontal Bar Chart showing how many minutes per week of exercise is performed
according to sex. To do this, the average of the measure Total_Intenso and the
average of the measure Total_Moderado as tooltips are used, the average of the
Total measure as value, and the sex attribute of the Sociodemographic table as a
legend.
5. Scatter Chart that reflects the relationship between the average of the Total measure
(x axis) and the average measure of Comorbidities over the Total Respondents (y
axis). Within it we add the calculated measures R and R2 between both variables.
6. Button to open Filter Panel to get data by age ranges, region and gender.
Feeding
1. Card showing how many meals on average are made by respondents on a day-to-
day basis. For this, the measure Amount of Meals per day was used.
2. A card that was used to reflect how many people read the nutrition information out of
the total number of respondents. To achieve this, the measure Percentage of people
who read the nutritional information was used.
3. Bar graph that demonstrates based on the average, how often each food is
consumed in a month. For this, they had to put on the x-axis the frequency with
which they consumed each food within a month, while on the axis and the type of
food. The value is Frequency.
4. Scatter Chart tries to show in a first instance and without making a very deep or
detailed analysis, if there is any kind of correlation between the average BMI and the
reading of the nutrition labels. On the x-axis we find the Nutritional Information
Reading, so much so that on the y-axis the average BMI (Body Mass Index).
5. Button to open Filter Panel to get data by age ranges, region and gender.
Glossary of Calculated Measures
Measured to calculate the average number of meals per day of the total number of
respondents, based on the number of weekly meals.
CALCULATE( DISTINCTCOUNT(Tabla_Miembros[ID_Miembro]),
FILTER(Tabla_Ejercicio ,
Tabla_Exercise[Intense Physical Exercise ] = "Yes" ||
Measure to calculate the number of people who do some activity, either inside or outside of
work, as well as the cardio that means the displacement.
CALCULATE( DISTINCTCOUNT(Tabla_Miembros[ID_Miembro]),
FILTER(Tabla_Ejercicio ,
Tabla_Exercise[Cardio in Displacement] = "Yes"))
A measure to calculate the number of people doing cardio on the go either to or from work
Number of People Doing some Activity at Work
CALCULATE( DISTINCTCOUNT(Tabla_Miembros[ID_Miembro]),
FILTER(Tabla_Ejercicio ,
Tabla_Exercise[Intense Physical Work ] = "Yes" ||
Measure to calculate the number of people who do physical activity within their work
CALCULATE( DISTINCTCOUNT(Tabla_Miembros[ID_Miembro]),
FILTER(Tabla_Ejercicio ,
Tabla_Exercise[Intense Physical Exercise ] = "Yes" ||
Measure to calculate the number of people who do physical activity outside of their work
CALCULATE(AVERAGE('Tabla_Ejercicio'[Total]))/7
Comorbidity Number
DISTINCTCOUNT(Index_Commorbidity[Comorbidity])
Measure to calculate the number of comorbidities that exist in the model (taking into account
the category others as a unit)
Comorbidity percentage -1
Measure to calculate the percentage of personas without comorbidities over the total of
respondents
Return
DIVIDE( CantCobSalud, TotalPersonas)
Measure to calculate the percentage of people with health coverage out of the total number
of respondents
Return
DIVIDE( CantSaludBuena, TotalPersonas)
Measure to calculate the percentage of people who self-perceive their own health as good
over the total of respondents
Measure to calculate the percentage of people who are obese/overweight out of the total
number of respondents
Return
DIVIDE( [Number of People Doing Some Activity], TotalPeople)
Measure to calculate the percentage of people who perform some activity over the total of
respondents
Return
DIVIDE( CantPersonasQueLeen, TotalPersonas)
Return
DIVIDE(PersonasComor,TotalPersonas)
Measure to calculate the percentage of people who have some comorbidity over the total
number of respondents
Measure to calculate the percentage of people of female sex who have some comorbidity
over the total of respondents
Measure to calculate the percentage of male persons who have some comorbidity over the
total number of respondents
Return
DIVIDE(PersonasDisc,TotalPersonas)
Measure to calculate the percentage of people with a disability out of the total number of
respondents
DIVIDE(
CALCULATE(
COUNT(Tabla_Comorbilidad[ID_Miembro]),
ALLSELECTED(Tabla_Comorability[ID_Miembro])),
DISTINCTCOUNT(Tabla_Comorability[ID_Miembro]))
Measure to calculate the average of comorbidities over the population with comorbidities
(Not used in the model)
DIVIDE(
CALCULATE(
COUNT(Tabla_Discapacity[ID_Miembro]),
ALLSELECTED(Tabla_Discapacity[ID_Miembro])),
DISTINCTCOUNT(Tabla_Discapacity[ID_Miembro]))
Measure to calculate the average of disabilities over the population with disabilities (Not
used in the model)
DIVIDE(
CALCULATE(
COUNT(Tabla_Comorbilidad[ID_Miembro]),
ALLSELECTED(Tabla_Comorability[ID_Miembro])),
COUNT(Tabla_Members[ID_Miembro]))
DIVIDE(
CALCULATE(
COUNT(Tabla_Discapacity[ID_Miembro]),
ALLSELECTED(Tabla_Discapacity[ID_Miembro])),
COUNT(Tabla_Members[ID_Miembro]))
Measure to calculate the average number of disabilities over the total number of
respondents
Average Age
AVERAGE(Tabla_Sociodemography[Age])
Measure to calculate the average age (in this way it allows to give it integer format)
Measure to calculate the correlation coefficient between the average total minutes of
exercise with the average of comorbidities over the total of respondents
R2
'Measures'[R]^2
Measure to calculate the coefficient of determination between the average total number of
minutes of exercise with the average of comorbidities over the total of respondents
DISTINCTCOUNT(Tabla_Comorability[ID_Miembro])
Future Lines
Future lines of research could delve into the experimentation and graphing of the causal
relationship between correlations and associations found, for example between minutes of
exercise and the number of diseases. Another aspect to investigate would involve the same
variables but making multiple shots over time, that is, adding a temporal dimension. This will
allow us to observe the evolution of the variables over time.
Bibliography: