Professional Documents
Culture Documents
Describing Data and Ethics: Case-Study: Funding For Social Services
Describing Data and Ethics: Case-Study: Funding For Social Services
Overview
When starting on the journey of learning statistics one of the first things that comes
to mind are the controversial ways in which statistics are used. This elicits a
conversation around the ethics of statistics. Consider the following scenario:
Since you are reading these notes you are probably thinking about statistics. Just
like Maria in the example above, we need to think about how we use basic
descriptive statistics in our work and daily lives. Descriptive statistics are basic
operations performed on data. Descriptive statistics do not convey any
significance, prediction, nor certainty, they simply describe the data in front of you.
Before Maria can have any conversations with funding agencies or regulatory
bodies she must be familiar with her clients, staff, hours of service, and outcomes,
which she can learn from descriptive statistics. Maria would not (or should not)
attend a meeting without being informed of this information. As statistical concepts
become more complex, the importance of understanding the fundamental building
blocks of descriptive statistics becomes more apparent, allowing us to correctly
select and apply more advanced inferential statistics. These advanced techniques
allow us to model the real world, based on our own local or sample data.
The first step in understanding descriptive statistics is to start thinking about some
ways in which statistics and numbers are used. Select some media content on a
study, poll, or trend, and you will see statistics. Whether political polling, the
effectiveness of a product, or an analysis of your favorite sports team, statistics is
all around us. Often, we are not even aware of some of the potential misuses of
data in our world. For example, how was the sample drawn? What exact question
was asked? What data were included and what data was not included? The
information in the accompanying materials on ethics (links: ethics in statistics, use
and misuse of numbers, and statistics ethics advice) are presented to get you
thinking about the issues around ethics.
So, as we begin our journey of learning statistics, let’s start with some of the ethics
involved in statistics along with some basic concepts of descriptive statistics.
Objectives
Upon completion of this lesson, you should be able to:
Qualitative data is typically words, but could also be images or other media, we will
refer to this data in this course as categorical. Qualitative data may be labeled with
numbers allowing this type of data to be analyzed using some of the techniques in
the course. Maria might encounter some qualitative data in her work by labeling
some of the mental health diagnoses (depression might be a “1”; anxiety a “2”).
Note how these numerical labels are arbitrary. On the other hand, quantitative data
is the focus of this course and is numerical. If Maria counts the number of patients
seen each day, this data is quantitative.
Categorical variable
Names or labels (i.e., categories) with no logical order or with a logical order
but inconsistent differences between groups, also known as qualitative.
Quantitative variable
Characteristic that varies and can take on any value and any value between
values
Characteristic that varies and can only take on a set number of values
Frequency tables, pie charts, and bar charts are the most appropriate graphical
displays for categorical variables. Below are a frequency table, a pie chart, and a
bar graph for data concerning Mental Health Admission numbers.
Frequency Table
A table containing the counts of how often each category occurs.
Pie chart
Graphical representation for categorical data in which a circle is partitioned into “slices” on
the basis of the proportions of each category.
Pie Chart of Diagnosis
Pitfalls
One of the pitfalls of a pie chart is that if the “slices” only represent percentages the
reader does not know how many actual people fall in each category.
Bar Chart
Graphical representation for categorical data in which vertical (or sometimes
horizontal) bars are used to depict the number of experimental units in each
category; bars are separated by space.
Note that in the bar chart, the categories of mental health diagnoses (bars) have
white spaces in between them. The spaces between the bars signify that this is a
categorical variable.
Pie charts tend to work best when there are only a few categories. If a variable has
many categories, a pie chart may be more difficult to read. In those cases, a
frequency table or bar chart may be more appropriate.
Pitfalls
While bar charts can be presented as either percentages (in which case they are
referred to as relative frequency charts) or counts, the differences among the
heights of the bars are often assumed to be different, even when they are not.
For now, the goal is to summarize the distribution or pattern of variation of a single
quantitative variable.
Histogram
Histograms are graphical displays that can be used with one quantitative variable.
In these plots the horizontal axis represents the values of the variable and the
height of the bar represents how many observations are equal to the particular
value.
From the histogram of children’s heights below, Maria can see that about 10
children have a height equal to “60”.
Pitfalls
People frequently confuse bar charts and histograms. The first test should be to
identify what kind of data you are charting (or what kind of data was charted),
quantitative or categorical. Another hint will be that the x-axis of the histogram will
contain labels that reflect a quantitative variable, bar charts will have an x-axis that
contains category labels, generally not numbers.
To draw a histogram by hand we would:
1. Divide the range of data (range is from the smallest to largest value within the data
for the variable of interest) into classes of equal width.
2. Count the number of observations in each class.
3. Draw the histogram using the horizontal axis as the range of the data values and
the vertical axis for the counts within the class.
Three of the many ways to measure central tendency are the mean, median and
mode.
There are other measures, such as a trimmed mean, that we do not discuss here.
Effects of Outliers
One shortcoming of the mean is that means are easily affected by extreme values.
Measures that are not that affected by extreme values are called resistant.
Measures that are affected by extreme values are called sensitive. As stated,
Maria would use the median if she felt her numbers were could be impacted by
outliers because the median is resistant to outliers.
Adding and Multiplying Constants
What happens to the mean and median if we add or multiply each observation in a
data set by a constant?
Consider for example if an instructor curves an exam by adding five points to each
student’s score. What effect does this have on the mean and the median? The
result of adding a constant to each value has the intended effect of altering the
mean and median by the constant.
For example, if in the above example where we have 9 participation rates for the
South Atlantic states, if 5 was added to each participation rate the mean of this
new data set would be 71.11 (the original mean of 66.11 plus 5) and the new
median would be 78 (the original median of 73 plus 5).
Similarly, if each observed data value was multiplied by a constant, the new mean
and median would change by a factor of this constant. Returning to the 9
participation rates, if all of the original rates were multiplied by 1.20 (a 20 percent
increase), then the new mean and new median would be found by multiplying the
original mean and median by 1.20. As we will learn shortly, the effect is not the
same on the variance!
The Ethical Guidelines address eight general topic areas and specify important
ethical considerations under each topic.
1. Professionalism points out the need for competence, judgment, diligence, self-
respect, and worthiness of the respect of other people.
2. Responsibilities to Funders, Clients, and Employers discusses the
practitioner's responsibility for assuring that statistical work is suitable to the needs
and resources of those who are paying for it, that funders understand the
capabilities and limitations of statistics in addressing their problem, and that the
funder's confidential information is protected.
3. Responsibilities in Publications and Testimony addresses the need to report
sufficient information to give readers, including other practitioners, a clear
understanding of the intent of the work, how and by whom it was performed, and
any limitations on its validity.
4. Responsibilities to Research Subjects describes requirements for protecting the
interests of human and animal subjects of research-not only during data collection
but also in the analysis, interpretation, and publication of the resulting findings.
5. Responsibilities to Research Team Colleagues addresses the mutual
responsibilities of professionals participating in multidisciplinary research teams.
6. Responsibilities to Other Statisticians or Statistical Practitioners notes the
interdependence of professionals doing similar work, whether in the same or
different organizations. Basically, they must contribute to the strength of their
professions overall by sharing nonproprietary data and methods, participating in
peer review, and respecting differing professional opinions.
7. Responsibilities Regarding Allegations of Misconduct addresses the
sometimes painful process of investigating potential ethical violations and treating
those involved with both justice and respect.
8. Responsibilities of Employers, Including Organizations, Individuals, Attorneys, or
Other Clients Employing Statistical Practitioners encourages employers and clients
to recognize the highly interdependent nature of statistical ethics and statistical
validity. Employers and clients must not pressure practitioners to produce a
particular "result," regardless of its statistical validity. They must avoid the potential
social harm that can result from the dissemination of false or misleading statistical
work.
So in dealing with data, not only must we be technically correct in
determining the type of data we have and matching the appropriate
descriptive statistics and graphical representations, we also must do so
in a manner that accurately represents our phenomena and not allow
our own biases and perspectives bend the data. Finally, as a data
consumer, you should become more aware to the possibilities of
misrepresentation of data, the material in this course will facilitate you
learning critical questions as you harness the incredible power and
influence of statistics.