Biostatistics - Data and Its Types

A BRIEF OVERVIEW OF DATA AND
IT’S TYPES
NAME: PRERANA CHAKRABORTY

STREAM: MSc. MICROBIOLOGY
PAPER NAME: BIOSTATISTICS
PAPER CODE: MSMC 104
SEMESTER: 1
ROLL NO. 1
DEFINITION
Data is a collection of discrete or continuous
values that convey information, describing the
quantity, quality, fact, statistics, other basic units
of meaning, or simply sequences of symbols
that may be further interpreted formally.
TYPES OF DATA
NOMINAL DATA: Nominal data types in statistics are not quantifiable and cannot be measured through numerical
units. Nominal types of statistical data are valuable while conducting qualitative research as it extends freedom of
opinion to subjects.
ORDINAL DATA: These types of values have a natural ordering while maintaining their class of values. These
categories help us deciding which encoding strategy can be applied to which type of data. In ordinal data type, label
encoding can be applied which is a form of integer encoding.
DISCRETE DATA: The numerical values which fall under are integers or whole numbers are placed under this category.
Discrete data types in statistics cannot be measured – it can only be counted as the objects included in discrete data
have a fixed value. The value can be represented in decimal, but it has to be whole. Discrete data is often identified
through charts, including bar charts, pie charts, and tally charts.
CONTINUOUS DATA: The fractional numbers are considered as continuous values. These can take the form of the
operating frequency of the processors, the android version of the phone, wifi frequency, temperature of the cores,
and so on. Continuous data can break down into smaller pieces and can take any value. Continuous types of
statistical data are represented using a graph that easily reflects value fluctuation by the highs and lows of the line
through a certain period of time.
1.NOMINAL DATA 1. Nationality
This is the most common nominal data example you’ll find. Nationality is a nominal
variable whose data comes from multiple categories depicting countries. Examples could
be American, Irish, Kenyan, Australian, etc. There’s nothing that can be quantified here or
put into hierarchical order. The data just includes countries that people belong to. That’s it.
2.ORDINAL DATA No scale or ranking can be given to that.
2. Economic status (poor, middle income, wealthy), Course grades (A+, A-, B+, B-, C)
3.DISCRETE DATA Education level (Elementary, High School, College, Graduate, Post-graduate)
3. Number of employees
The number of employees a company has is another type of discrete data. Companies may
track their number of employees because this information is relevant to their growth
goals. Some companies also try to maintain a specific ratio of management to lower-level
4.CONTINUOUS employees to ensure every employee receives guidance and direction in their roles.
DATA
4. The amount of time required to complete a project, the height of children ,the amount of
time it takes to sell shoes.
SIMILARITY AND DISSIMILARITY
A similarity measure is a mathematical function that quantifies the degree of similarity between two objects or
data points. It is a numerical score measuring how alike two data points are. It takes two data points as input and
produces a similarity score as output, typically ranging from 0 (completely dissimilar) to 1 (identical or perfectly
similar). A similarity measure can be based on various mathematical techniques such as Cosine similarity, Jaccard
similarity, and Pearson correlation coefficient. Similarity measures are generally used to identify duplicate
records, equivalent instances, or identifying clusters.
A dissimilarity measure is a mathematical function that quantifies the degree of dissimilarity between two
objects or data points. It is a numerical score measuring how different two data points are. It takes two data
points as input and produces a dissimilarity score as output, ranging from 0 (identical or perfectly similar) to 1
(completely dissimilar). A few dissimilarity measures also have infinity as their upper limit. A dissimilarity
measure can be obtained by using different techniques such as Euclidean distance, Manhattan distance, and
Hamming distance. Dissimilarity measures are often used in identifying outliers, anomalies, or clusters.
STATISTICAL SIGNIFICANCE
 Statistical significance refers to the claim that a set of observed data are not
the result of chance but can instead be attributed to a specific cause.
 Statistical significance is important for academic disciplines or practitioners
that rely heavily on analyzing data and research, such as economics,
finance, investing, medicine, physics, and biology.
 A high degree of statistical significance indicates that an observed
relationship is unlikely to be due to chance. The calculation of statistical
significance is subject to a certain degree of error.
 Statistical significance can be misinterpreted when researchers do not use
language carefully in reporting their results. Several types of significance tests
are used depending on the research being conducted.
CONCLUSION
• Data = Knowledge. Good data provides indisputable evidence, while anecdotal evidence, assumptions, or abstract
observation might lead to wasted resources due to taking action based on an incorrect conclusion.
• Data allows organizations to measure the effectiveness of a given strategy: When strategies are put into place to
overcome a challenge, collecting data will allow you to determine how well your solution is performing, and
whether or not your approach needs to be tweaked or changed over the long-term.
• Data allows organizations to more effectively determine the cause of problems. Data allows organizations to
visualize relationships between what is happening in different locations, departments, and systems.
• Data is a key component to systems advocacy. Utilizing data will help present a strong argument for systems
change. Whether you are advocating for increased funding from public or private sources, or making the case for
changes in regulation, illustrating your argument through the use of data will allow you to demonstrate why
changes are needed.
• Data increases efficiency. Effective data collection and analysis will allow you to direct scarce resources where they
are most needed. If an increase in significant incidents is noted in a particular service area, this data can be
dissected further to determine whether the increase is widespread or isolated to a particular site. If the issue is
isolated, training, staffing, or other resources can be deployed precisely where they are needed, as opposed to
system-wide. Data will also support organizations to determine which areas should take priority over others.
ACKNOWLEDGEMENT
I would like to express my gratitude and special thanks to my respected Professor Miss
Prithwa Ghosh to provide me a golden opportunity to work in this project on “A Brief
overview on Data and it’s types” and helped me a lot to complete this project on
time. I came to know about various unknown fields while I was doing this project.
I would also thank my classmates and other Professors who were

successful in answering my questionaires and finalizing this project.

Biostatistics - Data and Its Types

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics - Data and Its Types

Uploaded by

Copyright:

Available Formats

A BRIEF OVERVIEW OF DATA AND

NAME: PRERANA CHAKRABORTY

I would also thank my classmates and other Professors who were

You might also like