Professional Documents
Culture Documents
methodology and
medical
statistics
Part a
Research
methodology
CHAPTER I: historical background
Artharvaveda classifies diseases into two types Shapthya and Varunya and also
mentions names of several diseases like Kasa, Harima, Kilasa, Jalodara, Gandamala,
Mutraghata, Ashmari, Arbuda, Vishamajwara, etc.
Rigveda mentioned Tridoshas & Tridhatu. In addition to this, Atharvaveda has
clearly mentioned four types of Vayu, Pitta is regarded as Mayu, and Balasa term
has been used for Kapha. All this indicate the research process going on between
the time period of first Veda (Rigveda) and last Veda (Atharvaveda).
Three types of Aushadha Dravyas i.e. Divya, Parthiva and Apya are mentioned.
Organ transplantation, Sanjivani therapy and Rasayana are specifically mentioned
in Rigveda.
Aushadha Sukta of Rigveda is immensely important. It clearly describes the
morphology, habitat, types and uses of Aushadha Dravyas.
Atharvaveda has description of digestive fire, digestive process and seven Dhatu
metabolisms. In Atharvaveda, Yuktivyaparasharya and Daivavyapashraya treatment
principles are mentioned as well.
Furthermore, Visha Vigyan, Shalakya Tantra, Bhoota Vidya, Rasayana and Vajikarna
knowledge had been attained by physicians up to Atharvaveda period.
2) Sushruta Samhita
- Importance is given to surgical and anatomical information of the body.
- Surgical teaching and training methodology, establishment of a hospital and
surgical theater, etc.
- Concept of communicable diseases and different methods of disease
communication (Samsargajanya Vyadhi).
- Concept of Shat Kriya Kala with reference to different stages before onset of
the disease.
- Garbhavakranti vishayak Parishad is the only symposium found in Sushruta
Samhita.
3) Astanga Hridaya
It is a treatise regarded as a summary of Charaka and Sushruta Samhita with
selected additions from other Ayurvedic writers like Agnivesha, Bhela and
Harita to make the knowledge of Ayurveda easier understandable and up-to-
date.
4) Madhava Nidana
This treatise focuses mainly on Nidana Panchaka of various diseases.
5) Bhavaprakasha Nighantu
- Introduction of many new Dravya which cannot be found in other classical
texts are mentioned in Bhavaprakasha Nighantu.
- Introduction of substitute herbs (pratinidhi dravya) for originally prescribed
drugs in Brihata Traya.
6) Sharngadhara Samhita
- The main subject of this classical text is concerned with pharmaceutical
preparations.
- Systemic description of respiration process is available.
7) Kashyapa Samhita
- This treatise has focused on Kaumarabhritya which includes topics such as
obstetrics, midwifery and pediatric diseases.
- Ayupariksha in relation to anthropometric measurements of a child was
described in depth in order to assess not only the longevity but also the luck of
future life.
- Administration of processed gold in children is a unique practice mentioned
in Kashyapa Samihta as Suvarnaprashana.
The earliest narrative describing a medical trial is found in the Book of Daniel, which
says that Babylonian king Nebuchadnezzar ordered youths of royal blood to eat only red
meat and wine for three years, while another group of youths ate only beans and water.
The experiment was intended to determine if a diet of vegetables and water was
healthier than a diet of wine and red meat. At the experiment endpoint, the trial
accomplished its prerogative: the youths who ate only beans and water were noticeably
healthier.
Scientific curiosity to understand health outcomes from varying treatments has been
present for centuries, but it was not until the mid-19th century when an organizational
platform was created to support and regulate this curiosity. In 1945, Vannevar Bush said
that biomedical scientific research was "the pacemaker of technological progress", an
idea which contributed to the initiative to found the National Institutes of Health (NIH) in
1948, a historical benchmark that marked the beginning of a near century substantial
investment in medical research.
CHAPTER iI: Introduction to research
Research
Etymology:
Re- cerchier (to search) from old French -> recerché / recercher -> Research
The earliest recorded use of the term ‘Research’ was in 1577.
The word ‘Research’ is composed of two syllables -> “re” and “search”.
“Re” is a prefix meaning again, a new or over again.
“Search” is a verb meaning to examine closely and carefully, to test and try, or to
probe.
Definition:
- The Advanced Learner’s Dictionary of Current English defines research as
“a careful investigation or inquiry specially through search for new facts in any
branch of knowledge.”
Synonyms:
- ‘Research’ as a noun: Investigation, Experimentation, Testing, Exploration,
Analysis, Fact-finding, Examination, Scrutiny, Probing
- ‘Research’ as a verb: Investigate, conduct, study, enquire into, probe, explore,
analyze, examine, scrutinize, inspect, review, assess, read
Purpose of Research:
Purpose is the reason for which something is done or created or for which
something exists.
Research in whatever field of inquiry has four purposes; i.e. describing, explaining
& predicting phenomena and ultimately controlling events.
1-2) Describing & Explaining: It is the attempt to understand the world we live
in. Research is concerned with acquiring knowledge, establishing facts and
developing new methods.
4) Control: This follows from our knowledge and the successful verification of
hypotheses. Control represents the way in which research can be applied to
real problems and situations, thus helping us to shape our environment.
Nirukti:
अनु = to follow, along, along with, connected with
सन्धान = assembling, meeting, union, aiming at, perceiving, compounding -> in
regard to appropriate knowledge
Anusandhana means to follow appropriate knowledge which is perceived &
compounded.
Paribhasha:
कार्यकारणमावस्र् द्रव्र्ाणाां गुणकमयण ोः ।
परीक्षस्र् स्थापन सम्र्क् अनसु न्धानम् उच्र्ते ॥ (PV Sharma)
Anusandhana is the study of cause and effect relationship between Dravya Guna &
Dravya Karma after several observations and through verifiable examinations to
arrive at a final conclusion.
Paryaya:
अनुसन्धान = research
शध = validation, inquiry
सांश धन = discovery
गवेषण = literal meaning is “the search for the missing cows”. It implies the
search for the missing links between the cause and the effect.
अन्वेषण = desire to search
आधारन्वेषण = basic research
पर्ेषण = to search from all the dimensions
परीष्टि = investigation or inquiry
कल्पना = idea, hypothesis, investigation
मागयण = act of seeking or searching
वीचर्न = inquiry, research
CHAPTER Iii: research in ayurveda
Introduction
-> Refer to Anusandhana
The Central Council for Research in Ayurvedic Sciences (CCRAS), an Autonomous body
under Ministry of AYUSH, Govt. of India is an apex body in India for undertaking,
coordinating, formulating, developing and promoting research on scientific lines in
Ayurvedic Sciences.
The appropriate choice in study design is essential for the successful execution of
biomedical and public health research. Research is generally divided into primary and
secondary research.
Primary research relies upon data gathered from original research expressly for that
purpose. Secondary research relies on single or multiple data sources that are not
collected for a single research purpose.
Classification of Research:
Purpose of the Study Elements of Enquiry Role of the Investigator
1) Basic / Pure research 1) Qualitative research 1) Observational research
2) Applied research 2) Quantitative research 2) Interventional research
3) Mixed research
3) Mixed research
Mixed research or the term ‘mixed methods’ refers to an emergent
methodology of research that advances the systematic integration, or mixing /
combining of qualitative and quantitative data within a single investigation.
1) Observational research
Observational research, also called epidemiological studies, are those where
the investigator is not acting upon study participants, but instead observing
natural relationships between factors and outcomes.
E.g.: The researcher is simply observing the answers of a survey without
influencing the outcome in any way. Another example of an observational
study would be if a researcher was trying to determine the effects that eating
strictly organic foods has on overall health.
2) Interventional research
Interventional research, also called experimental study or clinical trials, are
those where the researcher intercedes as part of the study design. These are
often specifically tailored to evaluate direct impacts of treatment or preventive
measures on diseases.
In this type of research, participants are assigned to receive one or more
interventions (or no intervention) so that researchers can evaluate the effects
of the interventions on biomedical and health-related outcomes.
Participants may receive diagnostic, therapeutic, or other types of
interventions.
E.g.: A study in which the investigator randomly assigns the participants to
receive either aspirin or a placebo for a specific duration to determine whether
the drug has an effect on the future risk of developing cerebrovascular events.
A randomized controlled trial (RCT) is a type of scientific experiment (e.g. a clinical
trial) or intervention study that aims to reduce certain sources of bias when testing the
effectiveness of new treatments; this is accomplished by randomly allocating subjects to two
or more groups, treating them differently, and then comparing them with respect to a
measured response. One group - the experimental group - receives the intervention being
assessed, while the other - usually called the control group - receives an alternative
treatment, such as a placebo or no intervention. The groups are monitored under conditions
of the trial design to determine the effectiveness of the experimental intervention, and
efficacy is assessed in comparison to the control. There may be more than one treatment
group or more than one control group.
The trial may be blinded, meaning that information which may influence the
participants is withheld until after the experiment is complete. A blind can be imposed on
any participant of an experiment, including subjects, researchers, technicians, data analysts,
and evaluators. Effective blinding may reduce or eliminate some sources of experimental
bias.
A well-blinded RCT is often considered the gold standard for clinical trials. Blinded
RCTs are commonly used to test the efficacy of medical interventions and may additionally
provide information about adverse effects, such as drug reactions. A randomized controlled
trial can provide compelling evidence that the study treatment causes an effect on human
health.
CHAPTER v: research process
Research should be a process that converts data into information, information into
knowledge and knowledge into wisdom.
Research methods are understood as all those methods / techniques that are used for
conduction of research.
Research methodology is a way to systematically solve the research problem. It is
understood as a science of studying how research is done. As part of it, various steps are
adopted by the researcher.
E = Ethical -> Amenable to a study that institutional review board will approve.
The scientific study of the roots and paths of knowledge is Epistemology. This term
comes from the Greek “episteme”, meaning “knowledge”, and “logos”, meaning “study” or
“science of”. The study of epistemology focuses on the means for acquiring knowledge and
how we can differentiate between truth and falsehood.
The concept of Pramanas in Ayurveda can be compared with Epistemology. The word
“Pramana” comes from “Pramaa Karanam”. “Prama” means true knowledge, and “Karanam”
can be loosely translated as the “special causative factor”. So Pramana are the means of
knowledge or that by which knowledge is gained. It also stands for testimony, proof and
evidence.
These five were accepted as the source of reliable knowledge. These are essential in every
step of research process. Pramanas are also called as Pariksha in Ayurveda; which denotes
scientific investigation. So, Pramanas can be considered as scientific tools of research.
Nyaya Darshana unravels the six stages involved in the perception process. These
steps are termed as Sannikarsha (contact). This is again a scientific systematic exposition
involving the observation from the gross/superficial to the minute/in depth.
These six points of contact are:
1) Samyoga: Conjoined -> The first step that includes the contact of the substance
with the sense organ.
2) Samyukta Samavaya: Inherently joined -> The second step involving the contact of
the quality of the substance. This is the perception of shape, size, colour, etc. which
are inherently present with the substance.
3) Samyukta Samaveta Samavaya: Inherence in the inherently joined -> The third
step perceives the degree of the quality, like the intensity of redness in watermelons.
6) Visheshana Visheshya Bhava: Relation of the qualification and qualified -> The
non-existence (absence) is perceived at this stage.
Inference is an indirect method of validation valid for all the stages of time. It is based on
the pragmatic logic and reasoning. The research plan also expects the inference as the
main factor. The establishment of the relation between cause and effect is the aim of
logical reasoning known as research.
1) Svartha-anumana
Svartha-anumana Pramana is the inference of knowledge from one’s own perspective
of observation by previous experiences.
Example: While circumnavigating repeatedly around a hill, in few specific areas of the
hill, a person quite often sees fumes emerging along with fire. Having observed the
incidence of association of fume and fire many times, the same person when
happens to see only the fumes over a different spot of the hill, with the help of his
prior knowledge, infers that from his own perspective that it is firing over that new
area of place over the hill where he saw the fumes alone. It is because of his
conception (due to previous experiences of having seen both the fumes and the fire
together), that “Where ever fire is present, there are fumes. Likewise, where there
are fumes, there must be a fire.”
2) Parartha-anumana
Parartha-anumana Pramana is the process of explaining one’s inference to someone
else with the help of Panchavayava Vakya to make the other person aware about it.
It is a method of explaining valid knowledge to someone else, or a tool for convincing
someone through proper reasoning and logic.
v) Conclusion (Nigama)
The final conclusion deals with the approval or the rejection of the hypothesis.
In the present day, the reliable and factual statements of the experienced persons in
their respective knowledge areas are considered to be authentic and are therefore a
valid and approved source of knowledge.
One of the qualities of a researcher is that he/she should acquire existing knowledge and
training in physical & mental skills which are necessary to do the activities implied in
research. The existing knowledge of science can be acquired through authoritative
statements. So, in the process of research, the review of literature can also be
considered as part of Aptopadesha.
CHAPTER viI: ethics in research
Ethics is defined as moral principles that govern a person's behaviour or the conducting of an activity.
Research ethics provides guidelines for the responsible conduct of research. In addition, it
educates and monitors scientists conducting research to ensure a high ethical standard.
Bioethics is the study of the ethical issues emerging from advances in biology and medicine.
Objectivity: Strive to avoid bias in experimental design, data analysis, data interpretation,
peer review, personnel decisions, expert testimony, and other aspects of research.
Integrity: Keep your promises and agreements; act with sincerity; strive for consistency of
thought and action.
Carefulness: Avoid careless errors and negligence; carefully and critically examine your own
work and the work of your peers. Keep good records of research activities.
Openness: Share data, results, ideas, tools, resources. Be open to criticism and new ideas.
Respect for Intellectual Property: Honor patents, copyrights, and other forms of intellectual
property. Do not use unpublished data, methods, or results without permission. Give credit
where credit is due. Never plagiarize.
Responsible Mentoring: Help to educate, mentor, and advise students. Promote their
welfare and allow them to make their own decisions.
Respect for Colleagues: Respect your colleagues and treat them fairly.
Social Responsibility: Strive to promote social good and prevent or mitigate social harms
through research, public education, and advocacy.
Legality: Know and obey relevant laws and institutional and governmental policies.
Animal Care: Show proper respect and care for animals when using them in research. Do not
conduct unnecessary or poorly designed animal experiments.
Human Subjects Protection: When conducting research on human subjects, minimize harms
and risks and maximize benefits; respect human dignity, privacy, and autonomy.
The IEC is headed by a Chairperson and supported by 12 other members including the
Member Secretary. The composition of the IEC is a mix of members from different disciplines
including public health specialist, eminent scientist, medical doctors/clinicians, pharmacologist,
microbiologist, biostatistician, epidemiologist, legal expert and lay person.
As recommended in the Indian Council of Medical Research (ICMR) guidelines, the basic
responsibility of the IEC is “to ensure a competent review of all ethical aspects of the project
proposals received by it in an objective manner”. Guided by the recommendations of the ICMR,
the IEC of IIPH-D endeavours to protect and promote the dignity, rights and wellbeing of
potential research participants; ensure that universal ethical values and international scientific
standards are adhered to and contextualized to suit the local context; to advise, educate and
train the IIPH-D research community periodically; and to apprise its ethics committee member on
any updates.
Ethics in Animal Research:
The ethical assessments related to the use of animals in research are wide-ranging. It is
generally thought that it may be necessary to use laboratory animals in some cases in
order to create improvements for people, animals or the environment. At the same time,
the general opinion is that animals have a moral status, and that our treatment of them
should be subject to ethical considerations.
Such views are reflected in the following positions:
i) Animals have an intrinsic value which must be respected.
ii) Animals are sentient creatures with the capacity to feel pain, and the interests of
animals must therefore be taken into consideration.
iii) Our treatment of animals, including the use of animals in research, is an
expression of our attitudes and influences us as moral actors.
5) Responsibility for minimizing the risk of suffering and improving animal welfare
Researchers are responsible for assessing the expected effect on laboratory
animals. Researchers must minimize the risk of suffering and provide good animal
welfare. Suffering includes pain, hunger, thirst, malnutrition, abnormal cold or heat,
fear, stress, injury, illness and restrictions on the ability to behave normally/naturally.
Researchers must not only consider the direct suffering that may be endured during the
experiment itself, but also the risk of suffering before and after the experiment, including
trapping, labelling, anaesthetizing, breeding, transportation, stabling and euthanizing.
The concept of evidence based medicine (EBM), defined as the “integration of best
research evidence with clinical expertise and patient values”, has been gaining popularity in
the past decade. The practice of EBM involves a process of lifelong self-directed learning in
which caring for patients creates the need for important information about clinical and other
health care issues. EBM recognizes that the research literature is constantly changing. What
the evidence points to as the best method of practice today may change next month or next
year. The task of staying current, although never easy, is made much simpler by
incorporating the tools of EBM such as the ability to track down and critically appraise
evidence, and incorporate it into everyday clinical practice.
The work of people in the field of pediatrics and child health centers on the problems
of children and their families and careers. Questions about diagnosis, prognosis, and
treatment often arise and sometimes the answers to these questions need to be sought.
EBM allows the integration of good quality published evidence with clinical expertise and the
opinions and values of the patients and their families or caregivers. Deciding on how to treat
patients should not be based solely on the available evidence. Other factors such as personal
experience, judgement, skills, and more importantly patient values and preferences must be
considered.
The practice of EBM should therefore aim to deliver optimal patient care through the
integration of current best evidence and patient preferences, and should also incorporate
expertise in performing clinical history and physical examination.
The most important reason for practicing EBM is to improve quality of care through
the identification and promotion of practices that work, and the elimination of those that
are ineffective or harmful.
EBM promotes critical thinking. It demands that the effectiveness of clinical interventions,
the accuracy and precision of diagnostic tests, and the power of prognostic markers should
be scrutinized and their usefulness proven. It requires clinicians to be open minded and look
for and try new methods that are scientifically proven to be effective and to discard methods
shown to be ineffective or harmful.
Data Mining
Data mining is a process of extracting and discovering patterns in large data sets
involving methods at the intersection of machine learning, statistics, and database
systems. Data mining is an interdisciplinary subfield of computer science and statistics
with an overall goal to extract information (with intelligent methods) from a data set and
transform the information into a comprehensible structure for further use.
Data mining is the analysis step of "knowledge discovery in databases" process (KDD).
The term "data mining" is a misnomer, because the goal is the extraction of patterns and
knowledge from large amounts of data, not the extraction (mining) of data itself.
Data Mining Elements:
- Extract, transform, and load transaction data onto the data warehouse system.
- Store and manage the data in a multidimensional database system.
- Provide data access to business analysts and information technology professionals.
- Analyze the data by application software.
- Present the data in a useful format, such as a graph or table.
Data Portal & Database
Portal is a term, generally synonymous with gateway, for a World Wide Web site that is
or proposes to be a major starting site for users when they get connected to the Web or
that users tend to visit as an anchor site.
The basic definition of an open data portal is “A list of datasets with pointers to how
those datasets can be accessed.”
https://pubmed.ncbi.nlm.nih.gov/
https://www.niaid.nih.gov/research/bioinformatics-resource-centers
Research Information Management Systems (RIMs)
Research Information Management Systems (RIMs), sometimes referred to by an
earlier name, the CRIS (Current Research Information System), is a database or
other information system to store and manage data about research conducted at
an institution.
Examples of RIMs:
https://www.elsevier.com/en-in
https://satn.converis.clarivate.com/converis/portal/overview?lang=en_GB
https://www.symplectic.co.uk/
https://ayushportal.nic.in/
http://www.dharaonline.org/Home
Part B
Medical
statistics
CHAPTER I: Introduction
Definitions:
- Statistics is a branch of mathematics dealing with the collection, analysis,
interpretation, presentation, and organization of data.
- Epidemiology is the study and analysis of the patterns, causes, and effects of
health and disease conditions in defined populations.
1) Nature of statistics
According to Tipp “Statistics is both a science and an art.”
As a science, it studies the statistics systematic manner.
As an art, it uses statistics to solve the problems of real life.
Statistics is not a body of substantive knowledge but a body of methods for
obtaining knowledge.
2) Functions of statistics
i) Simplification of data
ii) Presentation of facts in a definite form
iii) Provision of a technique for making comparisons
iv) Provision of guidance in the formulation of policies
v) Enlarge individual experience
vi) Forecasting of future behavior of epidemics
vii) Evaluation of projects by drawing inferences
3) Limitations of statistics
i) Statistics does not study qualitative phenomena.
ii) Statistical laws are not exact and cannot be universally applied.
iii) Statistics does not study individuals.
iv) Statistics can be misused. Statements supported by statistics are more
appealing and more commonly believed. Statistical methods used by less
expert hands will lead to inaccurate results.
Importance of Statistics:
- The word statistics is used as singular or plural. Statistics as a singular noun refers
to the various methods adopted for collection, classifications, analysis and
interpretation. As a plural noun, statistics refer to data for facts.
Example: A sample is the group of individuals who participate in a study, and the
population is the broader group of people to whom the results will apply.
Types of Data:
1) Qualitative Data
A) Nominal, Attribute, or Categorical Data: The assignment of numbers for
classification purpose. Categorical data represents characteristics such as a
person’s gender, marital status, hometown, or the types of movies they like.
Examples:
- Gender/Sex
- Religion (Buddhist, Islamic, Jewish, Christian, Hindu, etc.)
B) Ordinal or Ranked Data: One value is greater or less than another, but the
magnitude of the difference is unknown.
Examples:
- Muscle response (none, partial, complete)
- Visual analogue scale
- Socio-economic status
Types:
1) Numeric Variables
Numeric variables have values that describe a measurable quantity as a
number, like ‘how many’ or ‘how much’.
Therefore, numeric variables are Quantitative variables.
a) Continuous variable: Observations can take any value between a certain
set of real numbers.
Examples: Height, Time, Age, Temperature, etc.
2) Categorical Variables
Categorical variables have values that describe a quality or characteristic of a
data unit, like ‘what type’ or ‘which category’.
Therefore, categorical variables are Qualitative variables.
a) Ordinal variable: Observations can take a value that can be logically
ordered or ranked.
Examples: Academic grades, Clothing size, etc.
Collection of Data
Data collection is the process of gathering and measuring information on targeted
variables in an established systematic fashion, which then enables one to answer
relevant questions and evaluate outcomes.
The goal for all data collection methods is to capture quality evidence which allows
analysis to lead to the formulation of convincing and credible answers to the questions
that have been posed.
Presentation of Data
Data presentation is the method by which people summarize, organize and communicate
information by using a variety of tools, such as diagrams, distribution charts, histograms
and graphs.
Tabular Presentation:
Tabulation is the process of presentation of classified data in a proper order to
facilitate comparison. A statistical table is a presentation of data in vertical columns
and horizontal rows.
Parts of a Table: - 7
1) Table number
2) Title of the table
3) Captions (column heading) & stubs (row heading)
4) Body of the table
5) Prefatory or Headnote
6) Footnotes
7) Source
Classification
i) Simple or one-way table: It is the simplest table which contains data of one
characteristic only. A simple table is easy to construct and simple to follow.
ii) Two-way table: It is a table which contains data on two characteristics. In such a
case, either stub or caption is divided into two coordinate parts.
iii) Manifold table: It is a table which has more than two characteristics of data.
Manifold tables, though complex, are good in practice as they enable full
information to be incorporated and facilitate analysis of all related facts.
As a normal practice, not more than 4 characteristics should be represented to
avoid confusion.
Graphical Presentation:
Ideally, every graph should:
- Include a title below the figure providing all relevant information.
- Be referred to as figures in the text.
- Identify figure axes by the variables under analysis.
- Quote the source which provided the data, if required.
- Demonstrate the scale being used.
- Be self-explanatory.
1) Histogram is defined as a
graphical representation of the
mutually exclusive events. A
histogram is quite similar to the
bar chart. Both are made up of
rectangular bars. The difference
is that there is no gap between
any two bars in the histogram.
The histogram is used to
represent continuous data.
Types of Diagrams:
1) Bar diagram
2) Pie diagram
3) Pictogram
4) Cartogram
4) Cartogram / Cartograph is a
diagram in which the information is
shown in a geographical distribution
of a map. The map is abstracted in
order to translate the information of
the alternative variable.
CHAPTER Iv: measures of location /
Central tendency
Measures of Location
A fundamental task in many statistical analyses is to estimate a location parameter for
the distribution; i.e. to find a typical value or central value that best describes the data.
According to Lawrence J. Kaplan “one of the most widely used set of summary figures is
known as measures of location, which are often referred to as averages, measures of
central tendency or central location.”
Average:
In colloquial language, an average is a middle or typical number of a list of
numbers. Different concepts of average are used in different contexts.
Often ‘average’ refers to the sum of the numbers divided by how many numbers
are being averaged.
In statistics, an average is defined as “the number that measures the central
tendency of a given set of numbers.”
Averages or Measures are statistical constants which enable us to comprehend in a
single effort the significance of the whole. They give us an idea about the
concentration of the values in the central part of the distribution.
Percentile:
A percentile (or a centile) is a measure used in statistics indicating the value below
which a given percentage of observations in a group of observations fall.
Example: The 20th percentile is the value (or score) below which 20% of the
observations may be found.
Example:
Mode:
The mode is the most common number in a set of data. The mode is found by
collecting and organizing the data in order to count the frequency of each result.
The result with the highest occurrences is the mode of the set.
Variability is the extent to which data points in a statistical distribution or data set
diverge from the average, or mean, value as well as the extent to which these data
points differ from each other.
Types of Variability:
1) Biological variability: The natural variability in a lab parameter due to
physiologic differences among subjects and within the same subject over time.
Example:
Problem -> The following data show the weights (in pounds) of 25 boys.
108, 104, 120, 108, 110, 125, 103, 112, 99, 115, 114, 96, 116, 100, 129, 117,
119, 121, 112, 111, 120, 111, 121, 101, 109 -> Find the range of data.
Arrange data in ascending order -> 96, 99, 100, 101, 103, 104, 108, 109, 110,
111, 111, 112, 112, 114, 115, 116, 117, 119, 120, 120, 121, 121, 125, 129
Standard deviation (also represented by the Greek letter sigma σ or the Latin
letter S) is a measure that is used to quantify the amount of variation of a set
of data values.
SD is also known as “root-mean square deviation” as it is the square root of
means of the squared deviations from the arithmetic mean.
3) Standard Error
A standard error (SE) is the standard deviation of the sampling distribution of a
statistic. Standard error is a statistical term that measures the accuracy with
which a sample represents a population.
In statistics, a sample mean deviates from the actual mean of a population;
this deviation is the standard error.
The term probability is derived from Latin ‘probabilitas’, from ‘probabilis’ meaning
‘provable’, ‘credible’ – ‘something likely to be true.’
Probability may be defined as the proportion of favorable outcomes to the total number
of possibilities.
Probability is the measure of the likelihood that an event will occur. The higher the
probability of an event, the more certain the event will occur.
Examples:
- When a single die is thrown, there are six possible outcomes:
123456
The probability of any one of them is 1/6.
- There are 5 marbles in a bag; 4 are blue, and 1 is red. What is the probability that a
blue marble gets picked?
Number of ways it can happen: 4 (there are 4 blues)
Total number of outcomes: 5 (there are 5 marbles in total)
So the probability = 4/5 = 0.8
Probability does not tell us exactly what will happen, it is just a guide.
The analysis of events governed by probability is called statistics.
Test of Significance:
Test of significance is a statistical technology used for ascertaining the likelihood of
empirical data, and from there, for inferring a real effect, such as a correlation between
variables or the effectiveness of a new treatment.
A typical test of significance comprises two elements:
i) Calculation of the probability of the data.
ii) Assessment of the statistical significance of that probability.
In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis
(also known as a "false positive" finding or conclusion; example: "an innocent person is
convicted"), while a type II error is the non-rejection of a false null hypothesis (also
known as a "false negative" finding or conclusion; example: "a guilty person is not
convicted").
Parametric test is the hypothesis test which provides generalizations for making
statements about the mean of the parent population.
Non-parametric test is defined as the hypothesis text which is not based on underlying
assumption, i.e. it does not require population’s distribution to be denoted by specific
parameters.
COMMONLY USED TESTS ‘Z’ test, Student’s ‘t’ test: paired Chi-square test, Fisher’s exact
and unpaired, ‘F’ test, ANOVA test, McNemar’s test, Wilcoxon
test test, Mann-Whitney U test
Parametric Tests:
1) ‘Z’ test
It is a statistical test where normal distribution is applied and is basically used
for dealing with problems relating to large samples when n ≥ 30.
(n = sample size)
3) ‘F’ test
It is a statistical test that is used to determine whether two populations having
normal distribution have the same variances or standard deviation.
Statistical software are specialized computer programs for analysis in statistics and
econometrics.
The most commonly used packages for statistical analysis are:
1) MICROSOFT EXCEL
2) SPSS
3) SAS
4) STATA
5) R
6) MINITAB
1) MICROSOFT EXCEL
It is part of the Microsoft Office suite of programs. Version 1.0 was first released in 1985.
Advantages:
- Easy to use and interchanges nicely with other Microsoft products.
- Excel spreadsheets can be read by many other statistical packages.
- Add-on module which is part of Excel for undertaking basic statistical analyses.
Disadvantages:
- Excel is designed for financial calculations, although it is possible to use it for many
other things.
- It cannot undertake more sophisticated statistical analyses without purchase of
expensive commercial add-ons.
2) SPSS
SPSS stands for Statistical Package for the Social Sciences. It was one of the earliest with
Version 1 being released in 1968.
Advantages:
- Easy to learn and use.
- Can use either menus or syntax files.
- Excels at descriptive statistics, basic regression analysis, analysis of variance, and some
newer techniques such as Classification of Regression Trees (CART).
Disadvantages:
- Has few of the more powerful techniques required in epidemiological analysis, such as
competing risk analysis or standardized rates.
3) SAS
SAS stands for Statistical Analysis System. It was developed at the North Carolina State
University in 1966.
Advantages:
- Can use either menus or syntax files.
- More powerful than SPSS.
- Commonly used for data management in clinical trials.
Disadvantages:
- Harder to learn and use than SPSS.
4) STATA
Stata is a more recent statistical package with Version 1 being released in 1985. Since
then, it has become increasingly popular in areas of epidemiology and economics.
Advantages:
- Can use either menus or syntax files.
- More powerful than SPSS; equivalent to SAS.
- Excels at advanced regression modelling.
- Has its own in-built structural equation modelling.
- Has a good suite of epidemiological procedures.
- Researchers around the world write their own procedures in Stata, which are then
available to all users.
Disadvantages:
- Harder to learn and use than SPSS.
- Does not yet have specialized techniques such as CART or Partial Least squares
regression.
5) R
S-plus is a statistical programming language developed in Seattle in 1988. R is a free
version of S-plus developed in 1996.
Advantages:
- Very powerful; easily matches or even surpasses many of the models found in SAS or
Statas.
- Researchers around the world write their own procedures in R, which are then available
to all users.
- Free of charge
Disadvantages:
- Much harder to learn and use than SAS or Stata.
6) MINITAB
Minitab is a command- and menu-driven software package for statistical analysis. It is a
statistical package developed at the Pennsylvania State University in 1972.
Advantages:
- Minitab is a versatile statistics package that is cheaper and requires less disk space than
SPSS and SAS.
- Analysis can be performed using drop-down menus or syntax, accommodating both
beginners and advanced users.
- Simplicity makes it easy to learn for beginners.
- User interface and output available in English, French, German, Japanese, Korean,
Simplified Chinese, and Spanish.
Disadvantages:
- The range of statistical analyses that Minitab can perform straight after installation is
not as wide as in other packages such as SPSS and SAS.
- Minitab is primarily a statistical analysis package, and as such is a weaker choice for
pure mathematical uses, with less ability to perform mathematical and numerical
analyses.
- SS allows researcher to write their own procedures which may be easily shared and
made available to all users depending on the used software.
- Teaching statistics at the university level is constantly changing due to the influence of
modern technology. The use of statistical software for computations and visual
representations, enable students’ active knowledge constructions by “doing” and
“seeing” statistics.