Professional Documents
Culture Documents
Group 5 of N03B
Jon Eiro D. Andal
Eliana Joelle L. Foronda
Gwyneth Elizabeth Ann F. Galicia
Darrel Danier A. Leander
Table of Contents
Abstract……………………………………………………………………………………3
Introduction………………………………………………………………………………..4
Problem Definition………………………………………………………………………10
Methodology……………………………………………………………………………..13
Data Representation and Analysis……………………………………………………….23
Statistical Analysis……………………………………………………………………….28
Conclusions and Recommendations……………………………………………………..38
Appendices……………………………………………………………………………….40
Bibliography……………………………………………………………………………..53
2
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Abstract
Liver disease has been ranked the tenth most common cause of death in India, as
per the World Health Organization. The alarming amount of individuals that endure and
suffer from liver-related diseases has been consistently increasing due to the
overconsumption of alcohol, misuse of drugs, harmful inhalation of toxic substances,
hereditary occurrences, and a deskbound lifestyle. In this context, this statistical research
would primarily utilize the Indian Liver Patient dataset that comprises 416 liver patient
records and 167 non-liver patient records collected from North East of Andhra Pradesh,
India. In this statistical study, there is an endeavor to place a strong emphasis on
comparing the findings of individuals who have and those who do not have liver disease
and to intuit if the patients’ amount of different chemical compounds in their blood have
an impact on the results of their liver function blood test. This paper would construct,
implement, and test four distinct statistical methods, namely, confidence interval
estimation, single and double parameter hypothesis testing, and simple linear regression.
The variables that are significant and would be investigated for this data collection are
total and conjugated bilirubin, amount of ALP, albumin, and the ratio of albumin and
globulin.
3
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
I. Introduction
The human liver is a vital organ that also serves as a gland in the body. Inside the
human body, this highly crucial organ performs around 500 different functions. In
essence, it is considered as the largest solid organ in the body that incorporates the ability
to regenerate. The liver is required for the digestion of food as well as the elimination of
toxins from the body (Cleveland Clinic, n.d.). In the industrialized world, one of the most
common causes of liver disease is alcohol consumption. Moreover, the liver can be
severely damaged by excessive alcohol consumption, inhaling hazardous gases, eating
contaminated food, certain viruses, and the usage of drugs (Mayo Clinic, n.d.). The
inheritance of liver disease is also possible, especially if there are individuals in the
family that had liver problems in the past. The diagnosis of liver diseases is executed
through the liver function test (Gulia et al, 2014).
4
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
sample (Britanicca, n.d.). Lastly, the Simple Linear Regression involves creating
assumptions about the data. The link between two continuous variables is modeled using
simple linear regression. Usually, the goal is to use the value of an input (or predictor)
variable to forecast the value of an output variable (or response) (JMP, n.d.).
The statistical study with the utilization of the four aforementioned statistical tests
is applied with the use of the Indian Liver Patient Dataset (ILPD) and makes use of the
variables: total and conjugated bilirubin, amount of ALP, albumin, and albumin and
globulin ratio. The results of the various statistical tests are derived from the
computerized solutions and statistical softwares/extensions which are PHstat and
Statistica.
5
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Alanine Aminotransferase
Normally found in the cells of the liver and heart. When the liver or heart is
injured, it is released into the bloodstream.
Albumin
Protein that is soluble in water and may be coagulated with heat, such as in egg
whites, milk, and blood serum.
Alkaline Phosphatase
In the liver, bone, and other tissues, an enzyme that liberates phosphate in alkaline
circumstances.
Aspartate Aminotransferase
When your liver or muscles are harmed, this enzyme is released.
Bilirubin
An orange-yellow pigment formed in the liver by the breakdown of hemoglobin
and excreted in bile.
Epidemiology
In this discipline of medicine, illnesses and other health-related variables are
discussed in terms of their occurrence, distribution, and possible control.
Enzyme
In a living organism, proteins that speed up the rate of a chemical reaction. An
enzyme acts as catalyst for specific chemical reactions, converting a specific set of
reactants into specific products.
Hepatotoxicity
6
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Liver Cirrhosis
A late stage of scarring of the liver caused by many forms of liver diseases and
conditions, such as hepatitis and chronic alcoholism.
Total Protein
The total quantity of protein in the serum is measured using a biochemical test.
Albumin and globulin are two types of protein found in the blood.
7
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
8
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
9
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
However, issues may arise if other aspects of a person are not considered in the
interpretation of a blood test. Differences in characteristics such as gender and age might
alter the results enough to cause a misdiagnosis. In a study conducted by Guy & Peters
(2013) that is focused on the differences in the impact of gender to risk of liver disease:
women are more likely to have acute liver failure, autoimmune hepatitis, benign liver
lesions, primary biliary cirrhosis, and toxin-mediated hepatotoxicity. Though having less
chances in contracting malignant liver tumors, primary sclerosing cholangitis, and viral
hepatitis. Men however, are more at risk in dying from chronic liver disease and cirrhosis
by two-fold. There is no difference if the main cause of liver complications are due to
alcohol consumption.
When it comes to age, it is a fact that the older population is at higher risk for
liver disease as concluded by Kim, Kisselva, and Brenner (2015) in a research focused on
the effects of aging on the liver. The reason for this is that the volume of the liver
10
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
decreases as one ages. For people that are 65 years old and older, a 35% decrease in liver
blood volume was observed when compared to those who are less than 40 years old. In
the same study, it was also found that there is also a decrease in the mass of functional
liver cells. Although there were mixed results about the effects of age on the liver, it was
observed that aging does have an effect on blood components related to the organ.
Humans will find trouble in maintaining normal levels of serum albumin as it decreases
with age. Other than that, cholesterol and fat volume in the liver increases, the
metabolism of the low-density lipoprotein cholesterol decreases by 35%, and serum
γ-glutamyltransferase and alkaline phosphatase levels are elevated.
Furthermore, In another study that has a similar goal— albeit more specific,
conducted by Rosenthal & Pincus (1984), it was found that the mean serum bilirubin
concentrations in men far exceeds those found in women. The serum bilirubin levels were
also at its highest at ages 19-24, slowly declining as one ages. Hence, adding substance to
the claim that there is distinction between the different groups and their liver function
results.
Preliminary Hypothesis
Making inferences with the aforementioned statements, the researchers have
hypothesized that there is a significant difference in blood test results among liver and
non-liver disease patients.
11
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
a. Doctors - the study will help them to know which enzymes present in the blood
would help in determining which patient does have liver disease. It would become
easier for them to diagnose people.
c. Future Researchers - this study will serve as reference data for those who are
planning to research this kind of topic and will help them gather data more easily.
Aside from that, this study will also help them conduct more promising research
that may help us someday.
d. Students - the research paper would be beneficial for academic purposes, such as
studying organology, research, statistics ,and more. It could also serve as
reference for those who want to learn more about this topic.
e. Other medical professionals - after finishing the study, the paper would contain
important data that would compare the result of the liver function blood test of
patients with and without liver disease. These data would help medical
professionals to increase their efficiency in diagnosing patients with liver disease.
12
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
V. Methodology
A. Variables
The variable that would be handled for the Confidence Interval Estimation of the
Mean would be the total amount of Bilirubin in mg/dL in patients that would be derived
from the Indian Liver Patient Dataset taken from the UCI Machine Learning Repository.
The chemical compound, Bilirubin, is examined in liver function tests as it can determine
whether or not you may show signs of liver disease. Moreover, having a low level of
bilirubin in your blood is normal, but having a high level might indicate liver illness
(URMC, n.d.). The researchers would like to investigate the range of values that
encapsulates the true value of the unknown parameter, which is the total bilirubin in
mg/dL present in the patients. With the use of the confidence interval estimation, this
particular statistical test would provide a lower and upper estimate instead of a single
value for the mean. The Indian Liver Patient Dataset consists of 416 liver patient records
and 167 non liver patient records (a total sample size of 583 patients). The total bilirubin
consists of the combination of direct and indirect bilirubin. In addition, the total bilirubin
would signify if an individual could have an underlying disease, especially if it is in high
amounts. When too much bilirubin, a chemical contained in the bile of the liver, seeps
into the bloodstream, it can lead to a variety of health issues (Cleveland Clinic, n.d.). A
normal range of total bilirubin goes from 0.1 to 1.2 mg/dL (Mount Sinai Hospital, n.d.).
The researchers would perform this particular statistical test in order to determine
if the confidence interval estimation of the mean would fall under the range of normal
13
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
amount of total bilirubin in mg/dL. With respect to the aforementioned dataset (ILPD), it
is apparent that the sample size is large enough, having 583 patients (n ≥ 30) and the
sigma is known, therefore, the application of the Central Limit Theorem (CLT) is valid,
hence the Z-distribution should be used. In view of the fact that the CLT is followed, X̄
will be approximately normal with a sample mean equal to the population mean. In this
specific confidence interval estimation, the researchers would make use of the default
confidence level, which is 95%. Given that the confidence level is 95%, the value of
alpha (α) would be 0.05. Additionally, α/2 would be 0.025. Then, the sample mean X̄
must be solved using the formula:
x̄ = ( Σ xi ) / n
After selecting the sample statistic and confidence level, the determination of the
margin of error is needed. The percentage point difference between your results and the
true population value is known as the margin of error. The formula for the margin of error
is:
Zα/2
The value that would be derived from the margin of error formula above would be
the numerical value that must be both subtracted from and added to the sample mean, X̄.
(X̄± margin of error). Subsequently, the two numerical values would respectively be the
lower and upper limit. The lower and upper limits would then dictate the range of
numbers that would enclose the true value of the mean. The confidence interval with 95%
confidence should exhibit this figure:
14
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
X̄-Zα/2 X̄+Zα/2
A. Variables
The variable that the researchers would utilize would be the Alkaline phosphate (ALP)
levels in international units per liter (IU/L) that came from the health records of patients.
These ALP levels would then be used for the hypothesized mean to test the researcher’s
Alternative hypothesis which aims to find out whether the mean ALP levels of the
patient’s would be above the normal, which would deem their ALP levels abnormal that
contributes to a higher risk of getting liver disease. The data that would be used will have
a sample size of 583 along with their ALP levels which are taken from the Indian Liver
Patient Records Data which are collected from Northeast of Andhra Pradesh, India.
According to the Cleveland clinic (2021) a normal ALP Level is typically 40 IU/L to 147
IU/L. Furthermore, Balingit (2013) said that an ALP level higher than 150 IU/L is
deemed abnormal, which supports the research made by the Cleveland clinic. So, the
researcher would utilize the upper mean level from the range which is 147 IU/L for its
hypothesized mean.
The variables are measured using the data gathered from the health records of the
patients with the help of the UCI machine learning repository, where all the data
15
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
regarding the level of alkaline phosphatase in the patient’s blood is shown individually.
The alkaline phosphatase level of the patient’s is considered as ratio data, wherein there is
a measurable difference between the ALP levels of each patient and can be ordered from
highest to least, also there is a “true zero starting point” wherein no ALP level should be
negative.
The researchers would use the single parameter hypothesis testing to test whether
their null hypothesis will be accepted or would fail to be rejected. The single hypothesis
testing will be utilized to assess the population of patients present in the data set, which is
the patients alkaline phosphatase level. The single hypothesis testing would follow an
8-step procedure, and since the data has a sample size of 583 and the sigma is known,
meaning it follows the central limit theorem and would also follow a normal distribution
curve. The 8-step procedure that would be used in the single hypothesis testing is as
follows: the first step would be identifying the Null hypothesis of the study, which would
be the mean alkaline phosphatase level of the patient would be less than 147 IU/L. Ho: µ
≤ 147 IU/L. The second step would be identifying the alternative hypothesis which would
be that the mean alkaline phosphatase level of the patient would be more than the healthy
amount which is greater than 147 IU/L that would deem the patient to be unhealthy and
would be subjected to liver disease. This means the test would utilize an upper tailed test,
which means the alpha would not be cut in half. The third step of the hypothesis testing
would be finding the level of significance. The researchers would use the normal alpha
(α) of 0.05 level of significance. The level of significance would determine whether the
null hypothesis would be rejected, or it would fail to be rejected. Since the researcher’s
would follow an upper tailed test, the alpha level would still stay the same α = 0.05. After
that the value of alpha has been determined using the z-table which would be 1.645,
which would serve as the rejection region. The fourth step is identifying the test statistic,
and since our data is 568 and the sigma is known, therefore the central limit theorem
(CLT) can be used. The fifth step of the hypothesis testing would be the decision rule,
16
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
which would identify if the null hypothesis will be rejected or not. The parameter or the
region of rejection that would be used is 1.645, which would lead us to a decision that the
researchers would reject Ho if the computed Z-score would be higher than 1.645. And if
the computed Z is less than 1.645 then the researchers would conclude that the null
hypothesis would not be rejected. The sixth step would be computing for the z-score
using the formula:
Ẋ−𝝁
Z= σ
𝑛
The seventh step of this hypothesis testing would be making the decision and confirming
whether or not the researchers would reject or fail to reject the null hypothesis based on
the computed Z-score. The final step of this hypothesis testing would be the making of
conclusion for the test.
A. Variables
For the hypothesis testing in the double parameter, the variable that will be used is
the conjugated or direct bilirubin level, in milligrams per decilitre (mg/dL), of the liver
and non-liver patients, in which the data comes from the Indian Liver Patient Dataset.
The hypothesis testing will have a sample size of 57 for liver patients and 27 for non-liver
patients. The average direct bilirubin level of liver and non-liver patients will be used to
get the standard deviation of each population.With the use of the UCI machine learning
repository, which displays all of the data on the level of direct bilirubin in the patient's
blood, the variables are assessed using data acquired from the patient's health records.
The variables mentioned will be utilized to prove that the alternative hypothesis is true,
that there is a significant difference between the two.
In adults, the normal blood test findings vary from 0 to 0.2 mg/dL or less than 0.3
mg/dL. Bilirubin may also show up in your urine if your blood test results are greater.
17
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Normal, healthy persons do not have bilirubin in their urine (University of Rochester
Medical Center Rochester, 2022).
Double parameter hypothesis testing will be used to determine what will be done
to the null hypothesis, whether to reject it or not. Before proceeding in using the 8-step
procedure in comparing the mean of the two populations, the diagnostic checking should
be done first. There are 5 steps in diagnostic checking. The first step is the test for
normality, in testing for normality the null hypothesis for the two populations is that each
population is normally distributed. The test will be conducted using the shapiro-wilk test
in STATISTICA. The decision rule is that if the p-value is less than the computed value,
then the null hypothesis is rejected. If both null hypotheses from each population have
failed to reject, we can continue to the next steps. Step two is determining if the two
populations are independent and should not influence each other. The next step is to make
sure that at least one or both of the population has a sample size less than 30. The fourth
step is knowing if the population variances of both populations are unknown, because in
hypothesis testing in the double parameter, t-test will be used. The last step includes
proving whether the two population variances are equal or not, provided that the null
hypothesis is that the two population variances are equal, while the alternative hypothesis
states that the two population variances are not equal. The F-test will be used for this
step. The researchers will use the normal level of significance (Alpha) of 0.05, the
distribution will have a two-tailed test, due to this the Alpha will be divided by two,
yielding a result of 0.025. Each critical value is solved differently and will also utilize the
F table. The null hypothesis is rejected if the computed value is greater than the upper
critical value or less than the lower critical value. The formula for F-test, degrees of
freedom (d.f.) and critical values are shown below:
18
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Upper critical value (𝝰/2 = 0.025): Lower critical value (𝝰/2 = 0.025):
𝑑. 𝑓.1(numerator) = 𝑛1 − 1 𝑑. 𝑓.2(numerator) = 𝑛2 − 1
𝑑. 𝑓.2(denominator) = 𝑛2 − 1 𝑑. 𝑓.1(denominator) = 𝑛1 − 1
2
𝑆1
F= 2
2
, where 𝑆1 > 𝑆2
2
𝑆2
The next procedure is doing the 8 steps for finding the conclusion of the initial
hypotheses. This 8-step procedure is similar to the single parameter hypothesis testing.
First step is to identify the null hypothesis, which is finding if there is a significant
difference between the conjugated bilirubin level of liver patients and non-liver patients.
While the next step is the alternative hypothesis, there is a significant difference between
the conjugated bilirubin level of liver patients and non-liver patients. According to the
alternative hypothesis the test will be using a two-tailed test. Step 3 is setting the level of
significance, in this hypothesis testing the researchers will be using 0.05 level of
significance (Alpha). The fourth step is determining which test statistic will be used.
There are two types of t-test that can be used in this hypothesis testing, t-test for separate
variance and t-test for pooled variance. The F-test will determine on which t-test will be
used, if null hypothesis is rejected t-test for separate variance will be used, while if it has
failed to reject the t-test for pooled variance will be used. The fifth step is obtaining the
critical values for the critical region of the distribution and finding the decision rule.
Since the test will be a two-tailed test, the Alpha will be divided by two, having 0.025 as
a result. Based on the critical values the decision rule is if the absolute value of the
computed t-score is greater than the critical value, then the null hypothesis is to be
19
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
rejected. Step 6 is for the actual computation of the t-score. The formula for the two t-test
and its corresponding degrees of freedom (d.f.) is shown below:
( )
2 2
𝑆1 𝑆2
t (separate variance) =
(𝑋 − 𝑋 ) − (µ − µ )
1 2 1 2
, with d.f. =
𝑛1
+ 𝑛2
2 2
( ) ( )
2 2 2 2
𝑆1 𝑆2 1 𝑆1 1 𝑆2
+ 𝑛1 − 1 𝑛1
+ 𝑛2 − 1 𝑛2
𝑛1 𝑛2
t (pooled variance) =
(𝑋 − 𝑋 ) − (µ − µ )
1 2 1 2
, with d.f. = 𝑛 + 𝑛2 − 2
1
2 1
𝑆𝑝 𝑛
1
( +
1
𝑛2 )
2 (𝑛1 − 1)𝑆21 + (𝑛2 − 1)𝑆22
𝑆𝑝 = 𝑛1+ 𝑛2− 2
The 7th step, this is where the conclusion will be drawn from the decision rule. The
conclusion is either there is sufficient or insufficient evidence to reject the null
hypothesis. For the final step, based on the conclusion from step 7 the researchers will
make a summary on the findings of the test from the hypotheses.
A. Variables
In this section of the study, the researchers aim to determine and understand the
relationship between Albumin levels and Albumin-Globulin Ratio. Data will be taken
from 583 patient records of which, 416 have a form of liver disease and 167 having none
(Lichman, 2013). Albumin will be the independent variable while the Albumin-Globulin
ratio will be the dependent variable.
20
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
pressure and therefore, prevents the leakage of fluids from blood vessels or the tissues
that surround it. For globulin, it is also a type of protein, but has more types depending on
its function such as acting as transporters or antibodies. The Albumin-Globulin ratio is
derived from the total protein test, which can be known from the Liver Function test.
Normal ratios of albumin-globulin are between 1.1 and 2.5. This also reads as 1:1 ratio to
2:5 ratio. Anything that deviates from this would provide a sign for potential or risk for
disease. Liver disease is usually more associated with higher levels of globulin and
albumin.
As Albumin and Globulin have the normal ratios between 1.1 and 2.5, imbalances
in this– like a higher albumin count, would indicate a possibility of health complications
such as liver disease (Yazdi, 2021). For this study, the researchers had the goal of
determining the relationship between these two variables: albumin and albumin-globulin
ratio to see if albumin is a variable that can predict the value of the A-G ratio. This will
be done by applying the Simple Linear regression test to the independent and dependent
variables taken from a sample size of 579 patients from the dataset (Lichman, 2013). The
test specifically that is going to be used would consist of two main parts: The ANOVA
approach to test if there is actually a significance between the relationships and The
Hypothesis Testing of Significance of Linear Relationships. This test in particular checks
if a relationship actually has significance or not.
The first step was to identify the null and alternative hypothesis for each test. For
the ANOVA Approach, The null hypothesis indicates that there is no association between
the variables. The alternative hypothesis says otherwise. For the Hypothesis Testing of
Significance of Linear Relationships, the null hypothesis says that there is no significant
relationship.
In the second step, a scatter plot was created to show the linear relationship of
albumin and the A-G ratio. From this, the regression equation used for predicting the
values of the independent variable can be taken. This is represented by the following
formula:
21
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
𝑦 = β0 + β1𝑥
The next step was using a statistical software called STATISTICA, to calculate the
t-values, correlation coefficient, and other results that are important for the interpretation
of the test. In order to properly read the results, conditions were set: If the estimated
coefficient has the value of zero (0), then it follows the null hypothesis. Any other value
would have it follow the alternative hypothesis. The level significance used is α= 0.05.
However, since it is a two-tailed test, the test will have This is important for the decision
rule. This rule is important in the decision making of the group on whether to accept or
reject the null hypothesis. In this test, it would be to reject Ho if, F> Fa = a, df = 1/n-2
The second test is about knowing whether the linear relationship has any or is
significant. This will be done by computing the absolute value of “t” with this formula:
𝑟 𝑛−2
𝑡 = 2
1−𝑟
22
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
I. Demographic Profile
Age classification
Gender
Male 443
Female 140
No (2) 167
Total Bilirubin
Alanine Aminotransferase
Level
Aspartate aminotransferase
Level
23
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Albumin
Total Proteins
Direct Bilirubin
Total Bilirubin
24
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 2.1 - Use of Descriptive Statistics in Statistica for Mean, Variance and Standard Deviation of Total
Bilirubin
25
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 2.3 - Summary Table of Important Values for Confidence Interval Estimation on Mean Total
Bilirubin in Patients
B. Single parameter Hypothesis testing for the mean ALP level of Patients
Figure 3.1 - Use of descriptive statistics in STATISTICA for the single parameter hypothesis testing
26
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 4.1 - Use of descriptive statistics in STATISTICA for the double parameter hypothesis testing
27
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
In view of the results gathered in the construction of a confidence interval estimation for
the mean total of bilirubin (mg/dL) derived from the Indian Liver Patient dataset, it can
be stated that the researchers are only 95% confident that the population mean μ is inside
the formulated confidence interval (2.793701, 3.803898). Owing to the fact that the
population mean μ is an unknown parameter, the researchers have decided to form a
confidence interval by basing it on the sampling distribution of the designated point
estimator, X̄, which is the sample mean. In essence, the 95% is merely a confidence level
and not the probability value that μ would be inside this confidence interval. This specific
interval only gives us a plausible range of values for the population mean. Furthermore,
95% is simply the percentage of all samples of size 583 patients that induces confidence
intervals that subsume μ, hence, 5% of samples of size 583 patients do not contain the
population mean. As per the Mount Sinai Hospital (n.d.), the normal range of total
bilirubin is from 0.1 to 1.2 mg/dL. As a result, the estimated confidence interval of the
mean of total bilirubin in both liver and non-liver patients from North East of Andhra
Pradesh, India is 2.793701, 3.803898 mg/dL with a 95% confidence level, therefore,
making the range of total bilirubin abnormal. A high amount of total bilirubin in the
blood can be one of the symptoms of liver-related illnesses.
B. Statistical analysis for Single parameter hypothesis test of the mean alkaline
phosphatase level of patients.
The 8-step procedure will be used in testing whether the population parameter is different
to the hypothesized value. The researchers first formulated two hypotheses using a
hypothesized value of 147 IU/L which came from the upper level of the normal range of
ALP level. The null hypothesis being Ho: µ ≤ 147 UI/L (The Alkaline phosphatase level
of the patient is in the normal range) and with an alternative hypothesis Ha: µ > 147 UI/L
(The Alkaline Phosphatase Level of the patient is considered abnormal). With that
28
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
hypothesis the researchers determined to use an upper tailed test using the normal
significance level of 0.05 which is equivalent to a 1.645 critical value. The Mean of the
ALP level which is 290.5736 with a standard deviation of 242.9380 will be used in
computing for the Z-score to find out whether we would reject the null hypothesis or not.
Based on the results of the 8-step procedure, The researcher computed a Z value of 14.27
which is greater than the critical value of 1.645, alongside this is a calculated p-value of
0.000 which is less than the alpha which is 0.05. Therefore, the researchers would reject
the null hypothesis which states that the average alkaline phosphatase level of the patients
is less than 147 UI/L which is considered in the normal range. Therefore, the researcher
would conclude that there is prevailing evidence that shows a very concerning problem
with the patients ALP levels. Lowe D. and Sanvictores T (2021) stated that ALP levels
29
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
above the normal range Indicates that a patient has a high chance of having liver
disease.With reference to this, based on the computed data majority of patients shows an
abnormal amount of ALP which the researchers can say they have a high chance of
having liver problems. This conclusion was also supported with the use of statistical tools
such as PHSTAT, wherein the researchers obtained a reported p-value of 0.000 meaning
that the results obtained were highly significant and is very unlikely to occur by chance
alone.
Figure 6.1 - Use of Shapiro-Wilk test in STATISTICA for the test of normality for liver patients’ data
30
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 6.2 - Use of Shapiro-Wilk test in STATISTICA for the test of normality for non-liver patients’ data
The result for the liver patients’ data is normally distributed, because the p-value is
greater than the Alpha. While the result for the non-liver patients’ data is not normally
distributed. Since the p-value is slightly close to the Alpha and that the acquired data does
not give much of a choice for the sample size, the diagnostic checking can still proceed to
the next steps. Next is making sure that the two populations are independent with each
other. For step 3 and 4, it indicates that the hypothesis testing will be using a t-test with
sigma unknown and that at least 1 of the sample size is less than 30. The final step is
crucial in determining which t-test to use. The null hypothesis states that the population
variance of the two populations are equal.
31
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Since the value of F is greater than the upper critical value, then the null hypothesis
would be rejected. The p-value is also obtained, it is less than the Alpha, which supports
the action of rejecting the null hypothesis. It means that the test statistic that will be used
for the 8-step procedure is the t-test for separate variance.
After conducting the diagnostic checking, the test can now proceed to the 8-step
procedure. First and second step is stating the null and alternative hypothesis, µ1 - µ2= 0
(There is no significant difference between the conjugated bilirubin level of liver patients
and non-liver patients.) is the null hypothesis while Ha: µ1 - µ2 ≠ 0 (There is a significant
difference between the conjugated bilirubin level of liver patients and non-liver patients.)
is the alternative hypothesis. Next is having the normal level of significance (Alpha) 0.05.
32
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
After that is identifying which t-test to use, which has already been identified with the
help of the F-test in the previous 5-step procedure. Step 5 is for the decision rule,
according to the alternative hypothesis it is a two-tailed test, therefore α/2 = 0.025 is used
for the t-table. The formula for the degree of freedom (d.f.) in t-test for separate variance
is computed with a value of 71.3163 and since d.f. should always be rounded down, the
final d.f. is 71. Using the PHSTAT the critical values are ± 1.9939. The computed value
from the t-test will determine the action on the null hypothesis.
Figure 6.4 - Use of PHSTAT in Separate-Variances t Test for the Difference Between Two Means
33
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
The computed t-value is 4.2689, the absolute value of the said value is greater
than the critical value, therefore the null hypothesis is rejected. In addition to this, the
p-value is 0.0001, in which it is less than the level of significance. The p-value also
proves that the null hypothesis is to be rejected. That means that there is a significant
difference between the average conjugated bilirubin level of liver and non-liver patients.
According to Mayo Foundation for Medical Education and Research (2020), normal
results for direct bilirubin are generally 0.3 mg/dL or below. The direct bilirubin levels of
liver patients are slightly higher compared to non-liver patients, which may have greatly
affected the results of the hypothesis testing.
The Scatterplot for the effect of Albumin towards the A-G ratio, showcases a linear
relationship between them. From the graph constructed by the program STATISTICA, it
was given that the equation for the regression is: y = 1.515 + 1.7143x. This is an equation
that is used to predict the values of the predictor variable. In this case, that would be the
A-G ratio.
34
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
From the results of the test, the researcher was able to gather the value for the following:
First, there is a linear relationship between the Albumin and the Albumin-Globulin levels.
This was proven by the first part of the Simple Linear Regression test (ANOVA
approach). By finding the F distribution and having it be the basis for the Decision rule,
1
which states that if F> Fa = .05, df = 577
= 3.858, which was the case, then it was
decided that the null hypothesis would be rejected. Having the null hypothesis be
followed meant that there would be no correlation between the two variables. Otherwise,
having the Alternative hypothesis be followed indicates the opposite. The results were in
favor of the alternative hypothesis since F = 523.3.
Regression Statistics
Statistic Value
35
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Multiple R 0.68963234189663
Multiple R² 0.475592766989831
Adjusted R² 0.474683915632794
F(1,577) 523.289934385422
df 1, 577
p 0
The second part of the statistical analysis for the relationship between the
Albumin and Albumin-Globulin ratio, tests how significant their relationship is. This was
done by conducting Hypothesis testing of Significance of Linear Relationships. The null
hypothesis for this assumes that the relationship is not significant while the alternative
hypothesis assumes that it is. significant. This favored the alternative hypothesis because
the p-value does not have the value of 0.05. The null hypothesis was also rejected since
the Decision Rule chosen has a condition wherein it states that if the absolute value of “t”
is greater than 1.984, then it would be rejected.
Figure 7.2 - Regression summary for the Dependent Variable computed in STATISTICA
To repeat, it was concluded that these variables have a significant, linear relationship.
What this implies for the research is that Albumin is a component in the bloodstream that
36
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
should be given significant importance when reading the liver function test of a patient.
This specific type of protein is created by the liver. It is responsible for numerous
functions such as the movement of molecules across the blood and the prevention of
fluids from the blood leaking to the surrounding tissues (Mount Sinai, n.d.). The linear
aspect in its relationship with the Albumin-Globulin ratio is another indicator of its
importance since an increase in Albumin would also affect the ratio.
37
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
The four statistical methodologies have been executed for this statistical research,
namely, confidence interval estimation for the mean of total bilirubin in patients, single
parameter hypothesis testing regarding the mean alkaline phosphatase level of patients,
double parameter hypothesis testing to compare the amount of conjugated bilirubin of
liver and non-liver patients, and the simple linear regression with regards to determining
the relationship between albumin and globulin ratio and amount of albumin. In view of
the collated results and in-depth analyses of the various test results given, it has been
apparent that the derived conclusions show that there is an abnormal amount of specific
chemical compounds in the blood of liver patients, specifically, amounts of total
bilirubin, alkaline phosphatase levels, and conjugated bilirubin. The World Health
Organization (WHO) stated that India is considered the world’s capital of liver diseases.
With reference to this, it is vastly an alarming matter owing to the fact that liver-related
illnesses are rampant in the said country. These results imply that these particular
chemical compounds present within an individual’s bloodstream can signify an early
onset of symptoms regarding liver disease, hence, can serve as a preventative measure for
the citizens of India. Ultimately, in reference to the gathered data results and
interpretations, the findings of the variables do bear an impact on an individual’s liver
functionality.
Based on the conclusions that are projected from the findings of this study, the
researchers would like to state the following recommendations to further enhance this
study. Firsty, let this study be proof of the lack of healthcare actions across the world
especially to the countries wherein there is a poor healthcare system. The government
should focus more on their healthcare system to lessen the burden of these different types
of health concerns, especially liver diseases which are sometimes overlooked. Relating to
poor healthcare systems, programs focusing on liver health should be implemented where
there is an abundance of people who suffer from liver diseases, as it is a very concerning,
which is supported by the study conducted by Sumeet, A. et al. (2019) stating that liver
type diseases accounts for approximately 2 million deaths per year worldwide. With this,
free blood tests should also be implemented so that early signs of liver diseases could be
38
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
prevented, that would help in lessening the burden of these types of diseases in the future.
Natalia, O. et al. (2017) stated that excessive alcohol consumption impacts the liver in the
greatest degree, which could lead to liver problems in the long run. A study by Natalia,
O. et al. added that 35% of problem drinkers develop liver problems. Lessening these
factors that affect liver health should be lessened like alcohol so that liver diseases would
not be prevalent. Future researchers who plan to make a study relating to this topic should
do further investigations in this area of research, and see if there are changes within the
scope of this research which are those people who have liver diseases. Many years from
now, the number of liver disease patients would drastically change whether for the better
or for worse, that is why further extensive and more intricate research is recommended. It
is highly critical to identify a liver infection early on in order to reduce the intensity and
frequency of the illness. Future work must include the gravity and importance of
diagnosis at the preliminary stage to induce and promote prevention measures which can
therefore lessen the consistently increasing number of liver patient cases. Other
researchers who also like to take up similar studies, should focus more on the variables
that significantly affect the liver health of a person.
39
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
VII. Appendices
1. Upon looking at the Indian Liver Patient dataset, the sample size (n) is equal to
583, value for sigma is known, hence, the researchers conclude that the sample
mean would follow a normal distribution.
3. With the X̄ calculation using Statistica, a statistical software, X̄ would have the
numerical value of 3.298799, which signifies the sample statistic of the mean. It
has been stated that the Central Limit Theorem is valid and could be applied,
therefore, the value of sigma σ can also be derived from Statistica, σ = 6.209522.
4. With this, the standard error could be easily calculated with the formula:
6. Following this and looking at the information that is given, the lower and upper
limits can be solved by both subtracting the margin of error from X̄ = 3.298799
and adding the value of the margin of error to the sample mean.
Zα/2
α/2 = 0.025 and this value is important since it must be found on the body of the Z-table
to find its corresponding Z-score.
40
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
8. Now that the Z-score, standard deviation, and sample size are known, plug in the
corresponding numerical values for the margin of error formula:
Zα/2
Multiply the Z-score of 1.96 to the numerical value of the standard error which is
SE = 0.257172
9. Now that the margin of error has been calculated, proceed to altering the sample
mean, X̄ = 3.298799 mg/dL, by subtracting and then adding the value of the
margin of error = 0.5040573642.
41
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
10. With this data gathered, it can be inferred that the confidence interval (2.793701,
3.803898) contains the true mean of the total bilirubin (mg/dL) in the patients
with a 95% confidence level.
B. 8-step procedure for Single hypothesis test of the mean alkaline phosphatase
level of patients
1. Null Hypothesis
Ho: µ ≤ 147 UI/L (The Alkaline phosphatase level of the patient is in the normal
range)
2. Alternative hypothesis
Ha: µ > 147 UI/L (The Alkaline Phosphatase Level of the patient is considered
abnormal)
3. Alpha level
α = 0.05
4. Test statistics
42
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 9.1 - Normal distribution curve for the single parameter testing of patients ALP levels.
Ẋ−𝝁
Z= σ
𝑛
290.5763−147
Z= 242.9380
583
7. Conclusion
We have sufficient evidence to reject the null hypothesis Ho
43
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Ho: The liver patients’ data is Ho: The non-liver patients’ data is
normally distributed. normally distributed.
Ha: The liver patients’ data is not Ha: The non-liver patients' data is not
normally distributed. normally distributed.
α = 0.05 α = 0.05
2. Independence:
A liver patient can not be a non-liver patient at the same time, each population do
not influence each other
3. At least 1 is n < 30
n1 = 57
n2 = 27
4. Raw data: σ12 and σ22 are unknown (CLT NOT OK)
44
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
α = 0.05
2
𝑆1
Test Statistic: F =
2 2
2 , where 𝑆1 > 𝑆2
𝑆2
Upper critical value (𝝰/2 = 0.025): Lower critical value (𝝰/2 = 0.025):
𝑑. 𝑓.1(numerator) = 𝑛1 − 1 𝑑. 𝑓.2(numerator) = 𝑛2 − 1
𝑑. 𝑓.2(denominator) = 𝑛2 − 1 𝑑. 𝑓.1(denominator) = 𝑛1 − 1
𝑑. 𝑓.1(numerator) = 57 − 1 𝑑. 𝑓.2(numerator) = 27 − 1
𝑑. 𝑓.2(denominator) = 27 − 1 𝑑. 𝑓.1(denominator) = 57 − 1
d.f.1 = 56 d.f.2 = 26
d.f.2 = 26 d.f.1 = 56
45
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 10.1 - Normal distribution curve for the F Test for Differences in Two Variances.
2
𝑆1
2 2
F= 2 , where 𝑆1 > 𝑆2
𝑆2
0.032726
F=
0.015328
8-step Procedure for Double Parameter Hypothesis Test of the Average Conjugated
Bilirubin Level of Liver and Non-Liver Patients.
1. Null Hypothesis
2. Alternative Hypothesis
46
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
3. Level of Significance
α = 0.05
4. Test Statistic
Based on the results of F-test, t-test for separate variance will be used.
( )
2 2
𝑆1 𝑆2
t (separate variance) =
(𝑋 − 𝑋 ) − (µ − µ )
1 2 1 2
, with d.f. =
𝑛1
+ 𝑛2
2 2
( ) ( )
2 2 2 2
𝑆1 𝑆2 1 𝑆1 1 𝑆2
+ 𝑛1 − 1 𝑛1
+ 𝑛2 − 1 𝑛2
𝑛1 𝑛2
( )
2 2
𝑆1 𝑆2
𝑛1
+ 𝑛2
d.f. = 2 2
( ) ( )
2 2
1 𝑆1 1 𝑆2
𝑛1 − 1 𝑛1
+ 𝑛2 − 1 𝑛2
0.015328 2
d.f. =
( 0.032726
57
+ 27 )
0.032726 2 0.015328 2
1
57 − 1 ( 57
+
1
)
27 − 1 27 ( )
d.f. = 71.3163 ≈ 71
47
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Figure 10.2 - Normal distribution curve for the Separate-Variances t Test for the Difference Between Two Means
t=
(𝑋 − 𝑋 ) − (µ − µ )
1 2 1 2
2 2
𝑆1 𝑆2
𝑛1
+ 𝑛2
7. Conclusion
48
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
For this study, the researchers utilized “Simple Linear Regression” as our test of statistics
for analyzing the relationship between Albumin and Albumin-Globulin ratio. This is with
the use of the program STATISTICA, as our main tool. The Dependent or Response
Variable would be the Albumin levels of the population whereas, the Independent or
Predictor variable would be their Albumin-Globulin ratio. This is represented by:
𝑦 = β0 + β1𝑥
𝑥 = 𝐼𝑉 = Albumin-Globulin ratio
𝑦 = DV = Albumin
ANOVA Approach -
49
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
● Decision Rule:
1
○ Reject Ho if, F> Fa = .05, df = 577
= 3.858
● STATISTICA results:
Regression Statistics
Statistic Value
Multiple R 0.68963234189663
Multiple R² 0.475592766989831
Adjusted R² 0.474683915632794
F(1,577) 523.289934385422
df 1, 577
p 0
50
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
𝑛
𝑛 ( ∑ 𝑦𝑖) 2
2 (1817.2)
Total Variation: 𝑆𝑦𝑦 = ∑ 𝑦𝑖 − 𝑖=1
𝑛
= 6068.1 - 579
= 364.6911226
𝑖=1
2
SSReg =𝑏1 𝑆𝑥𝑦 =1.74132 (101.204475) = 306.8646886
ANOVA Table
𝑀𝑆𝑅𝑒𝑔 306.8646886
𝐹= 𝑀𝑠𝑅𝑒𝑠
= 0.100219123
= 3061.937477
F > 3.858
51
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
𝑟 𝑛−2
𝑡 = 2
1−𝑟
● Decision Rule:
𝑑
○ Reject Ho if, |t| > t 2 = 0.025, df = 577 = 1.984
● Computation:
52
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
Bibliography
Indian liver patient records. (n.d.). Kaggle: Your Machine Learning and Data Science
Community. https://www.kaggle.com/uciml/indian-liver-patient-records
Liver problems - Symptoms and causes. (2020, February 21). Mayo Clinic.
https://www.mayoclinic.org/diseases-conditions/liver-problems/symptoms-causes/syc-20
374502
Liver: What it does, disorders & symptoms, staying healthy. (n.d.). Cleveland Clinic.
https://my.clevelandclinic.org/health/articles/21481-liver
Confidence intervals - SAGE research methods. (2007). SAGE Research Methods: Find
https://methods.sagepub.com/reference/encyclopedia-of-measurement-and-statistics/n10
2.xml
https://www.jmp.com/en_ph/statistics-knowledge-portal/what-is-regression.html
https://www.britannica.com/science/statistics/Hypothesis-testing
A study on the temporal trends in the etiology of cirrhosis of liver in coastal eastern
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7376596/
53
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
https://my.clevelandclinic.org/health/diagnostics/17845-bilirubin
https://www.urmc.rochester.edu/encyclopedia/content.aspx?contenttypeid=167&contenti
d=total_bilirubin_blood
https://www.mountsinai.org/health-library/tests/bilirubin-blood-test#:~:text=Normal%20
Results,1.71%20to%2020.5%20µmol%2FL)
Rosenthal, Philip & Pincus, M & Fink, D. (1984). Sex- and age-related differences in
bilirubin concentrations in serum. Clinical chemistry. 30. 1380-2.
10.1093/clinchem/30.8.1380.
Guy, J., & Peters, M. G. (2013). Liver disease in women: the influence of gender on
epidemiology, natural history, and patient outcomes. Gastroenterology & hepatology,
9(10), 633–639.
Kim, I. H., Kisseleva, T., & Brenner, D. A. (2015). Aging and liver disease. Current
opinion in gastroenterology, 31(3), 184–191.
https://doi.org/10.1097/MOG.0000000000000176
Chernoff R. Protein and older adults. J Am Coll Nutr. 2004 Dec;23(6 Suppl):627S-630S.
doi: 10.1080/07315724.2004.10719434. PMID: 15640517.
Wiwanitkit, V. (2001). High serum alkaline phosphatase levels, a study in 181 Thai adult
hospitalized patients. BMC family practice, 2(1), 1-4.
54
A Statistical Study of Applying Confidence Interval Estimation, Hypothesis Testing for a Single & Double Parameter,
and Simple Linear Regression to the Indian Liver Patient Dataset from the UCI Machine Learning Repository
55