Dental

Journal of Survey in Fisheries Sciences 10(1S) 2030-2039 2023
A Novel Ranking Approach to Improved

Health Insurance Cost Prediction by
Comparing Linear Regression to Random
Forest
G.Satya Mounika Kalyani ¹, Rama Parvathy L²*
¹Research Scholar, Saveetha School of Engineering, Saveetha Institute of Medical and Technical
Sciences, Saveetha University, Tamil Nadu, India, PinCode: 602105.
²*Project Guide, Corresponding Author, Saveetha School of Engineering, Saveetha Institute of
Medical and Technical Sciences, Saveetha University, Tamil Nadu, India, PinCode: 602105
ABSTRACT
Aim: The aim of research is to classify the cost prediction in health insurance using novel ranking with machine
learning algorithms. Materials and Methods: The categorizing is performed by adopting a sample size of n=10
in Linear Regression and sample size n = 10 in random forest algorithms was iterated 20 times for efficient and
accurate analysis on labeled images with G power in 80% and threshold 0.05%, CI 95% mean and standard
deviation. Results: The analysis of the results shows that the Linear Regression has a high accuracy of
(92.53%) in comparison with the random forest algorithm (91.68 %). There is a statistically significant
difference between the study groups with (p<0.05). Conclusion: Prediction in classifying the cost prediction in
health insurance shows that the Linear Regression appears to generate better accuracy than the cost prediction
random forest algorithm.
Keywords: Health Care, Machine Learning, Random Forest Regression, Novel Ranking, Linear Regression.
INTRODUCTION chance. Ignorance of the problem

The purpose of the research is to will lead to errors in decision making
improve the accuracy of cost prediction in (Muremyi et al. 2020; Baum et al. 2021;
health insurance using novel ranking with Pauly 2015). Doctors observe the views of
machine learning algorithms (Rajczi different patients to make decisions. The
2019). It preimates the sequence of applications of predicting health care costs
predicted health cost insurance using linear for each patient with high certainty,
regression over random forest. It is found problems such as accountability could be
to be important in today’s world since solved, enabling control over all parties
health insurance is more convenient for involved in patients’ care. It could also be
the patients with respect to various used for other applications such as risk
scenarios (Quadagno 2006) a widespread assessment in the health insurance
statistical phenomenon with potentially business, allowing competitive premium
serious implications for health care. It can charges, or for the application of new
result in wrongly concluding that an effect policies by governments to improve public
is due to treatment when it is due to health. (Rajczi 2019; Quadagno 2006)
2030
Rama Parvathy L.et.al.,A Novel Ranking Approach to Improved Health Insurance Cost Prediction by
Comparing Linear Regression to Random Forest
In distinguishing and forecasting insurance is used preprocessing of the

cost prediction in health care using novel datasets (Cebul et al. 2008).
ranking by comparing machine learning (Bhavikatti et al. 2021; Karobari et
algorithm 880 journals from IEEE Xplore al. 2021; Shanmugam et al. 2021; Sawant
digital library, 480 articles from et al. 2021; Muthukrishnan 2021; Preethi
ScienceDirect, 1430 articles from google et al. 2021; Karthigadevi et al. 2021;
scholar and 498 articles from Linear Bhanu Teja et al. 2021; Veerasimman et
regression and Random forest algorithm al. 2021; Baskar et al. 2021)
are two highly correlated parts in
prediction of cost. In this project is to The methods which were used
create a model based on statistically before have less accuracy and detection
significant factors independent variables rate in predicting cost in health insurance.
which will affect premium charges Determine premiums,it’s really up to know
dependent variables by an insurance the most important factors and how much
company (Qudsi 2015). Among all the statistical importance they hold. In order to
articles and journals the most cited paper is sequence the methods and techniques in
(Kharrazi et al. 2021) nearly 1340 people this research study Linear Regression
cited these articles and this article is generally fares better than Random Forest
useful for future research. An application (Rajczi 2019). It also takes more time to
is also built which can be interlinked with train a Random forest approach for
the linear regression over random forest analyzing health Insurance datasets which
model to view the result on UI based on are based on the model to increase with the
the input parameters (Burns 2015). While size of the datasets. Its estimates and
the health care law in any country does calculations are probably done using a
have some rules for companies to follow to characteristic technique. Applying Linear
determine premiums.concerning the value regression.over Random forest model to
of insurance in the lives of individuals, it Medical Insurance dataset to predict future
becomes important for the companies of insurance costs for the individuals
insurance to be sufficiently precise to (Willemse-Duijmelinck et al. 2017).
measure or quantify the amount covered Machine learning is a method of data
by this policy and the insurance charges analysis which sends instructions
which must be paid for it. The model is (programmable code) to code to computers
trained on insurance data from the past so that they can learn from data (Cabrera
(Nakamura et al. 2021; Rushing 1986). 2011). Then, based on the learned
The requisite factors to measure the data,they provide us with the predicted
payments can then be defined as the results/patterns. The aim of the research
model inputs, then the model can work is to predict the cost in health
correctly anticipate insurance (Shiny Irene insurance using machine learning and
et al. 2021). The use of multiple linear comparing ML techniques Linear
regression in this analysis since there are Regression over Random Forest.
many independent variables used to
calculate the dependent (target) variable. MATERIALS AND METHODS
For this study, the dataset for cost of health The research work was performed in
the Image Processing Lab, Department of
2031
Computer Science and Engineering, there is one dependent variable and

Saveetha School of Engineering, Saveetha multiple independent variables. The value
Institute of Medical and Technical of a dependent variable is miscalculated
Sciences, Chennai. Basically it is from independent variables. The
considered that two groups of classifiers dependent variable is medical charges and
are used, namely Linear Regression and independent variables are
Random Forest algorithms, which are used age,gender,smoker,BMI,children,region.
to predict cost in health insurance. Group 1 Multiple Linear Regression uses the
is the Linear Regression with the sample ordinary least squares method to find a
size of 10 and the Random Forest is Group best fitting line,which involves multiple
2 with sample size of 10 and they are independent variables.
compared for more Accuracy score and
Precision score values for choosing the Pseudocode for Linear Regression
best algorithm. The Pre- test analysis has Import Linear Regression as LR
been prepared by having a G power of Import pandas as pd
80% and threshold 0.05%, CI 95% mean ImportMatplotlib.pyplot as plt
and standard deviation (Muremyi et al. Compare from sklearn ensemble import
2020). Sample size has been calculated Linearregression
and it is identified that 10 samples/ group From sklearn .tree import
in total 20 samples with a standard RandomForestClassifier
deviation for Linear Regression = 95.2% Data extraction from sklearn
and Random Forest = 93.3%.Sample size Initiate sklearn.metrics import accuracy
has been calculated and it is identified as price
standard deviation for Linear regression = Calculate sequence
0.20359 and Random forest = 0.24832. sklearn.mode_selection import
A minimum of 4GB ram is required train_test_split
to use and install all the necessary Find results from
applications, and a minimum of i3 sklearn.feature_extraction.value import
processor to run all the applications and CountVectorizer
processes simultaneously. A minimum of count_vectorizer=CountVectorizer()
50GB hard disk space is required to install Cv.Shape
the required software and files, and an lrg=LRclassifier(max_depth=8,estimators
internet connection is required to =100,nthread=3)
download and install all the necessary lrgc.fit(x_train,y_train)
software and development environment to prediction_lrg=lrgc.predict(x_test)
run this Novel Effective framework. print(accuracy_price(prediction_lrg,y_test)
Python programming language is used for
the application. The version of python Random Forest
used is 3.9, and the IDLE is used to run Random forest is a Machine Learning
and execute the application. algorithm which is used for classification
problems, it is a predictive analysis
Linear Regression algorithm and based on the concept of
Multiple Linear Regression is a probability it will make an exact sequential
basic, machine regression model,in which
2032
identification using characteristic Sample T-test was performed.The

technique as shown below. independent data sets are groups,accuracy
and details. The Dependent values are
Pseudocode for Random Forest patient data and cost. A detailed analysis
Import random forest Classifier has been done on these values for finding
Import random forest as rm health insurance in machine learning.
Compare from sklearn.ensemble import
RandomForestClassifier RESULTS
Skip from sklearn.linear_model import The dataset is provided which selects
random forest the random samples from a given dataset
Data extraction from sklearn import svm for health insurance. The data is collected
Initiate sklearn.metrics import accuracy for a period of 10 days.Group Statistics of
score linear regression with random forest by
grouping the iterations with sample size
Calculate sequence from 10,Mean = 0.95.Group Statistics of Linear
sklearn.model_selection import Regression with Random Forest by
train_test_split grouping the iterations with Sample size
Find results from 10, Mean = 95.2000 , Standard Deviation
sklearn.feature_extraction.text import =1.50259 , Standard Error Mean =
CountVectorizer 0.47516. The data is collected for a period
count_vectorizer of 10 days.The model efficiently to
=CountVectorizer(stop_words='english') analyze health insurance.The mean
cv = difference, standard deviation difference
count_vectorizer.fit_transform(data['Clean and significant difference of Linear
_TweetText']) Regression based cost prediction and
cv.shape Random Forest based cost prediction is
Rm= Random forest() tabulated in Table 3,which shows there is a
Rm.fit(X_train,y_train) significant difference between the two
prediction_Rm = Rm.predict(X_test) groups since p<0.001 with an independent
print(accuracy_score(predi Rm sample T-Test. Table 1Health Insurance
action_rm,y_test)) data set with five attributes which selects
the random samples from a given dataset.
Statistical Analysis Table 2 the dataset is the independent and
The analysis was done using IBM dependent variable.Fig. 1 represents the
SPSS version 21. It is a statistical software comparison of Linear Regression over
tool used for data analysis. For both Random Forest in terms of mean
proposed and existing algorithms 10 accuracy. The dependent variables in
iterations were done with a maximum of Health Insurance cost prediction are
15 samples and for each iteration the predicted with the help of the independent
predicted accuracy was noted for variables. The statistical analysis of two
analyzing accuracy. There is a statistically independent groups shows that the Linear
significant difference between the study Regression has a higher accuracy mean
groups with (p<0.05). The value obtained (95.2%) compared to Random Forest.
from the iterations of the Independent
2033
DISCUSSION prediction focuses on a person's own

In this research clustering has health rather than other companies'
proven to be a highly effective and insurance terms and conditions. The
versatile approach for prediction (Kharrazi models can be applied to the data collected
et al. 2021; Muremyi et al. 2020). The in coming years to predict the premium.
applied unsupervised learning with R This can help not only people but also
explains clustering methods,distribution insurance companies to work in tandem
analysis,data encoders,and features of R for better and more health centric
that enable us to to understand the given insurance amounts (Rice and Thorpe
data better and get answers to our point of 1993).
prediction problems. The most important
features of detecting cost in health Hence the study results produce
insurance using Linear Regression is clarity in performance with both
empirically proven to be a highly effective experimental and statistical analysis, but it
than Random Forest classifier.It was has some limitations to the proposed work
observed that the important factors and the such as threshold and precision. The
concept of health insurance(Muremyi et al. accuracy level of cost prediction in health
2020; Baum et al. 2021)). care using novel based techniques can still
be improved by implementing artificial
It was observed that the important intelligence techniques to predict and
factors and the concept of cost prediction analyze results while comparing with
has been analyzed based on the matrix for existing machine learning algorithms. In
health insurance (Willemse-Duijmelinck et the future, the large dataset for health
al. 2017). The discussion offers insights insurance can be considered to validate
into the reasons for the development of the our proposed model with respect to recent
practical approach, a concrete scenarios (Daniel 2015).
methodology for its implementation and
strategic and tactical applications of the CONCLUSION
cost prediction (Maharani et al. 2019). In The current study focused on
this the linear regression is also used for machine learning algorithms, Linear
predicting the output of the given data. Regression over Random Forest for
Thus the linear regression shows the more higher classification in cost prediction. It
accurate results in it (Daniel 2015) various can be slightly improved based on the
factors which affect the predicted amount Random Forest data sets analysis in
were examined. It was observed that a future. The outcome of the study shows
person's age and smoking status affects the Linear Regression 95.2% higher accuracy
prediction most in every algorithm than Random Forest 93.3%.
applied. Attributes which had no effect on
the prediction were removed from the DECLARATIONS
features. The effect of various independent Conflict of Interests
variables on the premium amount was also No conflict of interests in this manuscript
checked. The attributes also in Authors Contribution
combination were checked for better Author GSMK was involved in
accuracy results. Premium amount data collection, data analysis, manuscript
2034
writing. Author DV was involved in the Ruby Mishra, S. Sivasaravanan, and D.

Action process, Data verification and Thanikaivel Murugan. 2021. “Detailed
validation, and Critical review of Analysis on Sterculia Foetida Kernel
manuscript. Oil as Renewable Fuel in Compression
Acknowledgement Ignition Engine.” Biomass Conversion
The authors would like to express and Biorefinery, February.
their gratitude towards Saveetha School of https://doi.org/10.1007/s13399-021-
Engineering, Saveetha Institute of Medical 01328-w.
and Technical Sciences ( Formerly known 4. Bhavikatti, Shaeesta Khaleelahmed,
as Saveetha University) for providing the Mohmed Isaqali Karobari, Siti Lailatul
necessary infrastructure to carry out this Akmar Zainuddin, Anand Marya,
research work successfully. Sameer J. Nadaf, Vijay J. Sawant,
Funding Sandeep B. Patil, Adith Venugopal,
We thank the following Pietro Messina, and Giuseppe
organization for providing financial Alessandro Scardina. 2021.
support that enabled us to complete the “Investigating the Antioxidant and
study. Cytocompatibility of Mimusops Elengi
1. SNEW.AI Technologies, Chennai. Linn Extract over Human Gingival
2. Saveetha University Fibroblast Cells.” International
3. Saveetha Institute of Medical and Journal of Environmental Research
Technical Sciences and Public Health 18 (13).
4. Saveetha School of Engineering https://doi.org/10.3390/ijerph18137162
.
REFERENCES 5. Burns, Sarah K. 2015. “Health Care
1. Baskar, M., R. Renuka Devi, J. Analysis for the MCRMC Insurance
Ramkumar, P. Kalyanasundaram, M. Cost Model.”
Suchithra, and B. Amutha. 2021. https://doi.org/10.21236/ada618075.
“Region Centric Minutiae Propagation 6. Cabrera, Delia. 2011. “A Review of
Measure Orient Forgery Detection Algorithms for Segmentation of
with Finger Print Analysis in Health Retinal Image Data Using Optical
Care Systems.” Neural Processing Coherence Tomography.” Image
Letters, January. Segmentation.
https://doi.org/10.1007/s11063-020- https://doi.org/10.5772/15833.
10407-4. 7. Cebul, Randall, James Rebitzer,
2. Baum, Griffin R., Geoffrey Stricsek, Lowell Taylor, and Mark Votruba.
Mathu A. Kumarasamy, Vineeth 2008. “Unhealthy Insurance Markets:
Thirunavu, Gregory J. Esper, Scott D. Search Frictions and the Cost and
Boden, and Daniel Refai. 2021. Quality of Health Insurance.”
“Current Procedural Terminology- https://doi.org/10.3386/w14455.
Based Procedure Categorization 8. Daniel, Ines. 2015. “MARKET
Enhances Cost Prediction of Medicare SEGMENTATION USING COLOR
Severity Diagnosis Related Group in INFORMATION OF IMAGES.”
Spine Surgery.” Spine 46 (6): 391–400. International Journal of Electronic
3. Bhanu Teja, N., Yuvarajan Devarajan, Commerce Studies.
2035
https://doi.org/10.7903/ijecs.1400. https://doi.org/10.18178/ijmlc.2019.9.6
9. Karobari, Mohmed Isaqali, Syed Nahid .866.
Basheer, Fazlur Rahman Sayed, 13. Muremyi, Roger, Dominique
Sufiyan Shaikh, Muhammad Atif Haughton, Ignace Kabano, and
Saleem Agwan, Anand Marya, Pietro François Niragire. 2020. “Prediction of
Messina, and Giuseppe Alessandro out-of-Pocket Health Expenditures in
Scardina. 2021. “An In Vitro Rwanda Using Machine Learning
Stereomicroscopic Evaluation of Techniques.” The Pan African Medical
Bioactivity between Neo MTA Plus, Journal 37 (December): 357.
Pro Root MTA, BIODENTINE & 14. Muthukrishnan, Lakshmipathy. 2021.
Glass Ionomer Cement Using Dye “Nanotechnology for Cleaner Leather
Penetration Method.” Materials 14 Production: A Review.”
(12). Environmental Chemistry Letters 19
https://doi.org/10.3390/ma14123159. (3): 2527–49.
10. Karthigadevi, Guruviah, 15. Nakamura, Haruyo, Floriano Amimo,
Sivasubramanian Manikandan, Siyan Yi, Sovannary Tuot, Tomoya
Natchimuthu Karmegam, Ramasamy Yoshida, Makoto Tobe, Md Mizanur
Subbaiya, Sivasankaran Rahman, Daisuke Yoneoka, Aya
Chozhavendhan, Balasubramani Ishizuka, and Shuhei Nomura. 2021.
Ravindran, Soon Woong Chang, and “Developing and Validating
Mukesh Kumar Awasthi. 2021. Regression Models for Predicting
“Chemico-Nanotreatment Methods for Household Consumption to Introduce
the Removal of Persistent Organic an Equitable and Sustainable Health
Pollutants and Xenobiotics in Water - Insurance System in Cambodia.” The
A Review.” Bioresource Technology International Journal of Health
324 (March): 124678. Planning and Management, July.
11. Kharrazi, Hadi, Xiaomeng Ma, Hsien- https://doi.org/10.1002/hpm.3269.
Yen Chang, Thomas M. Richards, and 16. Pauly, Mark. 2015. “Cost-
Changmi Jung. 2021. “Comparing the Effectiveness Analysis and Insurance
Predictive Effects of Patient Coverage: Solving a Puzzle.” Health
Medication Adherence Indices in Economics.
Electronic Health Record and Claims- https://doi.org/10.1002/hec.3044.
Based Risk Stratification Models.” 17. Preethi, K. Auxzilia, K. Auxzilia
Population Health Management, Preethi, Ganesh Lakshmanan, and
February. Durairaj Sekar. 2021. “Antagomir
https://doi.org/10.1089/pop.2020.0306. Technology in the Treatment of
12. Maharani, Dian, The authors are with Different Types of Cancer.”
Universitas Indonesia, Indonesia, Epigenomics.
Hendri Murfi, and Yudi Satria. 2019. https://doi.org/10.2217/epi-2020-0439.
“Performance of Deep Neural Network 18. Quadagno, Jill. 2006. “Cost
for Tabular Data — A Case Study of Containment versus National Health
Loss Cost Prediction in Fire Insurance.” One Nation,
Insurance.” International Journal of UninsuredWhy the U.S. Has No
Machine Learning and Computing. National Health Insurance.
2036
https://doi.org/10.1093/acprof:oso/978 Gabriel Sas, Ágoston Restás, Cyrus

0195160390.003.0006. Addy, Qiang Xu, et al. 2021. “Circular
19. Qudsi, Dini Hidayatul. 2015. Economy in Biocomposite
Predictive Data Mining of Chronic Development: State-of-the-Art,
Diseases Using Decision Tree: A Case Challenges and Emerging Trends.”
Study of Health Insurance Company in Composites Part C: Open Access 5
Indonesia. (July): 100138.
20. Rajczi, Alex. 2019. “Personal Cost.” 25. Shiny Irene, D., V. Surya, D. Kavitha,
The Ethics of Universal Health R. Shankar, and S. John Justin
Insurance. Thangaraj. 2021. “An Intellectual
https://doi.org/10.1093/oso/978019094 Methodology for Secure Health
6838.003.0004. Record Mining and Risk Forecasting
21. Rice, Thomas, and Kenneth E. Thorpe. Using Clustering and Graph-Based
1993. “Income-Related Cost Sharing in Classification.” Journal of Circuits
Health Insurance.” Health Affairs. Systems and Computers 30 (08):
https://doi.org/10.1377/hlthaff.12.1.21. 2150135.
22. Rushing, William A. 1986. “Health 26. Veerasimman, Arumugaprabu,
Insurance, Cost Containment, and Vigneshwaran Shanmugam,
Social Conflict: A Future Perspective.” Sundarakannan Rajendran, Deepak
Social Functions and Economic Joel Johnson, Ajith Subbiah, John
Aspects of Health Insurance. Koilpichai, and Uthayakumar
https://doi.org/10.1007/978-94-009- Marimuthu. 2021. “Thermal Properties
4231-8_9. of Natural Fiber Sisal Based Hybrid
23. Sawant, Kashmira, Ajinkya M. Pawar, Composites – A Brief Review.”
Kulvinder Singh Banga, Ricardo Journal of Natural Fibers, January, 1–
Machado, Mohmed Isaqali Karobari, 11.
Anand Marya, Pietro Messina, and 27. Willemse-Duijmelinck, Daniëlle M. I.
Giuseppe Alessandro Scardina. 2021. D., Daniëlle M. I. Willemse-
“Dentinal Microcracks after Root Duijmelinck, Wynand P. M. M. van de
Canal Instrumentation Using Ven, and Ilaria Mosca. 2017.
Instruments Manufactured with “Supplementary Insurance as a
Different NiTi Alloys and the SAF Switching Cost for Basic Health
System: A Systematic Review.” NATO Insurance: Empirical Results from the
Advanced Science Institutes Series E: Netherlands.” Health Policy.
Applied Sciences 11 (11): 4984. https://doi.org/10.1016/j.healthpol.201
24. Shanmugam, Vigneshwaran, Rhoda 7.08.003.
Afriyie Mensah, Michael Försth,
TABLES AND FIGURES

Table 1. Health Insurance datasets with seven attributes which selects the random samples
from a given dataset. Implement Linear Regression and Random Forest for each data point.
Age Sex Bmi Children Smoker Region Charges
19 FEMALE 27.9 0 YES SOUTHWEST 16884.924
2037
18 MALE 33.77 1 NO SOUTHEAST 1725.5523
28 MALE 33 3 NO SOUTHEAST 4449.462
33 MALE 22.705 0 NO NORTHWEST 21984.471
32 MALE 28.88 0 NO NORTHWEST 3866.8552
Table 2. Group Statistics of Linear Regression with Random Forest by grouping the
iterations with Sample size 10, Mean = 95.02000 , Standard Deviation = 1.50259 , Standard
Error Mean = 0.47516. Descriptive Independent Sample Test of Accuracy and Precision is
applied for the dataset in SPSS. Here it specifies Equal variances with and without assuming
a T-Test Score of two groups with each sample size of 10.
Group N Mean Std.Deviatio Std.Error
n Mean
Accuracy LR 10 95.2000 1.50259 0.47516
RF 10 93.3000 2.49132 0.78782
Precision LR 10 1.2640 0.20359 0.6438
RF 10 2.4620 0.24832 0.7853
Table 3. Independent Sample Test of Accuracy and Precision ( Calculate P-value <0.001 and
Significant value= 0.107, Mean Difference= 1.9 and confidence interval = (0.92- 0.10).
Logistic Regression and Random Forest are significantly different from each other.
F Sig T Df Sig(2. Mean Std error lower upper
tailed differe difference
nce
Accu Equal 2.885 0.107 2.065 18 <.001 1.9 0.92 -0.03 3.83
racy Variance
Assume
Equal 2.065 14.78 <.001 1.9 0.92 -0.06 3.86

variance
not
assumed
Preci Equal 0.289 0.598 -11.798 18 <.001 -1.198 0.10 -1.41 -0.98
sion Variance
Assume
d
Equal -11.798 17.3 <.001 -1.198 0.10 -1.41 -0.98

variance
not
assumed
2038
Fig. 1. Comparison of Linear Regression over Random Forest in terms of mean accuracy.It
explores that the mean accuracy is slightly better than random forest and the standard
deviation is moderately improved compared to random forest.Graphical representation of the
bar graph is plotted using group id as X-axis Linear Regression vs Random Forest,Y-Axis
displaying the error bars with a mean accuracy of detection +/-1 SD.
2039

Dental

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dental

Uploaded by

Copyright:

Available Formats

Journal of Survey in Fisheries Sciences 10(1S) 2030-2039 2023

A Novel Ranking Approach to Improved

INTRODUCTION chance. Ignorance of the problem

In distinguishing and forecasting insurance is used preprocessing of the

Computer Science and Engineering, there is one dependent variable and

identification using characteristic Sample T-test was performed.The

DISCUSSION prediction focuses on a person's own

writing. Author DV was involved in the Ruby Mishra, S. Sivasaravanan, and D.

https://doi.org/10.1093/acprof:oso/978 Gabriel Sas, Ágoston Restás, Cyrus

TABLES AND FIGURES

19 FEMALE 27.9 0 YES SOUTHWEST 16884.924

18 MALE 33.77 1 NO SOUTHEAST 1725.5523

28 MALE 33 3 NO SOUTHEAST 4449.462

33 MALE 22.705 0 NO NORTHWEST 21984.471

32 MALE 28.88 0 NO NORTHWEST 3866.8552

Accuracy LR 10 95.2000 1.50259 0.47516

RF 10 93.3000 2.49132 0.78782

Precision LR 10 1.2640 0.20359 0.6438

RF 10 2.4620 0.24832 0.7853

Equal 2.065 14.78 <.001 1.9 0.92 -0.06 3.86

Equal -11.798 17.3 <.001 -1.198 0.10 -1.41 -0.98

You might also like