Professional Documents
Culture Documents
To cite this article: Xuchen Lin, Xiaolong Li & Zhong Zheng (2016): Evaluating borrower’s default
risk in peer-to-peer lending: evidence from a lending platform in China, Applied Economics, DOI:
10.1080/00036846.2016.1262526
ABSTRACT KEYWORDS
Recent years have witnessed the popularity of online peer-to-peer lending, which allows indivi- Default risk; credit risk
duals to borrow from and lend to each other on an Internet-based platform. Using data from a assessment; peer-to-peer
large P2P platform in China, this article explores the factors that determine the default risk based lending; demographic
on the demographic characteristics of borrowers. Moreover, we propose a credit risk evaluation characteristic
model, which can quantify the default risk of each P2P loan. Empirical results reveal that gender,
JEL CLASSIFICATION
age, marital status, educational level, working years, company size, monthly payment, loan D12; G20
amount, debt to income ratio and delinquency history play a significant role in loan defaults.
Finally, we analyse the relationship between default risk and these contributory variables, and the
possible causes are also discussed in this study.
are influence default in loan repayment significantly show a higher default rate on average (Agarwal
(Oni, Oladele, and Oyewole 2006). The study of Iyer et al. 2015). The study of Ma and Wang (2016)
et al. (2009) has illustrated that borrowers can use both analysed the factors that influence the credit risk in
hard and soft data about the borrowers to assess one- P2P lending from three aspects, including bor-
third of the borrower’s credit risk (Iyer et al. 2009). Lin, rowers, P2P lending platform and environment,
Prabhala, and Viswanathan (2013) analysed the role of and they used the method of interpretative structural
social relations in the assessment of default risk, and modelling to explore the internal relationship
found that having a powerful social networking is a between these factors (Ma and Wang 2016).
significant factor of lowering the default risk of loans Serrano-Cinca et al. proposed a profit scoring system
(Lin, Prabhala, and Viswanathan 2013). Using data which can be used to predict the expected profit-
from the National Sample Survey 54th round, ability of investing in P2P loans, and they analysed
Chaudhuri et al. analysed the approval of a loan by the factors that determine loan profitability and
the financial institutions based on the sample selectiv- found these factors are different from factors that
ity model. And the results revealed that in rural India, affect the default risk (Serrano-Cinca and Gutiérrez-
credit rationing does exist in the credit market and Nieto. 2016). Chen and Han compared P2P lending
village level infrastructure acts a crucial role in deter- practices in China and the USA, and they found that
mining the behaviour of credit rationing (Chaudhuri ‘soft’ and ‘hard’ credit information both have pro-
and Cherical 2012). Study by Duarte, Siegel, and found impacts on lending outcomes in these two
Young (2012) further pointed out that the borrowers countries, but lenders in China rely more on ‘soft’
who seem more reliable get higher credit scores, and credit information (Chen and Han 2015). The study
have higher possibilities to have their loans funded and of Chen, Zhou, and Wan (2016) explored the rela-
make the default rates less (Duarte, Siegel, and Young tionship between people’s group social capital and
2012). Emekter et al. (2014) found that FICO score, their lending outcomes in the P2P lending market-
credit grade, revolving line utilization and debt-to- place (Chen, Zhou, and Wan 2016).
income ratio play a significant role in loan defaults Majority of the existing empirical studies of credit
(Emekter et al. 2014). To help a local government evaluation used data from P2P lending platforms of
estimate the interest rate in accordance with its credit the U.S.A (such as Prosper and the Lending club),
risk premium, study by Navarro-Galera et al. proposed meanwhile most of these studies were based on
a loan pricing model and found that the probability of FICO score model. In China, however, there is no
default is affected by population, socioeconomic and unified credit scoring system even in conventional
financial factors (Navarro-Galera et al. 2015). commercial banks and other financial institutions,
In the last 2 years, there has been a number of most lending platforms in China do not make its
research that studied the credit risk in the P2P credit risk assessment method known to public, and
lending marketplaces. The study of Guo et al. the traditional FICO credit risk assessment model
(2016) proposed an instance-based credit risk assess- may not be applicable to the P2P lending market in
ment model to quantitatively assess the risk of loans China.
in the P2P lending market (Guo et al. 2016). Using a Based on this, the goal of our article is to evaluate the
dataset from Lending Club, Eid et al. studied the default risk of the borrowers using loan data from a
impact of income rounding on loans outcomes, P2P lending platform in China, which will help bor-
they found borrowers with a rounding tendency rowers and lending platforms make more effective
have a higher probability of default rate and such decision to reduce investment risk. From the perspec-
borrowers are less likely to prepay than borrowers tive of default rate, we propose a credit risk assessment
with more accurate income reporting (Eid, Maltby model to analyse the borrowers’ credit risk based on
and Talavera 2016). Study by Agarwal et al. (2015) their demographic characteristics and corresponding
investigated the role of a personal guarantee in P2P loan information. Specifically, with the combination
marketplaces; the results indicated that the loans of qualitative and quantitative methods and using bin-
with guarantees are associated with a higher prob- ary logistics regression method to establish model, we
ability of getting the loans and a shorter time inter- study the relationship between borrowers’ credit risk
val between posting and closing, but such loans and their personal characteristics, characteristics of
APPLIED ECONOMICS 3
loans and other contributory factors, to provide more Table 1. Loan distribution by the loan status.
effective credit risk assessment method for lending plat- Number of
All loans loans Per cent Amount Per cent
forms and lenders. Late in payment 4854 9.9500 ¥17211079 10.1608
This article contributes a lot to the studies about Current 42576 87.2745 ¥147612512 87.1449
Fully paid 1354 2.7755 ¥4563909 2.6944
the emerging P2P lending market. While few exist- Total 48784 100.0000 ¥169387500 100.0000
ing studies discussed about borrowers’ credit screen-
ing problems in P2P lending, our study has the of these loan requests are not paying the regular
following several aspects differing from prior studies. instalments and not paid back in time (either fully
First, this study utilizes a new dataset which comes or partially). 2.78% of these loans has been fully paid,
from a large P2P lending platform in China, which which constitute about 4.6 million RMB. The 148
broadens the study of credit risk analysis. In con- million RMB loans account for 87.27% are in current
trast, most previous research utilized the data from state. From the perspective of lenders, the most essen-
P2P lending platforms in the U.S.A (such as Prosper tial concern is whether borrowers may default or not
and the Lending club). Second, this article analyses in the future. If some of the borrower’s personal
the default rate of loans based on the borrowers’ characteristics can help to predict the default like-
demographic factors, which will be conducive to lihood of the borrower, lenders will benefit a lot.
lenders to make an optimal investment strategy. Based on the descriptions of loan status, during the
This study will proceed as follows. In Section II, study time the platform provided a total of 48,785
we describe and summarize the descriptive statistics loans, 4854 of these loans are late payment. Although
of the data used in our study. Section III describes these default loans can be converted into a loan-
the methodology and presents the empirical results default rate of 9.9%, it can be biased downward
for measuring the default risk of borrowers. Finally, because the default rate is gradually rising with the
Section IV discusses the implication of the empirical maturity (Emekter et al. 2014).
result and concludes the whole article. According to the available data, the variables of each
loan case include borrower’s area (the area where the
borrower live), gender, marital status, children status
II. Data
(with or without children), educational level, monthly
In this section, we describe and summarize the income (one thousand RMB as a unit), the scale of
descriptive statistics of the data used in our study, working company, working years, age, loan amount
including the loan status and the characteristics of (one thousand RMB as a unit), repayment periods (the
loan applicants. This article uses 52,017 loan applica- months of payment periods), monthly payment(one
tions from 1 January 2015 to 31 May 2015 obtained hundred RMB as a unit), debt to income ratio, delin-
from www.yooli.com, which is a large P2P lending quency history (whether there was 7 days overdue
platform in China. Through data preprocessing and previously) and default status (whether the payment
data screening, we excluded the loan cases which have is overdue or not now) et al. In order to simplify the
missing value variables or outliers (for instance, in model, increase the stability of the model and reduce
some loan cases, the age of borrower is over 100, or the computational complexity, we preprocess these
borrower’s monthly income is more than 50,000 alternative indicators, and divide some alternative indi-
RMB et al.), and eventually we get 48,784 valid loan cators into groups. Among these variables, borrower’s
cases, the total amounts of all these loans are approxi- area, gender, marital status, children status, educational
mately 169 million RMB. Table 1 shows the loan level and the scale of working company are categorical
status for all the loans requested from 1 January variables. Specifically, borrower’s area was divided into
2015 to 31 May 2015. As shown in Table 1, 9.95% Western China, Central China and Eastern China1;
of all the loan applications, that is, a total amount of marital status contains unmarried, divorced and mar-
17.2 million RMB may be lost, because the borrowers ried; educational level is divided into below high
1
There are serious differences of social and economic development existing between inland and coastal cities in China. According to the different
development level, China can be divided into three economic belt: Eastern China, Central China and Western China. The economic development of
Eastern China is the best of these three regions, Central China is the medium region for economic development, and the economic development of
Western China is relatively slow.
4 X. LIN ET AL.
school, high school and undergraduate or postgradu- average loan amount is 3472 RMB and a borrower
ate; the scale of working company is divided into mini- need to pay 309.7 RMB per month on average. The
company (below 10 staff), small company (10 to 100 average repayment period is 12.8 months, and the
staff) and medium or large company (more than 100 average debt-to-income ratio is 0.093.
staff). Making such classifications for educational level
and company scale is because the borrowers in this
dataset are mainly lower income and have lower edu- III. Empirical results
cation levels. The descriptive statistics of the loan data
used in this study is shown in Table 2, including the In this section, we explore the factors that influence
general characteristics of borrowers and loans. As the possibility of loan default. First, we carry on the
shown in Table 2, based on our sample of 48,784 nonparametric tests to explore whether there is a
loan applicants, 30,916 people are male accounting significant difference in the variables between good
for 63.4%, remaining 17,868 people are female status loans and defaulted loans. Second, by employ-
accounting for 36.6%. Most borrowers come from ing the binary logistic regression, the default risk of a
West China. Borrowers are within the age group of loan applicant is modelled.
16–55 with a mean of 25-years old, and the average The results of the nonparametric tests are reported in
working year is about 1.8 years. 66.9% of borrowers are Table 3, in which summarized the differences between
unmarried, 28.5% of borrowers are married and the good loans and defaulted loans. Good status loans are
remaining 4.6% of borrowers are divorced. About loans that are fully paid or in payment schedule, and
three-fourths of borrowers have no kid and one-fourth defaulted loans are loans which are late in payment and
of borrowers have kid. As for borrowers’ education likely to lose. As we can see from Table 3, the loan and
level, 12,108 borrowers’ education experience is below borrower characteristics of the two groups are signifi-
high school, accounting for 24.8%, 22,811 borrowers cantly dissimilar. By examining the chi-square statistic
have education degree of high school, accounting for values of Kruskal–Wallis, we can find that the variables
46.8%, 13,865 people whose education degree achieve including gender, age, marital status, children status,
Bachelor’s degree or above contribute 28.4% of all educational level, working years, monthly income, loan
borrowers. On average, a borrower earn 3923 RMB amount, debt to income ratio and delinquency history
every month. 34.7% of borrowers are working in mini- between these two groups are statistically different at 1%
company, 40% of borrowers are working in small level. Specifically, the results show that the amount
company, and the remaining 25.3% of borrowers are borrowed on a defaulted loan is higher and the debt to
working in medium or large company. 85.8% of bor- income ratio is lower. And the borrowers of defaulted
rowers have no default history, and only 14.2% of loans are more likely to be male, older, single, well-
borrowers have default behaviours in the past. The educated staff of a large company. In addition, defaulted
APPLIED ECONOMICS 5
Table 3. Nonparametric test of differences between defaulted Table 4 presents the empirical results of the logistic
loans and good loans. regression described above. In the first place, we use
Variables Defaulted loans Good loans
forward stepwise iterative maximum likelihood method
1 Gender Lower Higher
2 Age Higher Lower to estimate the regression model, and then it is esti-
3 Marital status Lower Higher mated again with backward stepwise iterative maximum
4 Children status Lower Higher
5 Educational level Lower Higher likelihood method. Both methods achieve similar esti-
6 Working years Lower Higher
7 Loan amount Higher Lower
mated result. Among the 14 variables included in the
8 Debt-to-income ratio Lower Higher regression model, 10 variables (gender, age, marital
9 Delinquency history Higher Lower
status, educational level, working years, company size,
monthly payment, loan amount, debt to income ratio,
loan borrowers are staffs with less seniority, and they delinquency history) influence the loan outcome signif-
tend to have children and delinquency history. icantly. All the estimated coefficients in the results are
In order to further explore the exact effect of each significant at the 1% level, excluding monthly payment
variable on the default rate of loans, we use the and educational level 1 (below high school) which are
binary logistic regression which contains all variables significant at the 5% level, and marital status 1 (unmar-
needed to investigate. We decide to employ the ried) and company size 2 (small company) which are
binary logistic regression model because the depen- not significant. Examining Hosmer and Lemeshow’s
dent variable of general regression is not binary. In test, it indicate that the model is sufficient in explaining
our study, explained variable is divided into normal the outcome of a loan with a chi-square value of 12.470
loan and default loan, it is just a binary classification and a significance of 0.131 (Hosmer and Lemeshow
problem to which the logistic regression is a good 2000). This model does not suffer from multi-collinear-
solution. At the same time, we expect to obtain the ity problem since all SEs of estimated coefficients are
default possibility of a certain borrower, namely, the much smaller than two. Cox & Snell R2 and Nagelkerke
probability of the dependent variable taking value 1, R2 for this regression model is 18.2% and 38.2%,
so the results of the regression model have intuitive respectively.
meaning. In the binary logistic regression, the As we can see from Table 4, we can get several
dependent variable represents the likelihood of the implications: (1) With this data sample, the area where
event that happens; in this case, it corresponds to a borrower live, with or without children, personal
default. Let us suppose that z is a continuous num- monthly income and the months of payment period
ber that was not observed, it represents the prob-
ability of a default. As a result, a higher value of z Table 4. Binary logistic regression results.
indicates a higher default probability. In order to Variables β SE Wals Sig. Exp(β)
transform this continuous number into a number Gender −0.357*** 0.042 71.715 0.000 0.700
Marital status 8.328 0.016
ranging from 0 to 1, we utilize the transformation Marital status 1 0.035 0.051 0.475 0.490 1.036
Marital status 2 0.240*** 0.083 8.327 0.004 1.271
as follows: Educational level 13.491 0.001
Educational level 1 0.121** 0.053 5.162 0.023 1.128
1 Educational level 2 0.169*** 0.046 13.490 0.000 1.184
p¼ (1) Company size 12.190 0.002
1 þ ez Company size 1 0.127*** 0.047 7.172 0.007 1.135
Company size 2 −0.013 0.046 0.077 0.781 0.987
where p represents the probability of a default. It can Working years −0.027*** 0.007 13.610 0.000 0.973
Age 0.012*** 0.004 8.370 0.004 1.012
be further assumed that in the binary logistic regres- Loan amount 0.082*** 0.020 16.299 0.000 1.086
sion, k explanatory variables are linearly correlated Monthly payment 0.046** 0.018 6.219 0.013 1.047
Debt-to-income ratio −0.013*** 0.005 7.536 0.006 0.987
with z, so this model can use the following description: Delinquency history 3.278*** 0.037 7982.197 0.000 26.529
Constant −3.616*** 0.167 468.557 0.000 0.027
Hosmer and Lemeshow’s Test: Chi-square = 12.470, Significance = 0.131
z ¼ β0 þ β1 x1 þ β2 x2 þ β3 x3 þ ::: þ βk xk þ u;
The base value of the model for gender is male; the base value for marital
(2) status is married, marital status 1 represents unmarried, marital status 2
represents divorced; the base value for educational level is the highest
education degree, and educational level 1 and 2 represent below high
where x is the explanatory variable and k is the num- school and high school degree, respectively. The base value for company
ber of explanatory variable (Hosmer and Lemeshow size is the largest company size, company size 1 is mini-company and
company size 2 is small company. ** represents significance at the 5%
2000). level, and *** represents significance at the 1% level.
6 X. LIN ET AL.
have no significant influence on the default rate of a To illustrate, for a male, married borrower with age
certain loan. (2) The default rate of female borrowers is of 30, educational level of middle school, working years
less than that of male borrowers, corresponding to 70% of 10, loan amount of 3000 RMB, monthly payment of
of men’s. (3) The default rate of a borrower who is in the 300 RMB and debt-to-income ratio of 10%, and he
divorce state is significantly higher (1.271 times) than works in a micro-enterprise and defaulted in the past,
those whose marriage is compatible. As for unmarried the default probability of such a borrower is 57.5%.
young people, married or not has no significant impact (z ¼ 3:616 þ 0:169 þ 0:127 10 0:027 þ 30 0:
on the default rate. (4) The education level is signifi- 012 þ 3 0:082 þ 3 0:046 10 0:013 þ 3:278 ¼
cantly associated with the default risk, the default rate of 0:302; p ¼ 1þe10:302 ¼ 0:575). Assuming that other vari-
borrowers with low education degree is significantly ables remain the same, for a relatively safer borrower
higher than that of borrowers with high education without delinquency history, the probability of default
degree. The default rate of borrowers who only have would only be 4.9%.
high school degree is 1.184 times of those who have at
least Bachelor degree. (5) The company size where
borrower work is significantly related to the default
IV. Conclusion and discussion
rate, the default rate of a borrower working in the
mini-company is significantly higher than that of bor- For the P2P lending marketplaces, credit risk is an
rower working in medium or large company, the former important concern. In this article, we explored the
is 1.135 times of the latter. (6) Borrowers with long factors that determine the default risk and propose a
working years are associated with low default risk, as comprehensive credit risk evaluation model, which
the working age of a borrower increase by 1 year, the can quantify the default risk of each P2P loan. Our
default rate will be reduced to 0.973 of the original. (7) findings indicate that characteristics of borrowers
The older a borrower get, the higher the default risk will with low default risk are female gender, young adults,
be, a borrower’s default risk increases by 1.2% for every long working time, stable marital status, high educa-
additional age year. (8) Loans with high loan amount are tional level, working in large company, low monthly
related to high default risk, the default rate increases by payment, low loan amount, low debt to income ratio
8.6% for every 1000 RMB loan amount. (9) The more a and no default history. This result signifies that apart
borrower needs to repay every month, the higher the from the friendship and social ties of loan applicants
default risk will be. A borrower’s default risk raised by as discussed by Freedman and Jin (Freedman and Jin
1.2% for every 100 RMB monthly repayments. (10) The 2014) and Lin et al. (Lin, Prabhala, and Viswanathan
debt-to-income ratio of a borrower is related to the 2013), the 10 factors reported above are also signifi-
default rate significantly. As the debt-to-income ratio cant in determining the default risk of loans.
increases by 1%, the default rate will reduce to 98.7% of Our findings suggest that in contrast to men,
the original. (11) If a borrower has defaulted in the past, women have relatively lower default rate, which is
it will greatly enhance the default risk. consistent with the result obtained by Chen, Li, and
On the basis of the regression results, we can Lai (2014). Previous study also indicated that in
obtain the probability of a default for a certain loan financial decision-making, women display a com-
using the following model with the estimated coeffi- mon trait of less risk-seeking behaviour than men
cients as reported in Table 4. (Powell and Ansic 1997). Borrowers who have nor-
mal and healthy marriage have lower default risk,
z ¼ β0 þ β1 ðgenderÞ þ β2 ðmarital statusÞ which is in accord with our normal cognitive.
Divorced borrowers need to take care of their family
þ β3 ðeducational backgroundÞ
alone, so they are in relative poor financial situation,
þ β4 ðcompany sizeÞ þ β5 ðworking yearsÞ and they are likely to be overdue in the future. High-
þ β6 ðageÞ þ β7 ðtotal fundedÞ (3) educated borrowers, especially borrowers who have
þ β8 ðmonthly paymentÞ Bachelor degree or above, their default rate is sig-
nificantly lower than those borrowers with low edu-
þ β9 ðdebt to income ratioÞ
cation degree, this suggests that an increase in the
þ β10 ðdelinquency historyÞ þ u educational level (that is, more formal education
APPLIED ECONOMICS 7
Gomez, R., and E. Santor. 2003. “Do Peer Group Members Milne, A., and P. Parboteeah. 2016. The Business Models and
Outperform Individual Borrowers? A Test of Peer Group Economics of Peer-to-Peer Lending. Social Science
Lending Using Canadian Micro-Credit Data.” General Electronic Publishing.
Information. Navarro-Galera, A., S. Rayo-Cantón, J. Lara-Rubio, and
Gross, D. B., and N. S. Souleles. 2002. “An Empirical Analysis of D. Buendía-Carrillo. 2015. “Loan Price Modelling for
Personal Bankruptcy and Delinquency.” Review of Financial Local Governments Using Risk Premium Analysis.”
Studies 15 (1): 319–347. doi:10.1093/rfs/15.1.319. Applied Economics 47 (58): 6257–6276. doi:10.1080/
Guo, Y., W. Zhou, C. Luo, C. Liu, and H. Xiong. 2016. “Instance- 00036846.2015.1068924.
Based Credit Risk Assessment for Investment Decisions in Oni, O. A., O. I. Oladele, and I. K. Oyewole. 2006. “Analysis
P2P Lending.” European Journal of Operational Research 249 of Factors Influencing Loan Default among Poultry
(2): 417–426. doi:10.1016/j.ejor.2015.05.050. Farmers in Ogun State Nigeria.” Journal of Central
Hosmer, D. W., and S. Lemeshow. 2000. “Introduction to the European Agriculture 6 (4): 619–624.
Logistic Regression Model.” In Applied Logistic Regression, Powell, M., and D. Ansic. 1997. “Gender Differences in
2nd edition, 1–30. John Wiley & Sons. Risk Behaviour in Financial Decision-Making: An
Iyer, R., A. I. Khwaja, E. F. P. Luttmer, and K. Shue. 2009. Experimental Analysis.” Journal of Economic
“Screening in New Credit Markets: Can Individual Psychology 18 (6): 605–628. doi:10.1016/S0167-4870
Lenders Infer Borrower Creditworthiness in Peer-to-Peer (97)00026-3.
Lending?” AFA 2011 Denver meetings paper. Serrano-Cinca, C., and B. Gutiérrez-Nieto. 2016. “The Use of
Lin, M., N. R. Prabhala, and S. Viswanathan. 2013. “Judging Profit Scoring as an Alternative to Credit Scoring Systems
Borrowers by the Company They Keep: Friendship in Peer-to-Peer (P2P) Lending.” Decision Support Systems
Networks and Information Asymmetry in Online Peer- 89: 113–122. doi:10.1016/j.dss.2016.06.014.
to-Peer Lending.” Management Science 59 (1): 17–35. Stiglitz, J. E., and A. Weiss. 1981. “Credit Rationing in
doi:10.1287/mnsc.1120.1560. Markets with Imperfect Information.” The American
Ma, H.-Z., and X.-R. Wang. 2016. “Influencing Factor Economic Review 71 (3): 393–410.
Analysis of Credit Risk in P2P Lending Based on Wang, H., M. Greiner, and J. E. Aronson. 2009. “People-To-
Interpretative Structural Modeling.” Journal of Discrete People Lending: The Emerging E-Commerce Transformation
Mathematical Sciences and Cryptography 19 (3): 777–786. of a Financial Market.” In Value Creation in e-Business
doi:10.1080/09720529.2016.1178935. Management, 182–195. Berlin: Springer.