Professional Documents
Culture Documents
Submitted by:
RAKSHITH P S
MB197709
What is Hive?
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top
of Hadoop to summarize Big Data, and makes querying and analysing easy.
Features of Hive
What is Hue?
Hue is an open-source user experience or user interface for Hadoop components. The
user can access Hue right from within the browser and it enhances the productivity of
Hadoop developers.
Topic
Loading Database
1. Select all the observation from the database
Command
SELECT * FROM churn48;
Output
Command
SELECT churn48.exited FROM churn48 WHERE exited = '1';
Output
Interpretation:
This command selects all observations in the database. There are 500 entries
in the database. Database includes all the information related to bank
customers.
2. Select all the customers who have exited the bank
Command:
Output
Interpretation:
In this result out of 500 customers 101 have exited the bank. That
means customers are exiting the bank due to various reasons.
3. To observe the location of the customers who have exited.
Command:
SELECT geography FROM churn48 WHERE churn48.exited= "Yes";
Output:
Interpretation:
Command:
Output
Interpretation
The average balance of active customers who exited the bank is $82310.
That means Bank is losing the deposit of $82310 on each customer leaving the
bank.
5. To see the average loan balance of the active customers
who have exited bank.
Command:
SELECT avg(balance) FROM churn48 WHERE churn48.exited = "Yes" and
isactivemember = 'Yes';
Output
Interpretation:
The average loan balance of active customers who went the bank is $84830. That means
Bank is down the loan amount of $84,830 on each active customer leaving the bank.
6. To count the number of male and female active customers
Output:
Interpretation:
When we group active members, who left the bank on gender basis we get:
• 15 Female
• 14 Male
As the ratio between male and female are almost equal. We cannot draw any
gender specific conclusions.
7. Short term loans:
Command:
SELECT customerid FROM churn48 WHERE tenure <= 5;
Output:
Interpretation:
We have given 266 short term loan on their customer.
ii. How many exited out of customers who have taken short term loan
Command
SELECT count(customerid) FROM churn48 WHERE tenure <= 5
AND exited = 'Yes';
Output:
Interpretation:
57 customers out of total 266 who have taken short
term loan exited bank. Bank now has a default rate of
22% for short term loans.
Command
SELECT count(customerid) FROM churn48 WHERE tenure >5;
Output:
Interpretation:
Bank has given 235 long term loans to their customers.
ii. How many exited out of customers who have taken long term loan
Command
SELECT count(customerid) FROM churn48 WHERE tenure >5 and
exited = 'Yes';
Output
Interpretation:
45 customers out of 235 who have taken long term loans defaulted. The long-term default
rate is 19%.
To check balance and credit score of customers who have exited
i. To check balance of customers who have taken long term loan and exited
Command
SELECT avg(balance),min(balance),max(balance) FROM churn48 WHERE tenure >5 and
exited = 'Yes';
Output
Interpretation:
The average loan balance of customers who have exited the bank is $95,525. These
customers had taken long term loans. The maximum loan balance of customer who left the
bank is $182,123. And the minimum balance is zero. That tells us that even the loyal
customers who have cleared all their dues left the bank after their loan period.
ii. To check credit score of customers who have taken long term loan and
exited
Command:
SELECT avg(creditscore),min(creditscore),max(creditscore) FROM
churn48 WHERE tenure >5 and exited = 'Yes';
Output:
Interpretation:
The average credit score of customers is 622 which is below the score
which is considered as ‘good’. Credit score of 700+ is considered as
‘good’ credit score. And notches below that are not dependable credit
scores. Bank has certified long term loans to customers with credit
score as low as 431. This is one of the reasons why bank is losing out
its loan money.
Conclusion:
The main purpose of the lab was to understand churn modelling using banking data.
We used Hive Query Language to query and generate insights from the dataset. We
used Hue for running the HQL code. After going through the data, we came up with
the following insights.
Insights
The data consisted of 500 customers and out of those 101 customers have exited the
bank. These customers belong to various locations like France, Germany, and Spain.
29 customers who exited the bank were active. Out of people who have defaulted on
the repayment of the loan, both males and females are in an equal ratio. The average
loan balance of customers who exited the bank is $84,830. The bank is losing
$84,830 on each active customer defaulting.
Bank has given 266 short-term loans. Out of those 57 customers have exited the bank
i.e., the bank’s default rate for the short-term loan is 21%. The average loan balance
of customers who have exited the bank is $84,103. The maximum loan balance of a
customer who left the bank is $211196 which is higher than the maximum of the
long-term loan balance. Bank has given loans to people with a credit score as low as
374. Such a credit score is not considered good. And the average credit score is 647.
Bank has given 234 short-term loans. Out of those 46 customers have exited the bank
i.e., the bank’s default rate for the long-term loan is 19%. The average long term loan
balance defaulted is $95,525 per customer. The average credit score of customers is
622 which is below the score which is considered ‘good’. Bank has sanctioned long-
term loans to customers with a credit score as low as 431.
Suggestion
Bank is losing out a huge amount of money due to its unfortunate loan sanctioning
policy. Bank is giving money to people who have very less credit scores. This is one
of the major reasons for collective NPA. On the other hand, loyal customers who
have repaid all their dues exiting the bank. This shows there is some service issue in
the bank. Bank is not providing any services that can compete with other banks. This
is the main reason why customers are switching to other banks after repayment of
their loan amount.
To control loan defaults, bank has to make sure it lends money to only those who
have enough credit scores to prove their creditworthiness. And Bank should provide
some customer beneficial services which attract customers and make them stick to
the banking services provided.