You are on page 1of 18

PART B

Big Data Analytics using


HQL

Submitted by:

RAKSHITH P S

MB197709
What is Hive?
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top
of Hadoop to summarize Big Data, and makes querying and analysing easy.

Features of Hive

• It stores schema in a database and processed data into HDFS.


• It is designed for OLAP.
• It provides SQL type language for querying called HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.

What is Hue?

Hue is an open-source user experience or user interface for Hadoop components. The
user can access Hue right from within the browser and it enhances the productivity of
Hadoop developers.
Topic

Bank Customer Churn Modelling using HQL (Hive QL)


Data Description:

Column Heading Description


Row Number Number of rows in the data
Customer ID Unique ID given to various customers
by Bank
Surname Surname of the customer
Credit Score Credit score is a number that defines
the credit worthiness of the customer.
Higher the number, higher credit
worthiness (has more repaying
capacity)
Geography Customer’s location
Gender Customer’s gender
Age Customer’s age
Tenure How long they have taken loan for
Balance Balance loan amount
Number of products Number of various loans taken by the
customer
HasCrCard If Customer has credit card or not
IsActive Is Customer active or not
Exited Has customer exited the bank or not
1. Hue Environment Consists of
2. Database: Collection of several database that people are working on
3. Console: Place to write queries using languages like MYSQL, Hive,
Spark SQ etc
4. Output: Shows the output of the code written.
5. Tables: Shows all the datasets uploaded on the site.
Methodology
Import Data

Loading Database
1. Select all the observation from the database
Command
SELECT * FROM churn48;

Output

Command
SELECT churn48.exited FROM churn48 WHERE exited = '1';
Output

Interpretation:
This command selects all observations in the database. There are 500 entries
in the database. Database includes all the information related to bank
customers.
2. Select all the customers who have exited the bank

Command:

SELECT churn48.exited FROM churn48 WHERE exited='Yes';

Output

Interpretation:
In this result out of 500 customers 101 have exited the bank. That
means customers are exiting the bank due to various reasons.
3. To observe the location of the customers who have exited.
Command:
SELECT geography FROM churn48 WHERE churn48.exited= "Yes";

Output:

Interpretation:

In the output we can observe that

• 36 are from France


• 40 are from Germany
• 28 are from Spain
4. To see the average balance of the active customers who
have exited bank.

Command:

SELECT avg(balance) FROM churn18 WHERE churn18.exited "Yes" and


isactivemember= 'Yes';

Output

Interpretation
The average balance of active customers who exited the bank is $82310.
That means Bank is losing the deposit of $82310 on each customer leaving the
bank.
5. To see the average loan balance of the active customers
who have exited bank.
Command:
SELECT avg(balance) FROM churn48 WHERE churn48.exited = "Yes" and
isactivemember = 'Yes';

Output

Interpretation:
The average loan balance of active customers who went the bank is $84830. That means
Bank is down the loan amount of $84,830 on each active customer leaving the bank.
6. To count the number of male and female active customers

who have exited the bank.


Command
SELECT count(customerid) FROM churn48 WHERE churn48.exited = "Yes" and
isactivemember = 'Yes' GROUP BY gender;

Output:

Interpretation:
When we group active members, who left the bank on gender basis we get:

• 15 Female
• 14 Male
As the ratio between male and female are almost equal. We cannot draw any
gender specific conclusions.
7. Short term loans:

i. To check the number of short terms loans given

Command:
SELECT customerid FROM churn48 WHERE tenure <= 5;

Output:

Interpretation:
We have given 266 short term loan on their customer.

ii. How many exited out of customers who have taken short term loan

Command
SELECT count(customerid) FROM churn48 WHERE tenure <= 5
AND exited = 'Yes';
Output:

Interpretation:
57 customers out of total 266 who have taken short
term loan exited bank. Bank now has a default rate of
22% for short term loans.

8. Long Term Loans

i. To check the number of long terms loans given

Command
SELECT count(customerid) FROM churn48 WHERE tenure >5;

Output:
Interpretation:
Bank has given 235 long term loans to their customers.

ii. How many exited out of customers who have taken long term loan

Command
SELECT count(customerid) FROM churn48 WHERE tenure >5 and
exited = 'Yes';

Output

Interpretation:
45 customers out of 235 who have taken long term loans defaulted. The long-term default
rate is 19%.
To check balance and credit score of customers who have exited

i. To check balance of customers who have taken long term loan and exited

Command
SELECT avg(balance),min(balance),max(balance) FROM churn48 WHERE tenure >5 and
exited = 'Yes';

Output

Interpretation:
The average loan balance of customers who have exited the bank is $95,525. These
customers had taken long term loans. The maximum loan balance of customer who left the
bank is $182,123. And the minimum balance is zero. That tells us that even the loyal
customers who have cleared all their dues left the bank after their loan period.

ii. To check credit score of customers who have taken long term loan and
exited

Command:
SELECT avg(creditscore),min(creditscore),max(creditscore) FROM
churn48 WHERE tenure >5 and exited = 'Yes';
Output:

Interpretation:
The average credit score of customers is 622 which is below the score
which is considered as ‘good’. Credit score of 700+ is considered as
‘good’ credit score. And notches below that are not dependable credit
scores. Bank has certified long term loans to customers with credit
score as low as 431. This is one of the reasons why bank is losing out
its loan money.
Conclusion:

The main purpose of the lab was to understand churn modelling using banking data.
We used Hive Query Language to query and generate insights from the dataset. We
used Hue for running the HQL code. After going through the data, we came up with
the following insights.

Insights

The data consisted of 500 customers and out of those 101 customers have exited the
bank. These customers belong to various locations like France, Germany, and Spain.
29 customers who exited the bank were active. Out of people who have defaulted on
the repayment of the loan, both males and females are in an equal ratio. The average
loan balance of customers who exited the bank is $84,830. The bank is losing
$84,830 on each active customer defaulting.

Short term loans:

Bank has given 266 short-term loans. Out of those 57 customers have exited the bank
i.e., the bank’s default rate for the short-term loan is 21%. The average loan balance
of customers who have exited the bank is $84,103. The maximum loan balance of a
customer who left the bank is $211196 which is higher than the maximum of the
long-term loan balance. Bank has given loans to people with a credit score as low as
374. Such a credit score is not considered good. And the average credit score is 647.

Long term loans:

Bank has given 234 short-term loans. Out of those 46 customers have exited the bank
i.e., the bank’s default rate for the long-term loan is 19%. The average long term loan
balance defaulted is $95,525 per customer. The average credit score of customers is
622 which is below the score which is considered ‘good’. Bank has sanctioned long-
term loans to customers with a credit score as low as 431.
Suggestion

Bank is losing out a huge amount of money due to its unfortunate loan sanctioning
policy. Bank is giving money to people who have very less credit scores. This is one
of the major reasons for collective NPA. On the other hand, loyal customers who
have repaid all their dues exiting the bank. This shows there is some service issue in
the bank. Bank is not providing any services that can compete with other banks. This
is the main reason why customers are switching to other banks after repayment of
their loan amount.
To control loan defaults, bank has to make sure it lends money to only those who
have enough credit scores to prove their creditworthiness. And Bank should provide
some customer beneficial services which attract customers and make them stick to
the banking services provided.

You might also like