Welcome to Scribd!

Big Bank, Big Problem: Data Ingestion

Uploaded by

0% found this document useful (0 votes)

45 views3 pages

A large bank wants to use Hadoop to analyze customer data stored in RDBMS tables. They will migrate loan, credit card, and stock data to HDFS every 2 minutes. The data must be encrypted and compressed. Analysis includes finding customers with unpaid loans and healthy credit cards, and calculating maximum profits from stock trades. Survey and email data from small files will be packed and stored in HDFS. Policies will move HDFS directories between hot, warm and cold storage tiers daily based on defined dates.

Original Description:

Original Title

Bank Project

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

45 views3 pages

Big Bank, Big Problem: Data Ingestion

Uploaded by

Ram Guggul

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Big Bank, Big Problem

A leading banking and credit card services provider is trying to use Hadoop technologies to handle
and analyze large amount of data.

Currently, the organization has data in RDBMS but wants to use hadoop ecosystem for storage,
archival and analysis of large amount of data.

Data Ingestion
Bring data from RDBMS to HDFS. This data import must be incremental and should happen every 2
minutes.

Following tables have to be imported:

Table Incremental Column

loan_info Loan_id
credit_card_info Cc_number
shares_info Gmt_timestamp

All these data must be encrypted in HDFS.

HDFS data should be compressed to store less volume.

The sqoop password must also be encrypted.

Table Details
loan_info
This table has following columns

Loan_id Loan Identifier

User_id User Identifier
last_payment_date Date on which last payment was made
payment_installation Amount payable in installation
Date_payable Date in month when payment is made by the
user

credit_card_info
This table has following columns

cc_number Credit Card number

user_id User Identifier
maximum_credit Maximum credit allowed by the user
outstanding_balance Current outstanding balance by the user
due_date Due date for the credit card payment
Shares_info
This table has following information

Share_id Share Identifier

Company_name Name of organisation
Gmt_timestamp Timestamp of the share price
Share_price Value of share

Analysis
1. Find out the list of users who have at least 2 loan instalments pending.

2. Find the list of users who have a healthy credit card but outstanding loan account.

Healthy credit card means no outstanding balance.

3. For every share and for every date, find the maximum profit one could have made on the share.
Bear in mind that a share purchase must be before share sell and if share prices fall throughout the
day, maximum possible profit may be negative.

Archival
The organisation has lot of survey data scattered across different files in a directory in local file
system. Provide a mechanism to effectively store the small files in Hadoop. It is expected to pack
small files together before actually storing them in HDFS.

Survey files have below structure

survey_date Date on which survey was conducted

survey_question Question asked in the survey
Rating Rating received (1 - 5)
user_id User id who responded
survey_id Survey ID

Following analysis is expected from survey files:

1. How many surveys got the average rating less than 3, provided at least 10 distinct users gave the
rating?

2. Find the details of the survey which received the minimum rating. The condition is that the survey
must have been rated by at least 20 users.

The organisation also has lots of emails stored in small files.

The metadata about the email is present in an XML file email_metadata.xml

Read the XML file for email structure and pack all the email files in HDFS.
Following analysis is expected from the email data:

1. Which is the longest running email?

2. Find out the list of emails which were unanswered.

Policy Setting
A policy file contains following fields in a CSV file:

HDFS location HDFS directory

HOT date Date till which HDFS directory will be in HOT
WARM date Date till which HDFS directory will be in WARM
COLD date Date till which HDFS directory will be in COLD

Run a job every day to make the policy over the HDFS locations to the new value if they are suitable
for the same.

Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Bank PDF
Document10 pages
Bank PDF
Ram Guggul
No ratings yet
Big Data for Beginners: Data at Scale. Harnessing the Potential of Big Data Analytics
From Everand
Big Data for Beginners: Data at Scale. Harnessing the Potential of Big Data Analytics
Tom Lesley
No ratings yet
Bda_pr4D
Document2 pages
Bda_pr4D
Dev Pathak
No ratings yet
Hadoop Big Data Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Hadoop Big Data Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
No ratings yet
Project Report PDF
Document11 pages
Project Report PDF
Bingo
No ratings yet
Transit Quarters
From Everand
Transit Quarters
Shyamala Nemana
Rating: 4 out of 5 stars
4/5 (8)
Bda_pr4
Document2 pages
Bda_pr4
Dev Pathak
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
Rating: 3 out of 5 stars
3/5 (1)
BDA Answers-1
Document15 pages
BDA Answers-1
afreed khan
No ratings yet
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
Document6 pages
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
Nitin Patidar
No ratings yet
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
Document6 pages
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
Nitin Patidar
No ratings yet
Health Care System Analysispdf
Document19 pages
Health Care System Analysispdf
hungdang
No ratings yet
Big Data
Document27 pages
Big Data
Easha Asif
No ratings yet
Suggested Answers To Discussion Questions
Document3 pages
Suggested Answers To Discussion Questions
kkmoop
No ratings yet
AIS - Chapter 3 Relational Database
Document24 pages
AIS - Chapter 3 Relational Database
Ermias Guragaw
No ratings yet
Dbms Seperate Notes
Document74 pages
Dbms Seperate Notes
Rupith Kumar .N
No ratings yet
Big Data Question Bank
Document38 pages
Big Data Question Bank
indrajeet007
No ratings yet
Aws Data Analytics Fundamentals
Document15 pages
Aws Data Analytics Fundamentals
4139 NIVEDHA S
100% (1)
Instructor Manual For Accounting Information Systems A Business Process Approach Edition 2nd by Frederick Jones Author Dasaratha Rama Author
Document53 pages
Instructor Manual For Accounting Information Systems A Business Process Approach Edition 2nd by Frederick Jones Author Dasaratha Rama Author
ArianaDiaztkzsn
100% (77)
Banking Management System 002
Document13 pages
Banking Management System 002
Utkarsh Yadav
No ratings yet
Ing. Wilber Bautista Pizarro
Document21 pages
Ing. Wilber Bautista Pizarro
Joel Riveros Salinas
No ratings yet
Unit 1 1
Document10 pages
Unit 1 1
Sahil Dhakad
No ratings yet
Project Planning Sheet™: Human Resource Information Systems
Document9 pages
Project Planning Sheet™: Human Resource Information Systems
c3m3n17
No ratings yet
Uni I
Document27 pages
Uni I
GOVINDAN M
No ratings yet
Project On Futuristic Entertainment: Developed by
Document17 pages
Project On Futuristic Entertainment: Developed by
Sanooj Remegius
No ratings yet
Madhu Mohan Reddy. Mobile: +919494929654
Document4 pages
Madhu Mohan Reddy. Mobile: +919494929654
Karunakar Shakampally
No ratings yet
Introduction to Hadoop: Understanding Big Data and its Processing
Document22 pages
Introduction to Hadoop: Understanding Big Data and its Processing
Naru Naveen
No ratings yet
DBMS REPORT pdf1
Document32 pages
DBMS REPORT pdf1
Tausif
No ratings yet
Updated Unit-2
Document55 pages
Updated Unit-2
sc
0% (1)
Big Data For Analysis of Large Data Sets
Document6 pages
Big Data For Analysis of Large Data Sets
vikram
No ratings yet
Big Assignment
Document8 pages
Big Assignment
melesse bisema
No ratings yet
Possible Solution of In-Class Exercise 1: PART A: Short Questions (4 Marks Each)
Document5 pages
Possible Solution of In-Class Exercise 1: PART A: Short Questions (4 Marks Each)
Eunice Wong
No ratings yet
Big Data A Review
Document7 pages
Big Data A Review
Editor IJTSRD
No ratings yet
Use Case Analysis Seminar Tasks
Document9 pages
Use Case Analysis Seminar Tasks
Harshpanchal007
100% (1)
2023 DSE BDS Assignment 2 Problem Statement 2
Document3 pages
2023 DSE BDS Assignment 2 Problem Statement 2
rajc123
No ratings yet
Big Data Analytics
Document37 pages
Big Data Analytics
desamoi900
No ratings yet
What Is Big Data, and Where Does It Come From? How Does It Work?
Document31 pages
What Is Big Data, and Where Does It Come From? How Does It Work?
Rutikesh Shetye
No ratings yet
Big Data Analytics - Project
Document27 pages
Big Data Analytics - Project
Ram Jain
50% (2)
Business Intelligence Challenge
Document3 pages
Business Intelligence Challenge
gabriel271
No ratings yet
Fast Data Front Ends For Hadoop: Transaction and Analysis Pipelines
Document19 pages
Fast Data Front Ends For Hadoop: Transaction and Analysis Pipelines
Alin-Mihai Bobeica
No ratings yet
Ais Chapter 4 Tesfu
Document6 pages
Ais Chapter 4 Tesfu
Addis Info
No ratings yet
Cyber Cafe Managementdocx
Document46 pages
Cyber Cafe Managementdocx
Yogesh Mehra
No ratings yet
Churn Prediction Using MapReduce and HBa PDF
Document5 pages
Churn Prediction Using MapReduce and HBa PDF
7565006
No ratings yet
Elementary Concepts of Big Data and Hadoop
Document4 pages
Elementary Concepts of Big Data and Hadoop
Editor IJRITCC
No ratings yet
An Insight On Big Data Analytics Using Pig Script
Document7 pages
An Insight On Big Data Analytics Using Pig Script
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Cyber Cafe Managementdocx
Document41 pages
Cyber Cafe Managementdocx
PrithviRaj Gadgi
No ratings yet
Principles of Information Systems 13Th Edition Stair Solutions Manual Full Chapter PDF
Document30 pages
Principles of Information Systems 13Th Edition Stair Solutions Manual Full Chapter PDF
jacob.toupin212
100% (13)
Principles of Information Systems 13th Edition Stair Solutions Manual 1
Document27 pages
Principles of Information Systems 13th Edition Stair Solutions Manual 1
tonyrioscxdkopiynf
100% (19)
Printed Notes Dsba
Document13 pages
Printed Notes Dsba
hvg yughvbnjv
No ratings yet
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
Document57 pages
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
BLADE LEMON
No ratings yet
Hadoop Report
Document110 pages
Hadoop Report
Gahlot Divyansh
No ratings yet
Download Slides, Ebook, Solutions and Test Bank
Document33 pages
Download Slides, Ebook, Solutions and Test Bank
Mikay Torio
No ratings yet
Lincoln International College
Document17 pages
Lincoln International College
mansa sharma
No ratings yet
Assignment
Document13 pages
Assignment
Rajendra Agarwal
No ratings yet
Data, fields, records and files explained
Document6 pages
Data, fields, records and files explained
aim_naina
No ratings yet
Introduction To Database Management SYSTEM
Document40 pages
Introduction To Database Management SYSTEM
sikandertamoor
No ratings yet
Study On Existing System
Document34 pages
Study On Existing System
manjuashok
No ratings yet
Big Data Answers
Document11 pages
Big Data Answers
shubham rana
No ratings yet
Project Data Lake
Document7 pages
Project Data Lake
Sagar Ch
No ratings yet
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
Document21 pages
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
Ram Guggul
No ratings yet
Downloading Pig Datasets
Document2 pages
Downloading Pig Datasets
Ram Guggul
No ratings yet
ADD Find Unsorted Array Sorted Array Linked List
Document27 pages
ADD Find Unsorted Array Sorted Array Linked List
Ram Guggul
No ratings yet
Pig Installation On Mac and Linux
Document2 pages
Pig Installation On Mac and Linux
Ram Guggul
No ratings yet
BigData Sys Module 5 Quession 1
Document19 pages
BigData Sys Module 5 Quession 1
Ram Guggul
No ratings yet
M 01 Res 01
Document64 pages
M 01 Res 01
AbhishekBhau
No ratings yet
Install Cloudera Manager Using AMI On Amazon EC2
Document39 pages
Install Cloudera Manager Using AMI On Amazon EC2
Ram Guggul
No ratings yet
HDFS JAVA API
Document18 pages
HDFS JAVA API
Ram Guggul
No ratings yet
Onefs Cli Administration Guide 7 2
Document928 pages
Onefs Cli Administration Guide 7 2
eshu0123456789
No ratings yet
Client-Server File Processing App
Document3 pages
Client-Server File Processing App
Ram Guggul
No ratings yet
Isilon GUI Administration
Document434 pages
Isilon GUI Administration
Ram Guggul
No ratings yet
6th Central Pay Commission Salary Calculator
Document15 pages
6th Central Pay Commission Salary Calculator
rakhonde
100% (436)
Week 1 Coding Questions
Document4 pages
Week 1 Coding Questions
Ram Guggul
No ratings yet
Week 1 Coding Questions
Document4 pages
Week 1 Coding Questions
Ram Guggul
No ratings yet
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
Document21 pages
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
Ram Guggul
No ratings yet
Downloading Pig Datasets
Document2 pages
Downloading Pig Datasets
Ram Guggul
No ratings yet
Pig Installation On Mac and Linux
Document2 pages
Pig Installation On Mac and Linux
Ram Guggul
No ratings yet
Multiple Storage Allocations 19 DEC 2017
Document1 page
Multiple Storage Allocations 19 DEC 2017
Ram Guggul
No ratings yet
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
Document13 pages
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
Ram Guggul
100% (1)
Mercedes Benz Hadoop Hortonworks Skills
Document1 page
Mercedes Benz Hadoop Hortonworks Skills
Ram Guggul
No ratings yet
Steps To Run A JAVA API On Virtual-Box
Document5 pages
Steps To Run A JAVA API On Virtual-Box
Ram Guggul
No ratings yet
List of NAS storage shares and sizes
Document9 pages
List of NAS storage shares and sizes
Ram Guggul
No ratings yet
Basic Linux Commands
Document24 pages
Basic Linux Commands
Ram Guggul
No ratings yet
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
Document14 pages
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
Ram Guggul
No ratings yet
Storage and Data Mover Report for Applications and Files
Document2 pages
Storage and Data Mover Report for Applications and Files
Ram Guggul
No ratings yet
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
Document24 pages
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
Ram Guggul
No ratings yet
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
Document2 pages
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
Ram Guggul
No ratings yet
SAN Path Redundancy
Document2 pages
SAN Path Redundancy
Ram Guggul
No ratings yet
HPUX oldVMAX Info
Document4 pages
HPUX oldVMAX Info
Ram Guggul
No ratings yet
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
Document2 pages
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
Ram Guggul
No ratings yet
Checkout and Settlement
Document12 pages
Checkout and Settlement
AshishDJoseph
100% (2)
Wa0015.
Document9 pages
Wa0015.
Santosh Kumar Gupta
No ratings yet
Metrobank Branch Application Form Ao Feb 2023
Document2 pages
Metrobank Branch Application Form Ao Feb 2023
shanelynmg
100% (1)
4-Accounts Receivable (Classnotes)
Document8 pages
4-Accounts Receivable (Classnotes)
Amaiquen Siciliano
No ratings yet
Technological Innovation in Indian Banking Sector
Document22 pages
Technological Innovation in Indian Banking Sector
Mohammed Jashid
No ratings yet
Sushmita
Document40 pages
Sushmita
Priti Basfore
No ratings yet
Hi Celina, Here's Your Quarterly Electricity Bill For Supply Address
Document4 pages
Hi Celina, Here's Your Quarterly Electricity Bill For Supply Address
michelle.marsh82
No ratings yet
Cash Management Report
Document118 pages
Cash Management Report
kamdica
85% (20)
Kunci Jawaban Salon Indah
Document20 pages
Kunci Jawaban Salon Indah
Ghaida Amalia
100% (1)
Residual Life Assessment (Rla) / Life Extension (Le) Study of Dampower House (4X9MW) and Hampi Power House (4X9MW) of Tungabhadra Hydro Electric (Joint) Scheme
Document75 pages
Residual Life Assessment (Rla) / Life Extension (Le) Study of Dampower House (4X9MW) and Hampi Power House (4X9MW) of Tungabhadra Hydro Electric (Joint) Scheme
AS Brar
No ratings yet
C2 Midterm Practice Test 02
Document16 pages
C2 Midterm Practice Test 02
Vi Lương Thị Thảo
No ratings yet
Cashless Economy: Is Society Ready For Transformation
Document3 pages
Cashless Economy: Is Society Ready For Transformation
Jazz Alexander
No ratings yet
Sandeep Jain 13 Years Business Analyst
Document4 pages
Sandeep Jain 13 Years Business Analyst
Mona Jain
No ratings yet
Agoda Confirmed Booking at (Hotelname) PDF
Document1 page
Agoda Confirmed Booking at (Hotelname) PDF
Nhut Bang Nguyen
100% (1)
Employee Fraud PDF
Document15 pages
Employee Fraud PDF
manzoor
No ratings yet
Annex A Survey Questionnaire
Document17 pages
Annex A Survey Questionnaire
alejandrovontod
No ratings yet
Unlocked
Document21 pages
Unlocked
Rakesh das
No ratings yet
Conceptual Marketing Framework For Online Hotel Reservation System Design
Document42 pages
Conceptual Marketing Framework For Online Hotel Reservation System Design
Stanislav Ivanov
100% (1)
Front Office Asian
Document48 pages
Front Office Asian
marfyne
No ratings yet
Comcast July 2010 Bill
Document2 pages
Comcast July 2010 Bill
JEEP151
No ratings yet
MITC CC
Document15 pages
MITC CC
Arman Momin
No ratings yet
You Can Beat Credit Card Debt Collectors
Document7 pages
You Can Beat Credit Card Debt Collectors
woozyseer7665
100% (6)
Problem Statement For The CR Project
Document3 pages
Problem Statement For The CR Project
OhiPurana
No ratings yet
Tema 3 Rev3
Document49 pages
Tema 3 Rev3
CarlosA.Hurtado
No ratings yet
Functions of Checkout and Settlement
Document4 pages
Functions of Checkout and Settlement
soundarya Reddy
No ratings yet
Form 26AS: Annual Tax Statement Under Section 203AA of The Income Tax Act, 1961
Document4 pages
Form 26AS: Annual Tax Statement Under Section 203AA of The Income Tax Act, 1961
Yashodha
No ratings yet
TỔNG HỢP ĐỀ THI TOEIC READING
Document49 pages
TỔNG HỢP ĐỀ THI TOEIC READING
missyou411
No ratings yet
Convert To Installment 2015
Document1 page
Convert To Installment 2015
Rodelbert Roy Lazo
No ratings yet
Customer Market
Document13 pages
Customer Market
Yasir Ahmed Siddiqui
No ratings yet
Sps Luis Manuelita Ermitao Vs CA BPI Express C
Document5 pages
Sps Luis Manuelita Ermitao Vs CA BPI Express C
Bella Adriana
No ratings yet