Bank of America Campus Challenge 2013

Intelligent Model for Database Management and Analysis
Authors - Kunal Bakshi , Neeraj Sharma, Nikhil Jain, Parimal Deshmukh, Ravinandan Venkatesh

2013
Value Proposition
• • • • Database modeling using Artificial Intelligence Assistance in analyzing patterns and trends among customers and clients which are generally not visible with traditional models Use relationships to bring out the right facets in the customer-client, client-bank and customer-bank relationship One stop model for all data analytical solution to cut the chase

UNCC JUGGERNAUT

............................................................................................................ 9 2|UNCC JUGGERNAUT ..............................................................1 What has to be collected?............... 7 Graph Model (Layer 1)....................................................................................................................................................................................................................................................................................................................................... 6 Layer 4: Business Layer............. 7 Section 5: How can it be used? ................................................................................................................................................................................................................. 4 Section 2: Building the Model................................................................... 3 Solution proposed ...................................... 9 References: ................................................................................................................................................................................. 7 SOFM (Layer 2)................................ 7 External Use ........................................................................................... 3 Introduction ........................................................................................................................................... Predictable Groups ................... 3 Case in discussion ................................................ 4 1................................. 5 Layer 3: Service Layer ......................................................................................... Unpredicted Patterns -.................................................................................................................................................................................. 7 Internal Use ....................................................................................... 8 Five Great Minds (Our Team): .....................................................................................................2 How to get and store this data ..... 7 Service and Business Layer (Layer 3 and 4)................................................................ 5 Layer 1: Graph Model Layer-................................................................................................................. 6 Section 4: Impact of exogenous effect or external changes to the data........................................................................................................................................................................................................................................................................................ 7 A......................................................................Contents EXECUTIVE SUMMARY ......................................................... 8 Concluding Discussion........................................................ 3 1.. 7 B............ 6 Section 3: Integration and Interconnection between layers .......................................................................................................................................

Corporate banking. The data generated is "Big Data". Solution proposed The challenge here is to efficiently and effectively analyze a highly unstructured data. This is the intelligent layer of the model we are proposing. Self Organizing Feature Maps. profitable customers. The third layer is the service layer which provides functions to add data like new customer/client information and retrieve data like clusters from the SOFM layer. SOFM uses its intelligent neural network algorithm to cluster together the variables fed to it from the data it collects as inputs from the graph database. The key operations of BofA are Consumer banking. North Carolina. and then eventually can be used for analysis. the data has to be efficiently captured. Starting from the bottom.EXECUTIVE SUMMARY Introduction Bank of America (BofA) is an American multinational banking and financial services corporation headquartered in Charlotte. stored. This data. our algorithm focuses on the strength of decision making through clustering. and International Operations. The uniqueness with this algorithm is its ability to cluster the input data without making the user to specify the rules or targets in the training. The data will be stored in the form of nodes and relationships between nodes on the graph. BofA is the second largest bank holding company in the United States by assets. Though there have been many intelligent algorithms in the market to analyze this data. the first layer is the graph database. To effectively use this 'Big Data'. and between BofA's customers and its clients. can help us identify interrelationships between BofA and its customers. organized. which can be used for further analysis. Efficient identification of relationships can help BofA identify new business opportunities. if effectively organized and analyzed. Case in discussion Bank of America does business with a variety of customers and does variety of operations day in and day out and thus generates exhaustive amount of data on a daily basis. unforeseen risks. Figure 1: 4 layer structures The fourth and the final layer is the Business layer where the analyzed data is represented in a graph form on user interphase and thus can be used to make business decisions. The interrelationships between BofA and its customers and between its customers and clients thus obtained can be used by Bofa in many ways. The data points are clustered together based on their similarities in behavior and characteristics.e. The Self-Organizing Feature Map (SOFM) layer is the layer above the Graph Database. Investment Management. 3|UNCC JUGGERNAUT . All the data captured from internal and external sources will be stored in this layer in form of a graph database. and improve services to its customers and clients. This has been achieved by using a very efficient clustering algorithm of Artificial Neural Networks i. The computing power of machine is thereby leveraged to know the patterns in data set that are otherwise impossible to be perceived by human brain. The SOFM algorithm is intelligent and forms clusters without the need of user to feed the conditions of clustering. The solution we are proposing is an intelligent four layered model as shown below. This model can help identify possible future defaulters. better service opportunities. and new business opportunities.

existing clients). monthly). Age. Setup the system for capturing the data on a timely manner (weekly.Third party feedback sites for restaurants. In addition to this. Transactional data a. data can be obtained from social sites like facebook etc. j. Rate of increase or decrease in number and amount of transactions 2.2 How to get and store this data 1. 401 K. Why we are choosing the graph database. Also mention about its capacity and usefulness. Huawei). g. Response to previous marketing attempts. Ratio of credit to debit card is used. What patterns in credit transactions lead to default and fraud? (to detect and deter fraud) a. b. Average balance. i. Type of credit. 12 months. e. b. Demographic. From internal source (for existing customers/clients): Run scripts that pull data from existing BofA 2. Through text mining. The data should be saved on the database with every individual as one record. Deals used in the past. Glassdoor. Purchasing unusual/ irregular item ( like male purchasing female makeup items). f. f. What services and benefits would current customers likely desire? The bank can ask (perhaps it already does) every customer to list the top 5 services she would like bank of America to offer her. Social security 3. From external source (for existing customers/clients and prospective customers/clients): External data access that has been granted to BofA from a third party (eg .Section 1: Collection of Data: 1. Services used in the past. (eg for users . c. d. h. Profile of a customer (including personal information) and what type of products is he likely to buy? (to cross sell) a. through existing customers. c. how many credit payments were missed V/s How many were made on time. data stores and populate the graph. d. The people who may default or are likely to go bankrupt may express same desires of services as they might be facing same conditions as previous defaulters 1. Number of transaction. e. c. Pull the data when the load is minimal on the system. Number of family members. Amount of transaction. History of loan/mortgage payments 6. c. All the above data of previous defaulters. b. Which bank products are often availed of together by which groups of customers? (to cross sell and do target marketing) a. b. Number of times credit card transaction disputed (if above national average). 4. Address and zip code of customer.Adobe. Organizing the data 4|UNCC JUGGERNAUT . Logging on computer from unknown location/ purchase from unknown location.1 What has to be collected? 1. Marital status. Storage of data The data will be stored in an open source graph database called Neo4J. See what the desires were made by people who went either bankrupt of defaulter in the past.

Section 2: Building the Model The proposed model is a multi layer model comprising of four layers listed as follows: 1. This will be the first layer of the model. Each node in this map will be a data point and its relationship with all the other nodes will be the dimensions associated with this data point. Additional dimensions (other than the relationships already defined) can also be added to every data point. the network will be fed with a comprehensive set of control variables which will be the information of customers with known attributes from the past like the ones with bad debts or loan defaulters etc. our model will be able to analyze unstructured data too with high efficiency. The SOM algorithm will be run on this set to cluster the data into related groups. Every node will essentially have a relationship with every other node in the graph. Even if the relationship between nodes does not exist. Together with this test data. Relationships are treated as dimensions of these nodes and they too contain searchable attributes.e. This layer is based on Self Organizing Feature Maps (Kohonen Map). Each of the layer is elaborated in the following text.The graph database will organize data in the form of nodes and relationships between these nodes. will be the nodes and the relationships that exist between them will be paths that connect the nodes. This makes it convenient to identify important relationships. third party content providers etc. the relationship will be given a null value. clients. Graph Model Layer 2. i.e. Each noun in our domain.. The data thus gathered will be clustered in various groups using a three layered model discussed in section 2. Self Organizing Feature Map(SOFM) layer 3. which is arguably one of the most efficient Artificial Neural Network algorithms and has a close resemblance to the learning pattern of the human mind. the bank. Business Layer With the use of artificial intelligence tool (SOFM). Service layer 4. The behavior of each group/cluster will be defined by the member of Figure 3: Graph Data sample 5|UNCC JUGGERNAUT . It will comprise of the data collected in step 1 and arranged in a graph of nodes and relationships (The implementation itself is a graph database). with same number of dimensions) as Figure 4: SOFM Sample that of the test data. Layer 1: Graph Model LayerFigure 2: 4 Layer Diagram At the very bottom of our network we will have a graph model of the data. customers. Layer 2: Self-Organizing Map Model This will be the second (Intelligent) layer of the model and will take the data from layer 1 as its input. in the same format (i.

The business/visual layer provides functions and visual indicators to decision makers. Section 3: Integration and Interconnection between layers The Neo4J graph database forms the lowermost layer of the model. It exposes its features in the form of REST calls and also by a Java API. In line with the SOM theory. there will always be a room for a possibility of identification of new and previously unknown patterns i. Addition/editing of layer 1 can also be done by this layer. The function of this layer will be to run queries and retrieve data and relationships from Layer 1 or cluster information from layer 2. formulating queries. The service layer provides services such as adding new customers.net. Once these patterns are known. Therefore the output of SOM will have the potential to predict the behavior of data nodes (current subjects) with reference to the behavior of control variables (past and already known subjects). it is stored in the graph layer. i. It provides input to the SOM. retrieving clusters or patterns. This can be implemented in any client-server technology such as Java or c#. The SOM algorithm can be made recurring /static as per the user’s requirement. The Self-Organizing Map layer can be implemented using Java for better performance of the graph using the Java API of Neo4J. they can be used for making business decisions. Layer 4: Business Layer This layer will have the functions and visualizations which cater the user requirements by presenting the outputs in form of meaningful visuals to help decision makers evaluate the scenario. The same is applicable to the network learning rate too. 6|UNCC JUGGERNAUT . the behavior of the control variable.e. The SOM receives input and instructions from the service layer and after a map is computed. Thus.control variable it is hosting.js is a Javascript framework suited to creating visual graphs. patterns which have no /unknown/insufficient logical baking known to the user. charts and report generation. The REST structure frees the model from depending on any single implementation model for the layers above. all the data points (nodes) in a particular cluster are expected to exhibit similar kind of behavior. The SOFM algorithm is intelligent and forms clusters without the need of user to feed the conditions of clustering. Three.e. An excellent framework for this can be provided using client side browser technologies. then a weight can be attached to that particular dimension/relationship Layer 3: Service Layer This layer will be connected to all the other three layers. If it is felt that certain dimensions/relationships carry more/less importance than others. etc.

Internal Use A. The characteristics of nodes in these clusters are either known or can be predicted. the relationship of which could have not been predicted by manual logics. only the paths connecting the affected nodes will undergo a change. Thus the behavioral changes or exogenous effects will certainly affect the map. The applications of this model are as flexible as the model itself. it will be useful to bypass the self-organising layer and retrieve a group similar to the target profile. The model has the ability to identify relationships that were not known to BofA before and this can help develop new business opportunities or identify unforeseen risk. no other entity in the model should be affected. when the profile for a target audience is already known. These clusters can be used to classify new customers that have entered into the system as well as to identify groups which are to be targeted for future business. we can try and identify possible future defaulters by using this model. The model will handle these kinds of queries efficiently by using the inherent node . clusters of similar nodes (customers) will be formed. Examples: To identify possible future defaulters: By plugging in the attributes of past defaulters with those of the existing customers or clients. B. Unpredicted Patterns Since we have employed a self organizing map. The potential future defaulters will form a cluster around the node of past defaulters and hence can be identified 7|UNCC JUGGERNAUT .relationship traversals which are characteristic to graphs. Predictable Groups In some cases. SOFM (Layer 2)If the network is recurrent then some changes in clustering can be expected due to external changes. but the same will be limited to the winner node only. it causes the dimensions of winner node to adapt (given that the network is recurrent). This model can be used in wide varieties of things. Section 5: How can it be used? This model can help Bofa in numerous ways. Therefore any changes in layer 1 and 2 will cause some changes in layer 3 and 4 too. from finding the patterns in behavior of defaulter or profitable customers to developing or targeting future marketing plans.Section 4: Impact of exogenous effect or external changes to the data Graph Model (Layer 1)When relationships are altered. If this winner node happens to be a clusters border then some change in cluster formation can also be expected. Since these paths are independent of the nodes themselves as well as other paths. Whenever a data point is fired to this network. Service and Business Layer (Layer 3 and 4)Output of these layers is dependent on the input from layer 1 and 2.

She contacts BofA and obtains some limited access to their framework in exchange for something in return. Recently. This is because the data is unstructured. 8|UNCC JUGGERNAUT . In a traditional database. The proposed approach can handle unstructured data and can be used for a wide variety of decision making problems ranging from credit risk to marketing and sales analysis etc. She wants to find out why but does not know where to start looking. Self Organizing Feature Maps. External Use The service layer of this framework can be provided to external parties as a product. The report elaborates the structure and integration of a four layer model which does the job of analytics from database storage to visual business presentation of that database. Both of the aforementioned methods and similar can be used for identifying risk prone ventures. This is achieved in our system through a graph structure. This is possible because using a graph allows us to very easily identify Ramkatori’s node and to find the nodes of customers related to her. to find illegal transactions. sales have been declining. Concluding Discussion This report is in response of the campus challenge program of BoFA and discusses a novel approach of database management and utilization through the use of clustering algorithm of artificial intelligence i. This makes the structure rigid and it is difficult to find patterns which system is not already programmed to find. the system must be flexible enough to adapt whatever query is fired on it. If the attribute values of these profitable customers are plugged in with those of the current customers/clients. If access is to be provided to outsiders. In a traditional setup.e. Again. For example. Ramkatori owns a taco shop in a shopping area. She can query the framework for usage patterns in her shopping area and finds out that customer are flocking to other taco shops which offer some incentives.To identify most profitable customers: The pattern of behaviors of most profitable customers is known to BofA. using the framework she formulates a marketing drive which aims to work for her target audience and successfully resurrects her business. we can try and identify future profitable customers by using this model. it is difficult to and costly to provide access to outsiders. as well as to offer incentives to a particular group. such a query would have been very difficult.

TX in 2007.. N. sep 20). 3. His areas of interest are Business Analytics.org 3. Sep). He is presently pursuing his MBA in business analytics at Belk College of business (UNC at Charlotte). Financial Management. NC. Combinational approach to self organizing maps using multidimensional scaling.He has an undergraduate degree in Mechanical Engineering from The National Institute of Engineering. 9|UNCC JUGGERNAUT . Malaviya National Institute of Technology. R. (2013. 2. Second national conference on power electronics and intelligent control. He is presently pursuing his MBA in business analytics at Belk College of business (UNC at Charlotte). He is pursuing Masters in Mathematical Finance from University of North Carolina at Charlotte. programming languages and game development & gamification. Risk Management and Business Intelligence. 4. (2008. Sharma. His areas of interest are Business Analytics. Picture courtesy to Google images. His areas of interest are Business Intelligence and Analytics. He has worked in petroleum industry with Govt. Ravinandan Venkatesh : Ravinandan Venkatesh is a final semester MBA student at UNC Charlotte. Kunal Bakshi: Kunal Bakshi received the ME degree in electrical engineering from Lamar University. 2. His areas of interest are software engineering. Beaumont . Wikipedia. of India in marketing and operations and has also authored many technical papers in the field of Artificial Intelligence. Parimal Deshmukh: Parimal Deshmukh is a graduate student in the College of Computing and Informatics. Mysore. He has done his bachelors from University of Pune in Computer Science. His areas of interest are Quantitative Analysis.wikipedia. Neeraj Sharma: Neeraj Sharma received the B. Engineering Management and Electrical Engineering. 5. India in 2010. Nikhil Jain: Nikhil Jain: Nikhil Jain completed Bachelor degree in electronics engineering followed by post graduate diploma in insurance management and professional experience of 3 years in relevant field.Five Great Minds (Our Team): 1. Retrieved from www. UNC Charlotte. Artificial Intelligence and Energy Management. and Supply Chain Management.Tech degree in electrical engineering from Malaviya National Institute of Technology-Jaipur. Database Management. & Sharma. References: 1.