You are on page 1of 15

BlackRock

Data Science

Team’s Work
Custodia
n

Aladdin
NAV

Expense
s

Expense

Reclaims

Reclaims

Asset
s

Asset
s

Cash

Cash

Problem Statement
• When NAV or any of its breakdown do not match, there
is a Exception
• Exception are shown on Exception Monitor
• A person used to manually classify Exceptions for 1
year.
• Exceptions are generated every day and they are not
classified automatically.

• Problem Statement – Classify Exceptions
• Data – NAV and its breakdown of Portfolio [1 year data]

Data Extraction
• Data from table port_group
• Portfolio and its aggregated sleeves [portfolio_code]

• Data from table port_Nav
• Portfolio NAV and its breakdown

• Data from Excpetion Monitor
• Java code for combining all data

Data Cleaning
• Removing Outliers
• Removed ambiguity [ Accured Expenses, Accrued
Expenses, Accrued Expense -- All are same]
• Levels of Comments : 97
• Reduced it to 30 levels

Data
• Portfolio_name
• Portfolio_code
• Portfolio_group
• Company Name
• P. NAV [continuous]
• Q. NAV [continuous]
• P. Exclaims [continuous]
• P. Reclaims [continuous]

• P. Asset [continuous]
• P. Cash [continuous]
• Q. Exclaims
[continuous]
• Q. Reclaims
[continuous]
• Q. Asset
[continuous]
• Q. Cash [continuous]
• Start time
• End time
• Status [Class]

Feature Selected
• P. NAV [continuous]
• Q. NAV [continuous]
• P. Exclaims [continuous]
• P. Reclaims [continuous]
• P. Asset [continuous]
• P. Cash [continuous]
• Q. Exclaims [continuous]
• Q. Reclaims [continuous]
• Q. Asset [continuous]
• Q. Cash [continuous]

Features Extracted

• f1 - Nav Diff (BPS)
[(NAV1-Nav2)/NAV1] *1000
• Continous
• f2 – f5 Expense Diff (BPS] [(Expense1-Expense2)/NAV1]*1000
• Continous
• f6 - f7 Diff(Asset-Cash) Diff(Exclaim-Reclaim)
• Continous
• f8 – f11 Is Expense/Rec/Asset/Cash from Aladdin?
• Binary
• f12 – f15 Is Expense/Rec/Asset/Cash from Custodian?
• Binary
• f16 – f19 % Contribution of Expense in Total NAV
• Continous
• f20 – f24 Impact of Expense?
• Binary
• f25 - NAV diff >10?
• f26 - NAV diff >20?
• Normalization on f1 – f7

Model Used
• Random Forest
• Accuracy - 74% on testing data

BlackRock
Software Development

Problem Statement
• Given 2 data set
• Compute difference between them.
• Generate a report (.csv)
• Useful for Business people
• Difference between 2 .csv file

• Make shell script for regression testing in local
environment
• Useful for Tech people
• See the outage of code before going into tst environment
[test]

Product features
• Provide diff of 40 lakhs entries DataSet in 5 minutes
• Can define Ignore column
• Ignore time column [ Will be different in data from sql query]

• Can define Match ruleset
• Difference between src_a [Net_Asset] – src_b [Net_Asset] <
0.01
• Make it Match

• Can rename Column
• Match col1 from src_a to col2 from src_b

Code Input
• Path of Src_a
• Path of Src_b
• Key_column : Uniquely Identify the row [ ex:
portfolio_code, transactional_id]
• Ignore_column : creation time
• Integer_column
• Match set : src_a [ column_name] operator src_b
[ column_name] operator value
• Rename_column : src_a column and src_b column

Code Output
Sources

Hyperlink

Difference integer

Regression Testing
Shell Script
Run
Code
[Befor
e]

Run
Code
[After
]

Save
DB
(.csv)

Save
DB
(.csv)

Code : Make changes in
DB
Report : Expected
Changes
Actual Difference

Repor
t
DiffToo
l
(.java)