Professional Documents
Culture Documents
DATA
Presented by: Daniya Boges [Data Scientist]
Disconnected Systems of
Heterogeneous Nature
MINERET
SAGE
tech
MICROSYSTEM
S
S
CDK
Global
Autoline Drive DMS
HOW TO REACH OUR DATA
As MANY and As ONE [1]
Table Linking and Federation [2] Easy, low cost Performance Degradation
[2] L. M. Haas, E. T. Lin and M. A. Roth, "Data integration through database federation," in IBM Systems Journal, vol. 41, no. 4, pp. 578-596, 2002, doi: 10.1147/sj.414.0578.
[3] Biswas, N., Sarkar, A. & Mondal, K.C. Efficient incremental loading in ETL processing for real-time data integration. Innovations Syst Softw Eng 16, 53–61 (2020). https://doi.org/10.1007/s11334-019-00344-
SNIPPETS FROM
WITHIN Demonstrating Redundancy in Customers Data as so:
?
SAMEER(S) FOUND!
FIND THIS CUSTOMER: SAMEER M NAWAR
Provided his phone number is: +966505673025
SYNTAX CusstomerID
INCONSISTENCIES
ADDED SPACES | <space>CustomerID<space> |
SEMANTICS ..
OPERATION
CLEANSING
CUSTOMER
S MINERET tech
VEHICELS S
INVOICES SAGE
MICROSYSTEM
S
Customer | Vehicle | Invoice
Date of export in raw form : 2021-01-14
INVOICES
MINERET (Minerets +
SAGE)
CUSTOME VEHICE
S3 days
2020-07-11 to 2021-01-14
6 months, RS LS
SAGE
2013-01-22 to 2020-07-11
7 years, 5 months, 19 days
BIRD’S-EYE VIEW 170 fields
SAGE
MICROSYSTEM
MINERET tech
S
CDK
Global
Autoline Drive DMS
BIRD’S-EYE VIEW -40 fields-
(shrunk)
OPERATION
CLEANSING
Cross-platform KEYS:
1 Mobile numbers
2 Vehicle Identification Number
(VIN)
4,165,299
#3,012,907
PS: Results in the end have been stripped from duplicated customers; hence they are Unique.
EVALUATION PER ATTRIBUTE
FIELD
EVALUATION PER ATTRIBUTE
FIELD
WRAPPING UP
IN CONCLUSION