You are on page 1of 6

Prathima

Senior Data Analyst


Email : Prathimak107@gmail.com
Phone No. : 913-703-3092
Background Summary:
 Over 8+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC)
with skills in data analysis, design, development, testing and deployment of software systems.
 Extensive knowledge in Business Intelligence and Data Warehousing Concepts with emphasis
on ETL and System Development Life Cycle (SDLC)
 Excellent working knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and
identifying data mismatch.
 Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical
Design, ER Diagrams and OLTP and OLAP System Study and Analysis.
 Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
Worked on different file formats like delimited files, avro, Json and parquet. Docker container
orchestration using ECS, ALB and lambda
 Utilized all Tableau tools including Tableau Desktop, Tableau Server, Tableau Reader, and Tableau
Public.
 Experience in creating Power BI Dashboards (Power View, Power Query, Power Pivot, Power
Maps).
 Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships
between different pieces of data, draw appropriate conclusions and translate analytical findings into
risk management and marketing strategies that drive value.
 Strong understanding of the principles of Data warehousing, Fact Tables, Dimension Tables, Star
and Snowflake schema modeling
 Experienced working with Excel Pivot and VBA macros for various business scenarios.
 Strong experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration,
Data Import, and Data Export by using multiple ETL tools such as Ab Initio and Informatic
PowerCenter Experience in testing and writing SQL and PL/SQL statements - Stored Procedures,
Functions, Triggers and packages.
 Created Snowflake Schemas by normalizing the dimension tables as appropriate, and creating a Sub
Dimension named Demographic as a subset to the Customer Dimension
 Hands on experience in test driven development (TDD), Behavior driven development (BDD) and
acceptance test driven development (ATDD) approaches
 Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue,
Lambda functions, Step functions, cloud Watch, SNS, Dynamo, SQS
 Extensive experience in Text Analytics, generating data visualizations using R, Python and creating
dashboards using tools like Tableau, PowerBI.
 Involved in Data Migration projects from DB2 and Oracle to Teradata. Created automated scripts to
do the migration using UNIX shell scripting, Oracle/TD SQL.
 Proficient with Data Warehouse models like Star Schema and Snowflake Schema
 Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions
Handling, Lambda Expressions, Annotations
 Worked on the Microsoft Azure environment (blob storage, Data Lake, AZ copy) using Hive as
extracting language.
 Experienced in building Automation Regressing Scripts for validation of ETL process between
multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python. 
 Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data
to/from different source systems including flat files.
 Utilized Azure Data Factory for transforming and moving data from virtual machine to Data Factory,
BLOB storage, and SQL Server.
 Experience in Tableau Desktop, Power BI for data visualization, Reporting and Analysis; Cross tab,
Scatter Plots, Geographic Map, Pie Charts and Bar Charts, Page Trails, and Density Chart.
 Worked with Informatica Data Quality toolkit, Analysis, data cleansing, data matching, data
conversion, exception handling, and reporting and monitoring capabilities of IDQ.

Technical Skills:
 Programming Languages: R Programming, Python, Mat lab, VB, Java, C, C++, SQL, MySQL, PL/SQL
 ETL Tools: Informatica Power Center 9.1/8.6 (Designer, Workflow Manager/ Monitor, Repository), Ab
Initio
 Testing Tools: Jira, HP ALM, IBM ClearQuest, IBMRQM, MTM, SDLC
 Database Tools: Oracle SQL Developer, Toad, Oracle 10g/11g/12c, MS SQL Server, SSIS, SSRS, Data Grid
 BI and Analytics Tools: OBIEE, Oracle Reports Builder, Spotfire, Tableau 10.5, Pandas, Seaborne,
Matplotlib, Cognos, Excel, SAS, SAS Enterprise Miner
 Operating System/Framework: Windows, Linux, Macintosh, UNIX
 Cloud Technologies: AWS (Amazon Web Services), Microsoft Azure
 Data Modeling: Regression Modeling, Time Series Modeling, PDE Modeling, Star-Schema Modeling,
Snowflake-Schema Modeling, FACT and Dimension tables
 Statistical Modelling/Algorithms: Regression Modelling, Time Series Analysis, ACT, Random Forest

Project Experience:

Senior Data Analyst| Jan 2021 – Till Now


Client: Freddie Mac, VA

 Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping
documents, metadata, DDL and DML as required.
 Analysis of functional and non-functional categorized data elements for data profiling and mapping
from source to target data environment. Developed working documents to support findings and
assign specific tasks
 Analyzed DB discrepancies and synchronized the Staging, Development, UAT and Production DB
environments with data models.
 Worked on data profiling & various data quality rules development using Informatica Data Quality.
 Performed data analysis and data profiling using complex SQL on various sources systems including
Oracle and Teradata
 Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake,
IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail.
 Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
 Experienced in ETL concepts, building ETL solutions and Data modeling
 Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
 Architected, Designed and Developed Business applications and Data marts for reporting. Involved in
different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration
Testing, Review and Release as per the business requirements.
 Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
 Performed data extraction, transformation, loading, and integration in data warehouse, operational
data stores and master data management
 Performed Extraction, Transformation and Loading (ETL) using Informatica power center.
 Exported the analyzed data to the relational databases using Sqoop for visualization and to generate
reports for the BI team Using Tableau.
 Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using
commands with Crontab.
 Connected to AWS EC2 using SSH and ran spark-submit jobs.
 Designed and implemented configurable data delivery pipeline for scheduled updates to customer
facing data stores built with Python.
 Design and construct of AWS Data pipelines using various resources in AWS including AWS API
Gateway to receives response from aws lambda and retrieve data from snowflake using lambda
function and convert the response into Json format using Database as Snowflake, DynamoDB, AWS
Lambda function and AWS S3.
 Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical
Modeling.
 Optimized the Tensor Flow Model for efficiency.
 Analyzed the system for new enhancements/functionalities and perform Impact analysis of the
application for implementing ETL changes.
 Wrote SQL, PL/SQL, stored procedures for implementing business rules and transformations.
 Developed Mappings using Transformations like Expression, Filter, Joiner, and Lookups for better
data messaging and to migrate clean and consistent data.
 Worked on confluence and Jira.
 Extensively used ERWin to design Logical, Physical Data Models, and Relational database and to
perform forward/reverse engineering.
 Built performant, scalable ETL processes to load, cleanse and validate data.
 Developed various Mappings with the collection of all Sources, Targets, and Transformations using
Informatica Designer
 Preparing associated documentation for specifications, requirements, and testing.
 Identify & record defects with required information for issue to be reproduced by development
team.
 Designed and developed database models for the operational data store, data warehouse, and
federated databases to support client enterprise Information Management Strategy.

Environment: Pyspark, Apache Beam, Erwin, AWS (EC2, Lambda, S3, VPC, Snowflake, Cloud Trail,
Cloud Watch, Auto Scaling, IAM, Dynamo DB) Cloud Shell, Tableau, Cloud Sql, MySQL, Postgres, Sql
Server, Python, Scala, Spark, Informatica, Spark-Sql, No Sql, MongoDB, TensorFlow, Jira, GitLab.

Senior Data Analyst |Sep 2019 – Dec 2020


Client: Experian, CA

 Analysis of functional and non-functional categorized data elements for data profiling and mapping


from source to target data environment. Developed working documents to support findings and
assign specific tasks
 Involved with data profiling for multiple sources and answered complex business questions by
providing data to business users.
 Worked with data investigation, discovery, and mapping tools to scan every single data record from
many sources
 Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load
data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back
tool and backwards.
 Involved in all the steps and scope of the project reference data approach to MDM, have created a
Data Dictionary and Mapping from Sources to the Target in MDM Data Model.
 Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using
commands with Crontab.
 Utilized Power BI to create various analytical dashboards that helps business users to get quick
insight of the data
 Write research reports describing the experiment conducted, results, and findings and make
strategic recommendations to technology, product, and senior management. Worked closely with
regulatory
 Prepared an ETL technical document maintaining the naming standards.
 Wrote production level Machine Learning classification models and ensemble classification models
from scratch using Python and PySpark to predict binary values for certain attributes in certain time
frame.
 Made Power BI reports more interact and activate by using storytelling features such as bookmarks,
selection panes, drill through filters also created custom visualizations using “R-
ProgrammingLanguage”.
 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data
using the SQL Activity. Build an ETL which utilizes spark jar inside which executes the business
analytical model.
 Work on data that was a combination of unstructured and structured data from multiple sources and
automate the cleaning using Python scripts.
 Preparing dashboards using calculated fields, parameters, calculations, groups, sets and hierarchies
in Tableau.
 Prepared technical specification to load data into various tables in Data Marts.
 Created deployment groups in one environment for the Workflows, Worklets, Sessions, Mappings,
Source Definitions, Target definitions and imported them to other environments.
 Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into
target system from multiple sources
 Involved in Unit Testing the code and provided the feedback to the developers. Performed Unit
Testing of the application by using NUnit.
 Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of
how to integrate with other Azure Services. Knowledge of USQL
 Automated the processes from existing Manual processes.
 Worked with Reporting Team Using Tableau.
 Published and maintained workspaces in Power BI Service, allotted the time refresh for the data,
maintained the apps and workbooks.
 Created and maintained SQL Server scheduled jobs, executing stored procedures for the purpose of
extracting data from Oracle into SQL Server. Extensively used Tableau for customer marketing data
visualization.
 Performed all necessary day-to-day GIT support for different projects, Responsible for design and
maintenance of the GIT Repositories, and the access control strategies.
 Involved in extensive DATA validation by writing several complex SQL queries and involved in back-
end testing and worked with data quality issues.
 Developed regression test scripts for the application and involved in metrics gathering, analysis and
reporting to concerned team and tested the testing programs
 Identify & record defects with required information for issue to be reproduced by development team

Environment: Power BI, NoSQL, Data Lake, Zookeeper Python, Tableau, Azure, ADF, Unix/Linux Shell
Scripting, PyCharm, Informatica PowerCenter, Linux Shell Scripting,

Data Analyst | Jan 2017 – Aug 2019


Client: Broadridge, NY

 Identifying the business-critical Measures by closely working with the SME.


 Involved in Data mapping specifications to create and execute detailed system test plans. The data
mapping specifies what data will be extracted from an internal data warehouse, transformed, and
sent to an external entity.
 Analyzed business requirements, system requirements, data mapping requirement specifications,
and responsible for documenting functional requirements and supplementary requirements in
Quality Center.
 Setting up of environments to be used for testing and the range of functionalities to be tested as per
technical specifications.
 Tested Complex ETL Mappings and Sessions based on business user requirements and business rules
to load data from source flat files and RDBMS tables to target tables.
 Evaluated and enhanced current data models to reflect business requirements.
 Used Power Exchange to create and maintain data maps for each file and to connect Power center
to mainframe
 Developed Mappings using Source Qualifier, Expression, Filter, Look up, Update Strategy, Sorter,
Joiner, Normalizer and Router transformations.
 Integrate Data Stage Metadata to Informatica Metadata and created ETL mappings and workflows.
 Involved in writing, testing, and implementing triggers, stored procedures and functions at Database
level using PL/SQL.
 Migrated repository objects, services and scripts from development to production environment
 Created UNIX scripts and environment files to run batch jobs
 Worked with DBA's for performance tuning and to get privileges on different tables in different
environments
 Developed scripts using both Data frames/SQL and RDD in PySpark (Spark with Python) 1.x/2.x for
Data Aggregation.
 Unit tested the code to check if the target is loaded properly
 Involved in writing parsers using Python
 Responsible for creating test cases to make sure the data originating from source is making into
target properly in the right format.
 Tested several stored procedures and wrote complex SQL syntax using case, having, connect by etc
 Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing
issues are resolved on the basis of using defect reports.
 Tested the ETL process for both before data validation and after data validation process. Tested the
messages published by ETL tool and data loaded into various databases

Environment: Python, PL/SQL, Metadata, Cloudera, Java, PySpark, Scala, UNIX, Tableau.

Data Analyst | Mar 2015 – Dec 2016


Client: Mr Cooper, TX

 Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the
ground up on Confidential Redshift for large scale data handling Millions of records every day
 Migrated on premise database structure to Confidential Redshift data warehouse
 Was responsible for ETL and data validation using SQL Server Integration Services.
 Defined and deployed monitoring, metrics, and logging systems on AWS.
 Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data
Mart modeling methodology using Erwin.
 Evaluated and enhanced current data models to reflect business requirements.
 Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
 Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced
referential integrity constraints and created logical and physical models using Erwin.
 Created ad hoc queries and reports to support business decisions SQL Server Reporting Services
(SSRS).
 Analyze the existing application programs and tune SQL queries using execution plan, query
analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
 Involved in the Forward Engineering of the logical models to generate the physical model using
Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data
Warehouse.
 Given POC of FLUME to handle the real-time log processing for attribution reports.
 Generated, wrote and run SQL script to implement the DB changes including table update, addition
or update of indexes, creation of views and store procedures.
 Implementing and Managing ETL solutions and automating operational processes.
 Wrote various data normalization jobs for new data ingested into Redshift.
 Created various complex SSIS/ETL packages to Extract, Transform and Load data
 Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over
more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting
interface, giving sub-second query response for basic queries.
 Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to
get done.

Environment: Tableau, AWS, EC2, S3, SQL Server, Erwin, Oracle, Redshift, Informatica, SQL, NOSQL,
Snow Flake Schema, Tableau, Git Hub.

ETL Developer | Apr 2013 – Nov 2013


Client: Hudda Infotech Private Limited, Hyderabad, India

 Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for loading the data
(staging) to enhance and maintain the existing functionality.
 Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load)
processes using Oracle and Informatica Power Center.
 Designed new database tables to meet business information needs. Designed Mapping document,
which is a guideline to ETL Coding.
 Done analysis of Source, Requirements, existing OLTP system and identification of required
dimensions and facts from the Database.
 Designed the Dimensional Model of the Data Warehouse Confirmation of source data layouts and
needs.
 Extensively used Oracle ETL process for address data cleansing.
 Developed and tuned all the Affiliations received from data sources using Oracle and Informatica and
tested with high volume of data.
 Developed Logical and Physical data models that capture current state/future state data elements
and data flows using Erwin.
 Extracted Data from various sources like Data Files, different customized tools like Meridian and
Oracle.
 Used ETL to extract files for the external vendors and coordinated that effort.
 Created various Documents such as Source-to-Target Data Mapping Document, and Unit Test Cases
Document.

Environment: Informatica Power Centre, SQL, Oracle, MS Office, MS Excel, Windows.

You might also like