Professional Documents
Culture Documents
Technical Skills:
Programming Languages: R Programming, Python, Mat lab, VB, Java, C, C++, SQL, MySQL, PL/SQL
ETL Tools: Informatica Power Center 9.1/8.6 (Designer, Workflow Manager/ Monitor, Repository), Ab
Initio
Testing Tools: Jira, HP ALM, IBM ClearQuest, IBMRQM, MTM, SDLC
Database Tools: Oracle SQL Developer, Toad, Oracle 10g/11g/12c, MS SQL Server, SSIS, SSRS, Data Grid
BI and Analytics Tools: OBIEE, Oracle Reports Builder, Spotfire, Tableau 10.5, Pandas, Seaborne,
Matplotlib, Cognos, Excel, SAS, SAS Enterprise Miner
Operating System/Framework: Windows, Linux, Macintosh, UNIX
Cloud Technologies: AWS (Amazon Web Services), Microsoft Azure
Data Modeling: Regression Modeling, Time Series Modeling, PDE Modeling, Star-Schema Modeling,
Snowflake-Schema Modeling, FACT and Dimension tables
Statistical Modelling/Algorithms: Regression Modelling, Time Series Analysis, ACT, Random Forest
Project Experience:
Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping
documents, metadata, DDL and DML as required.
Analysis of functional and non-functional categorized data elements for data profiling and mapping
from source to target data environment. Developed working documents to support findings and
assign specific tasks
Analyzed DB discrepancies and synchronized the Staging, Development, UAT and Production DB
environments with data models.
Worked on data profiling & various data quality rules development using Informatica Data Quality.
Performed data analysis and data profiling using complex SQL on various sources systems including
Oracle and Teradata
Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake,
IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail.
Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
Experienced in ETL concepts, building ETL solutions and Data modeling
Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
Architected, Designed and Developed Business applications and Data marts for reporting. Involved in
different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration
Testing, Review and Release as per the business requirements.
Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
Performed data extraction, transformation, loading, and integration in data warehouse, operational
data stores and master data management
Performed Extraction, Transformation and Loading (ETL) using Informatica power center.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate
reports for the BI team Using Tableau.
Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using
commands with Crontab.
Connected to AWS EC2 using SSH and ran spark-submit jobs.
Designed and implemented configurable data delivery pipeline for scheduled updates to customer
facing data stores built with Python.
Design and construct of AWS Data pipelines using various resources in AWS including AWS API
Gateway to receives response from aws lambda and retrieve data from snowflake using lambda
function and convert the response into Json format using Database as Snowflake, DynamoDB, AWS
Lambda function and AWS S3.
Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical
Modeling.
Optimized the Tensor Flow Model for efficiency.
Analyzed the system for new enhancements/functionalities and perform Impact analysis of the
application for implementing ETL changes.
Wrote SQL, PL/SQL, stored procedures for implementing business rules and transformations.
Developed Mappings using Transformations like Expression, Filter, Joiner, and Lookups for better
data messaging and to migrate clean and consistent data.
Worked on confluence and Jira.
Extensively used ERWin to design Logical, Physical Data Models, and Relational database and to
perform forward/reverse engineering.
Built performant, scalable ETL processes to load, cleanse and validate data.
Developed various Mappings with the collection of all Sources, Targets, and Transformations using
Informatica Designer
Preparing associated documentation for specifications, requirements, and testing.
Identify & record defects with required information for issue to be reproduced by development
team.
Designed and developed database models for the operational data store, data warehouse, and
federated databases to support client enterprise Information Management Strategy.
Environment: Pyspark, Apache Beam, Erwin, AWS (EC2, Lambda, S3, VPC, Snowflake, Cloud Trail,
Cloud Watch, Auto Scaling, IAM, Dynamo DB) Cloud Shell, Tableau, Cloud Sql, MySQL, Postgres, Sql
Server, Python, Scala, Spark, Informatica, Spark-Sql, No Sql, MongoDB, TensorFlow, Jira, GitLab.
Environment: Power BI, NoSQL, Data Lake, Zookeeper Python, Tableau, Azure, ADF, Unix/Linux Shell
Scripting, PyCharm, Informatica PowerCenter, Linux Shell Scripting,
Environment: Python, PL/SQL, Metadata, Cloudera, Java, PySpark, Scala, UNIX, Tableau.
Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the
ground up on Confidential Redshift for large scale data handling Millions of records every day
Migrated on premise database structure to Confidential Redshift data warehouse
Was responsible for ETL and data validation using SQL Server Integration Services.
Defined and deployed monitoring, metrics, and logging systems on AWS.
Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data
Mart modeling methodology using Erwin.
Evaluated and enhanced current data models to reflect business requirements.
Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced
referential integrity constraints and created logical and physical models using Erwin.
Created ad hoc queries and reports to support business decisions SQL Server Reporting Services
(SSRS).
Analyze the existing application programs and tune SQL queries using execution plan, query
analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
Involved in the Forward Engineering of the logical models to generate the physical model using
Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data
Warehouse.
Given POC of FLUME to handle the real-time log processing for attribution reports.
Generated, wrote and run SQL script to implement the DB changes including table update, addition
or update of indexes, creation of views and store procedures.
Implementing and Managing ETL solutions and automating operational processes.
Wrote various data normalization jobs for new data ingested into Redshift.
Created various complex SSIS/ETL packages to Extract, Transform and Load data
Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over
more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting
interface, giving sub-second query response for basic queries.
Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to
get done.
Environment: Tableau, AWS, EC2, S3, SQL Server, Erwin, Oracle, Redshift, Informatica, SQL, NOSQL,
Snow Flake Schema, Tableau, Git Hub.
Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for loading the data
(staging) to enhance and maintain the existing functionality.
Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load)
processes using Oracle and Informatica Power Center.
Designed new database tables to meet business information needs. Designed Mapping document,
which is a guideline to ETL Coding.
Done analysis of Source, Requirements, existing OLTP system and identification of required
dimensions and facts from the Database.
Designed the Dimensional Model of the Data Warehouse Confirmation of source data layouts and
needs.
Extensively used Oracle ETL process for address data cleansing.
Developed and tuned all the Affiliations received from data sources using Oracle and Informatica and
tested with high volume of data.
Developed Logical and Physical data models that capture current state/future state data elements
and data flows using Erwin.
Extracted Data from various sources like Data Files, different customized tools like Meridian and
Oracle.
Used ETL to extract files for the external vendors and coordinated that effort.
Created various Documents such as Source-to-Target Data Mapping Document, and Unit Test Cases
Document.