You are on page 1of 8

N Jaya Mani

Professional Summary:
 Over 8 years of experience in Information Technology that Includes hands on Development in
building ETL and Data warehousing solutions, Database Administration, and Implementation of
various applications in both with Big Data and Relational Database environment
 Expertise in using major components of Hadoop ecosystem components like HDFS, YARN,
MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming,
Oozie, Zookeeper, Hue.
 Created Snowflake Schemas by normalizing the dimension tables as appropriate and creating a Sub
Dimension named Demographic as a subset to the Customer Dimension
 Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data
Lake Store, Azure Data Factory, and created POC in moving the data from flat files and SQL
Server using U-SQL jobs.
 Experience with Azure transformation projects, Azure architecture decision-making, and
implemented ETL and data movement solutions using Azure Data Factory (ADF).
 Worked on creating Data Pipelines for Copy Activity, moving, and transforming the data with
Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing.
 Recreating existing application logic and functionality in the Azure Data Factory (ADF), Azure
SQL Database and Azure SQL Datawarehouse environment.
 Loaded Data from SFTP/on Prem Data Through Azure Data factory to Azure SQL Database and
Automize the pipeline schedules using Event based Scheduling in Azure Data Factory (ADF).
 Customize the Power BI reports, dashboards and Azure Data Factory (ADF) pipelines when source
data updated.
 Hands-on experience with Azure Databricks, Azure Data Factory, Azure SQL, and Pyspark.
 Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance.
 Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR),
Redshift, and EC2 for data processing.
 Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic
Load Balancing, Auto Scaling, DynamoDB, Lambda, Cloud Front, CloudWatch, SNS, SES,
SQS and other services of the AWS family.
 Experience in python programming using Pandas and NumPy.
 Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join
operations.
 Good understanding of MPP databases such as HP Vertica and Impala.
 Extensive experience in Text Analytics, generating data visualizations using R, Python and creating
dashboards using tools like Tableau, PowerBI.
 Automation of workflows and scheduling jobs using Oozie and airflow.
 Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary Data for analysis
from different sources, prepared Data for Data exploration using Data munging.
 Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods
including describe Data contents, compute descriptive statistics of Data, regex, split and combine,
remap, merge, subset, reindex, melt, and reshape. Good understanding of web design based on
HTML5, CSS3, and JavaScript.
 Ability to work effectively in cross-functional team environments, excellent communication, and
interpersonal skills.
Technical Skills:
 Big Data Tools: Hadoop Ecosystem, Map Reduce, Spark, Airflow, Nifi, HBase, Hive, Pig, Sqoop,
Kafka, Oozie, Hadoop, Snowflake, Databricks.
 Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
 Cloud Platform: AWS (Amazon Web Services), Microsoft Azure
 Data Modeling Tools: Erwin Data Modeler, ER Studio v17
 Programming Languages: SQL, PL/SQL, Python, Scala and UNIX.
 OLAP Tools: Tableau, SSAS, Business Objects.
 Databases: Oracle 12c/11g, MS SQL Server, My-SQL, Teradata R15/R14.
 ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau.
 Operating System: Windows, Unix, Sun Solaris
 Development Methods: Agile/Scrum, Waterfall

Professional Summary:

AT&T – Plano, TX Aug 2023 – Feb


2024
Role: Sr. Azure Databricks Engineer
Responsibilities:
 Worked on customer data such as address and referenced it with 5 other billing systems to find the
match rate and ingested the data from various sources to the Azure Databricks.
 Analyzed and extracted data from databases using pyspark queries to perform data transformations,
aggregations, and manipulations, resulting in reduction in data processing time and increase in the
match rate.
 Implemented data quality checks, reducing data errors and ensuring data integrity.
 Applied machine learning algorithms like FuzzyWuzzy within Azure Databricks to derive
predictive models and improve decision-making for match rate.
 Devised data model validations and schema to represent business processes, systems, and Business
intelligence analytics.
 Managed data pipelines and workflows and ingested 10M data for processing and improving data
accuracy.
 Documented the pipeline enhancements and drafted reports to keep track of the deployments.
 Worked on Migrating SQL database to Azure Databricks and conducted a thorough assessment of
the existing SQL Server environment to understand the schema, data types, indexes, stored
procedures, triggers, and any dependencies.
 Before migration, performed data profiling and cleansing to ensure data integrity and quality.
Identified any anomalies or inconsistencies in the data that need to be addressed during the
migration process.
 Mapped the schema from the SQL Server database to Databricks SQL, taking into account any
differences in data types, constraints, and formatting requirements. Developed transformation
scripts or pipelines to convert and migrate the data.
 Optimized performance during the migration process by parallelizing data transfer, optimizing
queries, and leveraging Databricks capabilities such as caching and cluster scaling.
 Set up monitoring tools and alerts to track the migration progress, identify any bottlenecks or
issues, and troubleshoot them in a timely manner.
Environment:, Spark, Spark SQL, Python, Pyspark, Azure Data Factory, Azure SQL, Azure Databricks,
Azure DW, Scala, MySQL, Snowflake

PepsiCo – Plano, TX July 2022 – July


2023
Role: Sr. Azure Data Engineer
Responsibilities:
 Worked on Supply chain data such as location & tracking of the packages, types of packages, service
centers, PO vendors, Transaction’s data, NMFC data, Item class, Product Types, etc.,
 Created CRM Order processing and social data landing (Azure Blob Storage, Azure Data Lake
Storge) on Snowflake on Azure with integrated dashboards (Power BI, Snowflake Web UI) to
leverage
 Data extraction from various sources, Transformation and Loading into the target SQL Server
Database. Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-
cloud ETL processing.
 Worked on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL
Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database
access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
 Automize the Power BI reports, dashboards and Azure Data Factory (ADF) pipelines when source
data updated.
 Created Pipelines and Load the data in Azure SQL Datawarehouse through Data Lake and ADF
activities. Extract Transform and Load data from Sources Systems to Azure Data Storage services
using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake
Analytics.
 Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL,
Azure DW) and processing the data in Azure Databricks.
 Involved in developing the end-to-end ELT pipelines for migrating the on-premises Oracle
database to Azure SQL Datawarehouse.
 Implemented Spark using python and Spark SQL for faster testing and processing of data.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
 Responsible for ingesting data from various source systems (RDBMS, Flat files, Big Data) into
Azure (Blob Storage) using framework model.
 Developed Talend jobs to populate the claims data to data warehouse - star schema, snowflake
schema, Hybrid Schema.
 Used Hadoop scripts for HDFS (Hadoop Distributed File System) data loading and manipulation.
 Involve into Application Design and Data Architecture using Cloud and Big Data solutions on
Microsoft Azure.
 Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution. Re-
designing the Legacy Application solutions with minimal changes to run on cloud platform.
 Worked on building the data pipeline using Azure Service like Data Factory to load the data from
Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS
Packages, Talend Jobs, custom .Net and Python codes.
 The Databricks platform follows best practices for securing network access to cloud applications.
 Hands on experience working on creating delta lake tables.
 Extensive knowledge on performance tuning of streaming jobs.
 Expert in performance tuning of spark jobs.
 Performed data validation which does the record wise counts between the source and destination.
 Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema
changes loading the historic data.
 Worked on implementation of some check points like hive count check, Sqoop records check, done
file create check, done file check and touch file lookup.
 Worked on both Agile and Kanban methodologies
 Implemented Custom Azure Data Factory (ADF) pipeline Activities and SCOPE scripts.
 Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, SQL
Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to
connect to on premise servers.

Environment: Hadoop, Hive, Oozie, Spark, Spark SQL, Python, Pyspark, Azure Data Factory, Azure
SQL, Azure Databricks, Azure DW, BLOB storage, Scala, AWS, Linux, Maven, Oracle 11g/10g,
Zookeeper, MySQL, Snowflake.

Charles Schwab – Westlake, TX Oct 2021- June 2022


Role: Sr. Data Engineer
Responsibilities:
 Designed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event
processing using lambda function
 Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual
servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
 Utilized AWS services with focus on big data analytics, enterprise data warehouse and business
intelligence solutions to ensure optimal architecture, scalability, flexibility
 Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
 Participated in development/implementation of Cloudera Hadoop environment.
 Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on
Hadoop.
 Using Bash and Python included Boto3 to supplement automation provided by Ansible and
Terraform for tasks such as encrypting EBS volumes backing AMIs.
 Involved in using Terraform migrate legacy and monolithic systems to Amazon Web Services.
 Wrote Lambda function code and set Cloud Watch Event as trigger with Cron job Expression.
 Validate Scoop jobs, Shell scripts & perform data validation to check if data is loaded correctly
without any discrepancy. Perform migration and testing of static data and transaction data from one
core system to another.
 Worked on creating and running Docker images with multiple micro – services and Docker container
orchestration using ECS, ALB and lambda.
 Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform
actions on RDDs.
 Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
 Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
 Implemented Kafka producers create custom partitions, configured brokers, and implemented high
level consumers to implement data platform.
 Developed best practice, processes, and standards for effectively carrying out data migration
activities. Work across multiple functional projects to understand data usage and implications for data
migration.
 Prepared data migration plans including migration risk, milestones, quality and business sign-off
details.
 Performed advanced procedures like text analytics and processing, using the in-memory computing
capabilities of Spark using Scala.
 Worked on migrating MapReduce programs into Spark transformations using Scala.
 Developed spark code and spark-SQL/streaming for faster testing and processing of data.
 Wrote Python modules to extract data from the MySQL source database.
 Worked on Cloudera distribution and deployed on AWS EC2 Instances.
 Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
 Created Jenkins jobs for CI/CD using git, Maven and Bash scripting
 Built regression test suite in CI/CD pipeline with Data setup, test case execution and tear down using
Cucumber- Gherkin, Java, Spring DAO, PostgreSQL.
 Extracted and loaded CSV files, Json files data from AWS S3 to Snowflake Cloud Data
Warehouse.
 Moved data from S3 bucket to Snowflake data warehouse for generating the reports.
 Managed Roles and Grant management on Snowflake.
 Hands on experience running ETL data pipelines on snowflake.
 Connected Redshift to Tableau for creating a dynamic dashboard for the analytics team.
 Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script.

Environment: Redshift, DynamoDB, Pyspark, Snowflake, EC2, EMR, Glue, S3, Java, Kafka, IAM,
PostgreSQL, Jenkins, Maven, AWSCLI, Shell Scripting, Git.

Microsoft – Bangalore, India Jul 2019- June


2021
Role: Data Engineer
Responsibilities:
 Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming,
Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka
 Performing hive tuning techniques like partitioning and bucketing and memory optimization.
 Worked on different file formats like parquet, orc, Json and text files.
 Worked on migrating MapReduce programs into Spark transformations using Spark and Scala,
initially done using python (Pyspark).
 Used spark SQL to load data and created schema RDD on top of that which loads into hive tables
and handled structured using spark SQL.
 Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig,
Hive, HBase, Oozie, Zookeeper, Sqoop, Spark and Kafka.
 As a Big Data Developer implemented solutions for ingesting data from various sources and
processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce
Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
 Explored with the Spark improving the performance and optimization of the existing algorithms in
Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark, YARN, Pyspark.
 Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types
of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
 Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform
services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL
DW, HDInsight/Databricks, NoSQL DB)
 Worked in Azure environment for development and deployment of Custom Hadoop Applications.
 Created, provisioned different Databricks clusters needed for batch and continuous streaming data
processing and installed the required libraries for the clusters.
 Developed a process using the python scripting to connect Azure blob container and retrieve the
latest. bai & .xml files from the container and load them into the SQL database.
 Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
 Performed tuning on PostgreSQL Databases using vacuum and analyze.
 Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was
accessed by business users.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a
combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data
 The Databricks platform follows best practices for securing network access to cloud applications.
 Performed data validation which does the record wise counts between the source and destination.
 Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema
changes loading the historic data.
 Worked on implementation of some check points like hive count check, Sqoop records check, done
file create check, done file check and touch file lookup.
 Worked on both Agile and Kanban methodologies

Environment: Hadoop 3.0, Microservices, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4,
Kafka, JDBC, Hive 2.3, Pig 0.17, SQL Server, No SQL, PostgreSQL, Oozie

Cadence – Hyderabad, India Dec 2017- June 2019


Role: Data Engineer
Responsibilities:
 Experience in developing Spark applications using Spark-SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for Analyzing& transforming the data to
uncover insights into the customer usage patterns.
 Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks
cluster.
 Using Sqoop to import and export data from Oracle and PostgreSQL into HDFS to use it for the
analysis.
 Migrated Existing MapReduce programs to Spark Models using Python.
 Migrating the data from Data Lake (hive) into S3 Bucket.
 Done data validation between data present in Data Lake and S3 bucket.
 Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
 Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that
of MR jobs.
 Used Kafka for real time data ingestion.
 Created different topic for reading the data in Kafka.
 Read data from different topics in Kafka and run spark structured streaming jobs on AWS EMR
cluster.
 Created database objects like Stored Procedures, UDFs, Triggers, Indexes and Views using TSQL
in both OLTP and Relational data warehouse in support of ETL.
 Developed complex ETL Packages using SQL Server 2008 Integration Services to load data from
various sources like Oracle/SQL Server/DB2 to Staging Database and then to Data Warehouse.
 Created report models from cubes as well as relational data warehouse to create ad-hoc reports and
chart reports
 Written Hive queries for data analysis to meet the business requirements.
 Migrated an existing on-premises application to AWS.
 Developed PIG Latin scripts to extract the data from the web server output files and to load into
HDFS.
 Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
 Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and
Spark SQL.
 Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS
instances within Confidential.
 Used Python API by developing Kafka producer, consumer for writing Avro Schemes.
 Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to
AWS Redshift.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
 Implementing different performance optimization techniques such as using distributed cache for
small datasets, partitioning, and bucketing in hive, doing map side joins etc.
 Good knowledge on Spark platform parameters like memory, cores, and executors
 By using Zookeeper implementation in the cluster, provided concurrent access for hive tables with
shared and exclusive locking.

Environment: Linux, Apache Hadoop Framework, HDFS, YARN, HIVE, HBASE, AWS (S3, EMR),
Scala, Spark, SQOOP, MS SQL Server 2014, Teradata, ETL, Tableau (Desktop 9.x/Server 9.x), Python
3.x(Scikit-Learn/SciPy/NumPy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL).

S&P Global - Hyderabad, India May 2016 – Nov


2017
Role: Data Engineer
Responsibilities:
 Responsible for the systems study, analysis, understanding of systems design, and database design.
 Created SQL Databases, tables, indexes, and views based on user requirements.
 Worked with the application developers and provide necessary SQL Scripts using T-SQL.
 Created User Defined Functions, Stored Procedures, and Triggers.
 Experience in creating small Packages.
 Created Views to facilitate easy user interface implementation, and triggers enable consistent data
entry into the database.
 Data mapping using Business Rules and data transformation logic for ETL purposes.
 Involved in Migrating the data model from one database to an oracle database and prepared an
oracle staging model.
 Designed star schemas and bridge tables to control slowly changing dimensions.
 Involved in the process design documentation of the Data Warehouse Dimensional Upgrades.
 Assisted in the design of report layouts to maximize usability and business relevance.
 Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables
with the incremental load.
 Developed, deployed, and monitored SSIS Packages.
 I have created SSIS Packages using SSIS Designer for heterogeneous export data from OLE DB
Source (Oracle), Excel Spreadsheet to SQL Server 2010.
 Performed operations like Data reconciliation, validation, and error handling after Extracting data into
SQL Server.
 Worked on SSIS Package, DTS Import/Export for transferring data from Oracle and Text format
data to SQL Server.
 Created SSRS reports using Report Parameters, Drop-Down Parameters, Multi-Valued Parameters.
 I generated and formatted Reports using Global Variables, Expressions, and Functions for the reports.
 Created reports with Analysis Services Cube as the data source using SSRS.
 Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders.
 Developed functional specifications and testing requirements using Test Director to conform to user
needs.

Environment: SQL Server 2010, SQL Query Analyzer, MS Access, MS Excel, Visual Studio 2010.

Education
 Bachelors in IT from Anna University - 2016
 Masters in CS from Texas A&M University - 2022

You might also like