You are on page 1of 6

ABHINAY VARMA PINNAMARAJU

Data Engineer
E- mail: pinnamaraju0116@gmail.com Contact: 8574965480

Professional Summary:
 Around 8+ years of IT experience with Data Engineer/Analyst and coding with analytical programming using
SQL, Python, Snowflake and AWS.
 Good working knowledge in multi-tiered distributed environment, good understanding of Software
Development Lifecycle (SDLC)-Agile and Waterfall Methodologies
 The ability to develop reliable, maintainable, efficient code in most of SQL, Hive and Python.
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management
Systems (RDBMS) and from RDBMS to HDFS.
 Strong knowledge of various data warehousing methodologies and data modeling concepts.
 Experience in the development of ETL processes and frameworks for large-scale, complex datasets
 Experience with application development on Linux, python, RDBMS, NoSQL and ETL solutions
 Experience designing and operating very large Data Warehouses.
 Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc,
Pubsub, Airflow.
 Experience in the development of ETL processes and frameworks like matillion for large-scale, complex datasets
 Good knowledge in BI and data visualization tools such as Tableau
 Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
 Gained Knowledge to write cloud formation templates and deployed AWS resourcing.
 Experience executing data projects with focus on data warehousing, data lake and datamarts.
 Design and build data processing pipelines using tools and frameworks.
 Understanding data transformation and translation requirements and which tools to leverage to get the job done.
 Understanding data pipelines and modern ways of automating data pipeline using cloud-based technique.
 Good understanding and hands on experience in setting up and maintaining NoSQL Databases like MongoDB,
and HBase
 Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like
Dynamo DB.
 Good knowledge in data visualization tools such as Tableau
 Ability to work on several python packages like Pandas, NumPy, Matplotlib, Beautiful Soup, Pyspark
 Hands on experience in working with Continuous Integration and Deployment (CI/CD).
 Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and
retrospective.
 Experience in using various version control systems like Git.
 Strong analytical skills with the ability to collect, organize and analyze large amounts of information with
attention to detail and accuracy.
 Possess good interpersonal, analytical presentation Skills, ability to work in Self-managed and Team
environments.

Technical Skills:

Programming Languages Python, SQL, PHP, C++, Shell Scripting


SDLC Methodologies Agile/SCRUM, Waterfall

1|Page
Operating Systems Windows, Linux, Unix
Python Libraries Pyspark, Pandas, Beautiful Soup, Jinja2, NumPy, SciPy, Matplotlib, Unit test
Big Data Tools Hadoop3.3, Hive3.2.1, Kafka2.8, Scala, MapReduce, Sqoop
Cloud Tools Azure and AWS (S3, RDS, Dynamo, EMR, Redshift, Glue),
Databases MS SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, SQLite, Dynamo DB
Version Controls Git, GitHub, Bitbucket
Other Tools Snowflake, Databricks, Hive Spark, Matillion, ETL, JIRA, Docker, MS Excel,
Data Pipeline, Data Modeling. Data Lake

Education Details:
Bachelor of Technology in Computer Science from, JNTU Hyderabad. 2014

Professional Experience:

Client: AT&T Apr 2021 – Till Date


Role: Big Data Engineer
Responsibilities:

 Analyzed data requirements for the Project Marketing Data Hub to build data load Jobs.
 Leaded development of high-volume, low-latency, data solutions to support a variety of analytics use cases in
an efficient and scalable manner.
 Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB,
NoSQL.
 Helped in Technical ownership of data platform roadmap and its successful delivery, including sources, flows,
capabilities, and performance.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
 Experience with Data Lake Infrastructure, Data Warehousing, and Data Analytics tools
 Proven track record of building and delivering large, highly available, enterprise-grade data systems and
solutions.
 Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in In Azure Databricks.
 Used a broad range of AWS technologies (e.g. EC2, EMR, S3, Lake Formation, Redshift, VPC, Glacier, IAM,
CloudWatch, SQS, Lambda, CloudTrail, Systems Manager, KMS, Kinesis Streams)
 Created and maintained optimal data pipeline architecture, incorporating data wrangling and Extract-
Transform-Load (ETL) flows.
 Assembled large, complex data sets to meet analytical requirements - analytics tables, feature-engineering etc.
 Builded the infrastructure required for optimal, automated extraction, transformation, and loading of data from
a wide variety of data sources using SQL and other 'big data' technologies such as Databricks.
 Builded automated analytics tools that utilize the data pipeline to derive actionable insights.
 Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters
in AWS.
 Created Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, Pipeline to extract, transform
and load data from different sources like Azure SQL, Blob storage, Azure SQL DW, write-back tool and
backwards.

2|Page
 Identified, designed, and implemented internal process improvements: automating manual processes,
optimizing data delivery, re-designing infrastructure for greater scalability, etc.
 Worked closely with team members, stakeholders, and solution architects.
 Proven experience designing, building, and operating enterprise grade data streaming use cases leveraging in
Kafka, Spark Streaming.
 Analyzed data as it relates to business needs and use cases for applying predictive algorithms and machine
learning.
 Used Data Lake to combine customer data from a CRM platform with social media analytics, a marketing
platform that includes buying history.
 Used data lake to help our R&D teams test their hypothesis, refine assumptions, and assess results
 Collaborated with team members and tech lead to define, review and finalize design
 Builded data analysis dashboards and reports based on business use cases
 Utilized REST APIs with python to ingest the data into Big Query. Computed PySpark Jobs using gsutil and got
that executed In Data proc Cluster.
 Supported current dashboards built by the business and assist business to build their own dashboards
 Helped as Operational support to assist in data setup for stores and incident management
 Participated in all SCRUM ceremonies and team discussions
 Developed in NoSQL database platforms - Oracle, Cassandra, Hadoop.
 Used Apache Spark based frameworks for complex data processing.
 Helped in understanding of client business processes, objectives, and solution requirements. Participated in
project work groups with subject matter experts and stakeholders to understand data specific needs.
 Developed, builded, and implemented dependable automated build and deploy processes in our CI/CD
environment.
 Managed the data by extracting and transforming >100+ tables from AWS Snowflake; conducted an ad-hoc
analysis using SQL and Python; maintained Python code by Git.
 Mount Azure Data Lake containers to Databricksand create service principals, access keys, tokens to
accessAzure Data Lake Gen2 storage account.
 Builded strong and sustaining relationships with colleagues and business stakeholders.
 Loaded the Data to AWS using Databricks clusters and process it as per the business requirements
 Created models on data loaded to cloud.
 Came up with architecture patterns, designs and industry best practices for Data on Cloud.
 Hands on experience on cloud tools and technologies especially AWS, Snowflake & Databricks.
 Exposure to AWS technologies like Lambda, S3, etc.
 Worked with groups to understand the business needs, future road map, technology landscape and come up with
cost optimal data on cloud solution.
 Import raw data such as csv, json files into Azure Data Lake Gen2 to perform data ingestionby writing PySpark
to extract flat files.

Environment: Hadoop 3.0, MapReduce, Hive 3.0, Azure Data Lake, Azure Storage, Azure SQL, Azure DW, Power bi,
Agile, HBase 1.2, PySpark, NoSQL, AWS, Kafka, Pig 0.17, HDFS, Java 8, Hortonworks, Spark, PL/SQL, Python

Client: Goldman Sachs May 2019 – March 2021


Role: Data Engineer
Responsibilities:

3|Page
 Worked as a Data Engineer worked with the analysis teams and management teams and supported them based
on their requirements.
 Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate and reformat
into parquet files and load them into Azure Blob storage container.
 Worked on Agile, testing methodologies, resource management and scheduling of tasks.
 Worked on all phases of data warehouse development lifecycle, from gathering requirements to testing,
implementation, and support.
 Worked with Amazon Web Services (AWS) for improved efficiency of storage and fast access.
 Worked on Data migration project from Teradata to Snowflake.
 DevelopAzure linked services to construct connections with on-premisesOracle Database, SQL Server, Apache
Hive with Azure datasetsin the cloud.
 Extracted and analyzed >800k+ data using SQL queries from Azure Snowflake, Azure SQL DB.
 Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
 Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
 Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
 Involved in developing ETL pipelines in and out of data warehouse using combination of Python and
Snowflakes SnowSQL.
 Constructed connections with Azure Blob Storage, built end-to-end data ingestion to process raw files in
Databricks and Azure Data Factory.
 Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in
partitioned tables in the EDW.
 Designed the Exploratory Data Analysis in Python using Seaborn and Matplotlib to evaluate data qualities.
 Added support for Amazon AWS S3 to host static/media files and the database into Amazon Cloud.
 Used Oozie to automate data loading into the HDFS and PIG to pre-process the data.
 Called Data Rules and Transform Rules functions using Informatica Stored Procedure Transformation.
 Implemented data models, database designs, data access, table maintenance and code changes together with our
development team.
 Expertized in implementing Spark using Python and Spark SQL for faster testing and processing of data
responsible to manage data from different sources.
 Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
 Analyze data from multiple data sources and develop a process to integrate the data into a single but consistent
view.
 Performed feature engineering and data analysis using Pandas, NumPy, Seaborn, Matplotlib in Python and
built data pipelines and various machine learning algorithms.
 Performed Data Analytics on Data Lake using Pyspark on Databricks platform.
 Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
 Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop reports
using advanced SQL queries in snowflake.
 Processed and load bound and unbound Data from Google pub/subtopic to BigQuery using cloud Dataflow
with Python.
 Involved in BI interactive Dashboards and Tableau Publisher.
 Created multiple dashboards in tableau for multiple business needs.
 Used Excel's VLOOKUP's to determine the customer data and created pivots to easily access and validate data.

Environment: Agile, Snowflake, AWS, Spark, HDFS, Sqoop, Oozie, Scala, Cassandra, PIG, HBase, Tableau, Excel,
Pyspark, Databricks, Informatica.

4|Page
Client: Health iPASS Inc. - Hyderabad, IND Aug 2016 – Apr 2019
Role: Big Data Engineer
Responsibilities:

• Written Hive queries for data analysis to meet the business requirements.
• Migrated an existing on premises application to AWS.
• Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
• Created many Sparks UDF and UDAFs in Hive for functions that were not preexisting in Hive and SparkSQL.
• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
• Implementing different performance optimization techniques such as using distributed cache for small datasets,
partitioning, and bucketing in hive, doing map side joins etc.
• Good knowledge on Spark platform parameters like memory, cores, and executors
• By using Zookeeper implementation in the cluster, provided concurrent access for Hive Tables with shared and
exclusive locking.
• Using Sqoop to import and export data from Oracle and MySQL into HDFS to use it for the analysis.
• Migrated Existing MapReduce programs to Spark Models using Python.
• Migrating the data from Data Lake (hive) into S3 Bucket.
• Done data validation between data present in Data Lake and S3 bucket.
• Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
• Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
• Used Kafka for real time data ingestion.
• Created different topic for reading the data in Kafka
• Read data from different topics in Kafka.
• Involved in converting the HQL's in to spark transformations using Spark RDD with support of python and
Scala
• Moved data from S3 bucket to Snowflake Data Warehouse for generating the reports

Client: Prominent IT Solutions - Hyderabad, IND Jun 2014 –July 2016


Role: Data Analyst
Responsibilities:
 Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine
data quality.
 Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of
the SQL-Statements and stored procedures for business scenarios.
 Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
 Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
 Handled performance requirements for databases in OLTP and OLAP models.
 Analyzed the data which is using the maximum number of resources and made changes in the back-end code
using PL/SQL stored procedures and triggers.
 Created actions, parameters, Filter (Local, Global) and calculated sets for preparing dashboards and worksheets
using Tableau.
 Effectively used data blending, quick filters, actions, hierarchies feature in tableau.
 Imported and exported data files to and from SAS using proc import and proc export from excel and various de
limited text-based data files and converted them for analysis.
 Supporting the technical design process by participating in the analysis of technical application requirements.
 Created SQL queries to extract data and plot graphically to build Tableau Visualizations.

5|Page
 Developing Business requirements and data analysis to build graphically rich dashboard and communicate the
results in the form of Story.
 Created Ad hoc reports by using Tableau and data modelling using R language as well.
 Performed data completeness, correctness, data transformation and data quality testing using SQL.
 Involved in designing Business Objects universes and creating reports.
 Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements
are met for the business.
 Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
 Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
 Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.

Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL, Tableau.

6|Page

You might also like