You are on page 1of 6

AKASH VARMA

Data Engineer
Email id: akashvarma7439@gmail.com
LinkedIn Id:linkedin.com/in/akash-varma7
Ph no: (513) 760-1004
Professional Summary:
 Accomplished data engineer with 8 years of experience, adept at leveraging a comprehensive
suite of tools and technologies to design, develop, and maintain sophisticated data solutions.
 Proficient in ETL (Extract, Transform, Load) processes, utilizing tools such as
InformaticaPowerCenter, Talend, or SSIS to ensure efficient data integration across diverse
platforms.
 Skilled in working with big data frameworks and technologies, including Hadoop, Spark, and
tools like Apache Hive, Apache Pig, or Apache Kafka, to enable robust data processing and
analysis.
 Experienced in cloud-based data engineering, leveraging tools such as AWS Glue, Azure Data
Factory, Dataflow to build scalable and resilient data pipelines.
 Strong understanding of data warehousing concepts, utilizing tools like Amazon
RedshiftSnowflake to design and optimize data storage and retrieval systems.
 Strong understanding of Azure Stream Analytics, allowing real-time data processing and
analytics on streaming data sources.
 Proficient in SQL-based querying and scripting languages, including Oracle, SQL Server, or
PostgreSQL, to extract, manipulate, and analyze data from relational databases.
 Skilled in data modeling and schema design using tools like ERwin, ER/Studio, or Lucidchart
to ensure data consistency, integrity, and optimal performance.
 Expertise in data visualization tools such as Tableau, Power BI, or QlikView, enabling the
creation of interactive dashboards and reports to deliver actionable insights.
 Experienced in working with distributed data processing frameworks like Apache Spark or
ApacheFlink, harnessing their capabilities for large-scale data processing and real-time analytics.
 Skilled in stream processing tools like ApacheKafka, AWSKinesis, enabling real-time data
ingestion, transformation, and event-driven architectures.
 Proficient in developing data pipelines and ETL processes on Azure, ensuring efficient and
reliable extraction, transformation, and loading of data from diverse sources
 Strong understanding of data governance principles and practices, employing tools like Collibra,
Alation, or InformaticaAxon to ensure data quality, lineage, and regulatory compliance.
 Proficient in data cataloging tools such as Alation, Collibra Catalog, or Informatica Enterprise
Data Catalog, facilitating data discovery and fostering collaboration across teams.
 Experienced in data quality management tools like Talend Data Quality, Informatica Data
Quality, or Trifacta Wrangler, ensuring data accuracy, consistency, and completeness.
 Experienced in data integration and hybrid cloud solutions, employing Azure Stack and Azure
Data Gateway for seamless data movement between on-premises and cloud environments.
 Skilled in data preparation and wrangling tools such as TrifactaWrangler, Alteryx, or Dataiku
DSS, simplifying the process of transforming raw data into analysis-ready formats.
 Proficient in data orchestration and workflow management tools like ApacheAirflow, Luigi, or
Oozie, automating and scheduling complex data processing tasks and dependencies.
 Proficient in developing data pipelines and ETL processes on Azure, ensuring efficient and
reliable extraction, transformation, and loading of data from diverse sources.
 Experienced in data replication and synchronization tools like Attunity, GoldenGate, or AWS
Database Migration Service, enabling real-time data replication across heterogeneous systems.
 Skilled in data exploration and discovery tools like ApacheZeppelin, Jupyter Notebook, or
Databricks, facilitating interactive data analysis and collaboration among data scientists and
analysts.
 Skilled in data compression and optimization techniques using tools like Parquet, ORC, or
Avro, reducing storage costs and improving query performance in big data environments.
 Experienced in implementing data security and compliance measures on Azure, including
encryption, access controls, and audit trails, to protect sensitive data and meet regulatory
requirements.
 Experienced in data masking and anonymization tools such as Informatica Persistent Data
Masking, Delphix, or IBM Infosphere Optim, safeguarding sensitive data during development
and testing.
 Skilled in data profiling and data quality assessment tools like Talend Data Stewardship,
Trillium, or IBM InfoSphere QualityStage, identifying data anomalies and improving data
integrity.
 Proficient in data versioning and lineage tools like ApacheAtlas, Collibra Data Lineage, or
Informatica Enterprise Data Catalog, providing visibility into data changes and dependencies.
 Experienced in data integration and synchronization tools like Oracle Data Integrator,
Informatica Power Exchange, or Talend Data Integration, facilitating seamless data movement
between systems.
 Skilled in real-time analytics platforms like ApacheDruid, AWSAthena, enabling ad-hoc
querying and exploration of large datasets with minimal latency.
 Proven track record of successfully implementing security and compliance measures on Azure,
ensuring data protection and adherence to regulatory requirements.
 Experienced in data archival and lifecycle management tools like Informatica Data Archive,
IBM InfoSphere Optim, or AWS Glacier, optimizing storage costs and compliance requirements.
 Skilled in data profiling and metadata management tools like Informatica Metadata Manager,
CollibraDataGovernance, or Talend Data Catalog, facilitating data understanding and lineage
documentation.
TECHNICAL SKILLS:
Hadoop Components / HDFS, Map Reduce, Spark, Airflow, Yarn, HBase, Hive, Pig,
Big Data Flume, Sqoop, Kafka, Oozie, Hadoop, Zookeeper, Spark SQL.
Languages: SQL, PL/SQL, PYTHON, Java, Scala, C, HTML, Unix, Linux
Cloud Platform AWS (Amazon Web Services), Microsoft Azure
ETL Tools: Informatica PowerCenter, SSIS, Talend
Reporting Tools Power BI, SSRS and Tableau
Tracking tool JIRA
MS-Office Package Microsoft Office (Windows, Word, Excel, PowerPoint, Visio,
Project).
Databases MySQL, Oracle, Redshift, PostgreSQL
Operating Systems Windows, LINUX, Unix
Version Control Bitbucket, GitHub

PROFESSIONAL EXPERIENCE:

Client: DTCC Telecommunications, New York, NY. Dec 2020 -Present


Role: Senior Data Engineer
Responsibilities:
 Involving in testing the database using complex SQL scripts and handling the performance issues
effectively.
 Designed and developed ETL pipelines using Apache Airflow to extract, transform, and load data
from various sources into a centralized data warehouse.
 Developed and maintained ETL pipelines using Apache Airflow for a large-scale data warehousing
project.
 Led the implementation of Azure Stack for multiple clients, including infrastructure design,
configuration, and migration of workloads to the hybrid cloud platform.
 Developed a Restful API using &Scala for tracking open source projects in Github and computing
the in process metrics information for those projects.
 Analysis on existing data flows and Create high level/low level technical design documents for
business stakeholders that confirm technical design aligns with business requirements.
 Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the data.
 Involved in Onsite & Offshore coordination to ensure the deliverables.
 Developed and maintained data models using PowerBI's data modeling tools, such as Power Query
and DAX, to ensure the accuracy and consistency of data used in dashboards and reports.
 Created Azure Resource group and provided role-based access control to users using Azure
Management APIs.
 Extensively worked on Data Modeling involving Dimensional Data modeling, Star Schema/Snow
flake schema, FACT & Dimensions tables, Physical & logical data modeling.
 Designed front end and backend of the application utilizing Python on Django Web Framework.
Develop buyer-based highlights and applications utilizing Python and Django in test driven
development and pair-based programming.
 Involved in Designing Snowflake Schema for Data Warehouse, ODS architecture by using tools like
Data Model, Erwin.
 Created and configured the Virtual Networks and subnets using Azure CLI.
 Developed data models and data migration strategies utilizing concepts of snowflake schema.
 Extensively worked on Informatica power center Mappings, Mapping Parameters, Workflows,
Variables and Session Parameters.
 Leveraged data visualization tools like Tableau or PowerBI to create insightful dashboards and
reports for the telecommunication client, facilitating data-driven decision-making and business
intelligence.
 Implemented data quality and governance processes for the telecommunication client using tools
like Collibra or Informatica Data Quality, ensuring data accuracy, consistency, and compliance
with regulatory standards in the telecommunications industry.
 Creation and deployment of Spark jobs in different environments and loading data to NoSQL
database Cassandra/Hive/HDFS. Secure the data by implementing encryption-based
 Scheduled Informatica Jobs through Autosys scheduling tool.
 Imported data from excel documents to SAS datasets using Dynamic Data Exchange (DDE) utility.
 Proficient in writing PySpark scripts in Python, and in integrating PySpark applications with other
Python libraries and frameworks such as NumPy, Pandas, and Flask.
Environment: SQL , Apache Airflow, Tableau, Collibra, Azure,Scala, PowerBI, Data Modeler, Erwin,
Python, T-SQL, Spark SQL, Informatica, SAS , PySpark, NumPy, Pandas, Flask, NoSQL, Hive, HDFS.

Client: Neilson Dairy, George Town,ON. Dec 2018-Nov2020


Role: Senior Data Engineer
Responsibilities:
 Worked as an agile team member to develop, support, maintain and implement complex project
module for billing.
 Involved in building data pipelines to ingest and transform data using spark and loading the
output in multiple sources like Hive and HBase.
 Worked on complex data types of array, map, and struct in hive.
 Conducted performance tuning and optimization of AWS services to enhance data processing
speed, scalability, and cost efficiency for financial data workloads.
 Perform data validation and reconciliation on raw data in the DataLake using Spark.
 Leveraged R and Python libraries, such as pandas, NumPy, or scikit-learn, to perform advanced
data analysis, modeling, and predictive analytics for financial insights and decision-making.
 Collaborated with AWS Solution Architects and account teams to design and implement
solutions that align with the financial client's strategic goals and requirements.
 Perform tasks such as writing scripts, calling APIs, write SQL queries, etc.
 Schedule the Spark jobs in cluster using Airflow.
 Involved in performance tuning of Spark Applications for setting right level of Parallelism and
memory tuning.
 Developed and maintained monitoring and alerting systems on AWS using services like Amazon
CloudWatch or AWS CloudTrail, ensuring proactive identification and resolution of issues.
 Implemented data extraction processes from various sources, including relational databases,
APIs, and flat files, using SQL, Python, or ETL tools like Informatica or Talend.
 Involved in loading data from UNIX file system to HDFS.
 Used optimization techniques in spark like Data Serialization and Broadcasting.
 Using the memory computing capabilities of spark using Scala, performed advanced procedures
like text analytics and processing.
 Collaborated with cross-functional teams, including data scientists and business analysts, to
understand data requirements and translate them into efficient data engineering solutions using
R,Python, SQL, and ETL tools.
 Integrated AWS services with third-party tools and APIs to facilitate data integration and
automation processes for the financial client.
 Involved in loading data from UNIX file system to HDFS.
 Responsible for creating Jenkins pipeline for deployment using Ansible.
 Identified opportunities to improve the quality of data, through process and system
improvements.

Environment: Hive, HBase, Spark, UNIX, SQL Server, Ansible, MapReduce, Restful Service, Maven,
GIT, JIRA, ETL,AWS, R, Python, Scala, NumPy, Informatica, Talend, API’s.

Client: Mayo Clinic,India. Oct 2016 – August 2018


Role: Senior Data Engineer
Responsibilities:
 Involved in building data pipelines that extract, classify, merge and deliver new insights on the
data.
 Advanced knowledge in performance troubleshooting and tuning of services in Azure like HD
insight clusters, ADF, Databricks, Network.
 Designed and implemented ADHOC data pulls using Azure Data Factory from on-premises onto
SQL DW.
 Developed Spark jobs using Pyspark and Spark-SQL for data extraction, transformation &
aggregation from multiple file formats.
 Optimization of Hive queries using best practices and right parameters and using technologies
like Hadoop, Python, PySpark.
 Implemented data governance policies and procedures to ensure data privacy, security, and
compliance with healthcare regulations, such as HIPAA or GDPR, on Azure.
 Developed framework for converting existing Power Center mappings and to PySpark (Python
and Spark) Jobs.
 Worked on moving data from Blob storage containers to Azure Synapse using ADF.
 Designed and built ETL pipelines to automate ingestion of structured and unstructured data.
 Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades
as required and fixed problems, which raised after the upgrades.
 Knowledge in performance troubleshooting and tuning Hadoop clusters.
 Queried and analyzed data from CosmosDB for quick searching, sorting, and grouping through
CQL.
 Worked on python and shell scripting to automate and schedule the workflows to run on Azure.
 Hands-on experience in setting up ApacheHadoop and Cloudera CDH clusters on Linux
distributions environments.
 Responsible for implementing monitoring solutions in Ansible, Terraform, Docker, and
Jenkins.
 Developed reporting applications for generating day to day reports for business
Environment: Spark, Azure, Databricks, SQL, Java, Python, Scala, Linux, Maven, Cassandra, Docker,
MapReduce, Pyspark, Ansible, Terraform, JIRA, ETL, HBase, GIT, Hive.

Client: Thomson Reuters ,India. Jun 2015 – Sep 2016


Role: Senior Data Engineer
Responsibilities:
 Importing and sending out information into HDFS and Hive utilizing Sqoop.
 Experienced in running Hadoop stream jobs to process terabytes of xml group information with
the help of Map Reduce programs.
 Developed and maintained data backup and disaster recovery mechanisms on AWS, ensuring
data availability and business continuity.
 Involved in loading information from UNIX document framework to Hadoop Distributed File
System.
 Installed, configured Hive and furthermore composed HiveUDFs.
 Automated every one of the jobs for pulling information from FTP server to stack information
into Hive tables, utilizing Oozie work processes.
 Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data
to partitioned tables using Spark.
 Monitored and trouble shooted data processing workflows and infrastructure on AWS,
identifying and resolving issues to minimize downtime and maximize data availability.
 Wrote User Defined functions (UDFs) for special functionality for Spark.
 Used SQOOP Export functionalities and scheduled the jobs on daily basis with Shell scripting in
Oozie.
 Stayed updated with the latest AWS technologies and best practices through self-learning, online
courses, and participation in AWS user groups or forums.
 Worked with SQOOP jobs to import the data from RDBMS and used various optimization
techniques to optimize Hive and SQOOP.
 Used SQOOP import functionality for loading Historical data present in a Relational Database
system into Hadoop File System (HDFS).
Environment: Hadoop, MapReduce, HDFS, Hive, AWS, Java, R, Sqoop, Spark, Sqoop.

Education: Aug 2012 to April 2016 Bachelor of Technology in Computer Science, JNTUH college of
engineering Hyderabad, India.

You might also like