You are on page 1of 5

Bhavith

Sr. Data Engineer

PROFESSIONAL SUMMARY:

● Overall 7+ years of experience in the IT industry and expertise in Big Data/ Hadoop Development
framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration
using SQL, Python, Spark and Big Data technologies.
● Solid experience in Big Data Analytics using HDFS, Hive, Impala, Kafka, Pig, Sqoop, MapReduce, HBase,
Spark, Spark SQL, YARN, Spark Streaming, Zookeeper, Hue, Flume, Oozie.
● Develop data set processes for data modeling, and Data mining. Recommend ways to improve data
reliability, efficiency and quality.
● Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems
and vice-versa and load into Hive tables, which are partitioned.
● Hands-on use of Spark to compare the performance of Spark with Hive and SQL, and Spark SQL to
manipulate Data Frames.
● Experience in design and development of Ingestion framework from multiple sources to Hadoop using
Spark framework with PySpark.
● Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per
the requirement.
● Experience in installing, configuring and administering Hadoop clusters of major Hadoop distributions.
● Knowledge in Amazon Web Services (AWS) Cloud Platform which includes services like Glue, EC2, S3,
VPC, ELB, IAM, DynamoDB, CloudFront, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling,
Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy,
DynamoDB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail,
OpsWorks, Kinesis, IAM, SQS, SNS, SES.
● Experience in working with Microsoft Azure, Spark, Azure Data Lake Gen2, Azure Databricks, Azure
Data Factory, Spark Streaming, Event Hubs, IoT, Apache Druid, REST APIs, Azure Data warehouse
● Experience in Snowflake Clone and TimeTravel, in-depth knowledge in Snowflake Database Schemas
and table Structures.
● Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata
Management, Master Data Management and Configuration Management.
● Experience in developing a data pipeline through Kafka-Spark API.
● Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from
different source systems including flat files.
● Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL
Server, and Oracle, Snowflake)
● Excellent understanding of Snowflake Internals and integration of Snowflake with other data
processing and reporting technologies.
● Proficient in data processing like collecting, aggregating, moving from various sources using Apache
Flume and Kafka.
● Expertise in Python, user-defined functions (UDF) for Hive using Python.
● Experienced in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.
● Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology
(Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
● Creative skills in developing elegant solutions to challenges related to pipeline engineering
● Strong analytical skills with ability to quickly understand a client's business needs. Involved in business
meetings for requirements gathering from business clients.
TECHNICAL SKILLS:

Big Data Technologies HDFS, YARN, Hive, Map Reduce, Pig, H catalog, Sqoop, Zookeeper,
Kafka, Oozie, HBase, Flume, Airflow, CDP 7.x, Nifi

Databases Oracle, My SQL, MS SQL Server, MongoDB, Snowflake

Streaming Technologies Spark Streaming

Data Formats JSON, AVRO, ORC, CSV, Parquet and XML

Scripting Languages Python, Windows PowerShell, Unix Shell Scripts

Operating Systems Linux RHEL/Ubuntu, Windows (XP/7/8/10) UNIX, MAC

Web Servers Web Logic, Web Sphere, Apache Tomcat

Web Services SOAP, RESTful API

WORK EXPERIENCE:

Sparklight - Phoenix, AZ Jan 2020 – Present


Sr. Data Engineer

Responsibilities:
● Worked on building efficient ETL process from various data sources to Azure data storage services
using multiple Azure services such as Azure data lake, Azure storage, Azure SQL, Azure DW and
processing data in Azure Databricks
● Extensively worked on creating pipeline jobs, scheduling triggers, Mapping data flows
using Azure Data Factory(V2) and using Key Vaults to store credentials.
● Worked on Spark, Spark Streaming and Creating the Data Frames handled in Spark with PySpark.
● Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing
● Worked with various file formats like Delta, Avro, ini, yaml, Json for serializing and deserializing.
● Developed Spark applications by using Databricks for data processing from various streaming sources.
● Working with different data storage options including Azure Blob, ADLS Gen-1, Gen-2.
● Used Azure Data Factory V2 to Pause and Resume Azure SQL Data Warehouse and Implemented Copy
activity, Custom Azure Data Factory Pipeline Activities.
● Worked on developing critical transformations using PySpark in Azure Databricks.
● Implemented various Optimization techniques like partitioning data, Cache and Persist, Broadcast
joins, Parallelism (repartition and coalesce), API selection (Dataset, Dataframe, RDD), Choosing file
format, ByKey operation
● Evaluating client needs and translating their business requirements to functional specifications thereby
onboarding them onto the Azure ecosystem.
● Implemented Spark Scripts using Python, Spark SQL to access hive tables into Spark for faster
processing of data.
● Migrated MapReduce jobs to Spark jobs to achieve better performance.
● Responsible for developing data pipeline using Azure Data Factory, Azure Databricks, PySpark, used
Apache Kafka to ingest data.
● Create pipelines to migrate data from SQLServer, Azure DataLake to Snowflake.
● Knowledge on architecture and components of Spark, and efficient in working with Spark Core,
SparkSQL. Designed and developed RDD Seeds and Cascading. Streaming data to Spark streaming using
Kafka
● Scheduled jobs in Azure Data Factory using triggers on daily, weekly and monthly basis.
● Build and meet project timelines and manage delivery commitments with proper
communication to management.

Environment: Azure, Databricks, PySpark, Hive, Map Reduce, Apache Kafka, Cassandra, Oozie, Spark, SPARK
SQL, Maven, Python, SQL and Linux, IntelliJ, YARN, Agile Methodology.

Young Living - Remote(Phoenix, Az) Apr 2018 – Dec 2019


Data Engineer

Responsibilities:
● Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics,
processing, storing and Reporting of voluminous, rapidly changing data.
● Responsible for maintaining quality reference data in source by performing operations such as
cleaning, transformation and ensuring Integrity in a relational environment by working closely with the
stakeholders and solution architect.
● Performed end-to-end Architecture & implementation assessment of various AWS services like
Amazon EMR, Redshift, S3.
● Transformed the data using AWS Glue dynamic frames with PySpark, cataloged the transformed the
data using Crawlers and scheduled the job and crawler using workflow feature.
● Allotted permissions, policies and roles to users and groups using AWS Identity and Access
Management (IAM).
● Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores
and databases, such as Amazon Simple Storage Service (Amazon S3)and Amazon DynamoDB.
● Used Spark SQL for Scala and Python interface that automatically converts RDD case classes to schema
RDD.
● Import the data from different sources like HDFS/HBase into Spark RDD and perform computations
using PySpark to generate the output response.
● Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce
the cost for EC2 resources.
● Importing & exporting database using SQL Server Integrations Services (SIS) and Data Transformation
Services (DTS Packages).
● Developed reusable framework to be leveraged for future migrations that automates EL from RDBMS
systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
● Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and
publishing data sources to Tableau server.
● Developed Kibana Dashboards based on the Log stash data and Integrated different source and target
systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.
● Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks
such as publishing data to S3, training ML model and deploying it for prediction.
● Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on
Amazon SageMaker.
● Worked in writing Spark Sql scripts for optimizing the query performance.
● Responsible for handling different data formats like Avro, Parquet and ORC formats.
Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Amazon SageMaker, Apache
Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau

American Airlines, Tempe, Arizona July 2015 – Mar 2018


Data Engineer

Responsibilities:
● Experience in creating Hive Tables, Partitioning and Bucketing.
● Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data
from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and
backwards.
● Developed SQOOP scripts to migrate data from Oracle to Big data Environment.
● Involved in Analysis, Design and Implementation/translation of Business User requirements.
● Worked on collection of large sets using Python scripting, PySpark, Spark SQL
● Worked on large sets of Structured and Unstructured data.
● Actively involved in designing and developing data ingestion, aggregation, and integration in the
Hadoop environment.
● Developed Sqoop scripts to import export data from relational sources and handled incremental
loading on the customer, transaction data by date.
● Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi
Structured JSON data and converted to Parquet using Data Frames in Spark.
● Performed data analysis and data profiling using complex SQL queries on various source systems
including Oracle 10g/11g and SQL Server 2012.
● Identified inconsistencies in data collected from different sources.
● Participated in requirement gathering and worked closely with the architect in designing and modeling.
● Designed object model, data model, tables, constraints, necessary stored procedures, functions,
triggers, and packages for Oracle Database.
● Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations.
● Imported data from various sources into Spark RDD for processing.
● Developed custom aggregate functions using Spark SQL and performed interactive querying.
● Worked on installing cluster, commissioning & decommissioning of Data node, Name node high
availability, capacity planning, and slots configuration.
● Developed Spark applications for the entire batch processing by using PySpark.
● Extract Transform and Load data from Sources Systems to Azure Data Storage services using a
combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data
Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in In Azure Databricks.
● Visualized the results using Tableau dashboards and the Python Seaborn libraries were used for Data
interpretation in deployment.
● Worked with business owners/stakeholders to assess Risk impact, provided solutions to business
owners.
● Experienced in determining trends and significant data relationships Analyzing using advanced
Statistical Methods.
● Took personal responsibility for meeting deadlines and delivering high quality work and create POCs to
demonstrate new technologies including Jupyter Notebooks, PySpark.
● Strived to continually improve existing methodologies, processes, and deliverable templates.
Environment: Python, SQL server, Oracle, HDFS, HBase, Azure, MapReduce, Hive, Impala, Pig, Sqoop,
NoSQL, Tableau, RNN, LSTM, Unix/Linux, Spark, PySpark, Notebooks.

Acuvate, Hyderabad, India Jun 2013 – Jan 2014


Hadoop Developer

Responsibilities:
● Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process.
● Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest
improvements of the system and software.
● Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data
using Sqoop to the RDBMS servers after aggregations for other ETL operations.
● Involved in Functional Testing, Integration testing, Regression Testing, Smoke testing and performance
Testing. Tested Hadoop, Map Reduce developed in python, pig, Hive.
● Written and executed Test Cases and reviewed with Business & Development Teams.
● Worked on debugging, performance tuning and Analyzing data using Hadoop components Hive Pig.
● Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
● Automated Regression tool (Qute) and reduced manual effort and increased team productivity

Environment: Hadoop, Map Reduce, HDFS, Pig, HiveQL, MySQL, UNIX Shell Scripting, Java, SSIS, JSON, Hive,
Sqoop.

You might also like