You are on page 1of 3

Yamini B

Email: srikanth.kavuru@denkensolutions.com Phone: +1(949)889-3575


--------------------------------------------------------------------------------------------------------------------------------------------
PROFESSIONAL SUMMARY:
Having 7+ years of professional Hadoop and 4+ years in AWS/Azure Data Engineering, Data Science and Big data implementation experience in utilizing PySpark for Ingestion, storage, querying,
processing, and analysis of big data.

 Experience building data lake in Databricks and running jobs, setting pipelines end to end in databricks environment.

 Experience in setting up job clusters and separating environment based config files for each job deployments in Databricks.

 Experience in setting up tags for different job clusters so that it could be used for tracking costs and eliminate dependency with other teams.

 Expert in delta lake architecture such as establishing different zones in Databricks.

 Experience with Azure Data Platform stack: Azure Data Lake, Data Factory and Databricks

 Practical experience with AWS technologies such as EC2, Lambda, EBS, EKS, ELB, VPC, IAM, ROUTE53, Autoscaling, Load Balancing, Guard Duty, AWS Shield, AWS Web Application

Firewall (WAF), Network Access Control List (NACL), S3, SES, SQS, SNS, SES, AWS Glue, Quick Sight, Sage maker, Kinesis, Redshift, RDS, DynamoDB, Datadog, ElastiCache
(Memcached & Redis).

 Extracted Meta Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.

 Designed and developed components/applications using Snowflake, Microsoft Azure, AWS- Databricks, ADF, ADL, Hive, Python, Databricks, SparkSql, PySpark.

 Developed and Implemented Data Solutions utilizing Azure Services like Event Hub, Azure Data Factory, ADLS, Databricks, Azure web apps, Azure SQL DB instances. 

 Experience in creating Databricks workspaces and use ACLs on delta tables within respective departments.

 Experience in optimising Databricks Clusters setting up autoscaling policies and measuring metrics using Spark UI and Ganglia metrics.

 Implement AWS Lambdas to drive real-time monitoring dashboards from system logs.

 Expertise in AWS Databricks

 Experienced in running spark jobs on AWS EMR and using the EMR cluster and various EC2 instance types based on requirements.

 Developed PySpark scripts interacting with various data sources like AWS RDS, S3, Kinesis and distributed file types such as ORC, Parquet and Avro.

 Experience with AWS Multi-Factor Authentication (MFA) for RDP/SSO logon, working with teams to lock down security groups and build specific IAM profiles per group using recently

released APIs for restricting resources within AWS depending on group or user.

 Configure Jenkins to build CI/CD pipeline which includes to trigger auto builds, auto promote builds from one environment to another, code analysis, auto version etc. for various projects.

 Worked in highly collaborative operations team to streamline the process of implementing security Confidential Azure cloud environment and introduced best practices for remediation.

 Hands on experience with Azure Data Lake, Azure Data Factory, Azure Blob and Azure Storage Explorer.

 Experience in using various Amazon Web Services (AWS) Components like EC2 for virtual servers, S3 and Glacier for storing objects, EBS, Cloud Front, Elastic cache and Dynamo DB for

storing data.

 Good experience with use-case development, with Software methodologies like Agile and Waterfall.

TECHNICAL SKILLS:
Cloud : AWS, Azure(Azure Databricks)
Testing methods : Selenium 2.0, HP QTP 11.0, SOAP UI.
RDBMS : Oracle, SQL Server, DB2, MySQL, PGADMIN, RedShift, Cosmos DB
Languages : Apache Spark, Python, SQL, PL/SQL, HTML, DHTML, UML
Version tools : SVN, GIT
Automation Tools : Jenkins, Azure DevOps, Code Pipeline, Airflow
Scripting languages : Java, Python, shell Scripting, PowerShell Scripting YAML, JSON, Scala, SQL
Agile Tool : JIRA
Infrastructure as Code : CloudFormation, Terraform
PROFESSIONAL EXPERIENCE:
Client: Capital One August 2021 to Present

Role: Sr Data Engineer

Responsibilities:

 Created Databricks workspaces and use ACLs on delta tables within respective departments.

 Worked on optimising Databricks Clusters setting up autoscaling policies and measuring metrics using Spark UI and Ganglia metrics.

 Created/Setup EMR clusters for running data engineering work loads and data scientists.

 Hands on worked with Spark, Databricks, Scala, Pyspark, snowflake.

 Experience in building delta tables in AWS databricks and utilize delta tables for ETL processes.

 Also interfaced Kafka topics from AWS to Databricks delta tables and submitted jobs in job clusters.

 Experienced in setting up databricks SQL and unity catalogue, cluster pools and building feature stores in databricks.

 Worked on setting up job clusters and separating environment based config files for each job deployments in Databricks.

 Worked on setting up tags for different job clusters so that it could be used for tracking costs and eliminate dependency with other teams.

 Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

 Utilized Azure Synapse and Azure Databricks to create a data pipelines in Azure.

 Developed and Implemented Data Solutions utilizing Azure Services like Event Hub, Azure Data Factory, ADLS, Databricks, Azure web apps, Azure SQL DB instances. 

 Involved in setting up automated jobs and deploying machine learning model using Azure DevOps pipelines.
 Involved in design and deployment of a multitude of Cloud services on AWS stack such as Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, EC2, EMR, RDS, Redshift while focusing on

high-availability, fault tolerance, and auto-scaling in AWS CloudFormation.

 Worked in Athena, AWS Glue, Quick Sight for visualization purposes.

 Created data pipelines using data factory, Apache Airflow and data bricks for ETL processing.

 Retrieved data from DBFS into Spark Data Frames, for running predictive analytics on data.

 Used Hive Context which provides a superset of the functionality provided by SQLContext and preferred to write queries using the HiveQL parser to read data from Hive tables.

 Modelled Hive partitions extensively for data separation and faster data processing and followed Hive best practices for tuning.

 Hands on data engineering experience on Scala, Hadoop, EMR, spark, Kafka.

 Knowledge in AWS, Kubernetes, Production support/ troubleshooting.

 Caching of RDDs for better performance and performing actions on each RDD.

 Developed highly complex Python code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.

 Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory

into HDFS.

 Worked on Kafka REST API to collect and load the data on Hadoop file system and used Sqoop to load the data from relational databases.

 Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Environment: PySpark, Hive, SQOOP, Kafka, Airflow, Python, Scala, Spark streaming, DBFS, SQL Context, Spark RDD, REST API, Spark SQL, Hadoop, SQOOP, Parquet files, Oracle, SQL Server.

Client: anthem May 2019 to July 2021

Role: Data Engineer


Responsibilities:

 Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

 Worked on developing PySpark script to encrypting the raw data by using hashing algorithms concepts on client specified columns.

 Responsible for Design, Development, and testing of the database and Developed Stored Procedures and Views.

 Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.

 Compiling and validating data from all departments and Presenting to Director Operation.

 KPI (Key Performance Indicator) calculator Sheet and maintain that sheet within SharePoint.

 Created reports with complex calculations, designed dashboards for analysing POS data and developed visualizations and worked on Ad-hoc reporting using Tableau.

 Creating data model that correlates all the metrics and gives a valuable output.

 Designed Spark based real-time data ingestion and real-time analytics, created Kafka producer to synthesize alarms using Python also used Spark-SQL to Load JSON data and create

SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.

 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.

 Developed data pipeline using Spark, Hive, Pig, and python to ingest customer.

 Loaded and transformed large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts. Developed Hive and MapReduce tools to design and manage HDFS

data blocks and data distribution methods.

 Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.

 Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR.

Environment: SparkSQL, PySpark, SQL, RESTful Web Service, Tableau, Kafka, JSON, Hive, Pig, Hadoop, HDFS, MapReduce, S3, Redshift, AWS Data pipeline, Amazon EMR.

Client: ZENOPSYS TECHNOLOGIES Pvt Ltd, Hyderabad, IN May 2015 to December 2018

Role: Data Engineer

Responsibilities:

 Defined data contracts, and specifications including REST APIs.

 Worked on relational database modelling concepts in SQL, performed query performance tuning.

 Worked on Hive Meta store backup, Partitioning and bucketing techniques in hive to improve the performance and tuning Spark Jobs

 Responsible to build and run resilient data pipelines in production and implemented ETL/ELT to load a multi-terabyte enterprise data warehouse.

 Worked closely with Data science team and understand the requirement clearly and create hive table on HDFS.

 Developed Spark scripts by using python commands as per the requirement.

 Solved performance issues in Spark with understanding of groups, joins and aggregation.

 Scheduling Spark jobs in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations.
 Experience in using the EMR cluster and various EC2 instance types based on requirements.

 Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Hive UDFs.

 Was responsible for creating on-demand tables on S3 files using Lambda Functions using Python and PySpark.

 Designed and developed Map Reduce program to analyse & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact on flight historical

data.

 Created End-to-end ETL pipeline for data processing for created dashboards to business using PySpark.

 Developing spark programs using python APIs to compare the performance of spark with HIVE and SQL and generated reports monthly and daily basis.

 Developed dataflows and processes for the Data processing using SQL (SparkSQL & Data frames).

 Understood business requirements and prepared design documents, coding, testing and go on live production environment.

 Implemented analytics applications using multiple database technologies, such as relational, multidimensional (OLAP), key-value, document, or graph.

 Built cloud-native applications and supporting technologies practices including AWS, Docker, CI/CD and microservices. 

 Involved in planning process of iterations under the Agile Scrum methodology.

Environment: Hive, PySpark, HDFS, Python, EMR, EC2, UNIX, S3 files, SQL, MapReduce, ETL/ELT, Docker, REST API, Agile Scrum, OLAP (Online Analytical Processing).

Education:

Bachelor’s in Computer Science – IIIT – Basara –2015 (Rajiv Gandhi University)

You might also like