Professional Documents
Culture Documents
Email: akhilreddy0760@gmail.com
PH: 4153471504
Sr Big Data Engineer / GCP / Hadoop Cloud Developer
As a data engineer, I was responsible for Assess and document data requirements and client-specific
requirements to develop user-friendly BI solutions - reports, dashboards, and decision aids. Worked on Data
warehousing, Data engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence, specialize in
AWS frameworks, GCP / Hadoop Ecosystem, Excel, Snowflake, relational databases, tools like Tableau, PowerBI
Python and Data DevOps Frameworks/Azure DevOps Pipelines
PROFESSIONAL SUMMARY
8+ years of professional experience in information technology with an expert hand in the areas of
BIG DATA, HADOOP, SPARK, HIVE, IMPALA, SQOOP, FLUME, KAFKA, SQL tuning, ETL
development, report development, database development, data modeling and strong knowledge
of oracle database architecture.
Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB
cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME,
HBASE, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive SQL, IMPALA,
SPARK, SPARK SQL.
Architected and implemented MLOps - Continuous delivery and automation pipelines in cloud
native machine learning - (AWS Step Function, CodeBuild, Lambda, Azure DevOps, Azure ML, GCP
ML with Kubeflow Pipelines, TensorFlow, Dataflow, Cloud Storage, AI Hub, Cloud Build)
Using Flume, Kafka and Spark streaming to ingest real time or near real time data in HDFS.
Analyzed data and provided insights with R Programming and Python Pandas
Hands on experience on architecting the ETL transformation layers and writing spark jobs to do
the processing.
Have good Programming experience with Python and Scala.
Hands in experience on No SQL database like Hbase, Cassandra.
Analyzing the way to migrate oracle database to redshift.
Experience with scripting languages like PowerShell, Perl, Shell, etc.
Extensive experience in writing MS SQL, T-SQL procedures, ORACLE TOAD functions and queries
Effective team member, collaborative and comfortable working independently
Proficient in achieving oracle SQL plan stability, maintaining baselines with SQL plans, ASH, AWR,
ADDM, Sql Advisor for pro-active follow up and SQL rewrites.
Experience on Shell scripting to automate various activities.
Application development with oracle forms and report with OBIEE, discoverer, report builder
and ETL development.
Created shell scripts to fine tune the ETL flow of the Informatica workflows.
Experience using python libraries for machine learning like pandas, numpy, matplotlib, sklearn,
scipy to Loading the dataset, summarizing the dataset, visualizing the dataset, evaluating some
algorithms and making some predictions
Expertise in Amazon Web Services (AWS) Cloud Platform which includes services
like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk
(EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline,
Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch,
CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
Experience in developing RESTful API's using Django REST Framework.
Strong knowledge in data preparation, data modelling and data visualization using Power BI and
had experience in developing various reports, dashboards using various Visualizations in
Tableau.
Examine and evaluate reporting requirements for various business units.
Can work parallelly in both GCP and Azure Clouds coherently.
Performance optimization: Experience in tuning and optimizing Apache Druid for performance,
including optimizing data processing workflows, data storage configurations, and query
performance to achieve optimal system performance.
Strong expertise in Apache Druid: Experience and proficiency in working with Apache Druid,
including data ingestion, data storage, indexing, querying, and analysis using Druid's query APIs
or query languages like Druid SQL.
Imported data from hive to gcs buckets and later ingested with druid.
DevOps role converting existing AWS infrastructure to Server-less architecture (AWS Lambda,
Kinesis) deployed via Terraform template.
Strong background in DevOps practices and tools, utilizing technologies like Docker, Kubernetes,
Jenkins, or GitLab CI/CD for building and deploying applications and data pipelines in a cloud
environment.
TECHNICAL SKILLS:
PROFESSIONAL EXPERIENCE:
Client: Expedia, San Francisco, CA. Sep 2021 – Till Date
Worked on confluence and Jira skilled in data visualization like Matplotlib and seaborn library
Responsibilities:
Migrating an entire oracle database to BigQuery and using of power bi for reporting.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building power bi reports on Azure Analysis services for better performance.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from
enterprise data from BigQuery.
Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in
Hadoop Cluster over large Datasets.
Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning,
clustering and skewing
Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL
capabilities.
Skilled in data modeling, ETL (Extract, Transform, Load) processes, and data integration techniques,
ensuring efficient data ingestion, transformation, and integration across various data sources and
platforms.
Worked with google data catalog and other google cloud API’s for monitoring, query and billing related
analysis for BigQuery usage.
Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the
batch process.
Involved in Designing and Developing Enhancements of CSG using AWS APIS.
Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake,
Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
Designed end to end scalable architecture to solve business problems using various Azure Components
like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
Experience implementing Cloud based Linux OS in AWS to Develop Scalable Applications with Python.
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data
using the SQL Activity.
Develop and deploy the outcome using spark code in Hadoop cluster running on GCP.
Working on creating Various big data pipelines as part of the migration from on-prem servers into AWS.
Knowledge about cloud dataflow and Apache beam.
Good knowledge in using cloud shell for various tasks and deploying services.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including
Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.
Environment: Gcp, Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil,
Dataproc, Vm Instances, Cloud Sql, Mysql, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql.
Responsibilities:
Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources
to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
Developed ETL data pipelines using Spark, Spark streaming and Scala.
Responsible for loading Data pipelines from web servers using Sqoop, Kafka and Spark Streaming API.
Have experience of working on Snowflake data warehouse.
Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL
database for huge volume of data.
Hands-on experience with Feast, the Feature Management API, for feature store implementation and
management. Proficient in designing and maintaining feature stores to support data-driven applications
and machine learning models.
Developed and designed an API (RESTful Web Service) for the company’s website.
Developed and designed e-mail marketing campaigns using HTML and CSS.
Extensive experience in Amazon Web Services (AWS) Cloud services such as EC2, VPC, S3, IAM, EBS,
RDS, ELB, VPC, Route53, Ops Works, DynamoDB, Autoscaling, CloudFront, CloudTrail, CloudWatch,
CloudFormation, Elastic Beanstalk, AWS SNS, AWS SQS, AWS SES, AWS SWF & AWS Direct Connect.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Developed various UDFs in Map-Reduce and Python for Pig and Hive.
Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
Developed PIG UDFs for manipulating the data according to Business Requirements and worked on
developing custom PIG Loaders.
Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
Designing and Developing Apache NiFi jobs to get the files from transaction systems into data lake raw
zone.
Developed PIG Latin scripts for the analysis of semi structured data
Experienced in Databricks platform where it follows best practices for securing network access to cloud
applications.
Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake,
Azure Data Factory, Azure Data CatLog, HDInsight, Azure SQL Server, Azure ML and Power BI.
Using Azure Databricks, created Spark clusters and configured high concurrency clusters to speed up
the preparation of high-quality data.
Used Azure Databricks for fast, easy, and collaborative spark-based platform on Azure.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Analysed the SQL scripts and designed it by using PySpark SQL for faster performance.
Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and
cloud (Blob, Azure SQL DB, cosmos DB)
Environment: Spark, Spark Streaming, Apache Kafka, Apache NiFi, Hive, Tez, Azure, Azure Databricks, Azure
data grid, Azure Synapse analytics, Azure data catalog, ETL, PIG, PySpark, UNIX, Linux, Tableau, Teradata, Pig,
Snowflake, Sqoop, Hue, Oozie, Java, Scala, Python, GIT, GIT HUB
Client: Four Soft Private Limited Hyderabad, India Sep 2015 to Dec 2017
Data Modeler/Cloud Engineer
This project is an Ecommerce web-based application which allowed the customer to get a view of all the
products in the store and buy them online. The application mainly dealt with the online payment and billings. It
provides a search feature which uses Regular Expression or pattern matching with a user interface called
Search box through which consumers can access and view latest and top selling products/offers. It uses Secure
Good payment Gateway solution for customer to make a payment.
I work closely with Data/Application Architects and other stakeholders to design and maintain
advanced data Pipelines and I was responsible for developing and supporting advanced reports/data pipelines
that provide accurate and timely data for internal and external clients.
Deliverables:
o Responsible for building an Enterprise Data Lake to bring ML ecosystem capabilities to production and
make it readily consumable for data scientists and business users.
o Processing and transforming the data using AWS G to assist the Data Science team as per business
requirement.
o Implement Restful services using Spring MVC
o Used Spring Jdbc for interacting wif DataBase.
o Involved in writing JUnit test cases.
o Performing code reviews using SonarQube.
o Deploy Spring Boot Applications into Google Cloud - App Engine.
o Developing Spark applications for cleaning and validation of the ingested data into the AWS cloud.
o Working on fine-tuning Spark applications to improve the overall processing time for the pipelines.
o Implement simple to complex transformation on Streaming Data and Datasets.
o Work on analysing Hadoop cluster and different big data analytic tools including Hive, Spark, Python,
Sqoop, flume, Oozie.6
Responsibilities:
Use Spark Streaming to stream data from external sources using Kafka service and responsible for
migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems
components like RedShift, Dynamo DB.
Perform configuration, deployment, and support of cloud services in Amazon Web Services (AWS).
Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground
up on Confidential Redshift.
Design Develop and test ETL Processes in AWS Glue to migrate Campaign data from external sources
like S3, ORC/Parquet/Text Files into AWS Redshift.
Creating stacks using cloud formation to create datasets in AWS glue catalog.
Designed and developed ETL process in AWS Glue to migrate flight aware usage data from S3 data
source to redshift.
Coded Ingestion pipeline using step functions, Lambda's, SQS Queues, SNS Notifications, Glue ETL,
Crawlers and Athena.
Assist team in developing various step functions, Lambda's, CFT's, SNS Topics, SQS Queues, Glue
crawlers for various types of data flow which includes Streaming, non-streaming, CSV, Json formats.