You are on page 1of 7

SHARATH KUMAR REDDY PUTTAPATI

AWS SNOWFLAKE DATA ENGINEER


Mobile: 6822313054
Mail: sputtapati@gmail.com
PROFESSIONAL SUMMARY
● To obtain a challenging role as a Data Engineer, leveraging my 9 years of experience in multiple technology
methodologies, including Big Data analysis, design, and development using Hadoop, AWS, Python, data Lake, Scala,
and PySpark, as well as database and data warehousing development using MySQL, Oracle, and Data Warehouse.
With my broad experience and technical expertise, I am committed to designing and implementing high-performance
data solutions that deliver business value and enable data-driven decision-making.
● Hands on Experience on Hadoop Distribution Platforms Namely IBM Big Insights, Hortonworks and Cloudera and Cloud
platforms GCP and AWS
● Expertise in Big Data Technologies and Hadoop Ecosystems such as Pyspark, Spark-Scala, HDFS, GPFS, Hive, Sqoop, PIG,
Spark-SQL, Kafka, Hue, Yarn, Trifacta and EPIC data sources.
● Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
of Big Data and Machine Learning Concepts.
● Hands on experience in Building Data pipelines and Data marts using Hadoop stack.

● Hands on experience in Apache Spark creating RDD’s and Data Frames applying Operations Transformation and Actions and
concerting RDD’s to Data Frames.
● Experienced in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.

● Experience in writing REST APIs in Python for large-scale applications.

● Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda,
S3, IAM and EC2.
● Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics
on the incoming data.
● In depth understanding of Apache Spark job execution components like DAG, Executors, Task Schedular,Stages and Spark
Stearming.
● Experience in Creating and executing Data Pipelines in GCP and AWS platforms.

● Hands on Experinece in GCP, Big query , cloud functions, data proc.

● Strong Experience in Control-M Job Scheduler Tool,Apache Air flow, ESP, D-series. Monitored the jobs on call base to close
Incident tickets. Implemented real- NetFlow-powered time data processing pipelines using technologies like Apache Kafka
and Apache Flink for timely insights and decision-making.
● Orchestrated and managed robust NetFlow-driven big data infrastructure, utilizing cloud-based services for scalable and
fault-tolerant data systems.
● Spearheaded data quality initiatives and implemented NetFlow-based governance frameworks, ensuring accurate and com-
pliant data practices across the organization
● Hands-on experience with Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda,
EMR, Redshift, DynamoDB and other services of the AWS family.
● Expertise in using CICID JENKINS pipeline to deploy the codes into production

● Expertise in managing real-time data solutions with AWS RDS. Proficient in automating backups, implementing disaster re-
covery strategies, and utilizing snapshots to ensure high availability and data integrity.
● Designed and developed the program paradigm to support data collection , filtering process in data warehouse and Hadoop
data mart.
● Strong Hadoop and platform support experience with all the entire suite of tools and services in major Hadoop
Distributions - Cloudera, Amazon EMR, and Hortonworks.
● Hands-on Experience in working with globally distributed team in Europe,Mexico & India and AGILE Implementation
methodology.
● Hands on experience in working Agile environment and follow release management ,Golden rules.

● Experience in Version control tools such as GIT and Urban Code Deployment (UCD) tools.
.

TECHNICAL SKILLS
AWS Services AWS s3, redshift, EMR, SNS, SQS, Athena, glue, CloudWatch, kinesis, route53, IAM
Big Data Technologies HDFS, SQOOP, PySpark, Hive, MapReduce, Spark, Spark streaming, HBASE, GPFS.
Hadoop Distribution Cloudera, Horton Works, IBM Insights
Languages Java, SQL, PL/SQL, Python, HiveQL, Scala, Shell Scripting
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Database Teradata, oracle, SQL server, Hive, NETEZZA.
Scheduling IBM Tivoli, control-m, oozie, Apache Air flow
Version Control GIT, GitHub, VSS
Methodology Agile, Scrum, Jira.
IDE & Build Tools, Design Eclipse, Visual Studio, IntelliJ IDEA, PyCharm.
Cloud Computing Tools GCP, AWS, Snowflake.
Others MS Office, RTC, Service Now, OPTIM, WinSCP, MSVisio.

EDUCATION
● Bachelor of technology in Civil Engineering from Malla Reddy Engineering college & management sciences, India.

● Master in Computer Information System From Rivier University, Nashua New Hampshire

WORK EXPERIENCE

Client: Home Depot, Atlanta | Aug 2022 to Present


Role: AWS Snowflake Data Engineer
● Designed and implemented Snowflake stages to efficiently load data from various sources into Snowflake tables.

● Created and managed different types of tables in Snowflake, such as transient, temporary, and persistent tables.

● Optimized Snowflake warehouses by selecting appropriate sizes and configurations to achieve optimal performance
and cost efficiency.
● Developed complex Snow SQL queries to extract, transform, and load data from various sources into Snowflake.

● Implemented partitioning techniques in Snowflake to improve query performance and data retrieval.

● Configured and managed multi-cluster warehouses in Snowflake to handle high-concurrency workloads effectively.

● Defined roles and access privileges in Snowflake to ensure proper data security and governance.

● Implemented Snowflake caching mechanisms to improve query performance and reduce data transfer costs.

● Utilized Snow pipe for real-time data ingestion into Snowflake, ensuring continuous data availability and
automated data loading processes.
● Worked on various data modeling concepts like star schema, and snowflake schema in the project.

● Leveraged Snowflake's time travel features to track and restore historical data for auditing and analysis purposes.

● Implemented regular expressions in Snowflake for pattern matching and data extraction tasks.

● Developed Snowflake scripting solutions to automate data pipelines, ETL processes, and data transformations.

● AWS Cloud Data Engineering:

● Designed and implemented data ingestion and storage solutions using AWS S3, Redshift, and Glue.

● Proficiency in using EMR to process large-scale datasets using Apache Hadoop, Apache Spark, and other open-
source tools.
● Performed data transformation operations, such as filtering, aggregating, joining, and sorting datasets, using
Spark's powerful APIs like RDDs (Resilient Distributed Datasets) or Data Frames.
● Optimized Spark jobs for performance and scalability, including tuning memory settings, partitioning, and leveraging
caching mechanisms.
● Working with distributed computing environments like Apache Hadoop, YARN, or Kubernetes to deploy and
manage Spark applications.
● Integrated Spark with various data sources, such as HDFS, S3, databases, and streaming platforms, to ingest and
process data.
● Identifying and resolved performance bottlenecks, data skew issues, and other challenges related to Spark jobs.

● Developed ETL workflows using AWS Glue to extract, transform, and load data from various sources into Redshift.

● Integrated AWS SNS and SQS for real-time event processing and messaging.

● Implemented AWS Athena for ad-hoc data analysis and querying on S3 data.

● Familiarity with AWS Serverless Application Model (SAM) for Lambda deployment and management.

● Utilized AWS CloudWatch for monitoring and managing resources, setting up alarms, and collecting metrics.

● Designed and implemented data streaming solutions using AWS Kinesis for real-time data processing.

● Worked on FGA for new RDS Oracle databases and created IAM policies for users.

● Developed a shell script for Audit enable/disable on RDS Oracle Tables in multiple databases.

● Involved in designing and developing Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing,
Amazon SWF, Amazon SQS, and other services of the AWS infrastructure.
● Created and configured Oracle DB instance in AWS RDS and created Roles for databases.

● Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau

● Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology..

● Managed DNS configurations and routing using AWS Route53 for efficient application and service deployment.

● Developed data processing pipelines using Hadoop, including HDFS, Sqoop, Hive, MapReduce, and Spark.

● Implemented Spark Streaming for real-time data processing and analytics.

● Imported weblogs & unstructured data using Apache Flume and stored the data in Flume channel.

● Implemented scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow.

● Designed and configured workflows for data processing and ETL pipelines.

● Designed developed, and deploying Extract, Transform, Load (ETL) packages using SQL Server Integration
Services to move and transform data between different data sources and destinations.
● Extracted data from various sources, such as databases, flat files, APIs, and web services, using SSIS components like
Source Adapters.
● Experience in Birst BI Data Modeling (Admin), Live Access, Reporting (Designer /Visualizer /Dashboards) and space
management.
● Published RDS capabilities and limitations documentation into GitHub.

● Implemented data transformation tasks, including data cleansing, validation, aggregation, and enrichment, using
SSIS transformations like Derived Columns, Lookup, and Conditional Split.
● Deployed SSIS packages to production environments and scheduling their execution using SQL Server Agent or other
scheduling tools.
● Integrated SSIS with data warehousing solutions like SQL Server Analysis Services (SSAS) and SQL Server
Reporting Services (SSRS) to support business intelligence and reporting requirements.
● Designed and developed database solutions using Teradata, Oracle, and SQL Server.
● Utilized Git, GitLab,
Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Athena, glue, CloudWatch, kinesis, route53, IAM, Sqoop, MYSQL,
HDFS, Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Terraform, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m,
OOZIE, airflow, Teradata, oracle, SQL

College Board, New York | Mar 2021 to Jul 2022


Snowflake Engineer
● Created and managed various types of Snowflake tables, including transient, temporary, and persistent tables, to
cater to specific data storage and processing needs.
● Implemented advanced partitioning techniques in Snowflake to significantly enhance query performance and
expedite data retrieval.
● Defined robust roles and access privileges within Snowflake to enforce strict data security and governance
protocols.
● Implemented regular expressions in Snowflake for seamless pattern matching and data extraction tasks.

● Developed and implemented Snowflake scripting solutions to automate critical data pipelines, ETL processes, and
data transformations.
● Developed and optimized ETL workflows using AWS Glue to extract, transform, and load data from diverse sources
into Redshift for efficient data processing.
● Configured and fine-tuned Redshift clusters to achieve high-performance data processing and streamlined
querying.
● Set up and managed multiple RDS instances for various databases, including MySQL and PostgreSQL

● Integrated AWS SNS and SQS to enable real-time event processing and efficient messaging.

● Understanding of Lambda monitoring and logging using CloudWatch logs and metrics.

● Implemented AWS Athena for ad-hoc data analysis and querying on data stored in AWS S3.

● Designed and implemented data streaming solutions using AWS Kinesis, enabling real-time data processing and
analysis.
● Effectively managed DNS configurations and routing using AWS Route53, ensuring efficient deployment of
applications and services.
● Implemented robust IAM policies and roles to ensure secure user access and permissions for AWS resources.

● Understanding of EMR integration with other AWS services, such as Amazon S3, Amazon Redshift, and Amazon
DynamoDB

● Developed and optimized data processing pipelines using Hadoop ecosystem technologies such as HDFS, Sqoop,
Hive, MapReduce, and Spark.
● Implemented Spark Streaming for real-time data processing and advanced analytics.

● Demonstrated expertise in scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow, for
execution of data processing and ETL pipelines.
● Designed and developed database solutions using Teradata, Oracle, and SQL Server, including schema design and
optimization, stored procedures, triggers, and cursors.
● Proficient in utilizing version control systems such as Git, GitLab, and VSS for efficient code repository management
and collaborative development processes.
Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Athena, glue, CloudWatch, kinesis, route53, IAM, Sqoop, MYSQL,
HDFS, Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m, OOZIE, airflow,
Teradata, oracle, SQL

USPTO, Alexandria VA | Oct 2019 to Feb 2021


Big Data Developer
● Imported data from MySQL to HDFS on a regular basis using Sqoop for efficient data loading.
● Performed aggregations on large volumes of data using Apache Spark and Scala and stored the results in the Hive
data warehouse for further analysis.
● Worked extensively with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and
Cloudera.
● Used Node.js to write custom UDFs in Big query and used them in the data pipeline.

● Used Python for scripting purposes, for leveraging a wide range of technologies that include leveraging a wide range of
technologies.
● Developed applications and deployed them in Google Cloud Platform using DataProc, Dataflow, Composer, Big Query,
Bigtable, Cloud Storage, GCS and various operators in DAG.
● Migrated existing data pipelines in hive to GCP platform.

● Knowledge of Athena query performance optimization techniques, such as partitioning and data compression

● Loaded and transformed structured, semi-structured, and unstructured data sets efficiently.

● Developed Hive queries to analyse data and meet specific business requirements.

● Leveraged HBASE integration with Hive to build HBASE tables in the Analytics Zone.

● Utilized Kafka and Spark Streaming to process streaming data for specific use cases.

● Developed data pipelines using Flume and Sqoop to ingest customer behavioural data into HDFS for analysis.

● Utilized various big data analytic tools, such as Hive and MapReduce, to analyse Hadoop clusters.

● Implemented a data pipeline using Kafka, Spark, and Hive for ingestion, transformation, and analysis of data.

● Wrote Hive queries and used Hive QL to simulate MapReduce functionalities for data analysis and processing.

● Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for efficient data processing.

● Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and
transformation processes.
● Built a system for analyzing the column names from all tables and identifying personal information columns of data across
on-premises Databases (data migration) to GCP
● Implemented CI/CD pipelines for building and deploying projects in the Hadoop environment.

● Utilized JIRA for issue and project workflow management.

● Utilized PySpark and Spark SQL for faster testing and processing of data in Spark.

● Used Spark Streaming to process streaming data in batches for efficient batch processing.

● Leveraged Zookeeper to coordinate, synchronize, and serialize servers within clusters.

● Utilized the Oozie workflow engine for job scheduling in Hadoop.

● Utilized PySpark in Spark SQL for data analysis and processing.

● Used Git as a version control tool to maintain the code repository.


Environment: Sqoop, MYSQL, GCP,HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, Kafka, MapReduce, Zookeeper,
Oozie, Data Pipelines, RDBMS, Python, PySpark, Ambari, JIRA.

Assets For child Support, OTDA NY | May 2018 to Sep 2019


Hadoop Developer
● Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables.

● Rigorously used Spark -Scala (RRD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector API's for
various tasks (Data migration, Business report generation etc.)
● Developed Spark Streaming application for real time sales analytics.

● Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the
source and make it available for consumption.
● Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can
be reused across the project.
● Analysed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV files to
generate Power BI ad-hoc reports.
● Analysed the SQL scripts and designed the solution to implement using PySpark

● Extracted the data from other data sources into HDFS using Sqoop.

● Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded
data into HDFS.
● Proficiency in writing SQL queries to analyse data stored in Amazon S3 using Athena.

● Extracted the data from MySQL into HDFS using Sqoop.

● Implemented automation for deployments by using YAML scripts for massive builds and releases.

● Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.

● Implemented Data classification algorithms using MapReduce design patterns.

● Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of
MapReduce jobs.
● Worked on GIT to maintain source code in Git and GitHub repositories.
Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL.

Canadian Tire Financial Services, Pittsburgh, PA | Mar 2016 to Apr 2018


Data Warehouse Developer
● Creating jobs, SQL Mail Agent, alerts, and scheduling DTS/SSIS packages for automated processes.

● Managing and updating Erwin models for logical/physical data modelling of Consolidated Data Store (CDS),
Actuarial Data Mart (ADM), and Reference DB to meet user requirements.
● Utilizing TFS for source controlling and tracking environment-specific script deployments.

● Exporting current data models from Erwin to PDF format and publishing them on SharePoint for user access.

● Developing, administering, and managing databases such as Consolidated Data Store, Reference Database, and
Actuarial Data Mart.
● Writing triggers, stored procedures, and functions using Transact-SQL (T-SQL) and maintaining physical database
structures.
● Ability to integrate Glue with other AWS services, such as S3, Athena, and Redshift

● Deploying scripts in different environments based on Configuration Management and Playbook requirements.

● Creating and managing files and file groups, establishing table/index associations, and optimizing query and
performance tuning.
● Tracking and closing defects using Quality Center for effective issue management.

● Maintaining users, roles, and permissions within the SQL Server environment.
Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, PerformancePoint
Server 2007, Oracle 10g, visual Studio 2010.

American Express, Phoenix, AZ | Feb 2014 to Feb 2016


Data Warehouse Developer
● Creation, manipulation and supporting the SQL Server databases.

● Involved in the Data modelling, Physical and Logical Design of Database

● Helped in integration of the front end with the SQL Server backend.

● Developed robust ETL pipelines using Python, leveraging frameworks such as Apache Airflow and Apache Spark,
to efficiently extract, transform, and load large volumes of data into various data warehousing systems.
● Implemented data quality checks and monitoring solutions to proactively identify and resolve data inconsistencies,
ensuring high data accuracy and reliability for business analytics and reporting purposes.
● Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc on various database objects to
obtain the required results.
● Experience using Glue to perform batch and stream processing of data

● Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)

● Wrote T-SQL statements for retrieval of data and involved in performance tuning of T-SQL queries.

● Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc to SQL
Server using SSIS/DTS using various features like data conversion etc. In addition, Created derived columns from the
present columns for the given requirements.
● Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating different
types of reports such as Cross-tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub reports, and
formatting them.
● Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries
for Production database monitoring in 24/7 support.
● Created logging for ETL load at package level and task level to log number of records processed by each package and
each task in a package using SSIS.
● Developed, monitored and deployed SSIS packages.
Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage
6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows
2000.

You might also like