Professional Documents
Culture Documents
Phone: +1 8143778335
E Mail: pavansriharsha4290@gmail.com
PROFESSIONAL SUMMARY:
● Around 7+ years of professional experience in Data , involved in developing, implementing, configuring
Hadoop ecosystem components on Linux environment, development and maintenance of various applications using
Python, developing strategic methods for deploying Big data technologies to efficiently solve Big Data processing
requirement
● Having good knowledge in Python programming language
● Having good experience in Azure data bricks , Azure data lake,Azure Data Factory
● experience in Hadoop eco system components such as HDFS, MapReduce, Yarn, Pig, Hive and Sqoop
● Excellent programming skills at a higher level of abstraction using Python and Spark
● Good understanding in processing of real-time data using Spark
● Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata
into HDFS using Sqoop
● Extensive experienced Big Data - Hadoop developer with varying level of expertise around different Big
Data/Hadoop ecosystem projects which include HDFS, MapReduce, HIVE, Sqoop etc
● Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data
● Sustaining the BigQuery, PySpark and Hive code by fixing the bugs and providing the enhancements required
by the Business User
● Hands-on use of Spark and Python API's to compare the performance of Spark with Hive and SQL, and Spark
SQL to manipulate Data Frames in Python
● Expertise in Python, user-defined functions (UDF) for Hive and Pig using Python
● Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the
requirement
● Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, SQL Server and Oracle)
● Experience in managing and reviewing Hadoop Log files
● Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa
● Experience and understanding in Spark
● Hands on dealing with log files to extract data and to copy into HDFS using flume
● Experienced in the use of agile approaches, including Extreme Programming (XP Programming Techniques),
Test-Driven Development (TDD Methodologies) and Scrum
● Designed various ingestion and processing patterns based on use cases in Deltalake
● Experience in managing and storing confidential credential in Azure Key vault
● Built complex data ingestion/processing frameworks using Azure Databricks/ Python/ Pyspark
● Orchestrated the end to end data integration pipelines using Azure Data Factory
● Strong working experience with ingestion, storage, querying, processing and analysis of Big Data
● Hands-on experience in Python programming and Spark components like Spark-core
and Spark-SQL
● Worked on creating the RDDs and DFs for the required input data and performed the data transformations
using Spark-core
● Hands-on experience in dealing with Apache Hadoop components like HDFS, Map Reduce and Hive
● Hands on Experience with CI/CD configurations on production side
TECHNICAL SKILLS:
Bigdata/Hadoop MapReduce, Spark, SparkSQL, Azure Data factory,Data bricks, Kafka, PySpark, Hive, Yarn, Oozie
Technologies
Languages Python, Shell Scripting and SQL
Web Design Tools HTML, XML
Development Tools Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse.
Public Cloud Microsoft Azure
Development Agile/Scrum, Waterfall
Methodologies
Build Tools Control-M, Oozie, Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, Ant, RTC, RSA, Hue,
SOAP UI
Reporting Tools MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos.
Databases Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems All versions of Windows, UNIX, LINUX
PROJECTS PROFILE :
Responsibilities:
● Primarily involved in Data Migration using SQL, SQL Azure, Azure Data lake and Azure Data Factory
● Build analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition,
operational efficiency and other key business performance metrics
● Working with Source team to extract the data and it will be loaded in the ADLS
● Creating the linked service for source and target connectivity Based on the requirement
● Once it’s created pipelines and datasets will be triggered based on LOAD (History/Delta) operations
● Based on source (big or small) data loaded files will be processed in Azure Databricks by applying operations
in Spark SQL which will be deployed through Azure Data Factory pipelines
● Involved in deploying the solutions to QA, DEV and PROD
● Involved in setting up the environments for QA, DEV and PROD using VSTS
● Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs,
data modeling, and ensure the smooth running of applications
● Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data lake
● Used Azure Databricks notebook to extract the data from Data lake and load it into Azure and On-prem SQL
database
● Worked with large data sets and high capacity big data processing platform, SQL and Data Warehouse
projects
● Developed pipelines that can extract data from various sources and merge into single source datasets in Data
lake using Databricks
● Performed encryption of sensitive data in to Data lake which are sensitive to business using cypher algorithm
● Decrypt the sensitive data using keys for refined datasets for analytics, by providing end users access
● Created connections from different sources from On-prem and cloud to only source for Power BI reports
● Complex High volume high-velocity projects end-to-end delivery experience with good exposure to BigData
architectures. Experience with some framework building experience on Hadoop Very good understanding of Big Data
ecosystem Experience with sizing and estimating large scale big data projects
● Coordinate with external teams to reduce data flow issues and unblock team members
● Always actively participate in four ceremonies: Sprint planning meeting, Daily Scrum, Sprint review meeting,
and Sprint retrospective meeting
● Passion for product quality, customer satisfaction and a proven track record for delivering quality
Environment: Hadoop, Pyspark, Python, Azure data bricks and Azure Data factory
● Creating the pipelines and datasets which are deployed in ADF non-restricted
● creating Indexes, Indexed Views in observing Business Rules and
● creating effective Functions and appropriate Triggers to assist efficient data
● manipulation and data consistency.
● Data Extraction, Transforming, and Loading (ETL) using various tools such as SQL Server
Integration Services (SSIS), and Data Transformation Services (DTS).
● creating dynamic packages for Incremental Loads and Data Cleaning in Data Ware House
using SSIS.
● importing/exporting data between different sources like Oracle/Access/Excel etc. using
SSIS/DTS utility.
● Extracted, transformed, and loaded data from various heterogeneous data sources and
destinations like Access, Excel, CSV, Oracle, and flat files using connectors, tasks, and transformations
provided by SSIS.
● Involved in creating Jobs, SQL Mail Agent, Alerts, and Schedule SSIS
● High Availability and Disaster Recovery Planning
● Azure, Azure Data Factory Pipelines CI/CD with integration with GitHub, Linked Services
● with Datasets, Triggers, Window Trigger and using dependency as well, Event Trigger,
● Integration Runtime, Parameterize Datasets, Linked Services, Union, Pipelines,
Environment: Cloudera Manager (CDH5), Hadoop, HDFS, Pig, Hive, Kafka, Scrum, Git, Sqoop, Oozie, Pyspark,
Informatica, Tableau, OLTP, OLAP, HBase, Cassandra, Informatica, SQL Server, Python, Shell Scripting, XML, Unix
Education :