Professional Documents
Culture Documents
Jarupula Praveen
Jarupula Praveen
PROFILE SUMMARY:
• Overall, 12 years of experience in multiple technology methodologies like Azure Services, and Big Data,
including Analysis, Design, and Development of Big Data using Hadoop, Python, Data Lake, Scala, and
PySpark, and database and data warehousing development using My SQL, Oracle, and Data warehouse.
• 5 years of experience as Azure Cloud Data Engineer in Microsoft Azure Cloud technologies including Azure
Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data warehouse), Azure
SQL Database, Azure Analytical services, Azure Cosmos NoSQL DB, Azure Key vaults, Azure DevOps, Azure
HDInsight Big Data Technologies like Hadoop, Apache Spark, and Azure Data bricks.
• 4 years of experience as a Data warehouse developer handling Microsoft Business Intelligence Tools.
• Experience in developing pipelines in Spark using Scala and PySpark.
• Managed and administered Hadoop clusters, ensuring high availability, scalability, and optimal
performance.
• Experience building ETL (Azure Data Bricks) data pipelines leveraging PySpark, and Spark SQL.
• Extensively worked on Azure Databricks.
• Proficient in Azure Data Factory to perform Incremental Loads from Azure SQL DB to Azure Synapse.
• Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data
Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
• Experience in building the Orchestration on Azure Data Factory for scheduling purposes.
• Hands-on experience in Azure cloud worked on App services, Azure SQL Database, Azure Blob storage,
Azure Functions, Virtual machines, Azure AD, Azure Data Factory, event hub, and event Queue.
• Accomplished lead data engineer with 1+ years of experience in designing and implementing scalable and
efficient data solutions.
• Designed and optimized CDC pipelines to efficiently capture and replicate data changes from various source
systems such as relational databases (e.g., MySQL, PostgreSQL, Oracle) and NoSQL databases (e.g.,
MongoDB, Cassandra).
• Proven track record of leading high-performing teams in complex data engineering projects.
• Integrated Hadoop with various components of the big data ecosystem, including HBase, Hive, Spark, and
Pig, to support diverse data processing and analytics requirements.
• Experience with the Azure logic apps with different triggers.
• Implemented Unity catalog features starting from sharing metadata, access control, and lineage.
• Implemented indexing and partitioning strategies for large datasets to enhance query performance.
• Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each,
Wait, Execute Pipeline, Set Variable, Filter, until, etc.
• Strong experience in migrating other databases to Snowflake.
• Experience with Snowflake Multi-Cluster Warehouses.
• Conducted A/B testing and data analysis to identify key factors influencing customer behavior.
• Collaborated with data scientists to identify opportunities for automation and process improvement.
• Experience with MS Azure (Databricks, Data Factory, Data Lake, Azure SQL, Event Hub, etc.)
• Experience in using Snowflake Clone and Time Travel.
• Hands-on working experience and developing large-scale data pipelines using spark and hive.
• Experience working with ARM templates to deploy in production using Azure DevOps.
• Experience in developing very complex mappings, reusable transformations, sessions, and workflows using
the Informatica ETL tool to extract data from various sources and load it into targets.
• Used T-SQL stored procedures to transfer data from OLTP databases to the staging area and finally transfer
it into data marts and acted XML.
• Implemented production scheduling jobs using Control-M, and Airflow.
• Used various file formats like Avro, Parquet, Sequence, JSON, ORC, and text for loading data, parsing,
gathering, and performing transformations.
• Extensive experience with T-SQL in constructing Triggers and tables, implementing stored Procedures,
Functions, Views, User Profiles, Data Dictionaries, and Data Integrity.
• Hands-on experience with Confluent Kafka to load data from StreamSets directly into ADLS.
• Strong experience building data pipelines and performing large-scale data transformations.
• In-depth knowledge in working with Distributed Computing Systems and parallel processing techniques to
efficiently deal with Big Data.
• Designed and Implemented Hive external tables using a shared meta-store with Static & Dynamic
partitioning, bucketing, and indexing.
• Experience in handling, configuration, and administration of databases like MySQL and NoSQL databases
like MongoDB and Cassandra.
• Utilized Service Bus Explorer and Azure Portal for monitoring and managing Service Bus entities.
• Optimized Service Bus configurations for performance and scalability requirements.
• Extensive hands-on experience tuning spark Jobs and spark performance tuning.
• Experienced in working with structured data using HiveQL, and optimizing Hive queries.
• Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R,
Python, SQL, and Tableau.
• Running and scheduling workflows using Oozie and Zookeeper, identifying failures, and integrating,
coordinating, and scheduling jobs.
• Knowledge of Database Architecture for OLAP and OLTP Applications, Database designing, Data Migration,
and Data Warehousing Concepts, emphasizing ETL.
• Experience in Data Modeling & Analysis using Dimensional and Relational Data Modeling.
• Experience in using Star Schema and Snowflake schema for Modeling and using FACT & Dimensions tables,
Physical & Logical Data Modeling.
• Defining user stories and driving the agile board in JIRA during project execution, participating in sprint
demos and retrospectives.
• Maintained and administered GIT source code repository and GitHub Enterprise.
TECHNICAL SKILLS:
Big data Technologies HIVE, Sqoop, PySpark, Apache Spark, Hodoop, YARN, Kafka, Oozie, PyCharm, Maven.
Cloud Technologies Azure Data Factory, Azure data bricks, Gen2 storage, Blob Storage, Cosmos DB, ADLS,
StreamSets, Snowflake
Programming Languages Python, Scala, and SQL
Methodologies Agile, Kanban, and Waterfall
ETL Tools SSIS, Azure Data Factory
Database Oracle, MySQL, SQL Server, Snowflake
CI/CD Tools Terraform, Azure DevOps
Operating Systems UNIX, Windows
E DUCATION D ETAILS :
M ASTERS IN C OMPUTER S CIENCE , 2012, AUBURN UNIVERSITY AT MONTGOMERY.
BACHELOR ’ S IN C OMPUTER S CIENCE , 2011, O SMANIA U NIVERSITY .
WORK EXPERIENCE
Agiliti Health, Dallas, TX Mar 2022 to Present
Lead Azure Data Engineer.
Responsibilities:
Worked on Migration of Data from On-prem SFTP server to Azure Cloud using Azure data factory.
Implemented pipelines to extract data from on-premises source systems to Azure cloud data lake storage.
Implemented StreamSets pipeline to load Incremental data from Postgres SQL Database to Confluent Kafka
and loaded data into ADLS using Kafka topics.
Implemented Topics in Kafka to load data from StreamSets and Implemented connectors to load the Data
and push it into Azure ADLS.
Build ETL pipeline to load Historical data and Incremental data to load Data from Postgres SQL database to
Snowflake database.
Set up configuration settings in Databricks Notebook to load data from Kafka and load it into Delta tables.
Implemented Delta tables using SQL and Python code to load the data and parsed the JSON file format
schema using Scala.
Monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notifications of
failure pipelines.
Orchestrated Extract, Transform, Load (ETL) processes using Logic Apps, automating data transformations
and ensuring timely and accurate data delivery.
Designed and implemented complex ETL processes using SSIS for data extraction, transformation, and
loading.
Implemented star schema and snowflake schema designs for optimal query performance.
Configured and implemented the Azure Data Factory Pipelines and scheduled the Triggers.
Established end-to-end application performance monitoring using Azure Application Insights.
Created dashboards and alerts in Azure Monitor to proactively respond to application issues.
Implemented end-to-end data pipelines with Kafka Connect to integrate systems and applications with
Kafka topics. Built custom connectors for streaming data from databases, APIs, files, and SaaS applications.
Established monitoring, security, access control, and operational processes for Kafka infrastructure running
on Azure. Improved cluster reliability and efficiency.
Developed and maintained interactive and visually appealing SSRS reports for business users.
Led optimization of Kafka clusters for scale, stability, and costs including partitioning, replication,
compaction, and retention policies to support high event volumes.
Developed interactive real-time applications using Kafka Streams DSL for data processing including filtering,
mapping, joining, windowing, and aggregating data streams.
Built a scalable data pipeline to load processed Confluence data from Kafka topics into Azure Data Lake
Storage (ADLS) using Kafka Connect.
Integrated Power BI reports and dashboards with existing SSRS and SSAS solutions.
Collaborated with business stakeholders to gather and understand reporting requirements.
Integrated CI/CD workflows with version control to trigger automated builds based on code changes.
Implemented a natural language processing (NLP) model for sentiment analysis on customer feedback,
resulting in a 20% increase in customer satisfaction.
Used advanced features of T-SQL to design and tune T-SQL to interface with the Database and other
applications in the most efficient manner and created stored Procedures for the business logic using T-SQL.
Managed and enforced Azure RBAC to control permissions and access to Azure resources, ensuring the
principle of least privilege.
Spearheaded the development of a fraud detection system using machine learning algorithms, resulting in
a 25% reduction in fraudulent transactions and mitigating potential financial losses.
Utilized Agile process and JIRA issue management to track sprint cycles.
Created pipelines to extract data from Salesforce source systems to Azure cloud data lake storage.
Migrated Account, Users, Contact, and Opportunity object APIs from Salesforce using Azure Data Factory.
Implemented performance-tuning logic on targets, sources, mappings, and sessions to provide maximum
efficiency and performance.
Participates in the development improvement and maintenance of Snowflake database applications
Implemented Azure Linked Service to load data into Snowflake and Implemented Store Procedure to call
data from ADF to Snowflake using Script activity.
Develop Spark applications using PySpark and Spark SQL for data extraction, transformation, and
aggregation from multiple file formats for analysis and transformation.
Build complex ETL jobs that transform data visually with data flows or by using compute services Azure
Databricks, and Azure SQL Database
Implemented pipelines to extract data from on-premises source systems to Azure cloud data lake storage.
Worked on building the data pipeline using Azure Services like Data Factory to load the data from the
Legacy, SQL server to Azure Data warehouse using Data Factories and Databricks Notebooks.
Environment: Azure, Azure Data Factory, Snowflake, performance tuning, ADLS gen2, dataflow jobs, copy activity,
lookup activity, Data Flow, linked services, StreamSets, Databricks, Snow pipe, Confluent Kafka, Scala, SQL PySpark,
Python, Azure Key Vaults, JIRA, GitHub.
Environment: Azure, Azure Data Factory, azure synapse, performance tuning, adls gen2, dataflow jobs, copy
activity, lookup activity, Data Flow, linked services, logic apps, Event Hub, Databricks, Snowflake, PySpark, Python,
PyCharm, JIRA, GitHub.
First Republic Bank, San Francisco, California Nov 2018 to Aug 2020
Azure ETL Data Engineer.
Responsibilities:
• Creating pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark
with Databricks.
• Data Ingestion to Azure Data Lake, to Azure DW for data migration processing the data in Azure Databricks.
• Used Data Flow debugging for effectively building ADF data flow pipelines.
• Improved performance by using optimization options by effectively using partitions during various
transformations.
• Implemented complex ETL Azure Data Factory pipelines using mapping data flows with multiple
Input/output transformations
• Worked on Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
• Used Azure Key vault as a central repository for maintaining secrets and referenced the secrets in Azure
Data Factory and also in Databricks notebooks.
• Built a common SFTP download or upload framework using Azure Data Factory and Databricks.
• Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and Python scripting.
• Implemented Databricks Job workflows, extracting data from SQL server and uploading the files to SFTP
using PySpark and Python.
• Working with ARM templets to deploy in production using Azure DevOps
• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats.
• Responsible for estimating the cluster size, and monitoring and troubleshooting the Spark Databricks
cluster.
• Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
• Used Spark Streaming to divide streaming data into batches as input to the Spark engine for batch
processing.
• Configure Power BI Report Server for SSRS, deployed and scheduled reports in Report Manager.
• Develop transformation logic using snow pipeline. Hands-on experience with Snowflake utilities, Snow SQL,
Snow Pipe, and Big Data model techniques using Python/ Scala.
• Responsible for Building and scaling ETL / Event processing systems that organize data and manage
complex rule sets in batch and real-time.
• ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow SQL
Writing SQL queries against Snowflake.
• Involved in Branching, Tagging, and Release Activities on GitHub Version Control.
Environment: Azure, ADF, Azure Data Lake Gen2, PySpark, dataflow jobs, copy activity, lookup activity, Data Flow,
linked services, logic apps, Event-Hub, Scala, Snowflake, Streaming, Agile methods GIT.
Environment: Cloudera Distribution, Hadoop, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting,
MySQL, GIT.
Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation
Server, Git.