You are on page 1of 7

PRAVEEN J

(Azure Data Engineer)

Ph: 469-790-0580 jpraveen2025@gmail.com

PROFILE SUMMARY:
• Overall, 12 years of experience in multiple technology methodologies like Azure Services, and Big Data,
including Analysis, Design, and Development of Big Data using Hadoop, Python, Data Lake, Scala, and
PySpark, and database and data warehousing development using My SQL, Oracle, and Data warehouse.
• 5 years of experience as Azure Cloud Data Engineer in Microsoft Azure Cloud technologies including Azure
Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data warehouse), Azure
SQL Database, Azure Analytical services, Azure Cosmos NoSQL DB, Azure Key vaults, Azure DevOps, Azure
HDInsight Big Data Technologies like Hadoop, Apache Spark, and Azure Data bricks.
• 4 years of experience as a Data warehouse developer handling Microsoft Business Intelligence Tools.
• Experience in developing pipelines in Spark using Scala and PySpark.
• Managed and administered Hadoop clusters, ensuring high availability, scalability, and optimal
performance.
• Experience building ETL (Azure Data Bricks) data pipelines leveraging PySpark, and Spark SQL.
• Extensively worked on Azure Databricks.
• Proficient in Azure Data Factory to perform Incremental Loads from Azure SQL DB to Azure Synapse.
• Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data
Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
• Experience in building the Orchestration on Azure Data Factory for scheduling purposes.
• Hands-on experience in Azure cloud worked on App services, Azure SQL Database, Azure Blob storage,
Azure Functions, Virtual machines, Azure AD, Azure Data Factory, event hub, and event Queue.
• Accomplished lead data engineer with 1+ years of experience in designing and implementing scalable and
efficient data solutions.
• Designed and optimized CDC pipelines to efficiently capture and replicate data changes from various source
systems such as relational databases (e.g., MySQL, PostgreSQL, Oracle) and NoSQL databases (e.g.,
MongoDB, Cassandra).
• Proven track record of leading high-performing teams in complex data engineering projects.
• Integrated Hadoop with various components of the big data ecosystem, including HBase, Hive, Spark, and
Pig, to support diverse data processing and analytics requirements.
• Experience with the Azure logic apps with different triggers.
• Implemented Unity catalog features starting from sharing metadata, access control, and lineage.
• Implemented indexing and partitioning strategies for large datasets to enhance query performance.
• Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each,
Wait, Execute Pipeline, Set Variable, Filter, until, etc.
• Strong experience in migrating other databases to Snowflake.
• Experience with Snowflake Multi-Cluster Warehouses.
• Conducted A/B testing and data analysis to identify key factors influencing customer behavior.
• Collaborated with data scientists to identify opportunities for automation and process improvement.
• Experience with MS Azure (Databricks, Data Factory, Data Lake, Azure SQL, Event Hub, etc.)
• Experience in using Snowflake Clone and Time Travel.
• Hands-on working experience and developing large-scale data pipelines using spark and hive.
• Experience working with ARM templates to deploy in production using Azure DevOps.
• Experience in developing very complex mappings, reusable transformations, sessions, and workflows using
the Informatica ETL tool to extract data from various sources and load it into targets.
• Used T-SQL stored procedures to transfer data from OLTP databases to the staging area and finally transfer
it into data marts and acted XML.
• Implemented production scheduling jobs using Control-M, and Airflow.
• Used various file formats like Avro, Parquet, Sequence, JSON, ORC, and text for loading data, parsing,
gathering, and performing transformations.
• Extensive experience with T-SQL in constructing Triggers and tables, implementing stored Procedures,
Functions, Views, User Profiles, Data Dictionaries, and Data Integrity.
• Hands-on experience with Confluent Kafka to load data from StreamSets directly into ADLS.
• Strong experience building data pipelines and performing large-scale data transformations.
• In-depth knowledge in working with Distributed Computing Systems and parallel processing techniques to
efficiently deal with Big Data.
• Designed and Implemented Hive external tables using a shared meta-store with Static & Dynamic
partitioning, bucketing, and indexing.
• Experience in handling, configuration, and administration of databases like MySQL and NoSQL databases
like MongoDB and Cassandra.
• Utilized Service Bus Explorer and Azure Portal for monitoring and managing Service Bus entities.
• Optimized Service Bus configurations for performance and scalability requirements.
• Extensive hands-on experience tuning spark Jobs and spark performance tuning.
• Experienced in working with structured data using HiveQL, and optimizing Hive queries.
• Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R,
Python, SQL, and Tableau.
• Running and scheduling workflows using Oozie and Zookeeper, identifying failures, and integrating,
coordinating, and scheduling jobs.
• Knowledge of Database Architecture for OLAP and OLTP Applications, Database designing, Data Migration,
and Data Warehousing Concepts, emphasizing ETL.
• Experience in Data Modeling & Analysis using Dimensional and Relational Data Modeling.
• Experience in using Star Schema and Snowflake schema for Modeling and using FACT & Dimensions tables,
Physical & Logical Data Modeling.
• Defining user stories and driving the agile board in JIRA during project execution, participating in sprint
demos and retrospectives.
• Maintained and administered GIT source code repository and GitHub Enterprise.

TECHNICAL SKILLS:
Big data Technologies HIVE, Sqoop, PySpark, Apache Spark, Hodoop, YARN, Kafka, Oozie, PyCharm, Maven.
Cloud Technologies Azure Data Factory, Azure data bricks, Gen2 storage, Blob Storage, Cosmos DB, ADLS,
StreamSets, Snowflake
Programming Languages Python, Scala, and SQL
Methodologies Agile, Kanban, and Waterfall
ETL Tools SSIS, Azure Data Factory
Database Oracle, MySQL, SQL Server, Snowflake
CI/CD Tools Terraform, Azure DevOps
Operating Systems UNIX, Windows

E DUCATION D ETAILS :
M ASTERS IN C OMPUTER S CIENCE , 2012, AUBURN UNIVERSITY AT MONTGOMERY.
BACHELOR ’ S IN C OMPUTER S CIENCE , 2011, O SMANIA U NIVERSITY .

WORK EXPERIENCE
Agiliti Health, Dallas, TX Mar 2022 to Present
Lead Azure Data Engineer.
Responsibilities:
 Worked on Migration of Data from On-prem SFTP server to Azure Cloud using Azure data factory.
 Implemented pipelines to extract data from on-premises source systems to Azure cloud data lake storage.
 Implemented StreamSets pipeline to load Incremental data from Postgres SQL Database to Confluent Kafka
and loaded data into ADLS using Kafka topics.
 Implemented Topics in Kafka to load data from StreamSets and Implemented connectors to load the Data
and push it into Azure ADLS.
 Build ETL pipeline to load Historical data and Incremental data to load Data from Postgres SQL database to
Snowflake database.
 Set up configuration settings in Databricks Notebook to load data from Kafka and load it into Delta tables.
 Implemented Delta tables using SQL and Python code to load the data and parsed the JSON file format
schema using Scala.
 Monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notifications of
failure pipelines.
 Orchestrated Extract, Transform, Load (ETL) processes using Logic Apps, automating data transformations
and ensuring timely and accurate data delivery.
 Designed and implemented complex ETL processes using SSIS for data extraction, transformation, and
loading.
 Implemented star schema and snowflake schema designs for optimal query performance.
 Configured and implemented the Azure Data Factory Pipelines and scheduled the Triggers.
 Established end-to-end application performance monitoring using Azure Application Insights.
 Created dashboards and alerts in Azure Monitor to proactively respond to application issues.
 Implemented end-to-end data pipelines with Kafka Connect to integrate systems and applications with
Kafka topics. Built custom connectors for streaming data from databases, APIs, files, and SaaS applications.
 Established monitoring, security, access control, and operational processes for Kafka infrastructure running
on Azure. Improved cluster reliability and efficiency.
 Developed and maintained interactive and visually appealing SSRS reports for business users.
 Led optimization of Kafka clusters for scale, stability, and costs including partitioning, replication,
compaction, and retention policies to support high event volumes.
 Developed interactive real-time applications using Kafka Streams DSL for data processing including filtering,
mapping, joining, windowing, and aggregating data streams.
 Built a scalable data pipeline to load processed Confluence data from Kafka topics into Azure Data Lake
Storage (ADLS) using Kafka Connect.
 Integrated Power BI reports and dashboards with existing SSRS and SSAS solutions.
 Collaborated with business stakeholders to gather and understand reporting requirements.
 Integrated CI/CD workflows with version control to trigger automated builds based on code changes.
 Implemented a natural language processing (NLP) model for sentiment analysis on customer feedback,
resulting in a 20% increase in customer satisfaction.
 Used advanced features of T-SQL to design and tune T-SQL to interface with the Database and other
applications in the most efficient manner and created stored Procedures for the business logic using T-SQL.
 Managed and enforced Azure RBAC to control permissions and access to Azure resources, ensuring the
principle of least privilege.
 Spearheaded the development of a fraud detection system using machine learning algorithms, resulting in
a 25% reduction in fraudulent transactions and mitigating potential financial losses.
 Utilized Agile process and JIRA issue management to track sprint cycles.
 Created pipelines to extract data from Salesforce source systems to Azure cloud data lake storage.
 Migrated Account, Users, Contact, and Opportunity object APIs from Salesforce using Azure Data Factory.
 Implemented performance-tuning logic on targets, sources, mappings, and sessions to provide maximum
efficiency and performance.
 Participates in the development improvement and maintenance of Snowflake database applications
 Implemented Azure Linked Service to load data into Snowflake and Implemented Store Procedure to call
data from ADF to Snowflake using Script activity.
 Develop Spark applications using PySpark and Spark SQL for data extraction, transformation, and
aggregation from multiple file formats for analysis and transformation.
 Build complex ETL jobs that transform data visually with data flows or by using compute services Azure
Databricks, and Azure SQL Database
 Implemented pipelines to extract data from on-premises source systems to Azure cloud data lake storage.
 Worked on building the data pipeline using Azure Services like Data Factory to load the data from the
Legacy, SQL server to Azure Data warehouse using Data Factories and Databricks Notebooks.
Environment: Azure, Azure Data Factory, Snowflake, performance tuning, ADLS gen2, dataflow jobs, copy activity,
lookup activity, Data Flow, linked services, StreamSets, Databricks, Snow pipe, Confluent Kafka, Scala, SQL PySpark,
Python, Azure Key Vaults, JIRA, GitHub.

AT&T, California Sep 2020 to Feb 2022


Sr. Azure Data Engineer.
Responsibilities:
• Involved in Azure cloud, App services, Azure SQL Database, Azure Blob storage, Azure Functions, Virtual
machines, Azure AD, Azure Data Factory, event hub, and event Queue.
• Worked on building the data pipeline using Azure Services like Data Factory to load the data from the
Legacy, SQL server to Azure Data warehouse using Data Factories and Databricks Notebooks.
• Build complex ETL jobs that transform data visually with data flows or by using compute services Azure
Databricks and Azure SQL Database.
• Administered Azure Active Directory (Azure AD) groups, including the creation, modification, and deletion
of groups for efficient access control.
• Developed and implemented Runbooks to automate routine management tasks, reducing manual effort
and minimizing the risk of human error.
• Created and maintained SSIS packages for integrating data from various sources into the data warehouse.
• Set up monitoring tools to track cluster performance and troubleshoot issues in real time, ensuring the
health of the Hadoop infrastructure.
• Implemented a time series forecasting model using LSTM neural networks, enabling accurate demand
forecasting and reducing inventory costs by 10%.
• Established Kafka topics with appropriate partitioning and replication factors to support event streams with
volumes of over 1 million messages per second.
• Utilized SSIS to migrate data between on-premises and cloud-based databases.
• Implemented role-based access control for Kafka using ACLs to control access to topics/consumer groups
based on environment (dev/test/prod) and application role.
• Upgraded Kafka clusters across multiple versions and stacks with no downtime and no loss of data while
applications keep operating at scale.
• Implemented robust error-handling mechanisms in scripts to ensure graceful degradation and provide
meaningful error messages for troubleshooting.
• Configured various types of SQL Server replication, including transactional, snapshot, and peer-to-peer
replication, to support data distribution and reporting requirements.
• Configured CDC to detect and capture data changes, including inserts, updates, and deletes, while ensuring
data consistency and accuracy.
• Implemented pipelines to extract data from SQL Server to Azure cloud data lake storage.
• Implemented end-to-end Data pipelines using ADF services to load data from On-premises to Azure SQL
server for Data orchestration.
• Extensively worked on copy activities and implemented copy behaviors such as flattening hierarchy,
preserving hierarchy, and Merge hierarchy.
• Implemented Performance tuning techniques in Azure data factory and synapse.
• Monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notifications of
failure pipelines.
• Created Workspace and Apps in Power BI service (SaaS) and gave access to end users to Apps.
• Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines.
• Involved in designing and developing Azure stream analytics jobs to process real-time data using Azure
event hubs.
• Configured the Azure logic apps to handle email notifications to the end users and key shareholders with
the help of web services activity.
• Implemented Terraform for version controlling in the repository to track the infrastructure changes and
application code.
• Implemented Terraform configurations into Azure DevOps pipelines for Continuous deployment and
infrastructure changes as part of the CI/CD process.
• Extensively used Azure key vaults to configure the connections in linked services.
• Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and
Azure Databricks.
• Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage.
• Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.
• Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1
and SCD-2 approaches.
• Implementing Azure Logic apps, azure functions, azure storage, and service bus queries for large
enterprise-level ERP integration systems.
• Implemented and configured a new event hub with the provided event hubs namespace.
• Extensive hands-on experience tuning spark Jobs.
• Develop Spark applications using PySpark and Spark SQL for data extraction, transformation, and
aggregation from multiple file formats for analysis and transformation.
• Developed a customer message consumer to consume the data from the Kafka producer and push the
messages to HDFS
• Participates in the development improvement and maintenance of Snowflake database applications
• Build the Logical and Physical data model for Snowflake per the required changes.
• Implemented a Git repository and added the project to GitHub.
• Utilized Agile process and JIRA issue management to track sprint cycles.

Environment: Azure, Azure Data Factory, azure synapse, performance tuning, adls gen2, dataflow jobs, copy
activity, lookup activity, Data Flow, linked services, logic apps, Event Hub, Databricks, Snowflake, PySpark, Python,
PyCharm, JIRA, GitHub.

First Republic Bank, San Francisco, California Nov 2018 to Aug 2020
Azure ETL Data Engineer.
Responsibilities:
• Creating pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark
with Databricks.
• Data Ingestion to Azure Data Lake, to Azure DW for data migration processing the data in Azure Databricks.
• Used Data Flow debugging for effectively building ADF data flow pipelines.
• Improved performance by using optimization options by effectively using partitions during various
transformations.
• Implemented complex ETL Azure Data Factory pipelines using mapping data flows with multiple
Input/output transformations
• Worked on Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
• Used Azure Key vault as a central repository for maintaining secrets and referenced the secrets in Azure
Data Factory and also in Databricks notebooks.
• Built a common SFTP download or upload framework using Azure Data Factory and Databricks.
• Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and Python scripting.
• Implemented Databricks Job workflows, extracting data from SQL server and uploading the files to SFTP
using PySpark and Python.
• Working with ARM templets to deploy in production using Azure DevOps
• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats.
• Responsible for estimating the cluster size, and monitoring and troubleshooting the Spark Databricks
cluster.
• Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
• Used Spark Streaming to divide streaming data into batches as input to the Spark engine for batch
processing.
• Configure Power BI Report Server for SSRS, deployed and scheduled reports in Report Manager.
• Develop transformation logic using snow pipeline. Hands-on experience with Snowflake utilities, Snow SQL,
Snow Pipe, and Big Data model techniques using Python/ Scala.
• Responsible for Building and scaling ETL / Event processing systems that organize data and manage
complex rule sets in batch and real-time.
• ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow SQL
Writing SQL queries against Snowflake.
• Involved in Branching, Tagging, and Release Activities on GitHub Version Control.
Environment: Azure, ADF, Azure Data Lake Gen2, PySpark, dataflow jobs, copy activity, lookup activity, Data Flow,
linked services, logic apps, Event-Hub, Scala, Snowflake, Streaming, Agile methods GIT.

Client: ACI Worldwide, Omaha, Nebraska. Jul 2017 to Oct 2018


Role: Big Data developer.
Responsibilities:
• Worked with the sourcing team to understand the format and delimiters of the data file.
• Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop.
• Involved in creating Hive tables, loading with data, and writing Hive queries, which will invoke and run
Map-Reduce jobs in the backend.
• In-depth knowledge of Hadoop architecture and various components such as HDFS, application manager,
node master, resource manager name node, data node, and map-reduce concepts.
• Involved in developing a Map Reduce framework that filters bad and unnecessary records.
• Written the HIVE queries to extract the data processed.
• The Hive tables Implemented as per requirement were internal or external tables defined with appropriate
static and dynamic partitions, intended for efficiency.
• Implemented the workflows using the Apache Oozie framework to automate tasks.
• Developing design documents considering all possible approaches and identifying the best of them.
• Worked on GIT to maintain source code in Git and GitHub repositories.
• Performed all necessary day-to-day GIT support for different projects, Responsible for maintenance of the
GIT Repositories, and the access control strategies.

Environment: Cloudera Distribution, Hadoop, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting,
MySQL, GIT.

Client: BNY Mellon, New Jersey April 2013 -June 2017


Role: Data Warehouse Developer.
Responsibilities:
• Experience in developing complex store procedures, efficient triggers, required functions, and creating
indexes and indexed views for performance.
• Excellent Experience in monitoring SQL Server Performance tuning in SQL Server
• Expert in designing ETL data flows using SSIS; creating mappings/workflows to extract data from SQL Server
and Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS.
• Experience in Error and Event Handling: Precedence Constraints, Break Points, Check Points, Logging.
• Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business
Intelligence and writing MDX Scripting.
• Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and
developing, fact tables, and dimension tables, using Slowly Changing Dimensions (SCD).
• Thorough knowledge of Features, Structure, Attributes, Hierarchies, and Star and Snow Flake Schemas of
Data Marts.
• Create paginated reports using Power BI Report Builder for on-premises and cloud data sources.
• Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cubes, Data Mining Models, and
Deploying and Processing SSAS objects.
• Developed Star and Snow Flake Schemas of Data Marts.
• Creating Ad hoc reports and reports with complex formulas and querying the database for Business
Intelligence.
• Expertise in developing Parameterized, Chart, Graphs, Linked, Dashboard, Scorecards, and Report on SSAS
Cube using Drill-down, Drill-through, and Cascading reports using SSRS.
• Flexible, enthusiastic, and project-oriented team player with excellent written, and verbal communication
and leadership skills to develop creative solutions for challenging client needs.

Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation
Server, Git.

You might also like