You are on page 1of 5

Kanishk, Senior Data Engineer

Email: dkanishk15@gmail.com
Phone number: 980-655-9675
LinkedIn: www.linkedin.com/in/kanishk-d-a04856255

Professional Summary:

• Over 8+ years of Industry experience as Data Engineer/Analyst and coding with analytical programming
using SQL, Python, Snowflake, Azure, and AWS.
• Good working knowledge in multi-tiered distributed environment, good understanding of Software
Development Lifecycle (SDLC) - Agile and Waterfall Methodologies
• The ability to develop reliable, maintainable, efficient code in most of SQL, Hive and Python.
• Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management
Systems (RDBMS) and from RDBMS to HDFS.
• Ability to work on several python packages like Pandas, NumPy, Matplotlib, Beautiful Soup, Pyspark
• Strong knowledge of various data warehousing methodologies and data modeling concepts.
• Experience in the development of ETL processes and frameworks for large-scale, complex datasets.
• Experience with application development on Linux, python, RDBMS, NoSQL and ETL solutions.
• Experience in designing and operating very large Data Warehouses.
• Extensive experience in working with Informatica PowerCenter.
• Good knowledge of BI and data visualization tools such as Tableau.
• Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
• Gained Knowledge to write cloud formation templates and deployed AWS resourcing.
• Design and build data processing pipelines using tools and frameworks.
• Understanding data transformation and translation requirements and which tools to leverage to get the job
done.
• Understanding data pipelines and modern ways of automating data pipelines using cloud-based techniques.
• Good understanding and hands on experience in setting up and maintaining NoSQL Databases like
MongoDB, and HBase
• Experience in Designing, Architecting, and implementing scalable cloud-based web applications using AWS
and GCP.
• Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like
Dynamo DB.
• Hands on experience in working with Continuous Integration and Deployment (CI/CD).
• Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo
and retrospective.
• Experience in using various version control systems like Git.
• Strong analytical skills with the ability to collect, organize and analyze large amounts of information with
attention to detail and accuracy.
• Can work parallelly in both GCP and Azure clouds coherently.
• Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks. Worked
on Security in Web Applications using Azure and deployed Web Applications to Azure.
• Developed and designed a system to collect data from multiple portals using Kafka and process it using
Spark. Designed and implemented Kafka by configuring Topics in new Kafka cluster in all environments.
• Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake Store (ADLS) using
Azure Data Factory (ADF V1/V2).
Technical Skills:

• Programming languages: Python, PySpark, Shell Scripting, SQL, MySQL, PostgreSQL and UNIX, Bash, Spark
SQL
• Big Data: Hadoop, Sqoop, Apache Spark, NiFi, Kafka, Snowflake, Cloudera, PySpark, Spark, Spark SQL,
Databricks
• Data Modeling Tools: Erwin Data Modeler, ER Studio v17
• Operating Systems: UNIX, LINUX, Solaris, Mainframes
• Databases: Oracle, SQL Server, My SQL, DB2, Hive, Snowflake
• Cloud Technologies: AWS, AZURE
• IDE Tools: Aginitiy for Hadoop, PyCharm, Toad, SQL Developer, SQL *Plus, Sublime Text, VI Editor •
Data visualization Tools: Tableau, PowerBI, SSAS, Business Objects, and Crystal Reports 9
• ETL/Data warehouse Tools: Informatica 9.6/9.1, PowerBI and Tableau.

Professional Experience:

Client: Truist Bank, Charlotte, NC April 2021 to Till Date


Role: Senior Data Engineer
Responsibilities:
• Extensively worked with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data
Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer).
• Extensive work in Informatica PowerCenter.
• Used Data stage Administrator for defining environment variables and project level settings.
• Developed and designed a system to collect data from multiple portals using Kafka and process it using Spark.
Designed and implemented Kafka by configuring Topics in new Kafka cluster in all environments.
• Consumed data from Kafka into Databricks to perform various transformations and manipulations with the
data using Python and PySpark scripts.
• Created Database on Influx DB also worked on Interface, created for Kafka also checked the measurements on
Databases. Worked on Kafka Backup Index, Log4j appended minimized logs and pointed Ambari server logs
to NAS Storage.
• Experience with Snowflake Multi - Cluster Warehouses, Snowflake Clone and Snow SQL.
• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
• Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processed the data in In Azure Databricks.
• Created Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, Pipeline to extract, transform
and load data from different sources like Azure SQL, Blob storage, Azure SQL DW, write-back tool and
backwards.
• Created data pipeline for different events in Azure Blob storage into Hive external tables. Used various Hive
optimization techniques like partitioning, bucketing and Map join.
• Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) &
Azure SQL DB).
• Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write
data from the container whenever the function executes.
• Designed and deployed data pipelines using Data Lake, Databricks, and Airflow.
• Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the
customer usage patterns.
Environment: Microsoft Azure, SQL, Python, PySpark, APIS, Tableau, Kafka, Snowflake, Databricks, Docker,
PL/SQL, ETL/ELT, Data analysis, Informatica, Data staging

Client: Global Atlantic Financial Group, Indianapolis, IN October 2019 to March 2021
Role: Senior Data Engineer
Responsibilities:

• Worked as a Sr. Data Engineer to Import and export data from different databases
• Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the
project life cycle.
• Experience with Unix/Linux systems with scripting experience and building data pipelines.
• In-depth knowledge of Data Sharing in Snowflake.
• Working experience with Kimball Methodology and Data Vault Modeling.
• Used Data Frame API in Scala to convert distributed data into named columns & helped develop Predictive
Analytics using Apache Spark Scala APIs.
• Hands-on ER and Dimensional Data Modeling experienced in designing OLTP database / Data Vault/ Star
Schema/ Snowflake Schema methodologies in Sybase Power Designer 15 and ERwin r7.3
• Design dimensional model, data lake architecture, data vault 2.0 on Snowflake and used Snowflake logical
data warehouse for compute.
• Extensive experience in migrating data from legacy platforms into the cloud with Talend, AWS and
Snowflake.
• Expertise in automating builds and deployment process using Bash, Python and Shell scripts with focus on
CI/CD, AWS Cloud Architecture.
• Optimizing pig scripts, user interface analysis, performance tuning and analysis.
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the
dashboard.
• Involved in migrating Oozie workflows to Airflow to automate data pipelines to extract data and weblogs
from DynamoDB and Oracle databases.
• Used SSIS to build automated multi-dimensional cubes.
• Used Kafka streaming to receive data from various sources and publish the data into S3 (Final storage)
• Wrote indexing and data distribution strategies optimized for sub-second query response
• Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and
Workflow Monitor.
• Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored
Procedure, and Union to develop robust mappings in the Informatica Designer.
• Developed the Pyspark code for AWS Glue jobs.
• Experienced working on cloud AWS using EMR Performed operations on AWS using EC2 instances, S3
storage, IAM, performed RDS, analytical Redshift operations and wrote various data normalization jobs for
new data ingested into Redshift by building multi-terabyte of data frame.
• Designed AWS architecture, Cloud migration, Dynamo DB and event processing using Lambda function
• Managed storage in AWS using Elastic Block Storage, S3, created Volumes and configured Snapshots
• Work with WSDL, SOAP UI for APIs
• Write SQL queries, create test data in salesforce for informatica cloud mappings unit testing.
• Prepare TDDs, Test Case documents after each process has been developed.
• Identify and validate data between source and target applications.
• Created DAGS to trigger multiple jobs to run at scheduled time using Airflow to make all the operations run-
on time without manually triggering them.
Environment: SQL, Python, PySpark, Airflow, AWS, Snowflake, Kafka, Pig, Informatica, Data modeling, Pig,
Hadoop, Unix/Linux

Client: Change Healthcare, Nashville, TN February 2017 to September 2019


Role: Data Engineer
Responsibilities:

• Responsible for Design, Development, and testing of SSIS Packages to load data from various Databases and
Files.
• Worked in Agile development methodology environment.
• Responsible for designing COSMOS code to generate data and load it into servers.
• Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs
internally.
• Involved in design, implementation and modifying back-end Python code and MySQL database schema.
• Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
• Developed data ingestion modules using AWS step functions.
• Used Excel's VLOOKUPs to determine the customer data and created pivots to easily access and validate data.
• Designed ETL packages dealing with different data sources (SQL Server, Flat Files) and loaded the data into
target data sources by performing different kinds of transformations using SSIS.
• Modifying scripts to handle automated Loading/Extraction and Transformation (ETL) of data using SSIS.
• Responsible for coding SSIS processes to import data into the Data Warehouse from Excel spreadsheets, Flat
Files and OLEDB Sources.
• Used Shell Scripting for UNIX Jobs which included Job scheduling, batch-job scheduling, process control,
forking and cloning and checking status.
• Experience in performing Data Masking/Protection using Pentaho Data Integration(kettle).
• Automated different workflows, which are initiated manually with Python scripts and UNIX shell scripting.
• Wrote and executed MySQL queries from Python using Python-MySQL connector and MySQL dB package.
• Performed Incremental load with several Dataflow tasks and Control Flow Tasks using SSIS.
• Organized Error and Event Handling by giving Precedence Constraints, Break Points, Check Points, logging to
send the result files to the clients and vendors.
• Created stored procedures, Triggers, User-defined Functions, Views for both Online and Batch requests
handling business logic and functionality of various modules.
• Involved in performance tuning using Activity Monitor, Performance Monitor, SQL Profiler, SQL Query
Analyzer, and Index tuning wizards.
• Implemented batch processing using Jobs and DTS.

Environment: MS SQL Server 2014/2012, SSIS/SSAS/SSRS, ADF, ADB, ADL, Data Modeling, Big Data, T-SQL, Visual
Studio 2010/2012, C#, SQL Server Profiler, Power BI, Power Query, SSMS, Team Foundation Server (TFS), SQL Server
Data tools, SharePoint 2012.

Client: Maisa Solutions Private Limited, Hyderabad, India June 2014 to November 2016
Role: Data Analyst
Responsibilities:

• Worked as a Data Analyst to generate Data Models using Erwin and developed relational database system.
• Involved in extracting and mining data for analysis to aid in solving business problems.
• Used Azure Data Lake as Source and pulled data using Azure Polybase.
• Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
• Involved in manipulating data to fulfill analytical and segmentation requests.
• Managed data privacy and security in Power BI.
• Written complex SQL queries for data analysis to meet business requirements.
• Using Data Visualization tools and techniques to best share data with business partners.
• Designed and implemented a Data Lake to consolidate data from multiple sources, using Hadoop stacks
technologies like Sqoop, hive.
• Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from
UNIX, NoSQL, and a variety of portfolios.
• Reviewed code and system interfaces and extracts to handle the migration of data between
systems/databases.
• Developed a Conceptual model using Erwin based on requirements analysis.
• Involved in ETL mapping documents in data warehouse projects.
• Involved in loading data into Teradata from legacy systems and flat files using complex Multi Load scripts
and Fast Load scripts.
• Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target
system from multiple sources
• Implemented data ingestion and handling clusters in real time processing using Kafka.
• Created/ Tuned PL/SQL Procedures, SQL queries for Data Validation for ETL Process.
Environment: Azure Data Lake, Erwin, Power BI, Hadoop, HBase, Teradata, T-SQL, SSIS, PL/SQL.

Client: Brio Technologies Private Limited, Hyd, India August 2013 to May 2014
Role: Data Analyst
Responsibilities:

• Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine
data quality.
• Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of
the SQL-Statements and stored procedures for business scenarios.
• Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
• Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
• Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts
• Handled performance requirements for databases in OLTP and OLAP models.
• Analyzed the data which is using the maximum number of resources and made changes in the back-end code
using PL/SQL stored procedures and triggers
• Performed data completeness, correctness, data transformation and data quality testing using SQL.
• Involved in designing Business Objects universes and creating reports.
• Conducted design walk through sessions with Business Intelligence team to ensure that reporting
requirements are met for the business.
• Prepared complex T-SQL queries, views, and stored procedures to load data into staging area.
• Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
• Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
Environment: Hadoop3.0, Agile, Azure, PostgreSQL, Spark, Oozie4.3, Cassandra, Hive2.3, HDFS, Apache Flume1.8,
MapReduce, Apache Pig0.17, Scala, Sqoop1.4, Zookeeper, Apache Nifi.

Educational Background:

Bachelor of Technology in Computer Science Aug 2009 – May 2013

Jawaharlal Nehru Technological University, Hyderabad | India

You might also like