Professional Documents
Culture Documents
N Jaya Mani - Data Engineer
N Jaya Mani - Data Engineer
Professional Summary:
Over 8 years of experience in Information Technology that Includes hands on Development in
building ETL and Data warehousing solutions, Database Administration, and Implementation of
various applications in both with Big Data and Relational Database environment
Expertise in using major components of Hadoop ecosystem components like HDFS, YARN,
MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming,
Oozie, Zookeeper, Hue.
Created Snowflake Schemas by normalizing the dimension tables as appropriate and creating a Sub
Dimension named Demographic as a subset to the Customer Dimension
Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data
Lake Store, Azure Data Factory, and created POC in moving the data from flat files and SQL
Server using U-SQL jobs.
Experience with Azure transformation projects, Azure architecture decision-making, and
implemented ETL and data movement solutions using Azure Data Factory (ADF).
Worked on creating Data Pipelines for Copy Activity, moving, and transforming the data with
Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing.
Recreating existing application logic and functionality in the Azure Data Factory (ADF), Azure
SQL Database and Azure SQL Datawarehouse environment.
Loaded Data from SFTP/on Prem Data Through Azure Data factory to Azure SQL Database and
Automize the pipeline schedules using Event based Scheduling in Azure Data Factory (ADF).
Customize the Power BI reports, dashboards and Azure Data Factory (ADF) pipelines when source
data updated.
Hands-on experience with Azure Databricks, Azure Data Factory, Azure SQL, and Pyspark.
Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance.
Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR),
Redshift, and EC2 for data processing.
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic
Load Balancing, Auto Scaling, DynamoDB, Lambda, Cloud Front, CloudWatch, SNS, SES,
SQS and other services of the AWS family.
Experience in python programming using Pandas and NumPy.
Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join
operations.
Good understanding of MPP databases such as HP Vertica and Impala.
Extensive experience in Text Analytics, generating data visualizations using R, Python and creating
dashboards using tools like Tableau, PowerBI.
Automation of workflows and scheduling jobs using Oozie and airflow.
Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary Data for analysis
from different sources, prepared Data for Data exploration using Data munging.
Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods
including describe Data contents, compute descriptive statistics of Data, regex, split and combine,
remap, merge, subset, reindex, melt, and reshape. Good understanding of web design based on
HTML5, CSS3, and JavaScript.
Ability to work effectively in cross-functional team environments, excellent communication, and
interpersonal skills.
Technical Skills:
Big Data Tools: Hadoop Ecosystem, Map Reduce, Spark, Airflow, Nifi, HBase, Hive, Pig, Sqoop,
Kafka, Oozie, Hadoop, Snowflake, Databricks.
Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
Cloud Platform: AWS (Amazon Web Services), Microsoft Azure
Data Modeling Tools: Erwin Data Modeler, ER Studio v17
Programming Languages: SQL, PL/SQL, Python, Scala and UNIX.
OLAP Tools: Tableau, SSAS, Business Objects.
Databases: Oracle 12c/11g, MS SQL Server, My-SQL, Teradata R15/R14.
ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau.
Operating System: Windows, Unix, Sun Solaris
Development Methods: Agile/Scrum, Waterfall
Professional Summary:
Environment: Hadoop, Hive, Oozie, Spark, Spark SQL, Python, Pyspark, Azure Data Factory, Azure
SQL, Azure Databricks, Azure DW, BLOB storage, Scala, AWS, Linux, Maven, Oracle 11g/10g,
Zookeeper, MySQL, Snowflake.
Environment: Redshift, DynamoDB, Pyspark, Snowflake, EC2, EMR, Glue, S3, Java, Kafka, IAM,
PostgreSQL, Jenkins, Maven, AWSCLI, Shell Scripting, Git.
Environment: Hadoop 3.0, Microservices, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4,
Kafka, JDBC, Hive 2.3, Pig 0.17, SQL Server, No SQL, PostgreSQL, Oozie
Environment: Linux, Apache Hadoop Framework, HDFS, YARN, HIVE, HBASE, AWS (S3, EMR),
Scala, Spark, SQOOP, MS SQL Server 2014, Teradata, ETL, Tableau (Desktop 9.x/Server 9.x), Python
3.x(Scikit-Learn/SciPy/NumPy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL).
Environment: SQL Server 2010, SQL Query Analyzer, MS Access, MS Excel, Visual Studio 2010.
Education
Bachelors in IT from Anna University - 2016
Masters in CS from Texas A&M University - 2022