You are on page 1of 3

Monish K Mohan Raj

Data Engineer
Email: Monish96raj@gmail.com | Phone: (872) 262-7157 | Location: IL | GitHub | LinkedIn

SUMMARY
 3 years of professional experience in data analysis with strong communication and leadership skills.
 Worked on reviewing business requirement and developed Big Data solutions focused on pattern matching and
predictive modeling.
 Handled data ingestion from various external data sources and performed transformations using spark.
 Implemented Etl Pipelines and Data lake solution using Azure Data Engineering technologies such as Azure Data
Factory (ADF), Azure Data Lake Gen2, Azure Blob Storage, Azure SQL Database, Azure Databricks, Azure
HDInsight and Microsoft PowerBI.
 Expertise in building end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive.
 Integrated data from HTTP clients, Azure Blob Storage and Azure Data Lake Gen2 using Azure Data Factory.
 Skilled in performing data parsing, data ingestion, data manipulation, data architecture, data modelling and data
preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and
combine, Remap, merge, subset, reindex, melt and reshape.
 Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks,
Azure Data Lake, Azure Blob Storage, Azure Data Factory
 Experience working on Azure Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks,
Data factory and SQL Data warehouse.
 Experience in extracting data from Production databases through Sqoop, performed in data validations, managed the
jobs in Prod environment.
 Worked with NoSQL databases like HBase and performed CRUD operations, ingested large data sets of semi-
structured data coming from various sources.
 Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
 Worked on extract Transform and Load data from Sources Systems to Azure Data Storage services using a
combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
 Ability to create scripts using Azure Shell for automation and build process.
 Created dashboards in PowerBI designing with large data volumes from data source SQL server.

SKILLS
Languages: SQL, Python, NoSQL, R programming, HTML & CSS, JavaScript
Tools & Frameworks: Tableau, Power BI, Advanced Excel, Google Sheets, NumPy, Pandas,
Matplotlib, Seaborn, Scikit-learn, PySpark, pyspark.sql,
pyspark.streaming, pyspark.ml, pyspark. mllib
Database Management Apache Hadoop Ecosystem: Apache HBase, Apache Phoenix, Hive, Spark,
Systems: MongoDB, MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Azure
Data Factory
Machine Learning Linear regression, KNN, logistic regression, Naive Bayes classifiers, support
Algorithms: vector machines, neural networks

EDUCATION
MS in Information Technology May 2023
Loyola University Chicago, IL

EXPERIENCE
Epsilon, IL Nov 2022 - Present
Data Engineer (Intern)
● Experienced in creating Hive Tables, Partitioning and Bucketing.
● worked on various file formats generated by client applications
● Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different
sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
● Involved in Analysis, Design and Implementation/translation of Business User requirements.
● Worked on collection of large sets of Structured and Unstructured data using Python scripting, PySpark, Spark SQL.
● Actively involved in designing and developing data ingestion, aggregation, and integration in the Hadoop
environment.
● Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the
customer, transaction data by date.
● Identified inconsistencies in data collected from different sources.
● Involved in writing the deployments scripts for sqoop, hive and spark job submission on production environment
● Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations..
● Developed custom aggregate functions using Spark SQL and performed interactive querying.
● Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity
planning, and slots configuration.
● Developed Spark applications for the entire batch processing by using PySpark.
● Visualized the results using Tableau dashboards and the Python Seaborn libraries were used for Data interpretation in
deployment.
● Used Parameters and Variables in Pipelines, Datasets and Linked Services to create a metadata driven pipelines in
Azure Data Factory (ADF),debugging the data pipelines and resolving issues.
● Scheduled pipelines using triggers such as Event Trigger, Schedule Trigger and Tumbling Window Trigger in Azure
Data Factory (ADF).
● Took personal responsibility for meeting deadlines and delivering high quality work and create POCs to demonstrate
new technologies including Jupyter Notebooks, PySpark.
● Strived to continually improve existing methodologies, processes, and deliverable templates.

Dell Technologies, India Jan 2019 – Jul 2021


Data Analyst
● Assisted in data analysis tasks using SQL, Python, and Excel to support decision-making processes.
● Conducted exploratory data analysis using Python libraries such as NumPy, Pandas, and Matplotlib to identify trends,
patterns, and outliers in the data.
● Supported the design and development of data visualization dashboards using Tableau, presenting insights to
stakeholders.
● Assisted in data cleaning and transformation activities to improve data quality and consistency.
● Collaborated with the team to identify data analysis opportunities and propose solutions.
● Assisted in the implementation of data analytics projects, contributing to data collection, preprocessing, and analysis.
● Conducted data mining and sentiment analysis on social media data to gain insights into customer preferences and
behavior.
● Assisted in conducting A/B testing experiments to evaluate the effectiveness of different marketing strategies.
● Assisted in developing reports and presentations to communicate findings and recommendations to stakeholders.
● Assisted in the documentation of data analysis processes and methodologies for future reference.
● Demonstrated strong attention to detail and accuracy in data analysis tasks.
● Adapted quickly to new technologies and tools, acquiring proficiency in advanced Excel and Google Sheets.
● Assisted in data integration tasks, ensuring seamless flow of data between different systems and databases.

PROJECTS
Traffic Crash Analysis System (Data mining):
● The aim of the project was to predict the number of crashes happening in the city of Chicago, Illinois and by what
circumstances. We used the Rapid Miner tool to build models to do predictions on this dataset. The data set used in
the project had over 700,000 crash data from the Chicago Police Department.
Prediction of Heart Disease (Machine learning):

● In this project, we built a model to predict if patients have heart disease according to its features. We used Logistic
Regression and SVM algorithms and compared the results of the two to get the most accuracy in our model. So we
clean the data using Python libraries and prepare the data for the machine learning models. Then split the data frame
into train and test to predict the future outcomes
Chicago police crime dataset (Big Data):
● In this groundbreaking project, I efficiently managed massive datasets of over 1 million rows by importing them into
HDFS. I created an HBase table using MapReduce and utilized Phoenix for querying and analysis. I also leveraged
Impala, creating an external table mapped to HBase for advanced analysis. Through optimization and integration with
powerful visualization tools, I derived meaningful insights from the Chicago police crime dataset

You might also like