Avinash - Data Engineer (AutoRecovered)

Avinash Reddy
Data Engineer || avinasd413@gmail.com ||858 -901-4040
PROFESSIONAL SUMMARY
 Over 9 years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Deep
Learning with large data sets of structured and unstructured data in travel services, strong functional
knowledge, business processes, and latest market trends.
 Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression,
Cluster Analysis, and Neural Networks.
 Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data
Manipulation, Feature Engineering, Machine Learning Algorithms, Validation, and Visualization.
 Hands-on experience in Data Analytics Services such as Athena, Glue Data Catalog, Quick Sight.
 Knowledge of HDFS File system and Hadoop Demons such as Resource Manager, Node Manager, Name
Node, Data Node, Secondary Name Node, Containers, Map Reduce programming paradigm, and good hands-on
experience in PySpark and SQL queries.
 Good at development in enterprise web applications using Core Java.
 Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed real-
time analytics on the incoming data.
 Hands on Experience in Designing UI web Applications using Front End technologies Like
HTML,CSS,JavaScript,JQuery,Angular JS and Bootstrap.
 Time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
 Worked on the installation,configuraion and set up of the OBIEE adminstration tool and platform.
 Identifying and executing process improvements, hands-on in various technologies such as MySQL, Oracle,
Business Objects in data bricks python scripts.
 Expertise and Vast knowledge of Enterprise Data Warehousing including Data Modelling, Data
Architecture, Data Integration (ETL/ELT), and Business Intelligence.
 Design and Implement distributed splunk monitoring and data mining solution to support dev,STG and
Prod environment
 Developed custom Machine Learning (ML) algorithms in Scala and then made available for MLIB in Python
via wrappers.
 Used Airflow for orchestration and scheduling of the ingestion scripts.
 Experience in creating fact and dimensional model in MS SQL server and Snowflake Database utilizing
Cloud base Matillion ETL tool.
 Collaborating with business, third party vendors like TSG and Veeva API to help with upcoming migration
from Documentum to Veeva.
 Used Datastage as an ETL tool to extract data from Sources systems ,loaded the data into the Oracle
database.
 Performed ETL test Scenarios by writing SQL Scripts with consideration of Business Scenarios.
 Experience in GOOGLE CLOUD PLATFORM (GCP) services like Compute Engine, Cloud Functions,
CloudDNS, Cloud Storage and Cloud Deployment Manger.
 Understanding of structure data sets,data pipelines,ETL tools,Data Reduction,transformation and
aggregation technique,knowledge of tools such as DBT,DataStage.
 Import the data from different sources like HDFS/HBase into Graph DB.
 Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured
data coming from various sources.Worked on NoSQL databases like MongoDB,Document DB and Graph
Databases Like neo4j.
 Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark
concepts.
 Expert in developing core java concepts such as Collections, Multithreading, Serialization, Exception
handling.
 MLOps encompasses the experimentation, iteration, and continuous improvement of the machine
learning lifecycle.
 Created pipelines in ADF using Linked Services/Datasets/Pipeline/To Extract,Transform and Load data
from different sources like Azure SQL,Blob storage,Azure SQL data warehouse,write-back tool and backwards.
 Experience in developing Map Reduce Programs using Apache Hadoop for analysing the big data as per
the requirement.
 Used apache airflow in GCP composer environment to build data pipelines and used various airflow
operators like bash operator, Hadoop operators and python callable and branching operators.
 Experience in designing connection pools and schemas like star and snowflake schema on the OBIEE
repository.
 Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL
Synapse Analytics (DW).
 Proficient in Python and its libraries such as NumPy, Pandas, Scikit-Learn, Matplotlib, and Seaborn.
 ETL process for continuously bulk importing dmat data from sql server into Elastic search.
 Development of Restful Web services using Java and Spring boot.
 Expert in prepossessing data in Pandas using visualization, data cleaning, and engineering methods such as
looking for Correlations, Imputations, Scaling and Handling Categories.
 Implemented ElasticSearch services for big dat searches,wrote complex queries and design/implemented
clustered and multiple nodes.
 Expertise in Angular JS controllers, Directives,components,factory and service resource,routings and
events.
 Loaded the tables from the azure data lake to azure blob storage for pushing them to snowflake
 Wrote complex Unix/windows scripts for files transfers,emaling tasks from FTP/SFTP.
 Experience in building various machine learning models using algorithms such as Linear Regression,
Gradient Descent, Support Vector Machines (SVM), Logistic Regression, KNN, Decision Tree, Ensembles such as
Random Forrest, AdaBoost, Gradient Boosting Trees.
 Deployed the Data Model, Data entities and View for the Talend MDM.
 Experienced data migration from AWS to GCP by creating hybrid cloud infrastructure through initializing
tunnel between both the clouds. Migrated EC2 instances from AWS to COMPUTE ENGINE in GCP with the help
of CLOUD ENDURE.
 Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM,
Decision Forests, natural language processing (NLP), etc.
 Experienced in Data Integration Validation and Data Quality controls for ETL process and Data
Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
 Expert in developing Data Conversions/Migration from Legacy System of various sources (flat files, Oracle,
Non-Oracle Database) to Oracle system Using SQL LOADER, External table, and Calling Appropriate Interface
tables and API's Informatics.
 Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
 Expert in using Data bricks with Azure Data Factory (ADF) to compute large volumes of data.
 Building & configuring Redhat Linux systems over the network, implementing automated tasks through
crontab, resolving tickets according to the priority basis.
 Participated ,run and validated the ETL interfaces in System testing,excuted testing cases and
documented the results.
 Migrate Matillion pipelines and Looker reports from Amazon Redshift to Snowflake data warehouse.
 Implemented Multiple Data pipeline DAG's and Maintenance DAG'S in Airflow orchestration.
 Used Cosmos DB for storing catalog data and for event sourcing in order processing pipelines.
 Strong working experience on Teradata query performance tuning by analysing CPU, AMP Distribution,
Table Skewness, and IO metrics.
 Good understanding of working on Artificial Neural Networks and Deep Learning models using Theano
and Tensor Flow packages using in Python.
Professional Experience
Project 1 Mar 2022-Till Date

Client: Fidelity Investments Role: Data Engineer
Description: Fidelity International Ltd, or FIL for short, is a company that provides investment
management services including mutual funds, pension management and fund
platforms to private and institutional investors. Fidelity International was originally established in 1969 as the
international investment subsidiary of Fidelity Investments in Boston before being spun out as an independent
business in 1980. Since then, it has continued to operate as a private employee-owned company.
Responsibilities:
 Writing complex SQL queries for validating the data against different kinds of reports generated by
Cognos.
 Designed an object detection program by utilizing Python, YOLOmodel, and TensorFlow to identify
objects by drawing the boundaries of each identified object in an image.
 Analyzed data to identify trends, patterns, insights, and discrepancies in data using SAS and conveyed
ideas with visuals created in SAS, Matplotlib, and Tableau,AZURE Databricks .
 Data Migrating intlo Elastic search through ES-Spark integration and Created mapping are indexing in
Elastics search for quick retrieval.
 Designed and developed an efficient and flexible flight search using graph database by optimizing the
power of Neo4j and graph analytics technique.
 Using Angular JS ,created custom Directives for data manipulations and to display data in company
standard format in UI.
 Designed and developed Map-Reduce modules with Java as Programming Language on hoghly
unstructured and Semi structured data.
 Validating the data from SQL Server to Snowflake to make sure it has Apple to Apple match.
 Developed MDM intergration plan and hub architecture for customers,products and vendors,Designed
MDM solution for three domains.
 Identified customer/product segments for a wholesaler of industrial products, mapped potential buyers to
products using k-means clustering, and developed an RFM scoring system to assign a score to each customer.
 Create and developed data load and scheduler process for ETL jobs using Matillion ETL package.
 Designed and developed user defined functions,stored procedures ,triggers for Cosmos DB.
 Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate
reports for the BI team.
 Worked on Airflow schedular (celery) and worker setting in airflow.cfg file.
 Responsible for building scalable distributed data solutions using Azure Data lake ,Azure Databricks,Azure
HDinsights and Azure Cosmos DB.
 Problem resolution Veeva API to Analyze recurring incidents to identify and resolve underlying
problem.log and resolve problems in SPARC.
 Used the DataStage Designer to develop processes for extracting ,cleansing,transforming,integrating and
loading data into staging tables.
 Involved with ETL test data creation for all the ETL mapping rules based on mapping document.
 Strong skills in visualization tools Power BI, Microsoft Excel - formulas, Pivot Tables, Charts and DAX
Commands. Data Extraction, aggregations and consolidation of Adobe data Glue using PySpark.
 Developed JSON Scripts for deploying the pipeline in Azure Data Factory(ADF) that process the data using
the SQL Activity.
 Developed OBIEE repository as per user requirments and available information in Data Warehouse.
 Implement IOT streaming with Databricks Delta tables and Delta Lake to enable ACID transaction logging.
 Utilized big data tools for MLops like GCP, Big Query,DataProc for streamlining data lakes,AutoML for
automating the model building process.
 Buliding Scala Api for backend support for Graph Database User Interface.coded Scala Api to insert/delete
predicates in Graph DB after Transforming and Mapping incoming data.
 Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further
analysis
 Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
 Developed spark applications by using Spark,Java and Implemeneter Apache Spark data Processing to
handle data from various RDBMS and streaming Sources.
 Configured Airflow DAGs to customize dependencies, retry attempts, catchup protocol, email notification
on failure, etc.
 Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining
solutions to various business problems, and generating data visualizations using R, Python, and Tableau.
 Working on Azure Data Bricks to run Spark-python Notebooks through ADF pipelines.
 Mentor and guide analayst on buliding purposeful analytics tables in dbt for cleaner schemas.
 Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data
Aggregation, queries and writing data back into RDBMS through Sqoop.
 Integrated multiple custom platforms to affectively alert and report with Splunk.
 Worked with CICD pipeline creation using Gitlab and Ansible.
 Design built and deployed a set of python modeling APIs for customer analytics, which integrate multiple
machine learning techniques for various user behavior prediction.
 Extensively worked on UNIX shell Scripting for splitting group of files to various small files and file transfer
automation.
 Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform
the necessary transformations based on the STMs developed.
 Designed and optimize queries as well as write complex SQL scripts During ETL development and testing.
 Worked on Hadoop Architecture and various components using HDFS, Job Tracker, Task Tracker, Name
Node, Data Node, Secondary Name Node, and Map Reduce concepts.
 Built predictive models using clustering, SVM, Bayes, Elastic Net with an evaluation of their performance
for automatic classification of comments based on business-defined attributes
 Filtered the discovered boundaries by implementing a non-max suppression algorithm to achieve an
optimal bounding box per identified object.
 Utilized Matillion ETL solution to develop pipeline that extract and transform data from multiple sources
and loading to snowflake.
 Evaluated models using Cross-Validation, Log loss function, ROC curves and used AUC for feature
selection and elastic technologies like Elasticsearch, Kibana, kafka, etc
 Implemented statistical modeling with the XGBoost Machine Learning software package using Python to
determine the predicted probabilities of each model.
 Formulated several graphs to show the performance of the students by demographics and their mean
scores in different USMLE exams.
 Built reports for monitoring data loads into GCP and drive reliability at the site level.
 Building/Maintaining Docker/ Kubernetes container clusters managed by Kubernetes Linux, Bash, GIT,
Docker , on GCP.
 Monitoring the Datastage job on daily basis by running the UNIX shell script and made a force start
wheneverjob fails.
 Application of various Artificial Intelligence(AI)/machine learning algorithms and statistical modeling like
decision trees, text analytics, natural language processing(NLP), supervised and unsupervised, regression
models.
 Hands on experience in azure Analytics Services – Azure Data Lake Store(ADLS),Azure Data Lake Analytics
(ADLA),AZUre SQL DW ,Azure Data Factory(ADF) etc.
 Implemented data integration using Data Factory and Databricks from input sources to azure services.
 Data warehouse solutions using polybase/external table on Azure Synapse/Azure SQL Data
warehouse(Azure DW), Using Azure Data Lake as source. Rewriting exiting SSAS cubes to Azure Synapse/Azure
SQL Data warehouse(Azure DW).
 Bulid of hadoop cluster for splunk and ELK data archiving.
 Created and managed the roles and user access system in the OBIEE WebLogic and configured security for
Users,Groups,and Application Roles.
 Design and Implemented ElasticSearch cluster for divert web traffic to ES domain.
 Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in
python.
 Worked on Docker based containers for using Airflow.
 Worked with the Microsoft Cosmos DB to generate distributed systems technology.
 Guiding the teams of developers to optimize the usage of NoSQL databases.
 Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized
score and predict the residency attainment of students.
 Python,Flask,and Angular form a great stack to bulid modern web applications.
 Used XGB classifier if the feature is a categorical variable and XGB regressor for continuous variables and
combined it using FeatureUnion and FunctionTransfomer methods of Natural Language Processing.
 Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark.
 Invovlved in full life cycle Business Intelligence implementations and understanding of all aspects of an
implementation project using OBIEE.
 Experienced in SAP-business objects,Informatica Data Quality,Informatica MDM.Informatica Power
Exchange,Data Migration and Data Warehousing.
 Created data layers as signals to Signal Hub to predict new unseen data with performance not less than
the static model build using a deep learning framework.
 Coordinating with apps & database teams to apply patching, Managing SAN environment from the Linux
point of view, managing Physical and Logical volumes.
 Worked with the Data Governance group in creating a custom data dictionary template to be used across
the various business lines.
 Create statistical models using distributed and standalone models to build various diagnostics, predictive,
and prescriptive solutions.
Environment: Python 2.x,3.x, Hive, GCP, Linux, Tableau Desktop, Microsoft Excel, NLP, java,Deep learning
frameworks such as TensorFlow, Keras, Boosting algorithms, DB2, R, Python, Angular JS,Databricks,Visio,
HP ALM, Agile ,Jenkin pipes .
Project 2 Nov 2020- April 2022

Client: Starbucks, Redmond, WA Role: Data Engineer
Description: At BTIS, we pride ourselves on providing insurance for the construction industry. The entire
executive team has experience working in this arena.
Designed and developed data pipelines, data warehouses and data marts to integrate new data sets from
different sources into a data platform.
Implemented enterprise-level Azure solutions such as Azure Databricks, Azure ML, AKS, Azure Data
Factory(ADF), Logic Apps, Azure Storage Account, and Azure SQL DB
 Designed, developed, and deployed ETL solutions utilizing Microsoft Synapse and Data Factory to extract,
transform, and load data from various sources to the data warehouse.
 Created dashboards and reports using Power BI to provide business insights to stakeholders.
 Developed integration solutions using Logic Apps and APIs to connect disparate systems and automate
data flows.
 Used SSIS to create ETL packages to validate, Extract, Transform and load data into Data Warehouse and
Data Mart.
 Maintained and developed complex SQL queries, stored procedures, views, functions and report to meet
customer requirements using Microsoft SQL Server 2008 R2.
 Provide production support for existing products that include SSIS,SQL Server, stored Procedures, interim
data marts.
 Created Views and Table-valued Functions, Common Table Expression (CTE), joins, complex subqueries to
provide the reporting solutions.
 Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary
columns and redundant data, normalized tables, established joins and created index.
 Created SSIS packages using pivot Transformation, Fuzzy Lookup, Derived columns, condition split,
Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task.
 Used SAS/SQL to pull data out from databases and aggregate to provide detailed reporting based on the
user requirements.
 Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical
analysis.
 Provided statistical research analysis and data modelling support for mortgage product.
 Tested Complex ETL Mappings and Sessions based on Business user requirement and business rules to
load data from source flat files and RDBMS tables to target tables.
 Perform analysis such as regression analysis, logistic regression, discriminant analysis, cluster analysis
using SAS programming.
 Gathered, analyzed, document and translated application requirements into data models and supports
standardization of documentation and the adoption of standards and practices related to data and
applications.
 Good understanding in NoSQL databasesand hands on work experience in writing applications on NoSQL
databases like CosmosDB.
 Wrote complex Spark SQL queries for data analysis to meet business requirement.
 Developed MapReduce/Spark Python modules for predictive analytics and machine learning in Hadoop on
AWS.
 Participated in featured engineering such as feature intersection generating, feature normalize and label
encoding with Scikit-learn.Developed Graph Database nodes and relation using Cypher language.
 Created multiple custom SQL queries in Teradata SQL workbench to prepare the right data sets for
Tableau dashboards. Queries involved retrieving data from multiple tables using various join commands
that enabled to utilize efficiently optimized data extracts for Tableau workbooks.
 Designed and developed a scalable data warehouse using Azure Blob Storage, Data Lake, and Synapse to
store and manage large volumes of data.
Environment: SAS, R, MLIB, Python, Data Governance, MDM, MATLAB, Tableau, Angular JS,Azure SQL
Server,Azure synapse.
Project3 Jan 2019 – Oct 2020

Client: Microsoft, Redmond, WA. Role: Data Engineer/Data Analyst
Description: Microsoft Corporation is an American multinational technology company with headquarters in

Redmond, Washington.
Responsibilities:
 Analysed and solved business problems and found patterns and insights within structured and unstructured
data.
 Implemented advanced computer vision techniques like distortion correction, thresholding techniques, and
the sliding window method to identify the lane markings to highlight the entire lane.
 Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, MATLAB,
Tableau, and more.
 Developed highly optimized Spark applications to perform various data cleansing, validation, transformation,
and summarization activities according to the requirement.
 Developed service classes,domain/DAO’s and controllers using Java/J2EE Technologies.
 Normalized the data according to the business needs like data cleansing, modifying the data types, and various
transformations using Spark, Scala.
 Experience working with SQL Server Analysis Services (SSAS), Multi - dimensional and Tabular Cube, SQL
Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), Tableau, Power BI, DAX.
 Developed python scripts to do file vadlidations in Databricks and automated the process using ADF.
 Good hands on experience in creating the RDD’s, DF's for the required input data and performed the data
transformations using Spark Scala.
 Worked on Angular Concepts like Buliding the components,Data binding,String Interpolation,Property
Binding,Event Binding, Two-way data binding.
 Data pipeline consists of Spark, Hive, and Sqoop and Custom build Input Adapters to ingest, transform and
analyse operational data.
 Used Kafka HDFS connector to export data from Kafka topic to HDFS files in a variety of formats and
integrated with apache hive to make data immediately available for HQL querying.
 Design,bulid and amange the ELK(Elasticsearch,Logstash,Kibana) cluster for centralized loggingand search
functionalitiles for the App.
 Used Karamaas a unit testing tool and Jasmine as testing library for Angular JS apllications.
 Translate functionak requirments to technical solutions in Veeva API.
 Monitoring and logging tools such as ELK Stack (Elasticserach and Kibana).
 Analysed ORACLE 11g stored process, materialized views, and tables to translate the structures and business
logic from RDBMS to NOSQL database and Pentaho ETL.
 Performed ETL data translation using informatica of functional requirements to Source to Target Data
Mapping documents to support large datasets (Big Data) out to the AWS Cloud databases; Snowflake and
Aurora.
 Designed and developed informatica ETLs views ,Materializes Views and modeling on OBIEE.
 Installed, tested and deployed monitoring solutions with Splunk services.
 Created stored procedures to impact data into Elasticsearch engine.
 Worked with Sales and Marketing team for Partner and collaborate with a cross-functional team to frame and
answer important data questions prototyping and experimentation ML/DL algorithms and integrating into a
production system for different business needs.
 Done POC on newly adopted technologies like Apache Airflow and Snowflake and Gitlab
 Design built and deployed a set of python modeling APIs for customer analytics, which integrate multiple
machine learning techniques for various user behavior prediction and support multiple marketing
segmentation programs.
 Used DataStage Manager for importing metadata from repository,new job catagories and creating new data
elements.
 Developed various spark applications using Scala to perform various enrichment of these click stream data
merged with user profile data.
 Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
 Imported and Exported data files to and from SAS using Proc Import and Proc Export from Excel and various
delimited text-based data files such as. TXT (tab-delimited) and CSV (comma delimited) files into SAS datasets
for analysis.
 Designed data profiles for processing, including running SQL, Procedural/SQL queries and using Python and R
for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
 Managing and troubleshooting issues of Redhat Linux servers in Chase Mortgage Banking.
 Involved in designing and architecting data warehouses and data lakes on regular (Oracle, SQL Server) high
performance (Netezza and Teradata) and big data (Hadoop - MongoDB, Hive, Cassandra, and HBase)
databases.
 Developed Splunk infrastructure and related solutions as per automation toolsets.
 Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
 Create Pyspark frame to bring data from DB2 to Amazon S3.
 Trained and served ML pipelines using MLOps which aims to deploy and Maintain ML systems in Production
reliably and Efficiently
 Model and shift custom SQL and transpose LookML into dbt for materializing incremental views.
 Build servers using GCP, importing volumes, launching EC2, RDS , creating security groups, auto-scaling, load
balancers (ELBs) in the defined virtual private connection.
 Detected near-duplicated news by applying NLP methods and developing machine learning models like label
spreading and clustering.
 prototyping and experimenting with ML algorithms and integrating them into a production system for
different business needs.
 Implemented the number of customer clustering models and these clusters are plotted visually using Tableau
legends for the higher management.
 Process automation using Python/R scripts with Oracle database to generate and write the results in the
production environment every week.
 Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on
PySpark code.
 Worked on Airflow performance tuning of the DAG's and task instance.
 Used Data Quality validation techniques to validate Critical Data Elements (CDE) and identified various
anomalies.
 Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark
with Hive and SQL / Teradata .
 Performing Data Validation / Data Reconciliation between the disparate source and target systems for various
projects.
Environment: Python (Scikit-Learn/SciPy/NumPy/Pandas), Linux, Tableau, Hadoop, Map Reduce,
Hive,DataBrick, Oracle, Windows 10/XP, JIRA,perl ,Azure ADLS.
Project4 Mar 2017- Dec 2018
Client: Atena Role: Data Analyst
Description: Aetna Inc. is an American managed health care company that sells traditional and consumer directed
health care insurance and related services, such as medical, pharmaceutical, dental, behavioural health, long-term
care, and disability plans, primarily through employer-paid (fully or partly) insurance and benefit programs, and
through Medicare.[4] Since November 28, 2018, the company has been a subsidiary of CVS Health.
Responsibilities:
 Performed data integrity checks, data cleansing, exploratory analysis, and feature engineer using python
and data visualization packages such as Matplotlib, Seaborn.
 Used Python to develop a variety of models and algorithms for analytic purposes.
 Developed logistic regression models to predict subscription response rates based on customer variables
like past transactions, promotions, response to prior mailings, demographics, interests, and hobbies, etc.
 Used Python to implement different machine learning algorithms, including Generalized Linear Model,
Random Forest, and Gradient Boosting.
 Evaluated parameters with K-Fold Cross-Validation and optimized performance of models.
 Recommended and evaluated marketing approaches based on quality analytics on customer consuming
behavior.
 Analyzed large data sets apply machine learning techniques and develop predictive models, statistical
models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
 Coordinating with apps & database teams to apply patching, Managing SAN environment from the Linux
point of view, managing Physical and Logical volumes.
 Created event managment in which it listens continuously for the events in the MDM.
 Automated dataflows and pipelines which are interacting with multiple azure services using Azure
Databricks, Power automate (Flow).
 Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF),
and De-normalization of the database.
 Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine
learning methods including classifications, regressions, dimensionally reduction, etc.
 Responsibilities to solve technical Problems related System administration (Linux of Our Clients).
Maintaining and Troubleshooting of FTP Server, Samba Server of the client.
 Extensively worked on UNIX Shell Scripting for file transfer and error logging.
 Performed data visualization and Designed dashboards with Tableau, and provided complex reports,
including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
 Identified process improvements that significantly reduce workloads or improve quality.
 Performed data management, including creating SQL Server Report Services to develop reusable code and
an automatic reporting system and designed user acceptance test to provide end with an opportunity to
give constructive feedback.
Environment: Hadoop, MapReduce, Sqoop, HDFS, Hive, Pig, Oozie, Java, Oracle 10g, MySQL, and Impala.
Project 5 Dec 2016 - Feb 2017

Client: Sterling Insurance, Brentwood, TN. Role: Data Analyst
Description: Since 1895, the mission of Sterling Insurance Company has been to provide “unexcelled service and
sound insurance protection” to its policyholders.
Responsibilities:
 Responsible for data identification, collection, exploration, cleaning for modelling.

 Involved in creating database solutions, evaluating requirements, preparing design reports, and migrating
data from legacy systems to new solutions
 Worked with various database administrators/operations and analysts to secure easy access to data
 Build and maintain data in HDFS by identifying structural and installation solutions
 Analyzing structural requirements for new applications which will be sourced.
 Developed Spark batch job to automate creation/metadata update of external Hive table on top of datasets
residing in HDFS.
 Developed Data Serialization spark common Immuta module for converting Complex objects into sequence
bits by using AVRO, PARQUET, JSON, CSV formats.
 Worked on ER Modeling, Dimensional Modelling (Star Schema, Snowflake Schema), Data warehousing, and
OLAP tools.
 Worked in writing SPARK SQL query scripts for optimizing the query performance.
 Implemented Spark Scripts using Spark Session, Python, Spark SQL to access hive tables data flow into spark
for faster processing of data.
 Worked on various Spark optimizations techniques pocs for memory management, garbage collection,
Serialization, and custom partitioning.
 Developed Spark programs to parse the raw data, populate staging tables, and store the refined data in
partitioned tables in the EDW.
 Creating new workflows and maintaining Data access existing ETL workflows, data management, and data
query components.
 Design, develop and orchestrate data pipelines for real-time and batch data processing using AWS Redshift
 Performed Exploratory Data Analysis and Data visualizations using Python and Tableau.
Project 6 Oct 2014 - Aug 2016

Client: Code& Pixels, Hyderabad. Role: Data Analyst.
Description: Code and Pixels Interactive Technologies Private Limited is an E-Learning service provider based in
Hyderabad (India). We provide end-to-end E-Learning solutions and specialized in the innovative use of
technology.
Responsibilities:
 Developed Spark Programs for Batch and Real-Time Processing to process incoming streams of data from
Kafka sources and transform them into Data frames and load those data frames into Hive and HDFS.
 Experience in developing SQL scripts using Spark for handling different data sets and verifying the
performance of Map Reduce jobs.
 Developed Spark programs using Spark-SQL library to perform analytics on data in Hive.
 Developed various JAVA UDF functions to use in both Hive and Impala for ease of usage in various
requirements.
 Created multiple MapReduce jobs in Pig and Hive for data cleaning and preprocessing.
 Created Hive views/tables for providing SQL-like interface.
 Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
 Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on
the log data.
 Using Hive to analyze the partitioned data and compute various metrics for reporting.
 Transformed the Impala queries into hive scripts which can be run using the shell commands directly for a
higher performance rate.
 Created the shell scripts which can be scheduled using Oozie workflows and even the Oozie Coordinators.
 Developed the Oozie workflows to generate monthly report files automatically.
 Managing and reviewing the Hadoop log files.
 Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization
purposes.
TECHNICAL SKILLs
Programming & Python (Keras, Sci-kit learn, TensorFlow), SQL, SAS, R, Angular JS,java
Scripting Languages script,Bash Scripting.
Statistics / Machine Hypothesis Testing, Regression Analysis, SVM, ANOVA, Naive Bayes,
Learning Methods
Analysis/ Tableau, Microsoft Power BI, Matplotlib, Microsoft Excel (Pivot table,
Visualization Tools VBA/Macros), Adobe Analytics, Stata, Google Analytics,OBIEE SAP Bex Web
Machine Learning TensorFlow, Keras, PyTorch, NumPy, OpenCV, Scikit-Learn, SciPy, Pandas
Libraries
Data Modelling Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power
designer.
Teradata Utilities BTEQ, Fast Load, Fast Export, Multi-load, TPUMP, and TPT
Big Data Hadoop, Apache Spark, Hive, MapReduce, Flume, Azure ADF
Database/ Cloud MySQL, Microsoft SQL Server, MongoDB, GraphDB, Amazon Web Services,
Platforms Azure Synapse Analytics
EDUCATIONAL DETAILS
Bachelors’ in Osmania University

Masters in Wright state university Computer Science

Avinash - Data Engineer (AutoRecovered)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Avinash - Data Engineer (AutoRecovered)

Uploaded by

Copyright:

Available Formats

Avinash Reddy

Data Engineer || avinasd413@gmail.com ||858 -901-4040

Project 1 Mar 2022-Till Date

Project 2 Nov 2020- April 2022

Project3 Jan 2019 – Oct 2020

Description: Microsoft Corporation is an American multinational technology company with headquarters in

Project 5 Dec 2016 - Feb 2017

 Responsible for data identification, collection, exploration, cleaning for modelling.

Project 6 Oct 2014 - Aug 2016

Bachelors’ in Osmania University

You might also like