You are on page 1of 8

Name: Prem G

Email: prem.g775978@gmail.com
Phone: (573) 615-1914 | ext 1036

PROFESSIONAL SUMMARY:

 8+ years of IT experience in all phases of Software Development with Agile methodology which
includes User Interaction, Business Analysis/Modeling, Design & Development, Integration,
Planning , testing, migration, and documentation in applications using ETL pipelines and
distributed applications.
 Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud
DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of
Cloud Computing and Implementation using GCP.
 Handled operations and maintenance support for AWS cloud resources, including launching,
maintaining, and troubleshooting EC2 instances, S3 buckets, Auto Scaling, DynamoDB, AWS
IAM, and Elastic Load Balancers (ELB) and Relational Database Services (RDS). Also
created snapshots for data to store in AWS S3.
 ManagedSaaS, PaaS and IaaS concepts of Cloud Computing and Implementation. Worked with
Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud
Storage and Cloud Deployment using GCP.
 2+ years of Strong expertise in using Hadoop big data technologies Apache Spark, Scala,
python, Kafka, Cassandra, Jenkins Pipelines, Kubernetes, Kibana, Rancher, GITHUB,
Rancher, Kibana, Hadoop HDFS, Hive, IntelliJ, Cassandra, SQL serveretc.
 Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate
data sources and targets like Multiple Relational Databases & Cloud Flatforms (APACHE
Spark, Scala, HDFS file system, Hive, Cassandra, Teradata, Oracle, SQL SERVER, DB2,
salesforce), xml and diff file structures.
 Worked on optimizing volumes and EC2 instances and created multiple VPC instances.
Deployed applications on AWS using Elastic Beanstalk andImplemented and set up Route53 for
AWS Web Instances.
 Experience with ingesting data from RDBMS sources like - SQL, MY-SQL, Ms-Sql server,
Teradata, Oracle intoHadoop Distributed File System (HDFS) using Sqoop. Developed
workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with
Pig.
 Worked with various transformations like Normalizer, expression, rank, filter, group, aggregator,
lookups, joiner, sequence generator, sorter, SQLT, stored procedure, Update strategy, Source
Qualifier, Transaction Control, JAVA, Union, CDC etc.,
 Excellent experience with Requests, NumPy, Matplotlib, SciPy, PySpark and Pandas python
libraries during development lifecycle and experience in developing APIs for the application
using Python, Django, MongoDB, Express, ReactJS, and NodeJS.
 Knowledge about setting up Python REST API Framework using Django.
 Data Stack Azure Databricks. AOLS, ADF, AAS, DAX, Azure Automation Accounts, Azure Active
Directory(AD), Azure IAM security Groups Pyspark, Spark SQL Azure Data warehouse (ADW),
Power BI, DAX coding. MSBI, SSAS, CI/CD and Production Support.
 Used Python with OpenStack, OpenERP (now ODOO), SQL Alchemy, Django CMS.
 Strong experience with spark real time streaming data using Kafka and Spring boot API.
 2+ years of Strong expertise in using ETL Tool Informatica Power Center 10.x/9.x/8.x
(Designer, Workflow Manager, Repository Manager, ETL and Data Warehouse.
 Very good in data modeling knowledge in Dimensional Data modeling, Star Schema, Snow-
Flake Schema, FACT and Dimensions tables.
 Designed and developed Spark jobs with data frames using different file formats like Text,
Sequence, Xml, parquet, and Avro
 Proficient with AWS services like VPC, Glue Pipelines, Glue Crawler, Cloud front, EC2,
ECS, EKS, Elastic bean stalk, Lambda, S3, Storage gateway, RDBS, Dynamo db, Redshift,
Elastic Cache, DMS, SMS, Data Pipeline, IAM, WAF, Artifacts, API gateway, SNS, SQS,
SES, Auto Scaling, Cloud Formation, Cloud Watch and Cloud Trail .
 Excellent domain knowledge of Health care, Banking Financial, Manufacturing,Entertainment
and Insurance,Experience in manipulating/analyzing large datasets and finding patterns and
insights within structured and unstructured data.
 Coordinated with Business Users, functional Design team and testing team during the different
phases of project development and resolving the issues.
 Validated data against files and performed technical data quality checks to certify source and
target/business usage.
 Experience in writing UNIX shell scripts to process Data Warehouse jobs, file operation purpose
and data analytics purpose.
 Worked with different database oracle, SQL Server, Teradata, Cassandra SQL Programming.
 Experienced in using advanced concepts of Informatica like push down optimization (PDO).
 Experienced in Performance Tuning and Debugging of existing ETL processes.
 Very good in defining standards, methodologies and performing technical design reviews.

TECHNICAL SKILLS:
Hadoop/Big Data HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark,
Technologies Zookeeper and Cloudera Manager, Splunk.
NO SQL Database HBase, Cassandra
Monitoring and Tableau, Custom shell scripts,
Reporting
Hadoop Distribution Horton Works, Cloudera, MapR
Build Tools Maven, SQL Developer
Programming & JAVA, C, SQL, Shell Scripting, Python, Scala
Scripting
Java Technologies Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services
Databases Oracle, MY SQL, MS SQL server, Teradata
Web Dev. Technologies HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS
Version Control SVN, CVS, GIT
Operating Systems Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows
7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE:

Client: Verizon,Irving, Tx Feb 2023 - Present


Role: Data Engineer
Roles and Responsibilities:

 Developed code using Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra), Jenkins,
Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for streaming Real-time
streaming data, Kibana for monitor logs etc. authentication/authorization to the data.
 Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the data.
 Worked extensively on Informatica power center Mappings, Mapping Parameters, Workflows,
Variables, and Session Parameters.
 Designed front end and backend of the application utilizing Python on Django Web Framework.
Develop buyer-based highlights and applications utilizing Python and Django in test-driven
development and pair-based programming.
 Responsible to check data in DynamoDB tables and to check EC2 instances are upon running for
(DEV, QA, CERT, and PROD) in AWS.
 Participated in requirement grooming meetings which involves understanding functional
requirements from a business perspective and providing estimates to convert those requirements into
software solutions (Design and Develop & Deliver the Code to IT/UAT/PROD and validate and
manage data Pipelines from multiple applications with fast-paced Agile Development methodology
using Sprints with JIRA Management Tool).
 Developed spark applications by using Scala, Java and implemented Apache Spark data
processing project to handle data from various RDBMS and streaming sources
 Experienced in developing multiple MapReduce programs in java for data extraction, transformation,
and aggregation from multiple file formats including XML, JSON, CSV, and other file formats.
 Scheduled Informatica Jobs through Autosys scheduling tool.
 Experienced in designing Spark applications in Databricks for data extraction, transformation, and
aggregation from a variety of file types utilizing Spark-SQL.
 Created quick Filters Customized Calculations on SOQL for SFDC queries, Used Data loader for ad
hoc data loads for Salesforce.
 Analyzed existing data flows and create high level/low level technical design documents for business
stakeholders that confirm technical design aligns with business requirements.
 Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Glue Pipelines, Glue
Crawler, Auto scaling groups, Optimized volumes, and EC2 instances and created monitors, alarms,
and notifications for EC2 hosts using Cloud Watch.
 Extensively worked on Data Modeling involving Dimensional Data modeling, Star
Schema/Snowflake schema, FACT & Dimensions tables, Physical & logical data modeling.
 Involved in Onsite & Offshore coordination to ensure the deliverables.
 Studied the existing system and conducted reviews to provide a unified review on jobs.
 Responsible for facilitating load data pipelines and benchmarking the developed product with the set
performance standards.
 Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS,
Cloud Storage, and Cloud Deployment Manager and SaaS, PaaS, and IaaS concepts of Cloud
Computing and Implementation using GCP.
 Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS
environment with both traditional and non-traditional source systems as well as RDBMS and
NoSQL data stores for data access and analysis.
 Used Debugger within the Mapping Designer to test the data flow between source and target and to
troubleshoot the invalid mappings.
 Responsible for deployments to DEV, QA, PRE-PROD (CERT), and PROD using AWS.
 Created and deployed Spark jobs in different environments and loading data to no SQL database
Cassandra/Hive/HDFS. Secured the data by implementing encryption-based security.
 Involved in Designing Snowflake Schema for Data Warehouse, ODS architecture by using tools like
Data Model, Erwin.
 Developed data models and data migration strategies utilizing concepts of snowflake schema.
 Involved in testing the database using complex SQL scripts and handling the performance issues
effectively.
Environment: Apache spark 2.4.5, Databricks, Scala2.1.1, Cassandra, HDFS, Hive,
GitHub,Jenkins,Kafka, Informatica PowerCenter 10.x, SQL Server 2008, Salesforce Cloud, Visio,
TOAD, Putty, Autosys Scheduler, UNIX, AWS, snowflake, GCP, CSV, WinSCP, Salesforce data loader,
SFDC Developer console, python, Version One, Service Now etc.

Client: Fannie Mae, Dallas, TX Jan 2022 - Jan 2023


Role: Data Engineer

Roles and Responsibilities:

 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the
data using the SQL Activity.
 Involved in data analysis using python and handling the ad-hoc requests as per requirement.
 Worked on multiple Modules, HCM Global Integration with different Region’s and ONECRM
Salesforce Cloud.
 Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct
level of Parallelism and memory tuning.
 Analyzed and developed Data Integration templates to extract, cleanse, transform, integrate, and load
to data marts for user consumption. Review the code against standards and checklists.
 Used Python with OpenStack, Open ERP (presently ODOO), SQL Alchemy, Django CMS and so
forth.
 Maintaining the user accounts (IAM), RDS, SES and SNS services in the AWS cloud. Created a
RDBMS system with relational mappings. Migrated the server using the AWS services to a cloud
environment.
 Analyzed, designed and built Modern data solutions using Azure PaaS service to support
visualization of data. Understand current Production state of application and determine the impact of
new implementation on existing business processes.
 Performed Analysis on the existing source systems, understand the Informatica/ETL/SQL/Unix based
applications and provide the services which are required for development & maintenance of the
applications.
 Involved in gathering and analyzing the requirements and preparing business Requirements.
 Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks
cluster.
 Created High- & Low-level design documents for the various modules. Review the design to ensure
adherence to standards, templates, and corporate guidelines. Validate design specifications against the
results from proof of concept and technical considerations.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats for analyzing & transforming the data to uncover insights into
the customer usage patterns.
 Deployed the application using Docker and AWS Console services.
 Migrated SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data
Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating
On premise databases to Azure Data Lake store using Azure Data factory.
 Extracted , Transformed and Loaded data from Sources Systems to Azure Data Storage services using
a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure
DW) and processing the data in Azure Databricks.
 Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions,
Azure Machine learning solutions and operationalizing end to end Azure Cloud Analytics solutions.
 Developed Spark applications for data extraction, transformation, and aggregation from numerous file
types using pyspark and spark SQL, then analyze and transform the data.
 DevOps role converting existing AWS infrastructure to Server-less architecture (AWS Lambda,
Kinesis) deployed via CloudFormation.
 Experience with ingesting data from RDBMS sources like - SQL, MY-SQL, Ms-Sql server, Teradata,
Oracle intoHadoop Distributed File System (HDFS) using Sqoop. Developed workflow in Oozie to
automate the tasks of loading the data into HDFS and pre-processing with Pig.
 Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data
Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load
data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back
tool and backwards.
 Coordinated with the Application support team and help them assist understand the business and
necessary components for the Integration, Extraction, Transformation, and load data.
 Ingestion of data into one or more Azure services (Azure Data Lake, Azure Data Warehouse) and
processing in Azure Databricks.
 Experienced with developing SQL Scripts for automation purpose.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Spark, Databricks, Kafka, IntelliJ, ADF, Cosmos,
Sbt, Zeppelin, YARN, Scala, SQL, Git, Informatica 10.1.1, Oracle, GCP, SQL server, AWS, Unix, Flat
files, AutoSys, Web services, HCM Oracle Fusion, Soup UI, Salesforce cloud, python, Oracle MDM,
ESB.

Client: Techlogix Inc. – Woburn, Massachusetts Oct 2019–Dec 2021


Role: Data Engineer

Roles and Responsibilities:


 Installed and configured Hive, Pig, Sqoop, Flume, and Oozie on the Hadoop cluster.
 Developed Simple to complex MapReduce Jobs using Hive and Pig.
 Responsible for building scalable distributed data solutions using Hadoop.
 Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS,
Cloud Storage, and Cloud Deployment Manager and SaaS, PaaS, and IaaS concepts of Cloud
Computing and Implementation using GCP.
 Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools
including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala.
 Performed various benchmarking steps to optimize the performance of spark jobs and thus improve
the overall processing.
 Involved in converting MapReduce programs into Spark transformations using Spark RDD on
python.
 Developed Merge jobs in Python to extract and load data into MySQL database.
 Worked on implementing pipelines and analytical workloads using big data technologies such as
Hadoop, Spark, Hive, and HDFS.
 Analyzed the data by performing Hive queries and running Pig scripts to study employee behavior.
 Consumed the data from Kafka using Apache spark.
 Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, loaded data into HDFS, and Extracted the data from MySQL into HDFS using Sqoop.
 Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
 Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive and involved
in creating Hive Tables, loading with data, and writing Hive queries which will run MapReduce jobs
in the background.
 Coordinated with the Application support team and help them assist understand the business and
necessary components for the Integration, Extraction, Transformation, and load data.
 Performed Analysis on the existing source systems, understand the Informatica/Teradata based
applications and provide the services which are required for development & maintenance of the
applications.
 Created High- & Low-level design documents for the various modules. Review the design to ensure
adherence to standards, templates, and corporate guidelines. Validate design specifications against the
results from proof of concept and technical considerations.
 Developed Spark scripts by using python Shell commands as per the requirement.

Environment:Hadoop, Hive, Zookeeper, Python, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6,
HDFS, Flume, Oozie, DB2, HBase, Mahout, Unix, Linux.Informatica power center 9.6.1, Power
exchange, Teradata database and Utilities, Oracle, GCP, Python, Business Objects, Tableau, Flat files,
UC4, big data,HDFS, Mastreo scheduler,Unix.
Client: Zoho Technologies, India May 2017 – Sep 2019
Role: Big Data Developer/Admin

Roles and Responsibilities:

 Developed Simple to complex MapReduce Jobs using Hive and Pig.


 Involved in creating Hive tables, loading with data, and writing hive queries which will run internally
in MapReduce way.
 Worked with application teams to install operating system, Hadoop updates, patches, version
upgrades as required.
 Analyzed large amounts of data sets to determine the optimal way to aggregate and report on it.
 Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
 Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
 Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, loaded data into HDFS, and extracted the data from MySQL into HDFS using Sqoop.
 Responsible for building scalable distributed data solutions using Hadoop.
 Responsible for facilitating load data pipelines and benchmarking the developed product with the set
performance standards.
 Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS,
Cloud Storage, and Cloud Deployment Manager and SaaS, PaaS, and IaaS concepts of Cloud
Computing and Implementation using GCP.
 Used Debugger within the Mapping Designer to test the data flow between source and target and to
troubleshoot the invalid mappings.
 Responsible for deployments to DEV, QA, PRE-PROD (CERT), and PROD using AWS
 Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Installed and configured Pig and written Pig Latin scripts.
 Exported the analyzed data to the relational databases using Sqoop for visualization and to generate
reports for the BI team. Extensively used Pig for data cleansing.
 Load and transform large sets of structured, semi-structured, and unstructured data. Responsible to
manage data coming from different sources.
 Created partitioned tables in Hive. Managed and reviewed Hadoop log files.

Environment:Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse, Git,
Unix, Linux, Subversion.

Client: Pratian Technologies, India Aug 2015 – April 2017


Role: ETL Developer

Roles and Responsibilities:

 Worked proficiently on different database versions across all platforms (Windows, Unix).
 Developed data archiving, data loading, and performance test suites using ETL tools like Power
center & DM Express, Teradata, Unix, SSIS.
 Involved in SDLC using Informatica Power center, DMExpress ETL tool has been implemented with
the help of Teradata load utilities.
 Created High- & Low-level design documents for the various modules. Review the design to ensure
adherence to standards, templates, and corporate guidelines. Validate design specifications against the
results from proof of concept and technical considerations.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and
aggregation from multiple file formats for analyzing & transforming the data to uncover insights into
the customer usage patterns.
 Deployed the application using Docker and AWS Console services.
 Migrated SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data
Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating
On premise databases to Azure Data Lake store using Azure Data factory.
 Expertise in extracting and analyzing data from existing data stores using power center and DM
express tools and performing ad-hoc queries against warehouse environments such as Teradata.
 Extensively worked on adhoc requests using different ETL tools to load data.
 Proficiency in Data Analysis, handling complex query building and performance tuning.
 Involved in data scaling and data dicing.

Environment:Informatica Power Center, Teradata 13.11, DB2, SQL server, Flat files, DM Express,
SSIS, SSRS, Unix Shell scripting, MicroStrategy.

You might also like