Professional Documents
Culture Documents
Certified professional data engineer with overall IT experience more than 7 years in different domain
clients and technology spectrums. Experienced in working in highly scalable and large-scale applications
building with different technologies using Cloud, BigData, DevOps and Spring boot. Also, expert in
working in different working environments like Agile and Waterfall. Experience in working multi cloud,
migration, and scalable application projects.
Professional Summary
BigData
Experience in working various Hadoop distributions like Cloudera, Hortonworks and MapR.
Expert in ingesting batch data for incremental loads from various RBMS tools using Apache Sqoop.
Developed scalable applications for real-time ingestions into various databases using Apache Kafka.
Developed Pig Latin scripts and MapReduce jobs for large data transformations and Loads.
Experience in using optimized data formats like ORC, Parquet and Avro.
Experience in building optimized ETL data pipelines using Apache Hive and Spark.
Implemented various optimizing techniques in Hive scripts for data crunching and transformations.
Experience in building ETL scripts in Impala for faster access for reporting layer.
Built spark data pipelines with various optimization techniques using python and scala.
Experience in loading transactional and delta loads into NoSQL databases like HBase.
Developed various automation flows using Apache Oozie, Azkaban, and Airflow.
Experience in working with NoSQL Databases like HBase, Cassandra and MongoDB.
Experience in various integration tools like Talend, NiFi for ingesting batch and streaming data.
Cloud
Experience in working with various cloud distributions like AWS, Azure and GCP.
Developed various ETL applications using Databricks Spark distributions and Notebooks.
Implemented streaming applications to consume data from Event Hub and Pub/Sub.
Developed various scalable bigdata applications in Azure HDInsight’s for ETL services .
Experience in building data pipelines using Azure Data Factory, Azure Data Bricks.
Developed scalable applications using AWS tools like Redshift, DynamoDB, Kinesis.
Worked on building pipelines using snowflake for extensive data aggregations.
Working knowledge on GCP tools like BigQuery, Pub/Sub, Cloud SQL, and Cloud functions.
Experience in visualizing reporting data using tools like PowerBi, Google analytics.
DevOps
Experience in building continuous integration and deployments using Jenkins, Drone, Travis CI.
Expert in building containerized apps using tools like Docker, Kubernetes and terraform.
Developed reusable application libraries using docker containers.
Experience in building metrics dashboards and alerts using Grafana and Kibana.
Expert in java and Scala built tools like Maven, Pom and SBT for application development.
Experience in working with tools like GitHub, GitLab and SVN for code repository.
Expert in writing various YAML scripts for automation purpose.
Education
Experience Summary
Responsibilities
Experience in migrating existing legacy applications into optimized data pipelines using Spark with
Scala and Python, supporting testability and observability.
Experience in developing scalable real-time applications for ingesting clickstream data using Kafka
Streams and Spark Streaming.
Developed optimized and tuned ETL operations in Hive and Spark scripts using techniques such as
partitioning, bucketing, vectorization, serialization, configuring memory and number of executors.
Worked on Talend integrations to ingest data from multiple sources into Data Lake.
Developed an MVP on exporting data to Snowflake to understand usages and benefits for migration.
Experience in automating end to end Hadoop jobs using Oozie applications in optimized way.
Implemented cloud integrations to GCP and Azure for bi-directional flow setups for data migrations.
Developed various scripting functionality using Shell Script and Python.
Developed APIs for quick real-time lookup on top of HBase tables for transactional data.
Built Jupyter notebooks using PySpark for extensive data analysis and exploration.
Implemented code coverage and integrations using Sonar for improving code testability.
Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose.
Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
Experience designing solutions in Azure tools like Azure Data Factory, Azure Data Lake, SQL DWH,
Azure SQL & Azure SQL Data Warehouse, Azure Functions.
Migrated existing processes and data from our on-premises SQL Server and other environments to
Azure Data Lake
Implemented multiple modules in microservices to expose data through Restful API’s.
Developed Jenkins pipelines for continuous integration and deployment purpose.
Implemented various optimization techniques for Spark applications for improving performance.
Developed Jenkins and Drone pipelines for continuous integration and deployment purpose.
Built SFTP integrations using various VMWare solutions for external vendors on boarding.
Developed automated file transfer mechanism using python from MFT, SFTP to HDFS.
Technologies: HDFS, Hive, Spark, Oozie, Python, Scala, Shell, Talend, Snowflake, Azure, Azure
HDInsight’s, Databricks, Grafana, Jenkins, Azure Data Lake, Azure SQL
Responsibilities
Experience in building PySpark applications for ingestion data from data sources into Data Lake.
Developed scalable streaming applications using Kafka and Spark Streaming for real time ingestions.
Built Spring Boot microservices as part of real time data ingestion pipelines, and ingested data to
Data Lake using Spark Kafka consumers developed using Java high level API’s.
Experience with data migration from traditional RDBMS to Big Data and Cloud using tools such as
Sqoop, Spark JDBC connectors and AWS Glue.
Extensively used Spark Core and Spark SQL for data processing, data enrichments and also for
generating reports as per business user requirement.
Developed custom UDF’s in Hive and Spark.
Exposure with performance tuning for Hive scripts and Spark applications.
Experience in designing data solutions in AWS including data distributions and partitions,
scalability, disaster recovery and high availability.
Experience in monitoring and optimizing data solutions in AWS including usage of AWS
CloudWatch.
Used AWS Lambda functions as triggers for event-based Glue jobs.
Experience working with AWS Glue components such as data catalog, crawlers and developing
scripts in Glue using Spark and Python.
Used AWS Athena for ad-hoc querying.
Developed lot ETL operations using RedShift and Glue for business analysis purpose.
Implemented several modules in microservices application for streaming pipeline using Kafka.
In depth understanding and proficiency in automation of cloud platforms and data platforms.
Experience in automating end to end production and development jobs using Airflow.
Developed end to end deployment pipelines using CI/CD using Jenkins.
Technologies: PySpark, Kafka, Spark ,Sqoop, Hive, AWS, Aws Glue, RedShift, Airflow, Jenkins, Grafana,
Python, Shell, Microservices, Java, Restful API’s
Responsibilities
Technologies: HDP, HDFS, Hive, Spark , oozie, NiFi, Kibana, Grafana, Talend, Sqoop, Kafka, Scala, Python,
shell, spring boot, Avro
Responsibilities
Technologies: CDH, MapReduce, HDFS, Pig, Hive, Spark, Python, Scala, Java , Bash, Cassandra, Kafka,
Jenkins, Storm, Oozie, Sql
Client HSBC Location Hyderabad, India
Designation Data Engineer Duration August 2012 – June 2014
Responsibilities
Developed Hive queries as per required analytics for the report generation in Qlikview
Involved in developing the Pig scripts to process the data coming from different sources.
Developed Custom Map Reduce code in Java for data cleansing and crunching for further usage.
Worked on data cleaning using Pig scripts and storing in HDFS.
Worked on Pig user defined functions (UDF) using Java language for external functions.
Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
Worked on building custom UDF’s in Hive using Java.
Expertise in building custom alerts and validations using Python scripting.
Implemented Pl/SQL stored procedures, functions, triggers for persistence layer.
Used SVN for source code versioning and code repository.
Technologies: Cloudera, HDFS, MapReduce, Pig, Hive, SQL, Impala, Java, Python, Qlikview, Oozie