Professional Documents
Culture Documents
Data Engineer
Email id: akashvarma7439@gmail.com
LinkedIn Id:linkedin.com/in/akash-varma7
Ph no: (513) 760-1004
Professional Summary:
Accomplished data engineer with 8 years of experience, adept at leveraging a comprehensive
suite of tools and technologies to design, develop, and maintain sophisticated data solutions.
Proficient in ETL (Extract, Transform, Load) processes, utilizing tools such as
InformaticaPowerCenter, Talend, or SSIS to ensure efficient data integration across diverse
platforms.
Skilled in working with big data frameworks and technologies, including Hadoop, Spark, and
tools like Apache Hive, Apache Pig, or Apache Kafka, to enable robust data processing and
analysis.
Experienced in cloud-based data engineering, leveraging tools such as AWS Glue, Azure Data
Factory, Dataflow to build scalable and resilient data pipelines.
Strong understanding of data warehousing concepts, utilizing tools like Amazon
RedshiftSnowflake to design and optimize data storage and retrieval systems.
Strong understanding of Azure Stream Analytics, allowing real-time data processing and
analytics on streaming data sources.
Proficient in SQL-based querying and scripting languages, including Oracle, SQL Server, or
PostgreSQL, to extract, manipulate, and analyze data from relational databases.
Skilled in data modeling and schema design using tools like ERwin, ER/Studio, or Lucidchart
to ensure data consistency, integrity, and optimal performance.
Expertise in data visualization tools such as Tableau, Power BI, or QlikView, enabling the
creation of interactive dashboards and reports to deliver actionable insights.
Experienced in working with distributed data processing frameworks like Apache Spark or
ApacheFlink, harnessing their capabilities for large-scale data processing and real-time analytics.
Skilled in stream processing tools like ApacheKafka, AWSKinesis, enabling real-time data
ingestion, transformation, and event-driven architectures.
Proficient in developing data pipelines and ETL processes on Azure, ensuring efficient and
reliable extraction, transformation, and loading of data from diverse sources
Strong understanding of data governance principles and practices, employing tools like Collibra,
Alation, or InformaticaAxon to ensure data quality, lineage, and regulatory compliance.
Proficient in data cataloging tools such as Alation, Collibra Catalog, or Informatica Enterprise
Data Catalog, facilitating data discovery and fostering collaboration across teams.
Experienced in data quality management tools like Talend Data Quality, Informatica Data
Quality, or Trifacta Wrangler, ensuring data accuracy, consistency, and completeness.
Experienced in data integration and hybrid cloud solutions, employing Azure Stack and Azure
Data Gateway for seamless data movement between on-premises and cloud environments.
Skilled in data preparation and wrangling tools such as TrifactaWrangler, Alteryx, or Dataiku
DSS, simplifying the process of transforming raw data into analysis-ready formats.
Proficient in data orchestration and workflow management tools like ApacheAirflow, Luigi, or
Oozie, automating and scheduling complex data processing tasks and dependencies.
Proficient in developing data pipelines and ETL processes on Azure, ensuring efficient and
reliable extraction, transformation, and loading of data from diverse sources.
Experienced in data replication and synchronization tools like Attunity, GoldenGate, or AWS
Database Migration Service, enabling real-time data replication across heterogeneous systems.
Skilled in data exploration and discovery tools like ApacheZeppelin, Jupyter Notebook, or
Databricks, facilitating interactive data analysis and collaboration among data scientists and
analysts.
Skilled in data compression and optimization techniques using tools like Parquet, ORC, or
Avro, reducing storage costs and improving query performance in big data environments.
Experienced in implementing data security and compliance measures on Azure, including
encryption, access controls, and audit trails, to protect sensitive data and meet regulatory
requirements.
Experienced in data masking and anonymization tools such as Informatica Persistent Data
Masking, Delphix, or IBM Infosphere Optim, safeguarding sensitive data during development
and testing.
Skilled in data profiling and data quality assessment tools like Talend Data Stewardship,
Trillium, or IBM InfoSphere QualityStage, identifying data anomalies and improving data
integrity.
Proficient in data versioning and lineage tools like ApacheAtlas, Collibra Data Lineage, or
Informatica Enterprise Data Catalog, providing visibility into data changes and dependencies.
Experienced in data integration and synchronization tools like Oracle Data Integrator,
Informatica Power Exchange, or Talend Data Integration, facilitating seamless data movement
between systems.
Skilled in real-time analytics platforms like ApacheDruid, AWSAthena, enabling ad-hoc
querying and exploration of large datasets with minimal latency.
Proven track record of successfully implementing security and compliance measures on Azure,
ensuring data protection and adherence to regulatory requirements.
Experienced in data archival and lifecycle management tools like Informatica Data Archive,
IBM InfoSphere Optim, or AWS Glacier, optimizing storage costs and compliance requirements.
Skilled in data profiling and metadata management tools like Informatica Metadata Manager,
CollibraDataGovernance, or Talend Data Catalog, facilitating data understanding and lineage
documentation.
TECHNICAL SKILLS:
Hadoop Components / HDFS, Map Reduce, Spark, Airflow, Yarn, HBase, Hive, Pig,
Big Data Flume, Sqoop, Kafka, Oozie, Hadoop, Zookeeper, Spark SQL.
Languages: SQL, PL/SQL, PYTHON, Java, Scala, C, HTML, Unix, Linux
Cloud Platform AWS (Amazon Web Services), Microsoft Azure
ETL Tools: Informatica PowerCenter, SSIS, Talend
Reporting Tools Power BI, SSRS and Tableau
Tracking tool JIRA
MS-Office Package Microsoft Office (Windows, Word, Excel, PowerPoint, Visio,
Project).
Databases MySQL, Oracle, Redshift, PostgreSQL
Operating Systems Windows, LINUX, Unix
Version Control Bitbucket, GitHub
PROFESSIONAL EXPERIENCE:
Environment: Hive, HBase, Spark, UNIX, SQL Server, Ansible, MapReduce, Restful Service, Maven,
GIT, JIRA, ETL,AWS, R, Python, Scala, NumPy, Informatica, Talend, API’s.
Education: Aug 2012 to April 2016 Bachelor of Technology in Computer Science, JNTUH college of
engineering Hyderabad, India.