Welcome to Scribd!

Internship Presentation 2

Uploaded by

0% found this document useful (0 votes)

5 views16 pages

This document outlines a project to create a scalable data ecosystem and user-friendly dashboard using Python, Google Cloud Platform services, and an ETL pipeline. It describes extracting raw data, transforming it by creating dimensions and merging tables, and loading it into BigQuery. This forms the foundation for insightful analytics accessible through a Looker dashboard. The goals are to unlock full data potential for informed decisions and understandings. Key tools used include Jupyter Notebook, Google Cloud Platform, Mage AI, and Looker.

Original Description:

Copyright

Available Formats

KEY, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as KEY, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views16 pages

Internship Presentation 2

Uploaded by

kaustav.9748

Copyright:

Available Formats

Download as KEY, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

STREAMLINING DATA INSIGHTS:

A COMPREHENSIVE DATA
ENGINEERING
DASHBOARD

TANZIL AHMED KOUSTAV DUTTA

(24) (49)
CONTEXT
INTRODUCTION

GOOGLE CLOUD PLATFORM

EXTRACTION TRANSFORM LOAD

DASHBOARD

CONCLUSION
INTRODUCTION
This project integrates Python, Google Cloud Platform services, and a robust ETL pipeline to
create a scalable data ecosystem. A well-structured data model, coupled with GCP's

INTRODUCTION
capabilities, forms the foundation for insightful analytics and a user-friendly dashboard. The
ultimate goal is to unlock the full potential of data for informed decision-making.
WHAT IS
ETL ?
DATA EXTRACTION

DATA TRANSFORMATION

DATA LOADING
OUR
MODEL
RAW DATA
ETL
ANALYTICS
LOOKER
“
DATA: THE HEARTBEAT OF
DECISIONS, CURRENCY OF
PROGRESS AND KEY TO
UNDERSTANDING

”
TOOLS USED
GOOGLE CLOUD
JUPYTER NOTEBOOK PLATFORM

MAGE AI LOOKER
ENTITY RELATIONSHIP DIAGRAM

FACT TABLE
PRIMARY KEY – VendorID

DIMENSION
TABLE
PRIMARY KEY-o passenger_count_
dim
o
rate_code_id
o
trip_distance_id
o
payment_type_dim
o
datetime_dim
o
pickup_location_di
DATA TRAINING
IMPORTING REQUIRED
PACKAGES
PANDAS
DATA FRAME

SORTING

MERGING
GOOGLE CLOUD PLATFORM
UC
VIRTUAL KE SQL
MACHINE TP
hx
xh
GCP BUCKET

hh
h
BU
COMPUTE CK
ENGINE BIG QUERY
ET
STORAGE
DATA EXTRACTION
IMPORTS THE NECESSARY LIBRARIES: IO AND PANDAS.
CHECKS IF THE DATA LOADER VARIABLE IS ALREADY DEFINED .
DEFINES A FUNCTION CALLED LOAD_DATA_FROM_API().
INSIDE THE LOAD_DATA_FROM_API() FUNCTION,

Uses the requests library to download the PDF file.

Uses the io.StringIO() function to create a string buffer from the PDF file contents.
Uses the pandas.read_csv() function to read the data from the string buffer into a
Pandas Data Frame.
Returns the Pandas DataFrame.
DATA TRANSFORMATION

Creating trip Dropping

Importing libraries: distance duplicates:
dimension:
Renaming
Mapping rate columns:
Loading data: code:
Combining
Creating datetime dimensions:
dimensions:
DATA LOAD
Config File Dropping
Loader: duplicates:
Importing
Libraries: Data Frame: Renaming
columns:
Os.path:
get_repo_path:

Big Query:
DASHBOA
RD
CONCLUSION
In conclusion, our exploration into the integration of Python, GCP's Cloud Services, and a robust ETL (Extract, Transform, Load) pipeline has
unveiled a comprehensive approach to handling data efficiently. The outlined objectives led us to develop a model supported by a well-designed
ER diagram, utilizing Python for key tasks such as indexing, merging, and facilitating seamless interactions with a diverse dataset.

CONCLUSION
THANK
YOU

Salesforce Interview Questions
Document59 pages
Salesforce Interview Questions
kumarvicky87
No ratings yet
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
From Everand
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
Ron C. L'Esteve
No ratings yet
Big Data Technologies
Document31 pages
Big Data Technologies
AdiTan00
No ratings yet
What Is An Ivr System and How Does It Work
Document3 pages
What Is An Ivr System and How Does It Work
hamidboulahia
No ratings yet
STUTI - GUPTA Hadoop Resume PDF
Document2 pages
STUTI - GUPTA Hadoop Resume PDF
Noble kumar
No ratings yet
BigData Objective
Document93 pages
BigData Objective
saitej
No ratings yet
Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project
Document36 pages
Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project
Dejan Nastevski
No ratings yet
STUTI - GUPTA Hadoop Resume PDF
Document2 pages
STUTI - GUPTA Hadoop Resume PDF
Noble kumar
No ratings yet
Dice Resume CV Vijay Krishna
Document4 pages
Dice Resume CV Vijay Krishna
RAJU P
No ratings yet
QGIS Blueprints
From Everand
QGIS Blueprints
Mearns Ben
No ratings yet
GCP Data
Document6 pages
GCP Data
Kiran Chilledout
No ratings yet
Vishwa SrDataEngineer Resume
Document4 pages
Vishwa SrDataEngineer Resume
HARSHA
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
ACA BigData Consolidated Dump
Document29 pages
ACA BigData Consolidated Dump
Ahimed Habib Husen
No ratings yet
Final
Document276 pages
Final
Yàssine Hàdry
No ratings yet
Operationalizing The Model
Document46 pages
Operationalizing The Model
Mohamed Rahal
No ratings yet
Shaik Mahajabin
Document4 pages
Shaik Mahajabin
soumyaranjan panigrahy
No ratings yet
JudyQiu Talk IIT Nov 4 2011
Document81 pages
JudyQiu Talk IIT Nov 4 2011
Sherril Vincent
No ratings yet
Department of Computer Science and Engineering: Delhi Technological University Big Data Analysis Lab-BDA E3-G3
Document4 pages
Department of Computer Science and Engineering: Delhi Technological University Big Data Analysis Lab-BDA E3-G3
Ishan
No ratings yet
Hadoop
Document34 pages
Hadoop
forjunklikescribd
No ratings yet
ETL Talend: Doing IN
Document13 pages
ETL Talend: Doing IN
Fachrul Alam
No ratings yet
Akhil Reddy GCP
Document8 pages
Akhil Reddy GCP
abhay.rajauriya1
No ratings yet
VTU Solution of 13MCA43 Advanced Web Programming June 2017 by Uma B
Document19 pages
VTU Solution of 13MCA43 Advanced Web Programming June 2017 by Uma B
Sony
No ratings yet
Sai Kumar Aindla: Education
Document1 page
Sai Kumar Aindla: Education
Harold
No ratings yet
Revision01 2
Document30 pages
Revision01 2
Sadouli Saif
No ratings yet
Pig Mix Benchmark
Document6 pages
Pig Mix Benchmark
abhimonica
No ratings yet
20dce017 Bda Pracfil
Document41 pages
20dce017 Bda Pracfil
Raj Chauhan
No ratings yet
Module 4 - Pig
Document65 pages
Module 4 - Pig
Aditya Raj
No ratings yet
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
Document7 pages
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
jefferyleclerc
No ratings yet
Cloud RanganathJasti
Document6 pages
Cloud RanganathJasti
Harshvardhini Munwar
No ratings yet
Airlines Dynamic Pricing
Document24 pages
Airlines Dynamic Pricing
Asma Tekitek
No ratings yet
Web-Scale Data Processing: Christopher Olston and Many Others
Document32 pages
Web-Scale Data Processing: Christopher Olston and Many Others
Said Lobo
No ratings yet
IAT-V Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
Document10 pages
IAT-V Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
Darshan R Gowda
No ratings yet
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
Document51 pages
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
Sadikshya khanal
No ratings yet
CCF Usage Manual
Document8 pages
CCF Usage Manual
Ashutosh Kumar
No ratings yet
PDF
Document23 pages
PDF
HAMDI GDHAMI
No ratings yet
Go For DevOps (201-250)
Document50 pages
Go For DevOps (201-250)
Matheus Souza
No ratings yet
Oreillyfodooltweek 11675274112220
Document45 pages
Oreillyfodooltweek 11675274112220
sanedo.owner
No ratings yet
Python - Đặng Quốc Anh
Document2 pages
Python - Đặng Quốc Anh
tungnv2
No ratings yet
Road To Data Engineer
Document9 pages
Road To Data Engineer
dtanonimo
No ratings yet
C Code For Shopping Mall
Document40 pages
C Code For Shopping Mall
Ankur Sen
No ratings yet
Avinash - Data Engineer (AutoRecovered)
Document10 pages
Avinash - Data Engineer (AutoRecovered)
arifmuhmmad8
No ratings yet
BD Sqltohadoop3 PDF
Document13 pages
BD Sqltohadoop3 PDF
herotest
No ratings yet
Vedanth Kunchala Data Integration Engineer
Document4 pages
Vedanth Kunchala Data Integration Engineer
Dummy Gammy
No ratings yet
Towards Optimizing Hadoop Provisioning in The Cloud: Karthik Kambatla, Abhinav Pathak, Himabindu Pucha
Document5 pages
Towards Optimizing Hadoop Provisioning in The Cloud: Karthik Kambatla, Abhinav Pathak, Himabindu Pucha
chakrabhandari
No ratings yet
Joydeep Saha: Objective
Document2 pages
Joydeep Saha: Objective
Joydeep Saha
No ratings yet
Tapasvi - Lead GCP Cloud Data Engineer
Document5 pages
Tapasvi - Lead GCP Cloud Data Engineer
Sri Guru
No ratings yet
Data Processing For Large Database Using Mapreduce Approach Using Apso
Document59 pages
Data Processing For Large Database Using Mapreduce Approach Using Apso
sumankumar
No ratings yet
Deepak (Sr. Data Engineer)
Document10 pages
Deepak (Sr. Data Engineer)
ankul
No ratings yet
C++ Code For Shopping Mall
Document39 pages
C++ Code For Shopping Mall
Arko Roy
17% (6)
Scet Unit 5
Document9 pages
Scet Unit 5
Devi Kondaveti
No ratings yet
How To Access Google Sheet Data Using The Python API and Convert To Pandas Dataframe
Document5 pages
How To Access Google Sheet Data Using The Python API and Convert To Pandas Dataframe
gurusodhii
No ratings yet
Project Experience: Company: Mitratech LTD
Document5 pages
Project Experience: Company: Mitratech LTD
Shamsher Siddique
No ratings yet
Nelson Calero-Terraform Tips and Tricks
Document40 pages
Nelson Calero-Terraform Tips and Tricks
Florin Nedelcu
No ratings yet
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
Document5 pages
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
Kavita Rahane
No ratings yet
Payroll System Ip
Document38 pages
Payroll System Ip
srivathsan0104
No ratings yet
Bda Unit-Iii
Document42 pages
Bda Unit-Iii
rohithatimsi
No ratings yet
Optimization of Computing and Networking Resources of A Hadoop Cluster Based On Software Defined Network
Document15 pages
Optimization of Computing and Networking Resources of A Hadoop Cluster Based On Software Defined Network
liyuxin
No ratings yet
Kiran Reddy Resume
Document7 pages
Kiran Reddy Resume
IT Jobs India
No ratings yet
Notes
Document12 pages
Notes
Chris Harris
No ratings yet
CV KartikDutta 2022
Document2 pages
CV KartikDutta 2022
Kartik Dutta
No ratings yet
Pig Practical: Mcjjcbek/View?Usp Sharing
Document10 pages
Pig Practical: Mcjjcbek/View?Usp Sharing
Chandan
No ratings yet
Data Analysis With Python
Document12 pages
Data Analysis With Python
Minh Nhựt Nguyễn
No ratings yet
Oracle Application Express Installation Guide
Document142 pages
Oracle Application Express Installation Guide
Edson Antonio Dos Santos
No ratings yet
Eli Lilly Employee Contract Agreement
Document3 pages
Eli Lilly Employee Contract Agreement
Meerab Khan
No ratings yet
Study Plan For BCS Batch 201
Document3 pages
Study Plan For BCS Batch 201
Amiel Harith
No ratings yet
Module.+5+ +Business+Reporting+and+Visual+Analytics
Document14 pages
Module.+5+ +Business+Reporting+and+Visual+Analytics
Nicolas Joaquin Sebastian Gutierrez
No ratings yet
Syllabus For 4 Semester Cse and It (All Subjects Are Common) Tcs 401 Unix and Shell Programming
Document39 pages
Syllabus For 4 Semester Cse and It (All Subjects Are Common) Tcs 401 Unix and Shell Programming
Deepak Tiwari
No ratings yet
Arc Hydro Toolbox
Document311 pages
Arc Hydro Toolbox
Stars2man
No ratings yet
Internet Gateway Best Practices
Document38 pages
Internet Gateway Best Practices
David Xin
No ratings yet
Data Storage Policy PDF
Document3 pages
Data Storage Policy PDF
Jelyn Marey Tanael Joves
No ratings yet
Inv prj1
Document22 pages
Inv prj1
Pradip Mistry
No ratings yet
Introduction
Document5 pages
Introduction
Payam Mhammad
No ratings yet
MCSE Course Readme
Document2 pages
MCSE Course Readme
Sufeeyan Ahmed
No ratings yet
Introduction To RS&GIS: Syeda Hira Fatima 26 December, 2017 References
Document45 pages
Introduction To RS&GIS: Syeda Hira Fatima 26 December, 2017 References
asad
No ratings yet
Short Questions... Dbms
Document10 pages
Short Questions... Dbms
Muhammad Jamal Shah
No ratings yet
Programming in Modern C++
Document1 page
Programming in Modern C++
pallavi
No ratings yet
Registry of Barangay Inhabitants
Document12 pages
Registry of Barangay Inhabitants
DenAym
No ratings yet
Android Dbi Mulliner Breakpoint2012 PDF
Document52 pages
Android Dbi Mulliner Breakpoint2012 PDF
tony22 ele
No ratings yet
Analisis Perwujudan Fungsi Bimbingan Konseling Pada Peserta Didik Kelas X Sma Negeri 1 Pontianak
Document13 pages
Analisis Perwujudan Fungsi Bimbingan Konseling Pada Peserta Didik Kelas X Sma Negeri 1 Pontianak
Pratiwi Devi Kurniasih
No ratings yet
1 SRS (Email Spam Detection) - Introduction:: 1.1.1 Purpose
Document10 pages
1 SRS (Email Spam Detection) - Introduction:: 1.1.1 Purpose
udyadav430
No ratings yet
Aa 7 1 Sys Conf RCMDN Guid
Document29 pages
Aa 7 1 Sys Conf RCMDN Guid
Oscar Alberto Zambrano
No ratings yet
Lesson 2-Operation System
Document6 pages
Lesson 2-Operation System
Jhomar Etulle
No ratings yet
Emc A Programmer's Guide: Version 1.3.0 Revision B
Document129 pages
Emc A Programmer's Guide: Version 1.3.0 Revision B
jp
No ratings yet
Integration Broker Simple Example To Transfer Data From Hrms To fscm1
Document17 pages
Integration Broker Simple Example To Transfer Data From Hrms To fscm1
JR Prasad
No ratings yet
Domestic IT Helpdesk Attendant (Facilitator Guide)
Document247 pages
Domestic IT Helpdesk Attendant (Facilitator Guide)
N Gopi Krishna
No ratings yet
Create and Share Smart M.Apps in M.App Enterprise
Document76 pages
Create and Share Smart M.Apps in M.App Enterprise
Emanuel Proust
No ratings yet
Remote Method Invocation (RMI) - Javatpoint
Document1 page
Remote Method Invocation (RMI) - Javatpoint
Haddad Houssaine
No ratings yet
Mcafee Total Protection Product Key
Document2 pages
Mcafee Total Protection Product Key
Geek Squad Plus Technology
No ratings yet
Solving No Data Available For Smart Business Apps
Document15 pages
Solving No Data Available For Smart Business Apps
33286762
0% (1)