You are on page 1of 16

STREAMLINING DATA INSIGHTS:

A COMPREHENSIVE DATA
ENGINEERING
DASHBOARD

TANZIL AHMED KOUSTAV DUTTA


(24) (49)
CONTEXT
INTRODUCTION

GOOGLE CLOUD PLATFORM


EXTRACTION TRANSFORM LOAD

DASHBOARD

CONCLUSION
INTRODUCTION
This project integrates Python, Google Cloud Platform services, and a robust ETL pipeline to
create a scalable data ecosystem. A well-structured data model, coupled with GCP's

INTRODUCTION
capabilities, forms the foundation for insightful analytics and a user-friendly dashboard. The
ultimate goal is to unlock the full potential of data for informed decision-making.
WHAT IS
ETL ?
DATA EXTRACTION

DATA TRANSFORMATION

DATA LOADING
OUR
MODEL
RAW DATA
ETL
ANALYTICS
LOOKER

DATA: THE HEARTBEAT OF
DECISIONS, CURRENCY OF
PROGRESS AND KEY TO
UNDERSTANDING


TOOLS USED
GOOGLE CLOUD
JUPYTER NOTEBOOK PLATFORM

MAGE AI LOOKER
ENTITY RELATIONSHIP DIAGRAM

FACT TABLE
PRIMARY KEY – VendorID

DIMENSION
TABLE
PRIMARY KEY-o passenger_count_
dim
o
rate_code_id
o
trip_distance_id
o
payment_type_dim
o
datetime_dim
o
pickup_location_di
DATA TRAINING
IMPORTING REQUIRED
PACKAGES
PANDAS
DATA FRAME

SORTING

MERGING
GOOGLE CLOUD PLATFORM
UC
VIRTUAL KE SQL
MACHINE TP
hx
xh
GCP BUCKET

hh
h
BU
COMPUTE CK
ENGINE BIG QUERY
ET
STORAGE
DATA EXTRACTION
IMPORTS THE NECESSARY LIBRARIES: IO AND PANDAS.
CHECKS IF THE DATA LOADER VARIABLE IS ALREADY DEFINED .
DEFINES A FUNCTION CALLED LOAD_DATA_FROM_API().
INSIDE THE LOAD_DATA_FROM_API() FUNCTION,

Uses the requests library to download the PDF file.


Uses the io.StringIO() function to create a string buffer from the PDF file contents.
Uses the pandas.read_csv() function to read the data from the string buffer into a
Pandas Data Frame.
Returns the Pandas DataFrame.
DATA TRANSFORMATION

Creating trip Dropping


Importing libraries: distance duplicates:
dimension:
Renaming
Mapping rate columns:
Loading data: code:
Combining
Creating datetime dimensions:
dimensions:
DATA LOAD
Config File Dropping
Loader: duplicates:
Importing
Libraries: Data Frame: Renaming
columns:
Os.path:
get_repo_path:

Big Query:
DASHBOA
RD
CONCLUSION
In conclusion, our exploration into the integration of Python, GCP's Cloud Services, and a robust ETL (Extract, Transform, Load) pipeline has
unveiled a comprehensive approach to handling data efficiently. The outlined objectives led us to develop a model supported by a well-designed
ER diagram, utilizing Python for key tasks such as indexing, merging, and facilitating seamless interactions with a diverse dataset.

CONCLUSION
THANK
YOU

You might also like