Welcome to Scribd!

Skip carousel

M1.4 Summary

Uploaded by

AnuprekshaChowhan

0% found this document useful (0 votes)

6 views4 pages

Original Title

M1.4_Summary

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views4 pages

M1.4 Summary

Uploaded by

AnuprekshaChowhan

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Proprietary + Confidential

Modernizing Data
Lakes and Data
Warehouses with
Google Cloud

Summary

Let’s review some keys concepts we covered on data lakes and data warehouses.
Proprietary + Confidential

Summary

● Data engineers build data pipelines.

● The customers of a data engineer are all the people who make decisions with data.

● The three primary advantages of doing data engineering in the cloud are:
○ Ability to separate compute and storage
○ Serverless products
○ Not having to manage infrastructure

● The primary role of a data engineer is to build data pipelines.

● The ultimate purpose of a data pipeline is to enable stakeholders in an
organization to use data to make faster and better decisions.
● While the role of a data engineer is not new, being able to build data pipelines
entirely in the cloud is relatively new. We argue that doing data engineering in
the cloud is advantageous because you can separate compute from storage,
and you don’t have to worry about managing infrastructure and even software.
This allows you to spend more time on what matters; getting insights from
data.
Proprietary + Confidential

Summary

● Difference between a data lake and data warehouse.

● Google Cloud Storage as a data lake solution.

● BigQuery as a data warehouse solution.

● Differences between ETL, ELT and EL.

● Google Cloud reference architectures for ETL, ELT and EL.

● We introduced data lakes and data warehouses and discussed the key
differences between the two. At a high level, a data lake is a place to store
unprocessed data. While a data warehouse is a place to store transformed
data that you ultimately want to use for analytics, machine learning, and
dashboards.
● Next, we discussed Cloud Storage as the data lake solution on Google Cloud
in some technical depth. We also presented other Google Cloud solutions for
low-latency requirements, transactional workloads, and structured data.
● We introduced BigQuery as the data warehouse solution on Google Cloud.
We discussed partitioning and clustering in BigQuery as techniques for
improving query performance.
● Also, we talked about E-L, E-L-T, and E-T-L and how these relate to data lakes
and warehouses.
● Finally, we presented some reference architectures on Google Cloud for
streaming and batch data pipelines. The hope is that these reference
architectures serve as a starting point for your data pipeline.
Proprietary + Confidential

Data Engineering on Google Cloud learning path

Modernizing Data Lakes and Data

1 Warehouses with Google Cloud

Building Batch Data Pipelines on

2 Google Cloud

Building Resilient Streaming Analytics

3 Systems on Google Cloud

Smart Analytics, Machine Learning

4 and AI on Google Cloud

We’ve completed Modernizing Data Lakes and Data Warehouses with Google
Cloud.

Building Batch Data Pipelines on Google Cloud is the second part of the Data
Engineering on Google Cloud course, which we will cover on day 2.

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
Rating: 5 out of 5 stars
5/5 (1)
OD M4 Summary of Introduction To Data Engineering
Document4 pages
OD M4 Summary of Introduction To Data Engineering
obumnwabude
No ratings yet
Serverless Data Engineering
From Everand
Serverless Data Engineering
Chuck Sherman
No ratings yet
M4 - Summary
Document5 pages
M4 - Summary
Edgar Sanchez
No ratings yet
M1.1 Introduction To Data Engineering
Document75 pages
M1.1 Introduction To Data Engineering
AnuprekshaChowhan
No ratings yet
OD M1 Introduction To Data Engineering
Document69 pages
OD M1 Introduction To Data Engineering
obumnwabude
No ratings yet
M1 - Introduction To Data Engineering
Document65 pages
M1 - Introduction To Data Engineering
Edgar Sanchez
No ratings yet
TF2023313 - BPP Big Data and AMV File 2
Document14 pages
TF2023313 - BPP Big Data and AMV File 2
Nisha
No ratings yet
01-Migrating Enterprise Databases To The Cloud
Document36 pages
01-Migrating Enterprise Databases To The Cloud
sanjeevi81
100% (1)
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
Document80 pages
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
RashidMahmood
No ratings yet
Big Query Google
Document62 pages
Big Query Google
Juan Carlos Lara Gallego
No ratings yet
GCP Storage
Document12 pages
GCP Storage
Paresh Bapat
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
Document75 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
Adha Jamil
No ratings yet
Google Cloud Analytics Lakehouse
Document47 pages
Google Cloud Analytics Lakehouse
Javier
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
Document58 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
Pongsakorn
No ratings yet
GCP Fund Module 4 Storage in The Cloud
Document37 pages
GCP Fund Module 4 Storage in The Cloud
Edogaru
100% (1)
Migrating Your Databases To Managed Services On Google Cloud
Document29 pages
Migrating Your Databases To Managed Services On Google Cloud
Akhilesh Tripathi
100% (1)
Unlock The Power of Private Cloud Big Data Analytics Ref Arch
Document18 pages
Unlock The Power of Private Cloud Big Data Analytics Ref Arch
soo kyung Lee
No ratings yet
Final Project
Document28 pages
Final Project
Erik Sanchez Perez
No ratings yet
Building Batch Data Pipelines On Google Cloud
Document18 pages
Building Batch Data Pipelines On Google Cloud
21020781
No ratings yet
Intro To Data Science On Cloud
Document1 page
Intro To Data Science On Cloud
Nageswar Makala
No ratings yet
Analyse Data in GCP
Document14 pages
Analyse Data in GCP
jaspreet
No ratings yet
Unit - 1 Cloud Introduction
Document11 pages
Unit - 1 Cloud Introduction
Aman Rai
No ratings yet
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
Document44 pages
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
Giuseppe Tritto
No ratings yet
M3 - Building A Data Warehouse
Document123 pages
M3 - Building A Data Warehouse
Edgar Sanchez
No ratings yet
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
Document10 pages
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
saurabhsaini31
No ratings yet
11 - Getting Started With Google Cloud
Document35 pages
11 - Getting Started With Google Cloud
ancgate
No ratings yet
Gartner-Building A Data Mana 771797 NDX
Document65 pages
Gartner-Building A Data Mana 771797 NDX
Aadi Manchanda
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
Document63 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
Pongsakorn
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
Document70 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
Adha Jamil
No ratings yet
Amrinder Singh - Azure
Document10 pages
Amrinder Singh - Azure
praja
No ratings yet
Naukri SambaiahMitta (7y 5m)
Document6 pages
Naukri SambaiahMitta (7y 5m)
Arup Naskar
No ratings yet
Role of Cloud Computing For Big Data
Document5 pages
Role of Cloud Computing For Big Data
International Journal of Innovative Science and Research Technology
No ratings yet
ByteHouse Ebook CNCH v1.1 Compressed
Document27 pages
ByteHouse Ebook CNCH v1.1 Compressed
Souley Goubekoye
No ratings yet
Fundamentals of Big Data and Business Analytics
Document6 pages
Fundamentals of Big Data and Business Analytics
jitendra.jgec8525
No ratings yet
Module - II Data Lake
Document8 pages
Module - II Data Lake
Ashish Ratahva
No ratings yet
Deloitte Take Home Challenge - V2
Document83 pages
Deloitte Take Home Challenge - V2
Likhith D
No ratings yet
Google Bigquery & Tableau: Best Practices
Document14 pages
Google Bigquery & Tableau: Best Practices
Juan
No ratings yet
Methodology For Solving Data Portability in Cloud Information Technology Essay
Document4 pages
Methodology For Solving Data Portability in Cloud Information Technology Essay
Karthik Krishnamurthi
No ratings yet
GCP Training: © 2020 Coforge © 2020 Coforge
Document17 pages
GCP Training: © 2020 Coforge © 2020 Coforge
bittuankit
No ratings yet
A Practitioners Guide To Databricks Vs Snowflake
Document8 pages
A Practitioners Guide To Databricks Vs Snowflake
Tapan S
No ratings yet
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
Document18 pages
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
Darshan Tank
No ratings yet
Google Cloud Platform (GCP) at A Glance
Document8 pages
Google Cloud Platform (GCP) at A Glance
temp mail
No ratings yet
cs320 Data Platforms Databricks Long
Document47 pages
cs320 Data Platforms Databricks Long
J.B
No ratings yet
GCP Data Engineer Course Content
Document7 pages
GCP Data Engineer Course Content
AMIT DHOMNE
No ratings yet
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Document95 pages
M4 - Manage Data Pipelines With Cloud Data Fusion and Cloud Composer
Edgar Sanchez
No ratings yet
Lecture 1.1 - Introduction To DE
Document27 pages
Lecture 1.1 - Introduction To DE
zakiamine97
No ratings yet
Preparing For The Google Cloud Professional Data Engineer Exam
Document3 pages
Preparing For The Google Cloud Professional Data Engineer Exam
Prashant Rohilla
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
Document19 pages
GCP - DataPlex - Building A Data Lakehouse
Jonathan Clifton
No ratings yet
2.1 Cloud Solution Architect Sec 07 Chap 001 Bigdata V01 PDF
Document17 pages
2.1 Cloud Solution Architect Sec 07 Chap 001 Bigdata V01 PDF
Souvik Pratiher
No ratings yet
Yang Fengming - 2019213704 - DraftReport
Document31 pages
Yang Fengming - 2019213704 - DraftReport
empireinfolytx
No ratings yet
Designing and Planning A Cloud SA
Document32 pages
Designing and Planning A Cloud SA
tunguyen.itsys
No ratings yet
Google - Cloud Digital Leader.v2023 06 22.q106
Document50 pages
Google - Cloud Digital Leader.v2023 06 22.q106
peacegroupng
No ratings yet
Choosing Storage and Data Solutions: Priyanka Vergadia Developer Advocate, Google Cloud
Document27 pages
Choosing Storage and Data Solutions: Priyanka Vergadia Developer Advocate, Google Cloud
ricardoccnp
No ratings yet
Review Paper On Big Data Management and Cloud Computing.
Document10 pages
Review Paper On Big Data Management and Cloud Computing.
serkalem yimer
No ratings yet
Thimmarayudu. Gangavaram 8007779596: Loading Unloading
Document4 pages
Thimmarayudu. Gangavaram 8007779596: Loading Unloading
Tech Spore
No ratings yet
Bigquery, Google'S Enterprise Data Warehouse: Slid02
Document3 pages
Bigquery, Google'S Enterprise Data Warehouse: Slid02
reet12345
No ratings yet
Google Cloud Architect Resume - Ajmal M Shaikh
Document6 pages
Google Cloud Architect Resume - Ajmal M Shaikh
annur keerthana
No ratings yet
Module 3 - Predicting Visitor Purchases With Classification Using BigQuery ML
Document112 pages
Module 3 - Predicting Visitor Purchases With Classification Using BigQuery ML
Edgar Sanchez
No ratings yet
Question - Quora
Document24 pages
Question - Quora
gazi faisal
No ratings yet
Part of The Computer and Its Functions
Document5 pages
Part of The Computer and Its Functions
Fritzjohn Jo
No ratings yet
Create Bootable USB Flash Drive To Install Windows 10 - Tutorials
Document2 pages
Create Bootable USB Flash Drive To Install Windows 10 - Tutorials
xi si
No ratings yet
Sptve - Icf 7-Q1-M16
Document16 pages
Sptve - Icf 7-Q1-M16
JONATHAN QUINTANO
No ratings yet
Ict CSS NC Ii 2ND Sem Final Exam
Document2 pages
Ict CSS NC Ii 2ND Sem Final Exam
Trexy Heredero
No ratings yet
Dell Technologies Proven Professional Exam Updates
Document9 pages
Dell Technologies Proven Professional Exam Updates
Mỹ Linh Nguyễn
No ratings yet
Chapter 5 Database Recovery Techniques
Document46 pages
Chapter 5 Database Recovery Techniques
Firaol Bonsa
No ratings yet
Tle Last Topic
Document1 page
Tle Last Topic
Charles Kenn Mantilla
No ratings yet
CA Tut14 ANS
Document2 pages
CA Tut14 ANS
Tamizharasan P S
No ratings yet
Gestión de Almacenamiento: Facultad de Ciencias de La Ingeniería
Document43 pages
Gestión de Almacenamiento: Facultad de Ciencias de La Ingeniería
LuisToapanta
No ratings yet
Kioxia SSD BG4 Product Brief
Document2 pages
Kioxia SSD BG4 Product Brief
tanzim ornab
No ratings yet
DVL 909
Document47 pages
DVL 909
Flaviano Saccà
No ratings yet
63009-001 RevC Users Guide 5099EK 5125EK 5150EK
Document56 pages
63009-001 RevC Users Guide 5099EK 5125EK 5150EK
jair
No ratings yet
Grade 11 ICT Short Note
Document1 page
Grade 11 ICT Short Note
vihasnajayasena
No ratings yet
Clean Log
Document3 pages
Clean Log
HKM PUDU
No ratings yet
EMC VMAX Tdev Properties
Document6 pages
EMC VMAX Tdev Properties
Shashi Kanth
No ratings yet
07 Rep
Document320 pages
07 Rep
api-3806887
No ratings yet
Exercise 06
Document2 pages
Exercise 06
•-• Vũuu
No ratings yet
MBR GPT Cheatsheet
Document3 pages
MBR GPT Cheatsheet
zekuelbouzrazi
No ratings yet
U4 L4 - CAche and Virtual Memory
Document14 pages
U4 L4 - CAche and Virtual Memory
Tejas Dixit
No ratings yet
LESSON 1 - Disassembling A Computer
Document7 pages
LESSON 1 - Disassembling A Computer
Lenover
No ratings yet
Classification of Memory
Document16 pages
Classification of Memory
yashikakaimwaal123
No ratings yet
HHDLU Details
Document4 pages
HHDLU Details
fightingfalcon5
No ratings yet
Computer Science Chapter 1 - Binary Systems and Hexadecimal
Document26 pages
Computer Science Chapter 1 - Binary Systems and Hexadecimal
bud Mur
No ratings yet
CO Previous Year O.U
Document7 pages
CO Previous Year O.U
Vistas
No ratings yet
IBM 355 Disk Storage Unit: Close Up View of The Disks in A 355
Document16 pages
IBM 355 Disk Storage Unit: Close Up View of The Disks in A 355
jheiar06
No ratings yet
CYBER 502x Unit 3 Sleuthkit
Document28 pages
CYBER 502x Unit 3 Sleuthkit
neon48
No ratings yet
E 09 DIAG0
Document181 pages
E 09 DIAG0
Andrew Dowset
No ratings yet
Compendium of Notes in Tle: First Quarter
Document9 pages
Compendium of Notes in Tle: First Quarter
Raymond Puno
No ratings yet
Lalitha Kumari - Backup - TechM - Bangalore
Document4 pages
Lalitha Kumari - Backup - TechM - Bangalore
SHAIK AJEES
No ratings yet
SQL Server: Buffer Manager Object
Document3 pages
SQL Server: Buffer Manager Object
srinivas ma
No ratings yet