You are on page 1of 2

uantities of data and exploring the data through machine learning models.

Unified platform for all data, analytics, AI/ML workloads.

The Databricks Lakehouse Platform brings Data Engineering, BI & SQL Analytics, Data Science
and ML, Real-time data applications under one umbrella.

The Azure Databricks workspace provides a unified interface and tools for most data
tasks, including:

 Data processing workflows scheduling and management


- Supports Structured, Unstructured, Semi structured data
- Supports ACID transactions
- Schema on Read
- Does not leave the data in corrupt state

Data LakeHouse is nothing, but your ADLS only, but if you write the files in Delta format, it becomes
Lakehouse.

DESCRIBE EXTENDED default.people10m;

CREATE TABLE default.people10m(

Id INT,

Firstname STRING,

Lastname STRING,

timeCreated TIMESTAMP

) USING DELTA;

DROP TABLE

Managed table – Both, Data and Metadata are dropped

External Table – Only the actual Data gets deleted, but metadata still remains there.

Delte Lake is an Open Source storage framework that enables building a Lakehouse architecture.

Delta Lake sits right above your Data Lake (ADLS, Amazon S3 etc)

Delta Lake features:


- Support ACID transactions
- Scalable Metadata
- Time Travel
- Open Source
- Unified Batch/Streaming
- Audit History
- DML Operations

DELTA LAKE –

- Reliability with Transactions


- 48x faster data processing
- Data governance at scale with fine grain access control

Delta Table Advanced:

OPTIMIZE: Compact many small files

ZORDER: Index Tables

VACUUM: Clean up Stale Data files

RESTORE: Rollback to Older version

OPTIMIZE TABLE students

ZORDER BY id

DESCRIBE HISTORY students

SELECT * FROM students VERSION AS OF 3

RESTORE TABLE students TO VERSION AS OF 4

VACCUM students RETAIN 168 HOURS

You might also like