You are on page 1of 2

sdfsdfsdf

DataBricks – Databricks is an industry-leading, cloud-based data engineering tool used for


processing and transforming massive quantities of data and exploring the data through machine
learning models.
Unified platform for all data, analytics, AI/ML workloads.

The Databricks Lakehouse Platform brings Data Engineering, BI & SQL Analytics, Data Science
and ML, Real-time data applications under one umbrella.

The Azure Databricks workspace provides a unified interface and tools for most data
tasks, including:

 Data processing workflows scheduling and management


 Generating dashboards and visualizations
 Managing security, governance, high availability, and disaster recovery
 Data discovery, annotation, and exploration
 Machine learning (ML) modeling, tracking, and model serving
 Generative AI solutions

Data Warehouse + Data Lake = Data Lakehouse

- Supports Structured, Unstructured, Semi structured data


- Supports ACID transactions timeCreated TIMESTAMP

) USING DELTA;

DROP TABLE

Managed table – Both, Data and Metadata are dropped

External Table – Only the actual Data gets deleted, but metadata still remains there.

Delte Lake is an Open Source storage framework that enables building a Lakehouse architecture.

Delta Lake sits right above your Data Lake (ADLS, Amazon S3 etc)

Delta Lake features:

- Support ACID transactions


- Scalable Metadata
- Time Travel
- Open Source
- Unified Batch/Streaming
- Audit History
- DML Operations

DELTA LAKE –

- Reliability with Transactions


- 48x faster data processing
- Data governance at scale with fine grain access control

Delta Table Advanced:

OPTIMIZE: Compact many small files

ZORDER: Index Tables

VACUUM: Clean up Stale Data files

RESTORE: Rollback to Older version

OPTIMIZE TABLE students

ZORDER BY id

DESCRIBE HISTORY students

SELECT * FROM students VERSION AS OF 3

RESTORE TABLE students TO VERSION AS OF 4

VACCUM students RETAIN 168 HOURS

You might also like