0% found this document useful (0 votes)

97 views39 pages

Onehouse

Onehouse offers a fully-managed Universal Data Lakehouse that enhances the performance and automation of Apache Hudi data platforms. The platform addresses various challenges such as cost savings, performance optimization, and engineering productivity while providing features like continuous ingestion, auto-scaling, and advanced monitoring. It aims to simplify data management and improve query performance through hands-free optimizations and seamless integration with existing data ecosystems.

Uploaded by

navnith0503

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views39 pages

Onehouse

Uploaded by

navnith0503

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Onehouse Introduction

Supercharge performance and automate

your Apache Hudi™ data platform
Today’s Agenda
• Introductions
• CCC Intelligent Solution Goals
• How Onehouse can help
• Next steps
Tell us more about your platform?
Question Key Observations

What is your business use case?

What are your data sources?

How is data ingestion done to Hudi?

Scale?

What query engines do you use?

Where do you have current challenges?

Goals and Pains
Any of these examples?

Cost Savings Performance Eng Productivity

1. Overprovisioned AWS Infra? 1. Any slow queries? 1. How much time on-call?
2. Any vendor costs growing 2. Diﬃcult Hudi perf tuning? 2. What routine ops do you want to
unexpectedly? 3. Slow writes / freshness latency? automate?
3. Spark clusters or storage cost? 3. Hudi monitoring metrics?

Reliability Architecture Changes Hudi Expertise

1. Hudi pipeline stability? 1. Iceberg interop? 1. How many Hudi pro on your team?
2. Any internal SLA goals to hit? 2. Warehouse migrations? 2. Hudi version upgrades?
3. Disaster recovery? 3. Database ingestion? 3. Any special tuning projects
4. New query engines or catalogs? planned?
Example impact with our customers

Cost Savings Performance Eng Productivity

TBs/day Hudi experts 30x query performance in 1day after 4-6 FTEs managing Spark -> 0.5 FTE
40% total infra reduction vs OSS Spark their team spent months tuning building ETL pipeline templates
on bare metal EC2 6h write latency -> minutes

Reliability Architecture Changes Hudi Expertise

~75% est downtime reduction Hudi + Iceberg interop Hudi v0.8 -> 0.14 upgrades
Snowﬂake migrations Professional services for issue
Glue + DataHub multi-catalog resolution and perf tuning
How Onehouse Can Help:
The Universal Data Lakehouse
Supercharge performance and automate
your Apache Hudi™ data platform

“ Onehouse is going to make broadly accessible

what has to-date been a tightly held secret
used by only the most advanced data teams

Aaron Schildkrout, Investor - Addition

Delivering an Open Data Lake for the Enterprise
Take your data lakehouse to the next level

Infra Cost + Continuous analysis & DIY tuning 10x faster innovations on top of OSS
Performance -
-
Multiplexed ingestion
Optimized record-level index (RLI)
- Column-level partial overwrites (coming soon)

Analytics and data science-ready

- Auto perf tuned ﬁle sizes, sorting, partitioning, operates low
latency pipelines)

Developer Deep knowledge on Spark, No knobs tuning, with deep automatic

Experience Scala, cloud infra ++ database optimizations

Time to Months Days to weeks

production

Maintenance DIY oncall, monitoring, version Onehouse handles everything

upgrades, security patches,
backﬁlling
What is Onehouse?
Fully-Managed Universal Data Lakehouse

Data Warehouses
+
Fully-managed, securely in your account

Data Streams LakeView & Control Plane Query Engines

Table Format Interop & Catalog Sync

Databases Real-time Engines

Continuous Table SQL Spark/Python

Ingestion Optimizations Transformations jobs
Cloud Storage Data Engineering

Data Science

Onehouse Compute Runtime

Onehouse Compute Runtime - OCR
Specialized + runtime in your VPC

Serverless Compute Manager

○ Elastic cluster scaling - for handling data workload spikes and swings
○ Multi-cluster management - for flexible resource allocation and isolation
○ BYOC Infrastructure - for security, sovereignty, and flexibility

Adaptive Workload Optimizations

○ Multiplexed job scheduler - to minimize compute footprint
○ Lag-aware scheduling - to enforce latency SLAs
○ Performance proﬁles - to balance write vs query performance

High-performance Lakehouse I/O

○ Vectorized columnar merging - for fast writes
○ Parallel pipelined execution - to maximize CPU efﬁciency
○ Optimized storage access - reduces network requests vs oss parquet readers
- 4 ways to engage

Onehouse Cloud: Ingestion/ETL Onehouse Table Optimizer

● 10x faster vs existing OSS Hudi, Iceberg, Delta pipelines ● No-knobs tuning for Apache Hudi pipelines
● CDC ingestion from popular databases to any engines ● Automate advanced optimizations for 10x faster perf
● Auto-scaling infra w/ serverless exp in customer VPC ● No change to your existing pipelines

Onehouse LakeView (Free!) Enterprise Hudi Support + Services

● Advanced Hudi dashboards to monitor OSS pipelines ● Enterprise SLA (24x7, guaranteed response time)
● Monitoring + tuning insights for your existing data pipelines ● Hudi professional services (ex: Hudi 1.0 version upgrade)
- Diﬀerentiation vs AWS EMR

Autoscaling Advanced auto-scaling tuned for Hudi pipelines Generic Spark autoscaling

Compute Multiplexed Spark compute w/ serverless Provision, tune, and manage your own
backbone experience inside your secure VPC Spark clusters

Hudi table Automated Apache Hudi services continuously DIY custom code
services optimize tables

Ingestion Fully managed ingest services that beat cost DIY custom code
even vs OSS Spark on raw Ec2

ETL Low-code transformers, extensible Spark code, Custom Spark, PySpark, SparkSQL
SQL endpoint code

Observability Rich dashboards/alerts for pipeline health, perf Basic Spark monitors lack details to
quality, lag, cost, optimizations, + Hudi timeline debug and tune Hudi pipelines

Support 24x7 Hudi and Spark experts 24x7 AWS generic Spark support
- Examples and scale

Industry Vertical: Retail

Use Case: Near real-time lakehouse and ingestion for IoT, orders, inventory
management
Onehouse Impact: Cost reduction from BigQuery, faster/fresher data
lakehouse ingestion. Move from OSS to Managed Table Service management

Industry Vertical: Utilities

Use Case: Power meter utility sensor IoT. Predictive analytics for grid
failures.
Onehouse Impact: Reached previously untapped 10yr historical data that was
estimated to take more than 6 months just to process 1.7PB+ 1,000+
Data Processed Active Pipelines

Industry Vertical: Finance

Use Case: Credit card issuing, transaction processing, fraud detection, and
more.
700k+ 1.5 TB/min
Onehouse Impact: Snowflake migration. Streaming ingestion of raw data into Compute hours Peak Burst Rates
Hudi and XTable/Iceberg tables. Table management for downstream pipelines for a single pipeline

Sample Customer Workload

- EXAMPLE Customer TCO vs EMR

Total Cost:
$447K
Table Optimizer
Examp
le only
- actu
al $ es
t imates
Engineering
1.5 FTE(1)
$375K
will va
ry ~20% ~1,200
Infra. cost reduction Query hours gained
Total Cost:
$131K

Foundation
SavePoint / Rollback* $12K Engineering
0.25 FTE(1)
Monitoring $12K $60K

Observability $12K
Onehouse(5)

~2,400
$53K

~75%
Hudi Table Services $12K

AWS $24K(2) AWS $18K(2)

Downtime Avoidance Engineering hours saved

OSS Hudi Onehouse Platform
Proven industry traction for Onehouse & Hudi

50%
Reduction in data pipeline
98%
Reduction in CDC sync frequency
10X
Reduction in data processing
<1s
Query latency at data lake scale
engineering expenses times, after reducing backﬁll time
from half a year to days

80% 80% 5X+ 1.25M

Reduction in compute costs Faster ingestion
Reduction in compute costs Saved per year vs Fivetran
over millions of v_cores and Snowflake , while

90% 2-10X powering Notion AI

Reduction in storage costs Typical query performance

over 100TB/day ingested
- Peek at the feature menu 👀
Stream Ingest from Kafka,
Low-code transformations
RDBMS, or S3

Extensible Spark-code
Data Quality Quarantine
transform

Incremental Change Data

Capture SQL endpoint
Ingestion Transformations
Clustering Cleaning

Table Services Compaction Multi catalog sync

Table Format Interop More coming soon ++

Infrastructure
Serverless In-VPC Auto-scaling
Dataplane Spark clusters

Infra as Code APIs Monitoring/Alerting

Multiplexed compute Hudi Enterprise Support

Next Steps
- Next Steps Example

Identify Use Case Technical Trial Production

Setup

Executive Alignment Free trial

- Finding the Use Case (example)
Current State (Q3 2023) Future State
• Primarily an append-only workload in shadow mode in • Improved Data Accessibility & Analytics: Support for
production. multiple formats doesn’t mean more pipelines and
• Only a few small tables with mutable workloads. duplicate data.
• Old version of Apache Hudi without metatable • Scalability & Flexibility: Easily accommodates future
enabled.
data growth and integration of new data sources.
Leverage any query engine or data processing engine.
• Cost Optimization: Leverages scalable cloud native
storage and elastic computing.

Required Capabilities Success Metrics

● Support ingestion, backﬁll and historical upserts. • 20% Reduced infrastructure costs?
● Highly available infrastructure with minimized • Enable interoperability with Apache Iceberg and Delta
downtime. Lake?
● Store the data in a open standards, future-proof • 30% Query performance boost?
format in cloud native technology. • Autoscale ingestion?
● Out of the box data automation to enable data
freshness, reduce data corruption and enhance
query performance through data skew reduction
and data “defragging”.
● Stored securely in AWS?
NEXT STEPS
Your Universal ● Identify Use Case
Data Lakehouse ● Free 1-month trial to get started
is Waiting For You ● Invite your team to learn more:
⁃ Custom demo
⁃ Detailed feature overview
⁃ Case studies
⁃ Pricing
⁃ Hudi details
⁃ Anything around data

Email gtm@onehouse.ai for exclusive access

Thank You
Your ETL transformations, your way
Pr
ev
i ew
*

Managed Incremental Pipelines

Spark Jobs
No-code, Low-code, or BYO-code to ingest
data in real-time and ELT workloads. Easy, fast experience over OSS Spark/AWS
EMR + open table format

Pr
Pr ev
ev i ew
ie *
w
*

SQL Endpoint
SQL Editor
Execute SQL Pipelines using popular tools:
Onehouse SQL editor for quick exploration
dbt, Metabase, Airﬂow, Hex, Notebooks, etc
Pr
ev
i
- Spark Jobs ew
*

● Submit Spark Jobs (Scala/Python) to

Onehouse Jobs API or via UI

● Executes on Onehouse Compute Runtime

(OCR) - 100% Spark compatible

● Isolate workloads using auto-scaling

serverless clusters that lower compute costs

● Job monitoring and history, log tracking

Pr
ev
i
- Onehouse SQL Endpoint ew
*

● Endpoint for popular external SQL tools:

○ dbt, Airflow (SQL pipelines)
○ Metabase, notebooks (data exploration)
● Seamless Onehouse integration
○ Observability in the Onehouse console
○ Managed, async optimizations on your
SQL-generated tables
○ Multi-catalog sync for your SQL-generated
tables to Snowflake, Glue, DataHub, etc.
● Reduces spend on premium cloud warehouse
compute
○ SQL functions to author pipelines running
merges/loads incrementally
Pr
ev
i
- Onehouse SQL Editor ew
*

● Author and run SQL queries directly through the

Onehouse SQL Editor
● Get quick analysis at a glance
● Backfill or update critical data
● Create new tables and experiment
Announcing 1.0 release
1. Secondary Indexing
a. 95% faster TPC-DS with secondary indexing that you can
create/drop asynchronously w/ simple SQL
2. Partial Updates
a. 2.6x perf and 85% less write amplification with MERGE
INTO modifying only the changed fields of a record
3. Logical partitioning
a. Postgres style expression indexes to treat partitions like
the coarse-grained indexes
4. Merge Modes
a. 1st-class support for: commit_time_ordering,
event_time_ordering
5. Non-blocking Concurrency
https://hudi.apache.org/blog/2024/12/16/announcing-hudi-1-0-0
a. Multiple writers and compaction of the same record
without blocking any involved processes
6. LSM timeline
a. Revamped timeline, allowing users to retain a large
amount of table history
Record Index performance
1. [Write performance]
a. Boosts index lookup 4 - 10x for large scale datasets
2. [Read performance]
a. On “EqualTo” or “IN” predicates on record key columns in Spark
b. 98% reduction in query latency in a 400GB dataset with 20,000 filegroups
3. [Efficient Storage]
a. Stored as a separate partition under metadata table
b. Record key to record location in HFile for fast update & lookup
c. Avoids cost associated with gathering data from table data files
d. Uses fewer file groups instead of all data files
e. No linear increase with table size

Index and Write latency speedup on a 1TB dataset, 200MB batch, random updates, Reduce SQL point lookup latency on a TPC-DS
Spark datasource! 10TB datasets, store_sales table, Spark!
Onehouse Table Optimizer
Operating Open Table Formats is hard…

● Apache Hudi requires maintenance for

cleaning, clustering, compaction, ﬁle-sizing, etc

● Many conﬁgs to tune parameters from:

○ Frequency, budgets, triggers, partition spread, parallelism,
retention, concurrency, etc

● Inline services compete with write operations

● Tuning your table service conﬁgs right can result

in 2-100x perf swings
Operating Hudi pipelines
EXISTING PIPELINES

Bronze Silver Gold

Inline resource contention

or complex operational burden

Onehouse Table Optimizer

Managed Hudi Table Services

EXISTING HUDI PIPELINES

Bronze Silver Gold

Onehouse
Hands-free optimizations
● Async + On-Demand
● Auto-scaling + Spot Nodes
● Off-peak hours execution
- Table Optimizer

→ Auto ﬁle-sizing: never worry about small-ﬁles again

→ Auto cleaning: enforce retention and clean versions and failed

commits

→ Zero-Conﬁg Compaction: Set it and forget it continuous

merging of Hudi MoR parquet base and avro log ﬁles

→ Adaptive Clustering: Incremental + global clustering w/ several

sorting algorithms including, linear, z-order, hilbert curves

→ Partition monitoring: monitoring and dashboards to identify

partition skew over time.
- Table Optimizer Architecture

Clustering Cleaning

Compaction Multi catalog sync

Table Services
Table Format Interop More coming soon ++

Infrastructure Serverless In-VPC Auto-scaling

Dataplane Spark clusters

Infra as Code APIs Monitoring/Alerting

Hudi Enterprise
Multiplexed compute
Support
- Table Services Impact

20-30% 10-50% 2-30x

Storage Savings Faster Writes Faster Queries
1. Cleaning enforce retention and 1. Async table services eliminate 1. Clustering - sorting data to
clean Hudi timeline versions writer compute contention accelerate query access

Universal Table Services

Interoperability
Clustering Cleaning
1. Multi catalog sync makes data analytics
ready in any engine
Compaction Multi catalog sync
2. XTable conversion produces table as
Hudi, Iceberg, or Delta
Table Format Interop More coming soon ++
- Infrastructure Impact

~80% 10-70%
Engineering time savings Compute Savings
1. Hands-Free Infrastructure 1. Advanced auto-scaling - Push table maintenance
a. Removes complex DIY tuning, conﬁguring, and to smaller skus or spot nodes for cost savings
troubleshooting Spark clusters and Hudi table
services
2. Enterprise monitoring and alerting
a. Dashboards w/ Hudi timeline, job performance,
and resource usage Infrastructure
b. Conﬁgurable alerts to monitor health of the
pipelines Serverless In-VPC Auto-scaling
3. 24/7 production support w/ Onehouse expert Dataplane Spark clusters
guidance for table optimization strategies
Infra as Code APIs Monitoring/Alerting

Hudi Enterprise
Multiplexed compute
Support
- Advanced Monitoring/Alerting
● Per job cost attribution tracking
● Advanced lag and latency detection with conﬁgurable alert thresholds
● Detailed stats of bytes written per operation type, cluster utilizations, failures, and so much more
● In depth timeline of table level operations to audit all writers and services that touch the table
● Critical Hudi metrics including partition skew, ﬁle size, compaction backlog, concurrent operations
- Table Optimizer Roadmap
Now Next

● Auto file-sizing: never worry about ● Compaction Accelerator: providing ● Auto indexing subsystem: given
small-files again out-of-the-box 2-10x perf write patterns, apply indexes for
improvements over OSS acceleration
● Auto cleaning: enforce retention and
clean versions and failed commits
● TTL management: set and enforce ● Fully-managed lock manager:
● Zero-Config Compaction: Set it and Data expiry lifetimes Hosted and auto-configured locking
forget it continuous merging of Hudi service
MoR parquet base and avro log files ● Compaction Balancer: Balances
● Adaptive Clustering: Incremental + compaction patterns across recent ● Savepoint/Restore: automatically
global clustering w/ several sorting partitions for cost/performance take savepoints with on-demand
algorithms including, linear, z-order, restore service
hilbert curves ● Intelligent Clustering: monitor storage
access patterns to automatically select ● Iceberg/Delta: native optimizations
● Partition monitoring: monitoring and
clustering strategies
dashboards to identify partition skew
over time.

Conﬁdential
LIVE DEMO
Thank You

Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
Data Lakehouse Architecture
No ratings yet
Data Lakehouse Architecture
11 pages
Lakehouse - Research Points
No ratings yet
Lakehouse - Research Points
7 pages
Day 1
No ratings yet
Day 1
10 pages
Internship
No ratings yet
Internship
17 pages
IBM - IBM Watsonx - Data
No ratings yet
IBM - IBM Watsonx - Data
15 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
Powering The Data Lakehouse
No ratings yet
Powering The Data Lakehouse
10 pages
Data Lakehouse: Survey and Experimental Study
No ratings yet
Data Lakehouse: Survey and Experimental Study
19 pages
Data Engineering for Professionals
No ratings yet
Data Engineering for Professionals
45 pages
AWS Data Ingestion Workshop Guide
No ratings yet
AWS Data Ingestion Workshop Guide
43 pages
Day 06
No ratings yet
Day 06
34 pages
Must Read SQ Lop Tim Ization Research Paper
No ratings yet
Must Read SQ Lop Tim Ization Research Paper
13 pages
B22DCVT246 Tran Van Huy
No ratings yet
B22DCVT246 Tran Van Huy
69 pages
Ebook: The Data Store For AI
No ratings yet
Ebook: The Data Store For AI
17 pages
WP Dremio Definitive Guide To The Data Lakehouse
No ratings yet
WP Dremio Definitive Guide To The Data Lakehouse
20 pages
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
No ratings yet
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
45 pages
SingleStore External Table
No ratings yet
SingleStore External Table
18 pages
Databricks Overview Deck
No ratings yet
Databricks Overview Deck
42 pages
Data Report Martin Inline Graphics R8 1
No ratings yet
Data Report Martin Inline Graphics R8 1
6 pages
Lake XM Ref33
No ratings yet
Lake XM Ref33
8 pages
Advanced Data Lakehouse Concepts - New
No ratings yet
Advanced Data Lakehouse Concepts - New
25 pages
Hadoop Ecosystem Overview and Tools
No ratings yet
Hadoop Ecosystem Overview and Tools
1 page
Google Cloud Analytics Lakehouse
No ratings yet
Google Cloud Analytics Lakehouse
47 pages
DB For Data Engineering Solution Sheet
No ratings yet
DB For Data Engineering Solution Sheet
2 pages
Lakehouse: A Unified Data Architecture
No ratings yet
Lakehouse: A Unified Data Architecture
9 pages
Building Batch Data Pipelines On Google Cloud
No ratings yet
Building Batch Data Pipelines On Google Cloud
18 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Data Report Martin Inline Graphics R7 PDF
No ratings yet
Data Report Martin Inline Graphics R7 PDF
6 pages
All About Data Warehouse
No ratings yet
All About Data Warehouse
35 pages
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
No ratings yet
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
41 pages
Cloudera's Open Data Lakehouse Guide
No ratings yet
Cloudera's Open Data Lakehouse Guide
12 pages
Making Data Warehousing Simple: Building A Just-In-Time Data Warehouse Platform With Databricks
No ratings yet
Making Data Warehousing Simple: Building A Just-In-Time Data Warehouse Platform With Databricks
9 pages
Building A Serverless Data Lakehouse From Spare Parts: Jacopo Ciro Luca
No ratings yet
Building A Serverless Data Lakehouse From Spare Parts: Jacopo Ciro Luca
9 pages
Venda Ram
No ratings yet
Venda Ram
6 pages
Databricks Data Engineer Associate Notes
100% (1)
Databricks Data Engineer Associate Notes
5 pages
Fabric Interview Guide
No ratings yet
Fabric Interview Guide
7 pages
Exploring Database Lakehouse Architecture Design Patterns: Best Practices and Considerations
No ratings yet
Exploring Database Lakehouse Architecture Design Patterns: Best Practices and Considerations
8 pages
Karthik (Project Details)
No ratings yet
Karthik (Project Details)
14 pages
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
100% (2)
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
57 pages
Module 13 FusionInsight HD Solution Overview
No ratings yet
Module 13 FusionInsight HD Solution Overview
57 pages
1 - Architecting For The Lakehouse
No ratings yet
1 - Architecting For The Lakehouse
115 pages
Unit 6-1
No ratings yet
Unit 6-1
128 pages
Curso Google Data Engineer
100% (1)
Curso Google Data Engineer
36 pages
IoT Module 5
No ratings yet
IoT Module 5
9 pages
Databricks Lakehouse for Enterprises
No ratings yet
Databricks Lakehouse for Enterprises
30 pages
Explain Databricks
No ratings yet
Explain Databricks
26 pages
Design A Data Warehouse - Columnar DB Design
No ratings yet
Design A Data Warehouse - Columnar DB Design
3 pages
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
No ratings yet
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
44 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
193 pages
Hadoop
No ratings yet
Hadoop
4 pages
The Modern ELT Stack To Win With Cloud Data Warehousing
No ratings yet
The Modern ELT Stack To Win With Cloud Data Warehousing
33 pages
Snowflake To Lakehouse Migration Assessment 5-23
100% (1)
Snowflake To Lakehouse Migration Assessment 5-23
22 pages
Leveraging Enterprise Data Warehousing (EDW) To The Lakehouse Architecture
No ratings yet
Leveraging Enterprise Data Warehousing (EDW) To The Lakehouse Architecture
36 pages
Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei
100% (1)
Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei
45 pages
Tugas Bhs Ing
No ratings yet
Tugas Bhs Ing
4 pages
Programmer 1 & 2 PDF
No ratings yet
Programmer 1 & 2 PDF
265 pages
Web Page Speed Optimization with Silo
No ratings yet
Web Page Speed Optimization with Silo
12 pages
Linux Install
No ratings yet
Linux Install
9 pages
New RS485 MODBUS Based Communication Protocol Between BMS and PCS V2.1 - 20190323
No ratings yet
New RS485 MODBUS Based Communication Protocol Between BMS and PCS V2.1 - 20190323
14 pages
SSC Gr10 ICT Q4 Module 4 WK 5 - v.01-CC-released-15June2021
No ratings yet
SSC Gr10 ICT Q4 Module 4 WK 5 - v.01-CC-released-15June2021
13 pages
Oracle PL/SQL Lab Guide
0% (1)
Oracle PL/SQL Lab Guide
38 pages
Exercise 11 DBMS
No ratings yet
Exercise 11 DBMS
5 pages
Overview of UML Diagrams
No ratings yet
Overview of UML Diagrams
14 pages
Creating Enquiries with NOFILE
No ratings yet
Creating Enquiries with NOFILE
2 pages
Invoice
No ratings yet
Invoice
1 page
Full Forms of Computer Terms
No ratings yet
Full Forms of Computer Terms
3 pages
Weblogic Server Monitoring Best Practices
No ratings yet
Weblogic Server Monitoring Best Practices
4 pages
Top 50 Computer Ques
No ratings yet
Top 50 Computer Ques
12 pages
TOPIC 4 Working With Personal Computers
No ratings yet
TOPIC 4 Working With Personal Computers
6 pages
Vlsi Soc Design Using Verilog HDL: Training Report ON
No ratings yet
Vlsi Soc Design Using Verilog HDL: Training Report ON
46 pages
CRC and Network Communication
No ratings yet
CRC and Network Communication
3 pages
UNIT-5-HDFS (Hadoop Distributed File System)
No ratings yet
UNIT-5-HDFS (Hadoop Distributed File System)
18 pages
Introduction To Microchip'S Mtouch™ Capacitive Sensing
No ratings yet
Introduction To Microchip'S Mtouch™ Capacitive Sensing
71 pages
Online Insurance Portal Development Guide
No ratings yet
Online Insurance Portal Development Guide
17 pages
Perfect Power LetRipp Manual
No ratings yet
Perfect Power LetRipp Manual
31 pages
Azure DevOps Interview QA
No ratings yet
Azure DevOps Interview QA
3 pages
Hard Drive Geometry Explained
No ratings yet
Hard Drive Geometry Explained
5 pages
Reality XP GNS WAAS Hardware Driver User's Guide
No ratings yet
Reality XP GNS WAAS Hardware Driver User's Guide
7 pages
Ushp 1st Assignment
No ratings yet
Ushp 1st Assignment
2 pages
GPS Tracker Catalogue
No ratings yet
GPS Tracker Catalogue
1 page
Diagnostic Interrupt OB82 S71500 CPU
No ratings yet
Diagnostic Interrupt OB82 S71500 CPU
3 pages
CLM Applied Automation HCIA Interface Manual sst5D0
No ratings yet
CLM Applied Automation HCIA Interface Manual sst5D0
112 pages
Railway Booking System Guide
No ratings yet
Railway Booking System Guide
14 pages
OnApp Integrated Storage Diagnostics Guide
No ratings yet
OnApp Integrated Storage Diagnostics Guide
7 pages

Onehouse

Uploaded by

Onehouse

Uploaded by

Onehouse Introduction

Supercharge performance and automate

What is your business use case?

What are your data sources?

How is data ingestion done to Hudi?

What query engines do you use?

Where do you have current challenges?

Cost Savings Performance Eng Productivity

Reliability Architecture Changes Hudi Expertise

Cost Savings Performance Eng Productivity

Reliability Architecture Changes Hudi Expertise

“ Onehouse is going to make broadly accessible

Aaron Schildkrout, Investor - Addition

Analytics and data science-ready

Developer Deep knowledge on Spark, No knobs tuning, with deep automatic

Time to Months Days to weeks

Maintenance DIY oncall, monitoring, version Onehouse handles everything

Data Streams LakeView & Control Plane Query Engines

Table Format Interop & Catalog Sync

Continuous Table SQL Spark/Python

Onehouse Compute Runtime

Serverless Compute Manager

Adaptive Workload Optimizations

High-performance Lakehouse I/O

Onehouse Cloud: Ingestion/ETL Onehouse Table Optimizer

Onehouse LakeView (Free!) Enterprise Hudi Support + Services

Industry Vertical: Retail

Industry Vertical: Utilities

Industry Vertical: Finance

Sample Customer Workload

AWS $24K(2) AWS $18K(2)

Downtime Avoidance Engineering hours saved

80% 80% 5X+ 1.25M

90% 2-10X powering Notion AI

Reduction in storage costs Typical query performance

Incremental Change Data

Table Services Compaction Multi catalog sync

Table Format Interop More coming soon ++

Infra as Code APIs Monitoring/Alerting

Multiplexed compute Hudi Enterprise Support

Identify Use Case Technical Trial Production

Executive Alignment Free trial

Required Capabilities Success Metrics

Email gtm@onehouse.ai for exclusive access

Managed Incremental Pipelines

● Submit Spark Jobs (Scala/Python) to

● Executes on Onehouse Compute Runtime

● Isolate workloads using auto-scaling

● Job monitoring and history, log tracking

● Endpoint for popular external SQL tools:

● Author and run SQL queries directly through the

● Apache Hudi requires maintenance for

● Many conﬁgs to tune parameters from:

● Inline services compete with write operations

● Tuning your table service conﬁgs right can result

Bronze Silver Gold

Inline resource contention

or complex operational burden

Managed Hudi Table Services

Bronze Silver Gold

→ Auto ﬁle-sizing: never worry about small-ﬁles again

→ Auto cleaning: enforce retention and clean versions and failed

→ Zero-Conﬁg Compaction: Set it and forget it continuous

→ Adaptive Clustering: Incremental + global clustering w/ several

→ Partition monitoring: monitoring and dashboards to identify

Compaction Multi catalog sync

Infrastructure Serverless In-VPC Auto-scaling

Infra as Code APIs Monitoring/Alerting

20-30% 10-50% 2-30x

Universal Table Services

You might also like