Hadoop 3

Apache Hadoop
Apache Hadoop
04/03/2015 01:55 PM 1 of 52
Apache Hadoop
Unit 17: HBase - Contd

Unit 17: HBase - Contd
HBase vs RDBMS
Column Families
Access HBase Data
HBase API
Runtime modes
Running HBase
04/03/2015 01:55 PM 2 of 52
Apache Hadoop
HBase: Keys and Column Families
Keys and Column Families

Key
Byte array
Serves as the primary key for the table
Indexed for fast lookup
Column Family
Has a name (string)
Contains one or more related columns
Column
Belongs to one column family
Included inside the row
familyName:columnName
04/03/2015 01:55 PM 3 of 52
Apache Hadoop
Column Families
Version Number & Value

Version Number
Unique within each key
By default, System's timestamp
Data type is Long
Value (Cell)
Byte array
04/03/2015 01:55 PM 4 of 52
Apache Hadoop
Notes on Data Model

HBase schema consists of several Tables
Each table consists of a set of Column Families
Columns are not part of the schema
HBase has Dynamic Columns
Because column names are encoded inside the cells
Different cells can have different columns
Notes on Data Model (2)

The version number can be user-supplied
Even does not have to be inserted in increasing order
Version number are unique within each key
Table can be very sparse
Many cells are empty
Keys are indexed as the primary key
04/03/2015 01:55 PM 5 of 52
Apache Hadoop
HBase Physical Model

Each column family is stored in a separate file (called HTables)
Key & Version numbers are replicated with each column family
Empty cells are not stored
HBase maintains a multi-level index on values:
Column Families
Different sets of columns may have different properties and access patterns
Configurable by column family:
Compression (none, gzip, LZO)
Version retention policies
Cache priority
CFs stored separately on disk: access one without wasting IO on the other
04/03/2015 01:55 PM 6 of 52
Apache Hadoop
HBase Deployment
HBase vs HDFS
Plain HDFS/MR HBase

Write pattern Append-only Random write, bulk incremental
Random read, small range scan or table
Read pattern Full table scan, partition table scan
scan
Hive (SQL)
Very good 4-5x slower
performance
Do-it-yourself / TSV / SequenceFile /
Structured storage Sparse column-family data model
Avro / ?
Max data size 30 PB ~1PB
04/03/2015 01:55 PM 7 of 52
Apache Hadoop
HBase vs RDBMS
RDBMS HBase
Data layout Row-oriented Column-family-oriented
Transactions Multi-row ACID Single row only
Query language SQL get/put/scan/etc
Security Authentication / Authorization Work in progress
Indexes On arbitrary columns Row-key only
Max data size TBs ~1PB
Read/write throughput limits 1000s queries/second Millions of queries/second
When to use HBase

You need random write, random read or both (but not neither)
You need to do many thousands of operations per second on multiple TB of data
Your access patterns are well-known and simple
04/03/2015 01:55 PM 8 of 52
Apache Hadoop
Conclusion
Summary
Unit 18: Working with Flume

Flume - How it works
Flume Complex Flow - Calculation/ Multiplexing
04/03/2015 01:55 PM 9 of 52
Apache Hadoop
What is Flume?
Collection, Aggregation of streaming Event Data
Typically used for log data
Significant advantages over ad-hoc solutions
Reliable, Scalable, Manageable, Customizable and High Performance
Declarative, Dynamic Configuration
Contextual Routing
Feature rich
Fully extensible
Core Concepts: Event

An Event is the fundamental unit of data transported by Flume from its point of origination to its final
destination
Event is a byte array payload accompanied by optional headers
Payload is opaque to Flume
Headers are specified as an unordered collection of string key-value pairs, with keys being unique
across the collection
Headers can be used for contextual routing
04/03/2015 01:55 PM 10 of 52
Apache Hadoop
Core Concepts: Client

An entity that generates events and sends them to one or more Agents
Example:
Flume log4j Appender
Custom Client using Client SDK (org.apache.flume.api)
Decouples Flume from the system where event data is consumed from
Not needed in all cases
Core Concepts: Agent

A container for hosting Sources, Channels, Sinks and other components that enable the
transportation of events from one place to another
Fundamental part of a Flume flow
Provides Configuration, Life-Cycle Management, and Monitoring Support for hosted components
04/03/2015 01:55 PM 11 of 52
Apache Hadoop
Typical Aggregation Flow
Core Concepts: Source

An active component that receives events from a specialized location or mechanism and places it on
one or Channels
Different Source types:
Specialized sources for integrating with well-known systems. Example: Syslog, Netcat
Auto-Generating Sources: Exec, SEQ
IPC sources for Agent-to-Agent communication: Avro
Require at least one channel to function
04/03/2015 01:55 PM 12 of 52
Apache Hadoop
Core Concepts: Channel

A passive component that buffers the incoming events until they are drained by Sinks
Different Channels offer different levels of persistence:
Memory Channel: volatile
File Channel: backed by WAL implementation
JDBC Channel: backed by embedded Database
Channels are fully transactional
Provide weak ordering guarantees
Can work with any number of Sources and Sinks
Core Concepts: Sink

An active component that removes events from a Channel and transmits them to their next hop
destination
Different types of Sinks:
Terminal sinks that deposit events to their final destination. For example: HDFS, HBase
Auto-Consuming sinks. For example: Null Sink
IPC sink for Agent-to-Agent communication: Avro
Require exactly one channel to function
04/03/2015 01:55 PM 13 of 52
Apache Hadoop
Flow Reliability
Reliability based on:
Transactional Exchange between Agents
Persistence Characteristics of Channels in the Flow
Also Available:
Built-in Load balancing Support
Built-in Failover Support
Flow Reliability (2)
04/03/2015 01:55 PM 14 of 52
Apache Hadoop
Flow Handling
Channels decouple impedance of upstream and downstream
Upstream burstiness is damped by channels
Downstream failures are transparently absorbed by channels
Sizing of channel capacity is key in realizing these benefits
Configuration
Java Properties File Format
# Comment line
key1 = value
key2 = multi-line \
value
Hierarchical, Name Based Configuration
agent1.channels.myChannel.type = FILE
agent1.channels.myChannel.capacity = 1000
Uses soft references for establishing associations
agent1.sources.mySource.type = HTTP
agent1.sources.mySource.channels = myChannel
04/03/2015 01:55 PM 15 of 52
Apache Hadoop
Configuration (2)
Global List of Enabled Components
agent1.soruces = mySource1 mySource2

agent1.sinks = mySink1 mySink2
agent1.channels = myChannel
...
agent1.sources.mySource3.type = Avro
...
Custom Components get their own namespace
agent1.soruces.mySource1.type = org.example.source.AtomS
ource
agent1.sources.mySource1.feed = http://feedserver.com/ne
ws.xml
agent1.sources.mySource1.cache-duration = 24
...
Configuration (3)
agent1.properties: Active Agent Components(Sources, Channels, Sinks)
# Active components
agent1.sources = src1
agent1.channels = ch1
agent1.sinks = sink1
Individual Component Configuration
# Define and configure src1

agent1.sources.src1.type = netcat
agent1.sources.src1.channels = ch1
agent1.sources.src1.bind = 127.0.0.1
agent1.sources.src1.port = 10112
# Define and configure sink1

agent1.sinks.sink1.type = logger
agent1.sinks.sink1.channel = ch1
04/03/2015 01:55 PM 16 of 52
Apache Hadoop
# Define and configure ch1

agent1.channels.ch1.type = memory
04/03/2015 01:55 PM 17 of 52
Apache Hadoop
Configuration (4)
A configuration file can contain configuration information for many Agents
Only the portion of configuration associated with the name of the Agent will be loaded
Components defined in the configuration but not in the active list will be ignored
Components that are misconfigured will be ignored
Agent automatically reloads configuration if it changes on disk
Configuration (5)
04/03/2015 01:55 PM 18 of 52
Apache Hadoop
Contextual Routing
Achieved using Interceptors and Channel Selectors
Contextual Routing (2)

Interceptor is a component applied to a source in pre-specified order to enable decorating and
filtering of events where necessary
Built-in Interceptors allow adding headers such as timestamps, hostname, static markers etc
Custom interceptors can introspect event payload to create specific headers where necessary
04/03/2015 01:55 PM 19 of 52
Apache Hadoop
Contextual Routing (3)

Channel Selector allows a Source to select one or more Channels from all the Channels that the
Source is configured with based on preset criteria
Built-in Channel Selectors:
* Replicating: for duplicating the events

* Multiplexing: for routing based on headers
Channel Selector Configuration

Applied via Source configuration under namespace "selector"
# Active components
agent1.channels = ch1 ch2
agent1.sinks = sink1 sink2
# Configure src1
agent1.sources.src1.type = AVRO
agent1.sources.src1.channels = ch1 ch2
agent1.sources.src1.selector.type = multiplexing
agent1.sources.src1.selector.header = priority
agent1.sources.src1.selector.mapping.high = ch1
agent1.sources.src1.selector.mapping.low = ch2
agent1.sources.src1.selector.default = ch2
...
04/03/2015 01:55 PM 20 of 52
Apache Hadoop
Contextual Routing
Terminal Sinks can directly use Headers to make destination selections
HDFS Sink can use headers values to create dynamic path for files that the event will be
added to
Some headers such as timestamps can be used in a more sophisticated manner
Custom Channel Selector can be used for doing specialized routing where necessary
Load Balancing and Failover

Sink Processor
A Sink Processor is responsible for invoking one sink from an assigned group of sinks
Built-in Sink Processors:
Load Balancing Sink Processor – using RANDOM, ROUND_ROBIN or Custom selection

algorithm
Failover Sink Processor
Default Sink Processor
04/03/2015 01:55 PM 21 of 52
Apache Hadoop
Sink Processor
Invoked by Sink Runner
Acts as a proxy for a Sink
Sink Processor Configuration

Applied via "Sink Groups"
Sink Groups declared at global level:
# Active components
agent1.channels = ch1
agent1.sinks = sink1 sink2 sink3
agent1.sinkgroups = foGroup
# Configure foGroup
agent1.sinkgroups.foGroup.sinks = sink1 sink3
agent1.sinkgroups.foGroup.processor.type = failover
agent1.sinkgroups.foGroup.processor.priority.sink1 = 5
agent1.sinkgroups.foGroup.processor.priority.sink3 = 10
agent1.sinkgroups.foGroup.processor.maxpenalty = 1000
04/03/2015 01:55 PM 22 of 52
Apache Hadoop
Sink Processor Configuration (2)

A Sink can exist in at most one group
A Sink that is not in any group is handled via Default Sink Processor
Caution:
Removing a Sink Group does not make the sinks inactive!
Summary
Clients send Events to Agents
Agents hosts number Flume components – Source, Interceptors, Channel Selectors, Channels, Sink
Processors, and Sinks
Sources and Sinks are active components, where as Channels are passive
Source accepts Events, passes them through Interceptor(s), and if not filtered, puts them on
channel(s) selected by the configured Channel Selector
Sink Processor identifies a sink to invoke, that can take Events from a Channel and send it to its next
hop destination
Channel operations are transactional to guarantee one-hop delivery semantics
Channel persistence allows for ensuring end-to-end reliability
04/03/2015 01:55 PM 23 of 52
Apache Hadoop
Conclusion
Summary
Unit 19: Cassandra

What is Apache Cassandra?
How does Cassandra work
Installing Cassandra
Moving data to or from other databases
COPY command
Cassandra bulk loader (sstableloader)
OpsCenter
Querying Cassandra
Using DevCenter and cqlsh
04/03/2015 01:55 PM 24 of 52
Apache Hadoop
What is Cassandra?
Open-source database management system (DBMS)
Several key features of Cassandra differentiate it from other similar systems
History of Cassandra
Cassandra was created to power the Facebook Inbox Search
Facebook open-sourced Cassandra in 2008 and became an Apache Incubator project
In 2010, Cassandra graduated to a top-level project, regular update and releases followed
04/03/2015 01:55 PM 25 of 52
Apache Hadoop
Motivation and Function

Designed to handle large amount of data across multiple servers
There is a lot of unorganized data out there
Easy to implement and deploy
Mimics traditional relational database systems, but with triggers and lightweight transactions
Raw, simple data structures
General Design Features

Emphasis on performance over analysis
Still has support for analysis tools such as Hadoop
Organization
Rows are organized into tables
First component of a table's primary key is the partition key
Rows are clustered by the remaining columns of the key
Columns may be indexed separately from the primary key
Tables may be created, dropped, altered at runtime without blocking queries
Language
CQL (Cassandra Query Language) introduced, similar to SQL (flattened learning curve)
04/03/2015 01:55 PM 26 of 52
Apache Hadoop
Peer to Peer Cluster

Decentralized design
Each node has the same role
No single point of failure
Avoids issues of master-slave DBMS's
No bottlenecking
Fault Tolerance/Durability
Failures happen all the time with multiple nodes
Hardware Failure
Bugs
Operator error
Power Outage, etc.
Solution: Buy cheap, redundant hardware, replicate, maintain consistency
04/03/2015 01:55 PM 27 of 52
Apache Hadoop
Fault Tolerance/Durability (2)

Replication
Data is automatically replicated to multiple nodes
Allows failed nodes to be immediately replaced
Distribution of data to multiple data centers
An entire data center can go down without data loss occurring
Performance
Core architectural designs allow Cassandra to outperform its competitors
Very good read and write throughputs
Consistently ranks as fastest amongst comparable NoSql DBMS's with large data sets
04/03/2015 01:55 PM 28 of 52
Apache Hadoop
Scalability
Read and write throughput increase linearly as more machines are added
Comparisons
Amazon
Apache Cassandra Google Big Table
DynamoDB
Storage Type Column Column Key-Value
Large database
Best Use Write often, read less Designed for large scalability
solution
Concurrency
MVCC Locks ACID
Control
High Availability Partition Consistency High Availability Consistency High
Characteristics
Tolerance Persistence Partition Tolerance Persistence Availability
Key Point – Cassandra offers a healthy cross between BigTable and Dynamo
04/03/2015 01:55 PM 29 of 52
Apache Hadoop
Key-Value Model
Cassandra is a column oriented NoSQL system
Column families: sets of key-value pairs
column family as a table and key-value pairs as a row (using relational database analogy)
A row is a collection of columns labeled with a name
Cassandra Row
04/03/2015 01:55 PM 30 of 52
Apache Hadoop
Cassandra Row (2)

The value of a row is itself a sequence of key-value pairs
Such nested key-value pairs are columns
key = column name
A row must contain at least 1 column
Example of Columns
04/03/2015 01:55 PM 31 of 52
Apache Hadoop
Column names storing values

key: User ID
column names store tweet ID values
values of all column names are set to "-" (empty byte array) as they are not used
Key Space
A Key Space is a group of column families together
It is only a logical grouping of column families and provides an isolated scope for names
04/03/2015 01:55 PM 32 of 52
Apache Hadoop
Comparing Cassandra (C*) and RDBMS

With RDBMS, a normalized data model is created without considering the exact queries
SQL can return almost anything though Joins
With C*, the data model is designed for specific queries
schema is adjusted as new queries introduced
C*: NO joins, relationships, or foreign keys
a separate table is leveraged per query
data required by multiple tables is denormalized across those tables
Cassandra Query Language - CQL

creating a keyspace - namespace of tables
CREATE KEYSPACE demo

WITH replication = {‘class’: ’SimpleStrategy’,
replication_factor’: 3};
To use namespace:
USE demo;
04/03/2015 01:55 PM 33 of 52
Apache Hadoop
Cassandra Query Language - CQL (2)

Creating tables:
CREATE TABLE users(

email varchar,
bio varchar,
birthday timestamp,
active boolean,
PRIMARY KEY (email));
CREATE TABLE tweets(

email varchar,
time_posted timestamp,
tweet varchar,
PRIMARY KEY (email, time_posted));

Inserting data:
INSERT INTO users (email, bio, birthday, active)

VALUES ('john.doe@example.com', 'Example Teammate', 51651360
0000, true);
timestamp fields are specified in milliseconds since epoch
04/03/2015 01:55 PM 34 of 52
Apache Hadoop

Querying tables
SELECT expression reads one or more records from Cassandra column family and returns a
result-set of rows
SELECT * FROM users;

SELECT email FROM users WHERE active = true;
Cassandra Architecture Overview

Cassandra was designed with the understanding that system/ hardware failures can and do occur
Peer-to-peer, distributed system
All nodes are the same
Data partitioned among all nodes in the cluster
Custom data replication to ensure fault tolerance
Read/Write-anywhere design
04/03/2015 01:55 PM 35 of 52
Apache Hadoop
Google BigTable & Amazon Dynamo

Google BigTable - data model
Column Families
Memtables
SSTables
Amazon Dynamo - distributed systems technologies
Consistent hashing
Partitioning
Replication
One-hop routing
Transparent Elasticity
Nodes can be added and removed from Cassandra online, with no downtime being experienced
04/03/2015 01:55 PM 36 of 52
Apache Hadoop
Transparent Scalability
Addition of Cassandra nodes increases performance linearly and ability to manage TB's-PB's of data
High Availability
Cassandra, with its peer-to-peer architecture has no single point of failure
04/03/2015 01:55 PM 37 of 52
Apache Hadoop
Multi-Geography / Zone Aware

Cassandra allows a single logical database to span 1-N datacenters that are geographically
dispersed
Also supports a hybrid on-premise / Cloud implementation
Data Redundancy
Cassandra allows for customizable data redundancy so that data is completely protected
Also supports rack awareness (data can be replicated between different racks to guard against
machine/rack failures).
Uses 'Zookeeper' to choose a leader which tells nodes the range they are replicas for
04/03/2015 01:55 PM 38 of 52
Apache Hadoop
Partitioning
Nodes are logically structured in Ring Topology
Hashed value of key associated with data partition is used to assign it to a node in the ring
Hashing rounds off after certain value to support ring structure
Lightly loaded nodes moves position to alleviate highly loaded nodes
Partitioning & Replication
04/03/2015 01:55 PM 39 of 52
Apache Hadoop
Gossip Protocols
Used to discover location and state information about the other nodes participating in a Cassandra
cluster
Network Communication protocols inspired for real life rumor spreading
Periodic, Pairwise, inter-node communication
Low frequency communication ensures low cost
Random selection of peers
Gossip Protocols example

Example – Node A wishes to search for pattern in data
Round 1 – Node A searches locally and then gossips with node B
Round 2 – Node A, B gossips with C and D
Round 3 – Nodes A, B, C and D gossips with 4 other nodes
Round by round doubling makes protocol very robust
04/03/2015 01:55 PM 40 of 52
Apache Hadoop
Failure Detection
Gossip process tracks heartbeats from other nodes both directly and indirectly
Node Fail state is given by variable Φ
tells how likely a node might fail (suspicion level) instead of simple binary value (up/down)
This type of system is known as Accrual Failure Detector
Takes into account network conditions, workload, or other conditions that might affect perceived
heartbeat rate
A threshold for Φ tells is used to decide if a node is dead
If node is correct, Φ will be constant set by application
Generally, Φ(t) = 0
Write Operation Stages

Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
04/03/2015 01:55 PM 41 of 52
Apache Hadoop
Write Operations
Commit Log
First place a write is recorded
Crash recovery mechanism
Write not successful until recorded in commit log
Once recorded in commit log, data is written to Memtable
Memtable
Data structure in memory
Once memtable size reaches a threshold, it is flushed (appended) to SSTable
Several may exist at once (1 current, any others waiting to be flushed)
First place read operations look for data
SSTable
Kept on disk
Immutable once written
Periodically compacted for performance
Write Operations (2)
04/03/2015 01:55 PM 42 of 52
Apache Hadoop
Consistency
Read Consistency
Number of nodes that must agree before read request returns
ONE to ALL
Write Consistency
Number of nodes that must be updated before a write is considered successful
ANY to ALL
At ANY, a hinted handoff is all that is needed to return
QUORUM
Commonly used middle-ground consistency level
Defined as (replication_factor / 2) 1
Write Consistency (ONE)
INSERT INTO table (column1, ...) VALUES (value1, ...) USING

CONSISTENCY ONE
04/03/2015 01:55 PM 43 of 52
Apache Hadoop
Write Consistency (QUORUM)

CONSISTENCY QUORUM
Write Operations: Hinted Handoff

Write intended for a node that's offline
An online node, processing the request, makes a note to carry out the write once the node comes
back online
04/03/2015 01:55 PM 44 of 52
Apache Hadoop
Hinted Handoff

CONSISTENCY ANY
Note: Doesn't not count toward consistency level (except ANY)
Delete Operations
Tombstones
On delete request, records are marked for deletion
Similar to 'Recycle Bin'
Data is actually deleted on major compaction or configurable timer
04/03/2015 01:55 PM 45 of 52
Apache Hadoop
Compaction
Compaction runs periodically to merge multiple SSTables
Reclaims space
Creates new index
Merges keys
Combines columns
Discards tombstones
Improves performance by minimizing disk seeks
Two types
Major
Read-only
Compaction (2)
04/03/2015 01:55 PM 46 of 52
Apache Hadoop
Compaction (3)
Anti-Entropy
Ensures synchronization of data across nodes
Compares data checksums against neighboring nodes
Uses Merkle trees (hash trees)
Snapshot of data sent to neighboring nodes
Created and broadcasted on every major compaction
If two nodes take snapshots within TREE_STORE_TIMEOUT of each other, snapshots are
compared and data is synced
04/03/2015 01:55 PM 47 of 52
Apache Hadoop
Merkle Tree
Read Operations
Read Repair
On read, nodes are queried until the number of nodes which respond with the most recent
value meet a specified consistency level from ONE to ALL
If the consistency level is not met, nodes are updated with the most recent value which is then
returned
If the consistency level is met, the value is returned and any nodes that reported old values
are then updated
04/03/2015 01:55 PM 48 of 52
Apache Hadoop
Read Repair
SELECT * FROM table USING CONSISTENCY ONE
Read Operations: Bloom Filters

Bloom filters provide a fast way of checking if a value is not in a set
04/03/2015 01:55 PM 49 of 52
Apache Hadoop
Read
Cassandra Advantages
Perfect for time-series data
High performance
Decentralization
Nearly linear scalability
Replication support
No single points of failure
MapReduce support
04/03/2015 01:55 PM 50 of 52
Apache Hadoop
Cassandra Weaknesses
No referential integrity
No concept of JOIN
Querying options for retrieving data are limited
Sorting data is a design decision
No GROUP BY
No support for atomic operations
If operation fails, changes can still occur
First think about queries, then about data model
Cassandra: Points to Consider

Cassandra is designed as a distributed database management system
Use it when you have a lot of data spread across multiple servers
Cassandra write performance is always excellent, but read performance depends on write patterns
It is important to spend enough time to design proper schema around the query pattern
Having a high-level understanding of some internals is a plus
Ensures a design of a strong application built atop Cassandra
04/03/2015 01:55 PM 51 of 52
Apache Hadoop
Conclusion
Summary
04/03/2015 01:55 PM 52 of 52

Hadoop 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop 3

Uploaded by

Copyright:

Available Formats

Apache Hadoop

Unit 17: HBase - Contd

Access HBase Data

HBase: Keys and Column Families

Keys and Column Families

Serves as the primary key for the table

Indexed for fast lookup

Has a name (string)

Contains one or more related columns

Belongs to one column family

Included inside the row

Version Number & Value

Unique within each key

By default, System's timestamp

Data type is Long

Notes on Data Model

Each table consists of a set of Column Families

Columns are not part of the schema

HBase has Dynamic Columns

Because column names are encoded inside the cells

Different cells can have different columns

Notes on Data Model (2)

Even does not have to be inserted in increasing order

Version number are unique within each key

Table can be very sparse

Many cells are empty

Keys are indexed as the primary key

HBase Physical Model

Empty cells are not stored

HBase maintains a multi-level index on values:

Configurable by column family:

Compression (none, gzip, LZO)

Version retention policies

Plain HDFS/MR HBase

When to use HBase

You need to do many thousands of operations per second on multiple TB of data

Your access patterns are well-known and simple

Unit 18: Working with Flume

Flume Complex Flow - Calculation/ Multiplexing

Typically used for log data

Significant advantages over ad-hoc solutions

Reliable, Scalable, Manageable, Customizable and High Performance

Declarative, Dynamic Configuration

Core Concepts: Event

Event is a byte array payload accompanied by optional headers

Payload is opaque to Flume

Headers can be used for contextual routing

Core Concepts: Client

Flume log4j Appender

Custom Client using Client SDK (org.apache.flume.api)

Not needed in all cases

Core Concepts: Agent

Fundamental part of a Flume flow

Typical Aggregation Flow

Core Concepts: Source

Different Source types:

Auto-Generating Sources: Exec, SEQ

IPC sources for Agent-to-Agent communication: Avro

Require at least one channel to function

Core Concepts: Channel

Different Channels offer different levels of persistence:

Memory Channel: volatile