You are on page 1of 15

BSD Kqueue – A generic and scalable

event notification facility

- Narasimha Datta
Contents

Introduction

Limitations of the current model

Problems with poll/select

Kqueue Design Goals

Kqueue API
Introduction

Applications are frequently event-driven, i.e., most
of their work involves responding to external events

Performance of an application thus depends on
efficient and scalable event processing

Currently FreeBSD provides two system calls for
event delivery, namely select and poll
Limitations of the current model

Select and poll have performance limitations as the number
of monitored file descriptors grows large

Aside from events occurring on open file descriptors, an
application may be interested in other kinds of events such
as:

Completion of an asynchronous I/O request

Delivery of a signal

Changes to a file in the file system

Exiting of a process

None of these other event notification mechanisms are
efficiently implemented

Multiple programming interfaces leading to complexity
Problems with poll/select

Poll and select require an application to pass in the
entire list of descriptors to be monitored for every
call.

This requires two memory copies:

On call, the kernel copies the requested fd list from user
memory to kernel memory.

On return, the kernel copies the active fd list from kernel
memory to user memory.

For large fd lists where only a few of the fds are
active at any given time, most of the memory copies
are unnecessary.
Problems with poll/select

Multiple passes over the descriptor list required:

Kernel makes one pass to look for pending events

If poll/select sleep, then on wakeup, the kernel makes another pass
to record the active descriptors

User application makes one pass to determine the active
descriptors.

Each pass is an O(N) activity, where N is the number of
descriptors – this doesn't scale well.

Kernel needs to allocate memory for large lists.

The central problem is that poll/select are stateless by
design, i.e., the kernel does not remember the set of
monitored descriptors between calls.
Kqueue Design Goals

The facility should be efficient and scalable to a large
number of descriptors of the order of several thousand.

The system should be flexible and capable of handling
several types of events.

The programming interface should be simple, easy to
understand and allow existing poll/select-based applications
to be easily ported to the new interface.

Additional information for each event should be provided as
far as possible.
 For instance, a read event on a socket should indicate how many
bytes can be read by the application without blocking.
Kqueue Design Goals

The mechanism should be reliable and should not silently
fail or return an inconsistent state to the application.
 For example, memory allocation should be done at the time of the
system call rather than at the time of event occurrence.
 Coalescing of several events into one in order to avoid losing
events.

Events are considered to be “level-triggered” rather than
“edge-triggered”.
 For example, if the application does not consume all data in a
socket buffer, then the remaining data will be returned in the next
call.
Kqueue Design Goals

The API should be correct; in other words, events should be
delivered only when applicable.

For instance, if the application closes a socket before any pending
data in the socket buffer is delivered, then the event should be
discarded.
 Similarly if the application registers a socket read event after some
data has arrived in the socket buffer, then the event must be
delivered (level-triggering).

The system should provide multiple event notification
channels per process.
 Enables library code to use the API without fear of conflicting
with the linking program.
 This facility is not available with signal handlers and X-Windows
event loops.
Kqueue API

Two new system calls are introduced, kqueue and
kevent.

int kqueue(void)

Creates a new event queue where the application registers for
and retrieves event notifications.

Return value is a regular descriptor that can be passed to poll,
select or registered in another kqueue.

In the latter case, an event is delivered on the kqueue
descriptor if an event is ready to be returned by the original
kqueue.

This also enables the application to construct a hierarchy of
kqueues, i.e., determine which kqueue will be processed first.
Kqueue API

struct kevent {
uintptr_t ident; // identifier for event
short filter; // filter for event
u_short flags; // action flags for kqueue
u_int fflags; // filter flag value
intptr_t data; // filter data value
void *udata; // opaque identifier
};

int kevent(int kq, const struct kevent *changelist,
int nchanges, struct kevent *eventlist, int
nevents, const struct timespec *timeout);

EV_SET(&kev, ident, filter, flags, fflags, data,
udata)
Kqueue API

The kevent call is used by the application to register
for new events (using the changelist) and retrieve
event notifications (from the eventlist).

This eliminates having two separate system calls.

The kevent call returns the number of events actually
delivered.

The timeout parameter behaves the same way as in
poll.

A NULL pointer indicates that the call should block until
an event is ready.

A zero valued structure directs the call to check for
pending events and return immediately.
Kqueue API

The flags member is a set of bit flag values which
can be one of the following: EV_ADD,
EV_DELETE, EV_ENABLE, EV_DISABLE,
EV_CLEAR, EV_ONESHOT (input flags) or
EV_EOF, EV_ERROR (output flags).

The ident member is usually the file descriptor or
process identifier or signal number.

The filter member can be one of EVFILT_READ,
EVFILT_WRITE, EVFILT_AIO,
EVFILT_VNODE, EVFILT_PROC,
EVFILT_SIGNAL or EVFILT_TIMER.
Kqueue API

For the case of EVFILT_READ and
EVFILT_WRITE, the data member of the kevent
structure will contain the number of bytes that can be
read or written; the EV_EOF flag will be set
irrespective of whether data is available or not.

For the case of EVFILT_VNODE, the fflags member
is a set of bit flags used to indicate the type of action
that has occurred on a file or directory which could
be NOTE_DELETE, NOTE_WRITE,
NOTE_EXTEND, NOTE_ATTRIB, NOTE_LINK
or NOTE_RENAME.
Kqueue API

If the filter is EVFILT_PROC, then the fflags
member can be NOTE_EXIT, NOTE_FORK,
NOTE_EXEC, NOTE_TRACK (input/output flags)
or NOTE_CHILD, NOTE_TRACKERR (output
flags).

The NOTE_TRACK flag is used to track a process
across its children; the parent process returns with
NOTE_TRACK set in the flags field whereas the
child process returns with NOTE_CHILD set in the
fflags field and the parent PID in the data field.

You might also like