include people following each other on Twitter, status andlocation updates for friends, customized news feeds, and soon. While these instances share a common problem formu-lation, the specifics of scale, update rates, skew, etc. varywidely, and a good optimization framework is required tobuild a robust platform.We can think of the network as a relation
Connection-Network (Producer,Consumer)
. Typically, the
connection network
is stored explicitly in a form that supports effi-cient lookup by producer, consumer or both. For exam-ple, to push an event for producer
p
j
to interested con-sumers, we must look up
p
j
in the producer network andretrieve the set of consumers following that event, which is
{
c
i
:
f
ij
∈
F
}
. In contrast, to pull events for a consumer
c
i
, we need to look up
c
i
in the network and retrieve theset of producers for that consumer, which is
{
p
j
:
f
ij
∈
F
}
.In this latter case, we may actually define the relation as
ConnectionNetwork(Consumer,Producer)
to support clus-tering by Consumer. If we want to support both accesspaths (via producer and via consumer), we must build anindex in addition to the
ConnectionNetwork
relation.Each producer
p
j
generates a stream of events, whichwe can model as a producer events relation
PE
j
(EventID,Timestamp, Payload)
(i.e., there is one relation per pro-ducer). When we want to show a user his feed, we mustexecute a
feed query
over the
PE
j
relations. There are twopossibilities for the feed query. The first is that the con-sumer wants to see the most recent
k
events across all of theproducers he follows. We call this option
global coherency
and define the feed query as:
Q1.
σ
(
k
most recent events
)
∀
j
:
f
ij
∈
F
PE
j
A second possibility is that we want to retrieve
k
eventsper-producer, to help ensure diversity of producers in theconsumer’s view. We call this option
per-producer co-herency
and define the feed query as:
Q2.
∀
j
:
f
ij
∈
F
σ
(
k
most recent events
)
PE
j
Further processing is needed to then narrow down the per-producer result to a set of
k
events, as described in the nextsection. We next examine the question of when we mightprefer global- or per-producer coherency.
2.2 Consumer feeds
We now consider the properties of consumer
feeds
. A feedquery is executed whenever the consumer logs on or refreshestheir page. We may also automatically retrieve a consumer’supdated feed, perhaps using Ajax, Flash or some other tech-nology. The feed itself is a display of an ordered collectionof events from one or more of the producers followed bythe user. A feed typically shows only the
N
most recentevents, although a user can usually request more previousevents (e.g., by clicking“next”). We identify several proper-ties which capture users’ expectations for their feed:
•
Time-ordered:
Events in the feed are displayed in times-tamp order, such that for any two events
e
1
and
e
2
, if Timestamp(
e
1
)
<
Timestamp(
e
2
), then
e
1
precedes
e
2
in the feed
1
.
1
Note that many sites show recent events at the top of the
•
Gapless:
Events from a particular producer are dis-played without gaps, i.e., if there are two events
e
1
and
e
2
from producer
P
,
e
1
precedes
e
2
in the feed, and thereis no event from
P
in the feed which succeeds
e
1
but pre-cedes
e
2
, then there is no event in
PE
j
with a timestampgreater than
e
1
but less than
e
2
.
•
No duplicates:
No event
e
i
appears twice in the feed.When a user retrieves their feed twice, they have expec-tations about how the feed changes between the first andsecond retrieval. In particular, if they have seen some eventsin a particular order, they usually expect to see those eventsagain. Consider for example a feed that contains
N
= 5events and includes these events when retrieved at 2:00 pm:
Feed 1
Event Time Producer Text
e
4
1:59 Alice Alice had lunch
e
3
1:58 Chad Chad is tired
e
2
1:57 Alice Alice is hungry
e
1
1:56 Bob Bob is at work
e
0
1:55 Alice Alice is awakeAt 2:02 pm, the user might refresh their feed page, causinga new version of the feed to be retrieved. Imagine in thistime that two new events have been generated from Alice:
Feed 2
Event Time Producer Text
e
6
2:01 Alice Alice is at work
e
5
2:00 Alice Alice is driving
e
4
1:59 Alice Alice had lunch
e
3
1:58 Chad Chad is tired
e
2
1:57 Alice Alice is hungryIn this example, the two new Alice events resulted in thetwo oldest events (
e
0
and
e
1
) disappearing, and the global or-dering of all events across the user’s producers are preserved.This is the
global coherency
property: the sequence of events in the feed matches the underlying timestamp orderof all events from the user’s producers, and event orders arenot shuffled from one view of the feed to the next. Thismodel is familiar from email readers that show emails intime order, and is used in follows applications like Twitter.In some cases, however, global coherency is not desirable.Consider the previous example: in Feed 2, there are manyAlice events and no Bob events. This lack of diversity re-sults when some producers temporarily or persistently havehigher event rates than other producers. To preserve diver-sity, we may prefer
per-producer coherency
: the orderingof events from a given producer is preserved, but no guaran-tees are made about the relative ordering of events betweenproducers. Consider the above example again. When view-ing the feed at 2:02 pm, the user might see:
Feed 2’
Event Time Producer Text
e
6
2:01 Alice Alice is at work
e
5
2:00 Alice Alice is driving
e
4
1:59 Alice Alice had lunch
e
3
1:58 Chad Chad is tired
e
1
1:56 Bob Bob is at workThis feed preserves diversity, because the additional Aliceevents did not result in the Bob events disappearing. How-page, so“preceded”in the feed means“below”when the feedis actually displayed.