Random Docs MongoDB Pagination Seed

Sign in
Get started
 ARCHIVE
 WRITE FOR US
 STYLE GUIDE
 ABOUT
 JOB BOARD
You have 2 free stories left this month.

Sign up and get an extra one for free.
Extract Random Documents

in MongoDB While
Preserving Pagination
Because sometimes you need to grab a random
document from your database
Ivano Di Gese
Follow
May 25 · 5 min read
Photo by Patrick Fore on Unsplash
It may sound weird. It may sound extreme. It may sounds

unconventional. But extracting random documents from a
MongoDB collection is actually a common behavior that sooner or
later every programmer must be able to implement and reproduce.
First of All: Use Cases and Real

Needs
After all, you really need to be able to randomize documents
extraction.
Imagine a use case or an app where you have to randomize

information to show, even with a basic algorithm and not
necessarily purely casually. Apps show randomic or
pseudorandomic data more often than you think to cycle content
and to make them appear always fresh.
Instagram, itself, uses an approach not much different than that to

select pictures to show in their Explore section.
These picture are based on content you usually love to enjoy,

there’s no doubt about it, but the images you can see must be
rotated, shuffled, and mixed a certain way.
So even out of the algorithm or the logic you can incorporate in
this technique, your goal is often to pick up random (and genuine,
of course) documents from your collection and send them to your
client application, which is the mobile app in this case.
Here are some scenarios that may come up:
 Showing popular posts
 Extracting random news from a specific category
 Outputting sample data
 Applying the randomness to match and meet specific

goals — for example, giving your contents the same
impression rate and distribution
 Generating random content to randomize a

behaviour — maybe for testing purposes, like input
data to unit tests
The problem of pagination
Simple and very frustrating: When you pull out the first 50
documents, you want to be able to paginate data and use $skip and
the $limitcorrectly, showing coherent data in every subsequent
query and avoiding duplicate content.
So if it was purely random, you obviously could extract the same
document in every separated query because every skipped query
won’t care about what happened before and what we sent in
output previously.
The first approach would be to remember (or cache) content

emitted in every previous output. But that’s way bad. There’s no
technique to scale up this approach, and we don’t want to store
any useless data to our xbillion document collection — we want to
do it better. So how could we?
Seeding the query
If the output could be random but related to a seed or a value the

client keeps up during the pagination, we’d have it. So if we could
randomly extract data — maybe ordering by a specific input —
we’d just need to tell the client to use the same input for every
request above the first. But how?
This approach exists in the SQL actually. The popular Order by

Random()can simply apply a specific value by invoking the random
function with a value inside its parenthesis. So a query like this …

SELECT * FROM theCollection ORDER BY RANDOM(123)
… would do the trick in most RDBMS. So what about MongoDB?

MongoDB doesn’t provide any specific operators or functions to
randomize the access to the collection other than
the $sample operator.
The ‘$sample’ operator: not a solution
The $sample operator can be used inside any aggregation pipeline,

as it exists as a pipeline stage.
It’s a very straightforward operator: a very versatile and easy-to-

use way to apply randomness to our aggregation pipeline. But it’s
also very limited and basic. Here’s how it works:
db.theCollection.aggregate(
[ { $sample: { field: fieldValue } } ]
)
uses a pseudorandom cursor to select documents, so it

$sample
basically just positions the cursor in a random position to extract

pseudorandom data — very easy and very poor complexity.
That’s why it’s not the solution to our problem because it can’t
consider previous data querying and doesn’t provide any feature to
repeat the query after the first extraction while providing the
ability to avoid duplicate documents.
The correct way
To maintain logic that’s incorporated to the query matching

criteria, the only way to meet our goal is to customize its features
and to make it extract data with a custom algorithm.
So instead of using the $sample operator, we must come back to the
seeding approach, which works starting by the idea of passing a
parameter as an input on our query and making it work as the
seed from which we determine the randomness of the query
extraction.
 The client gives us a seed
 The query use the seed to shuffle, randomize, and

then project a new field
 The query then orders the results by this new value
The real problem of that is how to make the seed related to our
documents field values. Consider these hints and suggestions:
 If the seed is a number, it can be used to formulate a

mathematical calculus, which gives us good
distribution and cardinality
 If the seed is a string, you could manipulate its value

or somehow relate its value to one of our fields
 If the seed is a date, you could calculate the time

difference with a timestamp or something like that to
randomize the new value
It’s all about you, and it’s very dependent on the collection
structure and the data modeling of your database.
From my experience, using seeds as numbers is way easier than

manipulating strings formats (characters occurrences, string
length, etc).
By using a number as a seed, it’s easy to find up a formula

to $project a new field with a random value. I personally used the
modulus (%) or the divider (/) operator with very good results,
keeping the complexity very low and avoiding overhead on the
CPU.
Conclusions
Extracting random documents from a collection could be tricky —
at least as much as the worst aggregates are to build.
But after all: Keeping things simple, obtaining a seed, and

then $projecting a new field — the one you’ll order by — could be a
good idea if you find an easy way to calculate this field value
starting from the seed value.
And obviously, the seed must be refreshed every time you want to
randomize your data back: That’s why it could be a good idea to
create a new seed value every time you ask for the first page of
results; after that, every subsequent page will use the same seed
value.
Thanks to Zack Shapiro.
 Programming
 Mongodb
 Random
 Database
 NoSQL
55
WRITTEN BY
Ivano Di Gese
Follow
Passionate IT skills on the run: keep calm, do your stuff and code
better
Better Programming
Follow
Advice for programmers.
More From Medium

React Forms — Class vs. Functional
nathan brickett
Debugging Your iPhone Mobile Web App With Safari Dev Tools
Matthew Croak in Better Programming
What Is the Difference between Imperative and Declarative Code

Reed Barger in Code Artistry
Do It Yourself XD Plugin(s) for Beginners: Part 1

Steve Kwak in Adobe Tech Blog
How to Integrate Stripe Payment Platform with iOS App on a Firebase Backend
Osaretin Idele in Weekly Webtips
Breaking down Reduce() in JS for beginners

Jack Hilscher
Immediately Invoked Function Expression (IIFE) in JavaScript .

Igor Łuczko
Unit testing Firebase Firestore & Cloud Functions

Kyle Welsby in JavaScript In Plain English
Discover Medium
Welcome to a place where words matter. On Medium, smart voices and original
ideas take center stage - with no ads in sight. Watch
Make Medium yours
Follow all the topics you care about, and we’ll deliver the best stories for you to
your homepage and inbox. Explore
Become a member
Get unlimited access to the best stories on Medium — and support writers while
you’re at it. Just $5/month. Upgrade
About
Help
Legal
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Random Docs MongoDB Pagination Seed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Docs MongoDB Pagination Seed

Uploaded by

Copyright:

Available Formats

Sign in

You have 2 free stories left this month.

Extract Random Documents

It may sound weird. It may sound extreme. It may sounds

First of All: Use Cases and Real

Imagine a use case or an app where you have to randomize

Instagram, itself, uses an approach not much different than that to

These picture are based on content you usually love to enjoy,

Here are some scenarios that may come up:

 Showing popular posts

 Extracting random news from a specific category

 Outputting sample data

 Applying the randomness to match and meet specific

 Generating random content to randomize a

The problem of pagination

The first approach would be to remember (or cache) content

Seeding the query

If the output could be random but related to a seed or a value the

This approach exists in the SQL actually. The popular Order by

function with a value inside its parenthesis. So a query like this …

… would do the trick in most RDBMS. So what about MongoDB?

The ‘$sample’ operator: not a solution

The $sample operator can be used inside any aggregation pipeline,

It’s a very straightforward operator: a very versatile and easy-to-

uses a pseudorandom cursor to select documents, so it

basically just positions the cursor in a random position to extract

The correct way

To maintain logic that’s incorporated to the query matching

 The client gives us a seed

 The query use the seed to shuffle, randomize, and

 The query then orders the results by this new value

 If the seed is a number, it can be used to formulate a

 If the seed is a string, you could manipulate its value

 If the seed is a date, you could calculate the time

From my experience, using seeds as numbers is way easier than

By using a number as a seed, it’s easy to find up a formula

But after all: Keeping things simple, obtaining a seed, and

More From Medium

What Is the Difference between Imperative and Declarative Code

Do It Yourself XD Plugin(s) for Beginners: Part 1

Breaking down Reduce() in JS for beginners

Immediately Invoked Function Expression (IIFE) in JavaScript .

Unit testing Firebase Firestore & Cloud Functions

You might also like