Professional Documents
Culture Documents
Engineering
Back
Infrastructure
An important component of any ads ecosystem is the ability to store and retrieve
ad impression metadata accurately and consistently. This infrastructure powers
our analytics pipelines, billing systems, and prediction models. Given the
centrality of this system, it’s important that it evolves with the growing needs of the
business and the Revenue organization. The previous iteration of this system at
Twitter was designed almost ten years ago, when our team was much smaller and
served only a single type of ad. Today, Twitter’s Revenue organization consists of
10 times more engineers and ~$3.7B (https://investor.twitterinc.com/financial-
information/financial-releases/default.aspx) of revenue, supporting multiple ad formats —
Brand, Video, and Cards. We set out to design a system to meet the growing
demands of our platform.
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 1/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
There are three distinct stages in the AdServer: Candidate Selection, Ranking,
and Creative Hydration. Each of these components requires different fields to be
stored associated with each served ad. For example, the Prediction component
might run an experiment and want to store a field associated with it for processing
later. Throughout the funnel, each component attaches relevant fields in a data
structure called ImpressionMetadata, which is then persisted through AdMixer.
Once the served ads receive impressions and engagements, our ads callback
systems query this saved data and send it to all our downstream systems for
processing.
While this system served us and our customers well for over a decade, with the
growing demands of the business, it was increasingly difficult to append to this
system and extend it without compromising on engineering principles. Here are a
few of the problems we encountered:
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 2/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 3/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
Now that each problem space was logically separated, we considered the benefits
of physically separating the data being stored. Storing data in the same physical
space meant that operationally, we were still coupled. Capacity planning,
maintenance, and cost were shared between the teams. We started exploring a
world where the problem spaces are both logically and physically separated. The
benefits of this approach were clear: further separation of concerns, stronger
interface between different components, and independent maintenance, cost, and
operability of datasets.
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 4/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
Finally, an interface was needed between the AdServer funnel and multiple
datastores. We called this Impression Data Service. This microservice is in
charge of storing and retrieving ad impressions for all ads served. This interface
ensured a service level contract with the producers of the fields and helped us
abstract away the core business logic of handling candidates and winning ads in a
singular system.
In our new system, different problem spaces can independently grow, the
interfaces between multiple components of the AdServer are not leaky, and
developers working in one problem space do not need to gain context on spaces
outside of their domain of expertise. This means that Candidate Selection,
Creative Hydration, and Candidate Ranking can independently store metadata
relevant to their problem spaces. While previously ads metadata had to be
passed from one component to another, in our new system, each component calls
Impression Data Service to store relevant metadata. Impression Data Service
handles the business logic of correctly storing different metadata to different
physical stores.
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 5/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
There are two major aspects of the migration and testing framework:
1. Missing fields: We needed checks and balances in place to ensure that the
new system does not have any missing fields.
2. Mismatched values: When hundreds of fields, written by dozens of
systems, are being migrated to a new logical and physical store, we expect
some mismatches in the values of those fields.
Baking in these two components within our entire system itself made it a lot more
predictable and reliable. Automatic fallback mechanisms were built in, which
ensured that any time our new system isn’t consistent with our old system, the
request would fall back onto the old system and log that as an error for the team
to further resolve.
Because the shape of the data structures stored by the legacy and new system
were quite different, we built a Thrift parser to recursively find field names and
corresponding values in responses from the two systems. Once we brought the
new system to 100% correctness, we could start shedding load from the older
system and ramping up traffic to the newer system.
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 6/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
Network Bandwidth
We needed to understand the total bandwidth we would require in the new system
in each datacenter, taking into account failovers. The network bandwidth required
by the system in each datacenter is determined by request and response size for
the service.
Conclusion
First, throughout the design and development of the new system, it was critical to
stay customer focused and create a tight feedback loop with the current and
future users of the system. Working with customers closely during the problem
discovery, design, implementation, and testing phase enabled us to design a
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 7/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
system that truly solved their pain points. When working with systems that have
stood the test of time for over a decade requires the utmost care to go into
validating the problem statements and solutions.
Second, for migrations of this scale, it’s important to build migration and testing
frameworks that enable graceful degradation and fallback mechanisms into the
system itself. This allows for a seamless and confident transition of traffic from
legacy to new systems. As a result of our investment in this framework, our
system achieved 100% match rate with the legacy system and provided us with
multiple automatic fallback mechanisms.
Acknowledgements
Such a large scale effort would not have been possible without the cooperation of
multiple teams. We would like to thank those who worked on this project: Andrew
Taeoalii, Catia Goncalves, Corbin Betheldo, Ilho Ye, Irina Sch, Julio Ng,
Mohammad Saiyad, Ranjan Banerjee, Siyao Zhu, Tushar Singh, Jessica Law,
Juan Serrano, Mark Shields, Sandy Strong, Vivek Nagabadi, Kevin Donghua Liu,
Rashmi Ramesh, Justin Hendryx, Karthik Katooru, Sean Ellis, Bart Robinson,
Kevin Yang, Andrea Horst, Eric Lai, Ian Downes, Kristen Daves, Yogi Sharma,
Ming Liu, Yiming Zang, Kavita Kanetkar, Brian Kahrs, Dan Kang, J Lewis, Kai
Chen, James Gao, George Sirois, Fabian Menges, Somsak Assawakulpaibool,
Steven Yoo, Tanooj Parekh, Jean-Pascal Billaud, Jiyuan Qian, Pawan Valluri,
Paul Burstein, Xiao Chen, Yong Wang, Yudian Zheng, Tim Marks, Luke Simon,
Helen Friedland, Sergej Kovalenko, Dwarak Bakshi, Marcin Kadluczka.
(https://www.twitter.com/sidgrao)
Siddharth Rao
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 8/9
12/29/22, 1:33 PM Storing and retrieving millions of ad impressions per second
(https://www.twitter.com/zhukai_cs)
Kai Zhu
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/storing-and-retrieving-millions-of-ad-impressions-per-seco… 9/9