Paul-Henri Leobon

Building a Serverless Ad Tracker: Scaling to Millions of Events and Back

Nov 22, 2025 (22h ago)65 views

The advertising industry is much wider than search and social ads. One small niche of it is performance email marketing. ThePerformer is a company working in this field. As their CTO, I had to architect a complete ad tracking and attribution platform capable of handling a few million events per day while staying cost-effective during quiet periods.

TLDR: AWS serverless services scale very well, GoLang is fast, rails can run inside lambda.

First, a little bit about the business

In email marketing, there are 3 parties involved:

Thanks to our tracking, whenever someone clicks on an ad, we record it, bill the advertiser, pay the publisher and redirect the customer to the advertiser's website.

Requirements

ThePerformer is a small startup with a limited budget and a very unpredictable load. You'll never get a warning before a big sale that can 10x traffic overnight, your infrastructure has to always be ready, most importantly, it has to remain cost efficient when there are no campaigns.

Our customers come to us to drive more sales; speed is all the rage in B2C. If our services are slow or unreliable, that's lost revenue for both the advertiser and publisher, and lost reputation for the agency.

This led to some hard requirements:

The tracker

The tracker is the heart of our company. I wanted it to be lightweight and to solve only one problem: tracking.

It had to behave like a function: f(RawEvent) -> TrackedEvent

A raw event can be a click, impression or conversion. They have to be linked to a campaign. Campaigns are configured on our console (details below) and dispatched to the tracker in the form of an event through AWS EventBridge. The tracker then stores the campaign information in MySQL with a Redis cache in front.

The RawEvent comes from API Gateway. Whenever a user interacts with one of our ads, he is redirected to the tracker, which in turn processes the event. The user also receives a proper response (either a redirect or a pixel). TrackedEvents are pushed to the tracking event stream (AWS Kinesis).

+-----------------------------+
|        ENTRY POINTS         |
|                             |
|   +---------------------+   |   Clicks / Impressions /
|   |     API Gateway     +---+-------- Conversions --------.
|   +---------------------+   |                             |
|                             |                             |
|                             |                             v
|                             |                     +----------------+           +-----------------------------+
|                             |                     |                |           |    TRACKING EVENT STREAM    |
|                             |                     |    TRACKER     |           |                             |
|   +---------------------+   |      Campaign       |    (Lambda)    | Tracking  |      +---------------+      |
|   |     EventBridge     +---+--- Create/Update -->|                +-----------+----->|    KINESIS    |      |
|   +---------------------+   |                     |                |   Event   |      +---------------+      |
|                             |                     +----------------+           |                             |
+-----------------------------+                                                  +-----------------------------+

The result is very simple and elegant. The tracker can scale up and down as needed with 0 bottleneck. AWS Kinesis can handle several mb/s, Redis hundreds of thousands of queries. This is much more than we need.

The Tech Stack GoLang was used for its speed and low memory footprint. The lambda needed only 128mb of ram and had a p95 latency of ~25ms. Cold starts were a non-issue with GoLang (p99 ~100ms)

Consuming the stream

Now that we have this clean stream of tracked events, we are just left with consuming them. We needed two systems, one for our real-time dashboard and another for long-term analytics.

The latter one was the simplest. We used AWS Firehose to consume the Kinesis stream and feed the results into our S3 data lake. Then a few glue jobs to transform the data into a more queryable format. Now everything is available inside AWS Quicksight and Athena.

For real-time analytics, we went with another consumer, a lambda function. This one loads batches of events from Kinesis, filters them, and aggregates them by various dimensions (e.g., campaign, publisher, hour) and pushes the aggregated results into a MySQL database.

This is the 2nd function of the system: f(TrackedEvent[]) -> StatsAggregate

Aggregating the data allowed us to considerably reduce the number of inserts needed, which in turn reduced the load on the database. This same lambda also exposes an HTTP API (via API Gateway), allowing other services to query the aggregated stats in real-time (near real-time to be precise).

                                                 (Cold / Long-term Path)
       +-----------------------+                 +---------------------+       +------------------+
       | TRACKING EVENT STREAM |            .--->|    AWS Firehose     |------>|   S3 Datalake    |
       |                       |            |    |  (Slow/Cold Path)   |       | (Raw / Archive)  |
       |       [KINESIS]       +------------|    +---------------------+       +------------------+
       |                       |            |
       +-----------------------+            |
                                            |
                                            |     (Hot / Realtime Path)
           +-------------+                  |    +---------------------+       +------------------+
           |             |                  `--->|    Stats LAMBDA     |       |     MySQL DB     |
           | API GATEWAY +---------------------->|                     |------>|   (Aggregated    |
           |             |                       |   1. Batch Poll     |       |   Stats Store)   |
           +-------------+                       |   2. Filter         |       +------------------+
                                                 |   3. Aggregate      |
                                                 +---------------------+

The Tech Stack GoLang was used once more for its speed, allowing us to process large amounts of data efficiently. MySQL was the known bottleneck of this service, but still handled all the load. The plan is to move from RDS to a time-series database like TigerData when the business grows sufficiently.

Managing the system

These two simple “functions” are the core of our system; they allow us to track campaigns and analyse their performance. The business is now able to make decisions based on their output (send more emails, remove bad actors, pay our publishers).

For our management console, I wanted a simple traditional MVC dashboard; performance wasn't a concern. Speed and ease of development were. I picked Ruby on Rails hosted on AWS Lambda using Lamby. Contrary to what you may think, cold starts are not a problem and happen rarely.

I added to this Rails app an event bus; any change to any campaign, medium, publisher or ad is automatically propagated to the tracker and other consumers through EventBridge. The Rails app is also able to consume campaign metrics by calling the stats lambda.

                                                     +-------------------+
                                      Updates        |    EVENTBRIDGE    |
                                  .----------------->|   (Change Events) |
                                  |                  +-------------------+
+---------------+      +----------+----------+
| Business User |      | MANAGEMENT DASHBOARD|
|    (HTTP)     +----->|   (Rails + Lamby)   |
+---------------+      +----------+----------+
                                  |
                                  |  Get Metrics     +-------------------+
                                  '----------------->|     STATS API     |
                                                     +-------------------+

The Tech Stack Ruby on Rails + MySQL were used for their speed of development. Scalability was not a concern; the dashboard is not meant to be used by more than a few hundred users. Despite this, we had an average response time of 140ms. Cold starts at 2ish seconds.

The results

99.999% uptime over the last year, millions of events processed with peaks reaching a few million events per day. All at a production cost of around $450 per month.

There's a growing trend of developers criticising cloud services for their high markup over bare metal. But I can't overstate the value cloud services provided us. Far exceeding what we'd have paid for in complexity and engineering time if we had decided to run on bare metal.

The ease of development, scalability, and reliability of this tracker stems directly from the cloud services we used.

I skipped a few infrastructure choices for brevity. Overall, whenever we picked an AWS service we picked the serverless option, except for Redis and MySQL. These two were HA managed clusters using MemoryStore and RDS.