Visualizing Lamport Clocks & Distributed Databases

A dive into distributed systems and React/TypeScript

Photo by Anton Makarenko from Pexels

Alternate timelines

What, not everyone does that? Anyway, I didn’t have that goal in mind when I started this project. Instead, I wanted to alleviate my confusion about clocks.

Logical time

Receive a message? Add one. Send a message? Add one.

However, none of the articles explained why systems need these weird clocks. So I said, “Screw it, I’m gonna build a database and find out for myself.” Before long I started to send multiple write requests, and the model sometimes recorded data incorrectly. Here’s an example:

Let’s send two operations to the database one after the other.

  • First, we send item 1 = apple.
  • Right after that we send item 1 = banana.
  • After these operations complete, we expect the database to return item 1 = banana.

What happens if our requests get switched en route because of random network delay?

  • item 1 = banana arrives first and is written to the database.
  • item 1 = apple arrives second and is written to the database.
  • The database returns item 1 = apple because that request arrived last.
  • That’s wrong — the database should return item 1 = banana.
Photo by Aleksandar Pasaric from Pexels

Lamport clocks prevent disappearing bananas by including timestamps with each message. apple is sent with timestamp 1, and banana is sent with timestamp 2. Now, the database knows that banana was sent second because its timestamp is greater than apple’s timestamp. We keep our potassium, and the database is correct.

To see logical clocks in action, visit my distributed database simulation at https://whatisdist.com. Now, how do we actually build a real-time distributed database simulation with React and TypeScript?

Application Architecture

  1. The core data models
  2. The front-end
  3. The “API” component
Photo by Pok Rie from Pexels

The core data models

I built this project with some simple starting assumptions: Each node will have a random connection, and there will be a random network delay in sending messages to other nodes. Here are the resulting data models:

The Network object generates nodes with random information and connections. It puts all the nodes in a nodeMap , then instructs each node to find all the other nodes on the network.

The node object is the core of the application. It was inspired by CockroachDB’s method of distributing a single binary to all nodes on the network. Once the nodes come online, they will find each other, capture metadata about the Network, and form a functional distributed database. The code that simulates this behavior is particularly interesting:

findAllNodes() uses recursion to gather metadata about all nodes. Each node calls this function once to gather metadata about the network. In a real system, lines 5 and 10 would be API calls to other nodes.

There’s a problem here. Each recursive function will be called on the newNode , not on the originalNode , so each newNode must be able to send information back to the originalNode .

We solve this problem by sending a reference to the originalNode , and keeping that reference constant throughout the recursive chain. This is a situation where understanding pointers, references, or languages like C/C++ is necessary to write correct JavaScript. I learned C before JS so I call it “thinking in C,” and it’s such a pleasure to think in C on the front-end.

Photo by PixaSquare from Pexels

The front-end

The front-end captures database commands and sends them to the API. The API returns a list of events that happened in the database and at what time they happened.

The front-end uses the API response to animate the instructions being completed, and the user can pause the instruction execution at any time to understand how the database functions internally before returning a response.

Now here’s the interesting part: The API is actually just a front-end component that performs the simulation and returns the list of events to the rest of the application.

I built it this way so that if the actual simulation moves to the back-end in the future, the front-end is already built to receive that list of events from the API. It’s modular. Now how does the API do the simulation and gather the event list?

Photo by PixaBay from Pexels

The “API” component

This is where things get really fun. Take a gander at the code below:

So the API generates a new Network when the app loads, and it includes an eventStream which has type Subject from RxJS. Basically, the Subject is an event emitter.

When eventStream.next() is called, eventStream will emit the object that next() was called with. This is perfect for emitting events that happen in real-time.

Every time a database node receives or sends a message, it will use the eventStream to emit the message it received or sent and the random network delay. With the events emitted in order and the network delay of each event, I can re-create a timeline of events and animate it on the front-end.

But how do I gather those emitted events? There’s this thing called a Subscription .

The Subscription object is returned when Subject.subscribe() is called. If the Subject is an event emitter, the Subscription is the event consumer. Every time an event is emitted, the function within the subscribe is called. In this example, line 13 says that when the eventStream emits an event, simply push that event onto the emittedEvents array (yes, this should be a concat).

But I can’t send the API response until I know for sure that the simulation is finished. How do I do that? Enter the final pieces of magic, unsubscribe() and React side effects:

First, check out line 34. When the executeAllCommands() function detects that it has finished all of the commands, it will call Subscription.unsubscribe() on the eventStream subscription that we created on line 14. When unsubscribe() is called, the Subscription shuts down and doesn’t execute the function that was given in the subscribe() call. The eventStream can still accept any number of subscribers.

After that, the React state change apiFinishedExecuting is triggered and the useEffect() on line 20 delivers the complete event list to the front-end for animation. Woohoo!

You might be wondering how the executeAllCommands() call knows that it’s done executing an instruction. The answer is that the node data model returns a Promise just like a real API would. When a Promise resolves, the Promise.then() function kicks in and increments a counter in executeAllCommands(). When the counter is equal to the length of the instruction list, the next recursive call is made with an updated index argument. Finally, during the final recursive call the index is equal to the length of all lists and we know that the database is finished executing.

If you’ve checked out the site, you can probably guess that I’m a visual learner. I believe that most engineering concepts sound much more difficult and intimidating than they actually are, and well-thought-out visualizations help people see the underlying simplicity.

Thanks for reading!

A million lines of code live under our applications, and I love studying them. OMS CS at Georgia Tech, Software Engineer at Facebook.