LanguagesArchitecture

A few years ago, I was lying in bed at 2am listening to my favorite fanfiction podcast for the 47th time, drifting in and out of sleep, brain spinning on various features and bugs I'd programmed that day. 0

Suddenly, my eyes flew open as a realization hit me.

That realization sparked a chain reaction of events that spanned years, and led to discovering an entire new memory safety approach.

This is the first post (of hopefully many!) about the journey of creating, designing, and implementing regions, a static analysis technique that lets us make efficient, memory-safe code without garbage collection, reference counting, or a borrow checker.

This post in particular talks about what originally led to the idea, and the first bits of implementation. If you instead want to learn about regions themselves, head over to the regions overview! 1

Regions and the Chronobase

The idea originally came from a game I made for the 7-Day Roguelike Challenge. In this game, you could time travel and team up with your past self to defeat hordes of enemies.

The time-traveling system involved a lot of repetitive boilerplate code, so I made a tiny language and compiler to generate the code for me. 2 3 I could then write a tiny bit of code in this language to add a new item, enemy, tile, or component, and it would instantly generate the game state code.

In the end, it was a sort of "time-traveling database", so I called it the chronobase.

The 7-day challenge started, and the chronobase worked wonderfully. By the end of the week, I had so many kinds of items, enemies, tiles, and components in the game that the resulting generated code was over 150,000 lines.

After the challenge, I wanted to write functions in the chronobase language, not just data. These functions would behave kind of like regular databases' stored procedures. 4

For various reasons, I decided that the outside objects (the ones given as inputs to these functions) would be immutable. An outside Goblin would be r'Goblin and immutable, while a Goblin already in the database would be db'Goblin and mutable.

And at 2am, that fateful night, it occurred to me that we can skip reference counting and generation checks for any of these inputs (like r'Goblin) because they're temporarily immutable while we're inside the function. 5

If you're curious about why this works, check out the explanation, but for now I'll continue the tale!

Side Notes
(interesting tangential thoughts)
1

If you like this post, consider submitting it to Hacker News, Reddit, or your favorite link aggregator! It helps me out a lot, and helps spread these ideas more widely.

2

It was a domain specific language named VSS, with a tiny Scala-based compiler to generate the chronobase code.

3

Time traveling is hard to do efficiently; one cannot simply copy the entire game state for every turn. Instead, I used journaling and persistent hash maps to be able to reconstruct the game state from any point in time.

4

In fact, if we added stored procedures and made the tiny language deterministic, we could have it record all of its inputs and be able to perfectly replay and reproduce any bugs we encountered, so I added that in. This later inspired Vale's Perfect Replayability feature.

5

This idea eventually evolved into immutable borrowing, the core concept that enables regions.

An Alien Concept

I feverishly scribbled it in a notebook for hours. When I woke up, I discovered this notebook, and read the page.

"...what the hell am I looking at?"

This was a common occurrence, honestly. "Notes from Night Evan" is an ongoing joke in my circle, about the weirdest things I'd written to myself. And this was definitely up there.

"Huh."

It kind of made sense.

And then I noticed a triple-underlined statement: "can skip run-time safety overhead!".

"Holy hells!" I said to myself and sprang out of bed. It all came rushing back, and I finally remembered all the details. 6

With this technique, it might be possible to skip almost all memory-safety overhead in a program, and it didn't even require aliasing restrictions! 7 8

I knew I had to bring it into Vale.

6

At the time, Vale was based on a single-ownership form of reference counting called constraint references, and Night Evan had realized that references into immutable regions could completely skip the reference counting overhead. Nowadays, Vale is based on generational references, but regions have the same optimizing effect for them too.

7

It's generally impossible to skip all memory safety overhead in a program. One will always need at least bounds checking.

8

This is referring to the borrow checker's restrictions, which enforce that nobody else can have a reference to anything that might be modified.

More Alien Concepts

I posted the designs, and people started reaching out to me about other designs in the space. I had a call with Forty2's Marco Servetti and we talked about our different takes on regions, and someone shared Milano, Turcotti, and Myers's work on concurrency models.

I'm told that regions look like some sort of unified generalized form of affine typing (one of the concepts underlying Rust's borrow checker), Pony's iso permission, and a few other languages' mechanisms, all combined into one.

I hammered on this design in my head for a long time while I built out Vale's foundations. 9 10 Eventually, generational references and regions combined to form something that looked suspiciously like an entire new paradigm. 11

9

I haven't stopped hammering on it in my mind, even now. People still find me randomly staring at walls, thinking about regions. It's only mildly concerning.

10

The biggest revelation came when I was lying on the floor in a random Georgia AirBnB named "Little Mexico" when I realized that we could make "one-way isolation". Check out One-way Isolation for more on this, it's pretty trippy.

11

One person in our server described it as a "higher-level, more precise, opt-in Rust".

The First Pieces

Implementing something like this is no easy task. It took years to implement Rust's borrow checker, and I don't have an entire team behind me.

Large projects like these require more than coding. They require patience, introspection, planning, a high yak tolerance, and a bit of insanity. 12

The most important technique in planning something like this is to break up large tasks, even if it slows you down. Don't embark on monumental rewrite-the-world odysseys, and don't implement large features all at once.

There are a few reasons for this 13 but the biggest one is that we humans need timely reward for our emotional investment, to avoid burnout. A good sub-project is one that someone can work on for a month or two at most, then release it and feel the satisfaction of it being done, and see the users' delight in using it. 14 15

A few years ago, I worked on a large refactor to add internal namespacing 16 to the Vale compiler. After too many months of working on it, I burned out hard. I had to take a break from Vale for two months before my motivation returned. I didn't want that to happen again with regions.

To avoid that kind of situation, we plan intermediate, useful, cohesive goals.

One possibility jumped out immediately. I had a post-regions plan to make it so we could safely call into code written in other languages that don't have as strong safety guarantees, such as C, Zig, or Rust. To do this, the backend needed to be aware of multiple regions, and properly handle data between them.

After five weeks of adding partial backend region support and three weeks of FFI code, we released Fearless FFI.

That was three weeks of not working on regions, but at the end I had a complete new feature which we could release and talk about.

The friendly folks in our discord server gave me hearty congratulations, and we also got a couple new sponsors! One even told me how much he believed in what I was doing for the software world, and hopes that I can succeed in making systems programming much more accessible to the every day programmer.

Words cannot express how much words like that mean to me, and how much they keep me going when things get rough. 17

12

After all, what kind of sane person would spend years working on a language?

13

Some more reasons:

  • It helps stave off the boredom from working on one thing for too long.
  • Your stakeholders (investors, managers, users) like to see active updates, which indirectly helps keep the project alive.
  • It helps avoid merge problems (unless continuously merging behind a feature flag, which is often the best approach in my opinion.)
  • Combined with other measures, it helps new engineers resolve their impostor syndrome.

14

The reward can come in other ways, of course. Sometimes, the reward can even come from doing the work itself, if the problem is particularly fascinating. Most projects are mostly made up of tedious tasks, so the reward needs to come from elsewhere.

15

There are ways to mitigate it (demos, congratulations, etc.) but if the sub-project continues too long unchecked, we find ourselves continuing only because of our momentum and force of will.

16

Internal namespacing means that all locals, generic arguments, generic variables, lambdas, etc. all have absolute names. x becomes myFunc.lambda3.x. We also add disambiguating information to support overloads, such as parameters, since multiple functions can have the same name. Two functions named myFunc will include their parameter types in their name to become myFunc(int) and myFunc(bool) so they have unique names.

17

Especially when people ask me why I'm working on a language when the perfect language already exists. It helps to know that others see the value of making programming easier and more accessible to more programmers!

Uncomfortably Exciting

At my last job, we often used the phrase "uncomfortably exciting" to describe when you're a bit scared of your own idea. That definitely captures how it feels to work on regions!

In the next post, I'll talk about the massive yak that reared up when when trying to add regions to the frontend. 18

That's all for now! I hope you enjoyed this article. Keep an eye out for the next one on our RSS feed, twitter, discord server, or subreddit!

See you next time!

- Evan Ovadia

18

Spoiler alert: it's generics. It's always generics.