Explorations in Vale

In this article, I'm going to show you that:

  • While Rust is today's gold standard of speed and safety, it's not perfect; the borrow checker's safety guarantees incur both speed and memory overhead.
  • There are a lot of new projects and ideas that are improving on these drawbacks, to make faster and safer languages.

I'm a big fan of Rust, and think it's a huge leap forward in language design. Rust has shown us a lot of techniques that work really well.

This isn't a hit piece, this is an honest criticism of Rust. 0 Still, those that identify heavily with it might want to stop reading here. Keep in mind that I'm comparing with Rust only because, as mentioned before, Rust is the gold standard for safety and speed.

Some assumptions up-front:

  • This article assumes we're not using any unsafe, except indirectly the unsafe in the standard library.
  • This article also assumes we're not using any Rc because it's widely regarded as un-idiomatic 1, but also because it slows Rust down enough to make this article moot. This article is about achieving perfect safety with maximum performance.

Notes
0

Rust is what ignited my passion for programming language development in the first place! So it's rather ironic that I'm writing an article about what might come after Rust.

1

See this article for this perspective.

Rust's Tradeoffs

The Speed Tradeoff

Contrary to popular belief, the borrow checker incurs run-time overhead.

This isn't to say that Rust is slow. Despite the below costs, Rust is still one of the fastest languages in existence. It's not perfect though, and that means languages of the future might do better.

An Example

To illustrate the borrow checker's overhead, let's say we have a Vec<Spaceship>, and a dogfight function that:

  • Takes two &mut Spaceships parameters (a and b),
  • Uses a's weapons member to damage b's health...
  • ...and vice versa...
  • ...and repeat until one runs out of health.

The borrow checker rejects our call dogfight(&mut spaceships[shipAIndex], &mut spaceships[shipBIndex]). 2

The three main workarounds: 3

  • Use split_at_mut twice to get a mutable reference to the first spaceship and two slices for the remaining Spaceships, and search through those to get a mutable reference to the other.
  • Change dogfight to accept the entire Vec<Spaceship>, a shipAIndex int, and a shipBIndex int. 4
  • Store the ships in a Vec<RefCell<Spaceship>> instead.

Each of these three workarounds introduces some run-time overhead. Respectively:

  • Each split_at_mut involves branching, and if we ever want a reference to any other Spaceship, we do more branching to figure out which of our three slices has it.
  • Every time we "dereference" an index, it causes a bounds check.
  • RefCell incurs overhead every time we borrow its contents.

Memory safety is never free. Memory safe languages alway incur run-time overhead. One just has to zoom out enough to see it. 5

There are other safe and fast memory models with different tradeoffs (covered further below) and more are being discovered every year! It's an exciting time for language design.

2

It rejects it because it can't guarantee that shipAIndex != shipBIndex.

3

There are other tricks we can do to work around the borrow checker here, but none that are generally applicable in any situation. These three solutions can always be used.

4

We could also sunder our Vec<Spaceship> into a parallel Vec for health, and Vec for weapons. But at that point, we'd unwittingly have been forced into an ECS framework, which is sometimes not the best choice of architecture.

5

One can contrive very small and simple programs that use no Vecs, but such programs are incredibly rare.

Vecs and Indexing In General

In general, Rust forces us to use a lot of Vecs, which always have costs.

Not only does indexing into them require bounds checks, but expanding them is very costly...

  • We call malloc, which is expensive.
  • We copy all the data, which causes a lot of cache misses. 6.

...and not as amortized as one would think. 7

If one is using generational indices, looking up an element will incur a branch when checking that the generations match.

6

The CPU can avoid cache misses for linear copies, but only for large Vecs (~50 cache lines)

7

For example, if a Vec is used for a temporary calculation in a function, the Vec's resizing cost isn't amortized across all the function calls.

The Memory Tradeoff

Rust forces us to use a lot of Vecs to store our objects. However, Vecs have a tradeoff: they use a lot of memory. Your Vec with three elements might actually be 128 elements wide, if at some point in the past it had 65 elements in it. Rust programs use more and more memory as time goes on. 8

A very common technique in idiomatic Rust is to use generational indices to refer to objects. These too have memory overhead: a "current generation" integer for a slot in the array, and a "target generation" integer in every generational_index that is serving as a non-owning reference.

If a language could offer more flexibility on where objects reside and how they're represented, then they might avoid this memory overhead.

The Safety Tradeoff

Rust reaches a miraculous level of safety language compared to, say, C++. Its ability to be very safe and low-level is truly an achievement.

However, it's only a moderately safe language compared to other languages, for example Javascript. This is because of the unsafe keyword.

As said in the beginning, this article is assuming that we're not using any unsafe in our code. However, we can never be certain if a dependency is using unsafe wisely, and it's rarely feasible to audit all dependencies' usage of unsafe.

This problem is exacerbated by the borrow checker, which makes unsafe very tempting for some developers. 9 A very widely used framework, Actix, had an alarming amount of memory unsafety. 10

Contrary to popular belief, not all memory unsafety happens inside unsafe blocks:

  • Risky behavior inside an unsafe block can trigger memory unsafety outside, in safe Rust code.
  • Bugs in safe Rust code can trigger memory unsafety inside unsafe blocks.

If a language can achieve Rust's performance without dropping into unsafe, and make a few other adjustments, 11 then this problem evaporates. I believe such a thing is possible.

8

There's another paradigm that gives you speed if you sacrifice memory: garbage collection! GC becomes faster and faster the more memory you give it.

9

The experienced Rustacean knows that any program is technically expressible in idiomatic Rust, but not everyone that contributes to the ecosystem is so experienced.

10

We shouldn't pretend that we can simply grep for unsafe in our dependencies and immediately see problems. It's not that simple; it's very easy for a bug in safe Rust to violate the assumptions in an unsafe block.

11

It's important to note that offering FFI (a foreign function interface) is just as bad as unsafe blocks, so any language that wishes to improve upon this will need to address FFI's potential unsafety as well.

The Flexibility Tradeoff

In idiomatic Rust (without Rc), the borrow checker is very limiting, and rejects a staggering amount of safe patterns.

For example, Rust won't let you use the observer pattern.

rust
let my_object = SomeObject(...); my_button.add_click_observer(||{ });

There's only one option here: use globally-accessible data. 12

A great example of this is in the NAME HERE library. Because Rust can't handle observers, it requires all your data be in a SOMETHING struct, accessible to all observers.

An experienced architect will back up, and question why we have to use a framework for such a simple and safe pattern.

There are countless other examples of patterns that the borrow checker rejects.

If a language could offer static analysis on par with the borrow checker or integrate the borrow checker well with other strategies, this problem could be solved.

Development Time -> Efficiency

Because Rust is so inflexible, it takes longer to use, even after the initial learning curve. Compared to a language like Javascript, you will spend much more time refactoring your code to appease the borrow checker whenever you make a change.

That development time is valuable, because it can be spent on optimizing the code that matters. If a hypothetical language was half as time-consuming as Rust, then that extra time could be spent optimizing the hot path, 13 which could result in a more efficient program in the end than the Rust one.

12

This can either be an actual global, or some other data that's widely accessible to most of the program. Both are harmful.

13

It's often said that a program spends 95% of its time in 10% of the code. The "hot path" refers to that 10%.

Where can we go from here?

Rust isn't perfect, and one day, we'll find a language that will give us better safety, or more speed, or both. There are already a lot of fascinating techniques on the horizon which show a lot of promise.

Lobster: Compile-Time Reference Counting

Lobster is a new language that achieves some pretty amazing performance. At compile-time, its algorithm looks at all aliases for a given object, infers which one should be the "owning" reference, and make all other references into borrow references. For the rare case where this doesnt work, it falls back on reference counting. The algorithm eliminates 95% of all reference count adjustments.

It's like an "automatic borrow checker" in a way; it produces code that's similar to a Rust program that uses the occasional Rc, but much easier to use and with more potential optimizations since Rc is built into the language itself.

Additionally, it isolates memory between threads (similar to Rust and Vale) which drastically reduces reference counting's cost.

This is exciting because it shows that reference counting can be incredibly fast. There's a lot of different directions we can go with something like this, so keep an eye out for Lobster!

Cone: Maximum Flexibility via Regions

Cone is a language exploring what happens when we build the borrow checker on top of any kind of "region" that the user wants: lexical, arena, pool, reference counting, garbage collection, or the user can easily code their own region as well.

I'm particularly excited about this because a user can create their own region to suit a particular use case. Perhaps these could soon be possible:

  • If someone knew that a region would only have 50,000 objects, they could make their pointers 16 bits, which is more cache friendly than the normal 64 bits and could drastically improve performance.
  • If someone knew that objects only pointed to nearby objects in memory, they could use offsets instead of pointers.
  • Maybe someone could make some sort of interning region, where all objects are unique, to drastically speed up comparisons.
  • If someone knew that we often took snapshots of a region, one could use persistent data structures to index various versions of any piece of data.
  • If someone wanted to rewind an entire region, one could have the region keep a changelog.

I suspect that one day, we'll see a flood of interesting new memory management techniques, specialized for all sorts of use cases. After that ecosystem of regions forms, we'll start to learn when to use one over the other, how to seamlessly use them together, and achieve some kind of memory management enlightenment.

Region Borrow Checking

Vale is exploring how RAII interacts with different kinds of memory regions. Because all mutable objects are owned by a single owner, the language can "secede" an object (and all those it indirectly owns) into an isolated region, which we can then keep separate or merge with other regions.

With this, Vale makes it so that a function call (and any functions it indirectly calls) can use a specific region for its own allocations, while still being able to access the caller's region.

It keeps these separate by tracking which region every reference is pointing into.

You can read more about this at Zero-Cost References with Regions.

This is particularly exciting because:

  • It makes it incredibly easy to use a bump allocator or object pooling on a temporary basis for any function, without having the change that function.
  • Pure functions can know that the caller's region is completely immutable, which lets us do the same optimizations Rust does.

Hybrid-Generational Memory

Recently, the Vale team invented a new memory management paradigm, an alternative to borrow checking or reference counting or garbage collection, called generational memory. In generational memory, every allocation has an "actual generation" number at the top, 14 which represents "I am the nth inhabitant to occupy this location in RAM." Every (non-owning, non-borrow) reference is made of a pointer and a "target generation" number, so before each dereference, it checks that the allocation's actual generation still matches the reference's target generation, effectively checking if the target object is still alive.

Generational memory is 2.5x as fast as reference counting 15 so this is particularly promising. You can read more about this at Generational References.

Vale's next goal is to build on generational memory to make hybrid-generational memory, which blends in scope tethering 16 to enable automatic borrow checking in a way similar to an experienced Rust programmer, but without the difficulties of the borrow checker. You can get a sneak peak at the design at Hybrid-Generational Memory.

Constraint References

Address Sanitizer is a tool that numerous companies use to gain more confidence in their programs' memory safety. It instruments programs during development and testing to detect invalid memory accesses. 17 18 Then, after rigorous testing with this, a team will turn of Address Sanitizer in production, to maintain maximum performance. For certain use cases (such as games and apps) this is a perfect balance between safety and speed.

Constraint references go one step further than address sanitizer; instead of waiting until you dereference a dangling pointer, a constraint reference can detect when a pointer becomes dangling. 19 With this, a program can detect risky pointers before they're even dereferenced, similar to how SQL's foreign key constraints work (hence the name).

This was first implemented by Adam Dingle and David Bacon in an experimental language named Gel. We experimented with this in Vale and were wonderfully surprised at how easy they were to learn and use, and how few constraint halts we encountered (only two in a 1,500 line program, both which actually found legitimate mishandling). We decided to keep it in the language as Assist mode, and an additional Unsafe mode to turn off all protections if one deemed their testing with Assist mode was sufficient. You can read more about this at The Next Steps for Single Ownership and RAII.

This is particularly exciting because, given a good test suite, it's conservative enough to give high confidence in safety, 20 with zero memory or speed overhead, which will be the best choice for a vast number of programs.

And beyond!

The above languages are pushing the state of the art in regions, static analysis, and new memory management paradigms.

I predict that the 2020s will be a decade of exploration in the best ways to blend static analysis with a combination of different memory management paradigms in the same program, in a way that Rust programmers manually do right now.

I'd love to hear your thoughts! Swing by our discord server or the r/vale subreddit!

Afterword: More Ideas, Experiments, and Languages!

If there's something that writing this article has taught me, it's that we need more programming languages. We should to explore these realms more, because there's so much interesting potential! If I had to list a few interesting ideas:

  • Virtual memory (mmap, mprotect, etc.) tricks are constantly hinting at hidden potential.
  • Perhaps there's more that languages can do in the realms of concurrency on harnessing multiple cores' caches more intelligently.
  • There's interesting blends in the vast no-mans-land between native code and garbage collection.

There are a lot of other ideas and experiments and new languages using techniques which I couldn't fit here. In a future article, I could cover Nim's ORC, Zig's ReleaseSafe, Basil's stack-based model, and much much more. Comment and let me know what you'd like to hear about!

14

This requires that our malloc uses size classes (like jemalloc and mimalloc) in a heap specifically for these "generationed allocations" so we can be certain nothing will overwrite them.

15

This is comparing naive generational memory to naive reference counting. In other words, there are no optimizations on top of either.

16

This means that locals can share ownership of an object by marking its "tethered" bit, preventing it from disappearing and allowing more aggressive static analysis.

17

This means that the compiler inserts assembly instructions before every dereference to make sure we're accessing only valid memory.

18

Asan is basically like generational memory, but 20-40x slower.

19

If the user intends for a reference to dangle, then they can use a weak reference instead of a constraint reference.

20

Keep in mind, Rust programs (especially those with dependencies) still have to worry about memory unsafety too, because of unsafe.