The Most Memory Safe Native Programming Language

Memory safety is an incredibly useful aspect for a programming language. It protects us from maddening bugs, devious vulnerabilities, and cantankerous shenanigans.

Getting memory safety with predictable performance 0 is quite a challenge! Most languages need to sacrifice memory safety for other features, such as FFI or unsafe blocks.

We've discovered a better way to offer memory safety.

Generational References' Hidden Superpower

Vale starts with a foundation of generational references. A generational reference is when every object has a "current generation" integer which we increment on free, and every pointer has a "remembered generation" number from the object. When we dereference a pointer, we assert that those two numbers still match. See Generational References for a more in-depth explanation!

Most other languages use garbage collection, reference counting, or borrow checking, but we chose generational references because they're fast 1 and more flexible than borrow checking, allowing us to use common safe patterns like observers, higher RAII, the dependency injection pattern, delegates, back-references, graphs, and so on.

We also discovered that they enable Vale to have complete memory safety, something no native language has been able to achieve.

This is because generational references:

Are not reference counted, so if we give one to some extern C code, we don't need to trust it to maintain reference counts to uphold our memory safety. 2
Are able to detect when an object has been deallocated.
Require no unsafe blocks, because there is no borrow checker we need to work around.

Generational references are pretty stellar for memory safety. However, there's one remaining problem: what about FFI?

FFI and Region Isolation

Foreign Function Interface (FFI) is when a language allows calling into another language's code. For example, Objective-C might make a pointer to some data and pass it to C, which might have a bug which corrupts that data.

This is called "leaky safety", and its bugs are very difficult to track down, because their symptoms manifest so far from their cause.

This can also happen when a language has unsafe escape hatches. If some unsafe code corrupts some memory, it can cause undefined behavior in safe code. For example, see this Rust snippet where an unsafe block corrupts some memory that's later used by the safe code.

In all these cases, we know that the unsafe language was involved somewhere in the chain of events, but since the bugs actually happen later on, in supposedly safe code, there's no easy way to identify which unsafe code was the original culprit.

To solve this, Vale has Fearless FFI which decouples and isolates unsafe C data from safe Vale data.

Separate the safe memory from the unsafe memory (such as the memory managed by C). This includes:

Not allowing safe objects to contain unsafe objects.
Not allowing unsafe objects to contain safe objects.
Using a different stack for the unsafe code.

Allowing references between the two:

A safe object can contain a reference to an unsafe object.
An unsafe object can contain a reference to a safe object, and it's automatically scrambled.

Enable passing memory between the two by copying, also known as message passing.

See Fearless FFI for more on this!

This protects us from any bugs in C that might otherwise accidentally corrupt our Vale data. 3

Performance

Generational references are very fast, but if we want that extra sliver of performance, Vale plans to add the Check Override Operator.

Note that the check override operator, region borrow checker, and hybrid-generational memory are all upcoming features, not available yet. We mention them here to show our plans for balancing safety with speed!

Most generation checks will be skipped by the region borrow checker and Hybrid-Generational Memory. For example, we recently implemented a Cellular Automata algorithm, and by our measurements, regions and hybrid-generational memory would eliminate every single generation check in the entire algorithm.

Still, for the occasional generation check that those two might not eliminate, we have the Check Override Operator, which will skip the generation check for a generational reference.

"But what if a dependency uses a check override operator, and causes memory unsafety? Isn't this just as bad as an unsafe block?"

It would seem so, except for one key detail: the check override operator is ignored by default for any dependencies. One must explicitly enable a dependency to ignore its checks.

Most people will be using it with checks on, so everyone will find out very quickly if there's any unsafe behavior in practice. Anyone who wants that extra sliver of performance can then opt-in to skipping the checks in release mode, with a little more confidence that unsafety will be detected by other users of the library, or in development or testing.

And if a user isn't comfortable with that for their situation, they simply stick with the defaults, which ignore the check override operator.

We think this is a perfect tradeoff to allow memory safety when it's critical, while not compromising the safety of the language or ecosystem.

We're also thinking of adding a way to skip all generation checks for a given block of code. 4

Tying it All Together

We talked about three mechanisms:

Generational references
Fearless FFI
Check Override Operator

With these measures in place, Vale will be the first completely memory safe native language!

Side Notes

(interesting tangential thoughts)

Notes [–] Notes [+] 0 1 2 3 4

Tracing garbage collection (like in Java) is a great solution for memory safety, but there are cases where more predictable performance is highly desirable, such as in embedded devices, games, or certain kinds of servers.

They're not just fast, but they could get even faster when we introduce regions and Hybrid-Generational Memory

For example, when Python code sends a Python object into C, if the C code doesn't correctly call Py_INCREF, it will corrupt Python's memory and cause some mysterious behavior later on in the Python code.

We could also protect against malicious code with sandboxing, via webassembly or subprocesses. This is a planned feature, see Fearless FFI for more on this!

We might even call it an unsafe block, if that doesn't cause confusion with other languages.