Immutable Code in Unison
Surprisingly good things happen if your language doesn’t let you change the code you write.
I’m going to show you some magic, and then reveal how it is done.
To do that, I’m going to use the Unison programming language, which enforces both immutable data and immutable code.1
This is a long post: I promise it’s worth reading it all. By the end, you’ll see why immutable code has the potential to fix both dependency hell and software distribution: no more package managers, no more builders/bundlers, and no more complex deployment.
Before we start, there are two things you need to know:
Unison function definitions are like Haskell’s. Here’s a function that multiplies two natural numbers:2
Unison
times: Nat -> Nat -> Nat times a b = a * b
The first line is the type signature:
times
takes a natural number (Nat
), then anotherNat
, and returns aNat
.The second line is the definition: the function has two parameters,
a
andb
, shown to the left of the equals sign, and a body, to the right. The value of the function when called is the value of the body.Unison doesn’t use (long lived) source files. Instead, it manages all your code itself inside a namespaced repository.3 Your interface to this repository is the
ucm
4 command.In normal use, you
cd
into some temporary, empty, directory and run theucm
command. In another window, you use your favorite editor to create Unison source files in that same directory. When you save a file,ucm
compiles its contents and shows you the functions it found. It also runs any tests you’ve defined. If you’re happy, you can add the function to (or update the function in) the Unison code repository.
In the magic show that follows, you’ll see ucm
sessions along with Unison source code that I entered into my editor. I’ll label these blocks with UCM or Unison for clarity.
Abracadabra
We start in ucm, where we create a namespace and import the standard library into it.
UCM
.> cd magic
.magic> fork .base lib.base
Done.
Now, in the editor, we’ll create a function that adds its arguments.
Unison
add: Nat -> Nat -> Nat
add a b = a + b
test> add.test = check (add 3 4 == 7)
The last line defines a test (which becomes its own function). We bind it to the name add.test
. This isn’t necessary, but it makes it easier to see what’s going on.
We save the source, and the ucm window bursts into life:
UCM
⍟ These new definitions are ok to `add`:
add : Nat -> Nat -> Nat
add.tests : [Result]
Now evaluating any watch expressions (lines starting with `>`)...
4 | test> add.tests = check (sum 3 4 == 7)
✅ Passed : Proved.
Ucm now has local copies of our add
and add.tests
functions.
We’ll add these two functions to the repository:
UCM
.magic> add
⍟ I've added these definitions:
add : Nat -> Nat -> Nat add.tests: : [Result]
Back in our editor we’ll define two new functions, square
and sumSquare
, along with a couple of tests.
Unison
square: Nat -> Nat
square a = a * a
test> square.tests = check (square 3 == 9)
sumSquare : Nat -> Nat -> Nat
sumSquare a b = add (square a) (square b)
test> sumSquare.tests = check (sumSquare 3 4 == 25)
Save it, and ucm reports:
UCM
⍟ These new definitions are ok to `add`:
square : Nat -> Nat
square.tests : [Result]
sumSquare : Nat -> Nat -> Nat
sumSquare.tests : [Result]
4 | test> square.tests = check (square 3 == 9)
✅ Passed : Proved. (cached)
9 | test> sumSquare.tests = check (sumSquare 3 4 == 25) ✅ Passed : Proved. (cached)
Add square
and sumSquare
to the repository (along with their tests).
UCM
.magic> add
⍟ I've added these definitions:
square : Nat -> Nat sumSquare : Nat -> Nat -> Nat
At this point, feel free to exit your editor and delete the scratch source file. Honest.
Back in ucm, we can prove that our source is still safely stored, and that the tests still pass:
UCM
.magic> view sumSquare
sumSquare : Nat -> Nat -> Nat
sumSquare a b = add (square a) (square b)
.magic> test
◉ square.tests : Proved.
◉ add.tests : Proved.
◉ sumSquare.tests : Proved.
✅ 3 test(s) passing
It looks like our code is safely tucked away inside Unison.
Part Two: What’s in a Name?
Looking at the test results from the previous ucm output, I’m struck by the fact we called our addition function add
and the function that adds two squares sumSquare
. Fortunately, we can use ucm to rename it.
UCM
.magic> move.term add sum
Done.
Let’s run the tests again:
UCM
◉ square.tests : Proved.
◉ add.tests : Proved.
◉ sumSquare.tests : Proved.
✅ 3 test(s) passing
Look at that. We renamed the add
function, but not its test. We’ll fix that:
UCM
.magic> move.term add.tests sum.tests
Done..magic> test
◉ square.tests : Proved.
◉ sum.tests : Proved.
◉ sumSquare.tests : Proved.
✅ 3 test(s) passing
Cool. Except… it shouldn’t have worked. The sumSquare
function was written to use add
, but add
no longer exists. Let’s double check the source:
UCM
.magic> view sumSquare
sumSquare : Nat -> Nat -> Nat sumSquare a b = sum (square a) (square b)
We never touched the source of sumSquare
, but somehow the call to add
was replaced by a call to sum
. That’s why the tests ran.
OK, you’re thinking. When we renamed add
, ucm went through the repository and changed the word add
to sum
.
The truth is far cooler than that.
Part Three: Bring out the chainsaw
We now have our code safely stored in a box, the Unison repository. Time for finale of our trick: let’s do some damage and saw it in half.
Open up your editor and create a new definition for the function sum
. This version takes three parameters, not two.
Unison
sum: Nat -> Nat -> Nat -> Nat
sum a b c = a + b + c
test> sum.tests = check (sum 3 4 5 == 12)
ucm, being the honey badger of development environments, just doesn’t care:
UCM
⍟ These names already exist. You can `update` them to your new
definition:
sum : Nat -> Nat -> Nat -> Nat
sum.tests : [Result]
6 | test> sum.tests = check (sum 3 4 5 == 12)
✅ Passed : Proved. (cached)
Let’s again do as it suggests and update the sum
function, then rerun all the tests:
UCM
.magic> update
⍟ I've updated these names to your new definition:
sum : Nat -> Nat -> Nat -> Nat
sum.tests : [Result]
.magic> test
Cached test results (`help testcache` to learn more)
◉ square.tests : Proved.
◉ sum.tests : Proved. ◉ sumSquare.tests : Proved.
So that’s just weird. We replaced the sum
function with one that is incompatible with the original, and yet sumSquare
still works.
Let’s have a look at sumSquare
one more time:
UCM
.magic> view sumSquare
sumSquare : Nat -> Nat -> Nat sumSquare a b = #aut6jgfc1j (square a) (square b)
Whoa: there’s a strange set of characters, #aut6jgfc1j
, where the function sum
used to be. If we assume it’s a function name, perhaps we can view it:
UCM
.magic> view #aut6jgfc1j
#aut6jgfc1j : Nat -> Nat -> Nat
#aut6jgfc1j a b =
use Nat + a + b
That’s our original sum
function, but with a new name.
Part 4: Finale
Some time later, after I’d forgotten about the whole sum
/add
fiasco, I’m back writing code. I needed a function to total some numbers, so I wrote:
Unison
total : Nat -> Nat -> Nat
total addend1 addend2 =
addend1 + addend2
test> total.tests = check (total 5 6 == 11)
I saved it, and over in ucm I added it to the repository and ran tests.
UCM
⍟ These new definitions are ok to `add`:
total : Nat -> Nat -> Nat
total.tests : [Result]
4 | test> total.tests = check (total 5 6 == 11)
✅ Passed : Proved.
.magic> add
⍟ I've added these definitions:
total : Nat -> Nat -> Nat total.tests : [Result]
At that time I remembered about the strange function name in sumSquare
. I decided to have another look at it:
UCM
.magic> view sumSquare
sumSquare : Nat -> Nat -> Nat sumSquare a b = total (square a) (square b)
ucm noticed that the total
function does exactly what the old sum function did, even though the name and parameter names are different. This means it can now use the more readable name, total
in place of #aut6jgfc1j
.
Magic!
What we saw
Unison manages your source code internally: what you see in your editor is ephemeral.
The names of functions can be changed (on way is to use
move.term
), and that change is reflected in all sites that reference that function.Replacing a function with a new, incompatible, version doesn’t break existing uses of that function. Instead, those existing uses are replaced with a strange name staring with
#
.If you subsequently define a function that does the same thing as the original, Unison replaces the
#...
name with that new function’s name.
Next we’ll see how all this is done. Now might be a good time for a stretch…
How It Works
We’ve all heard of immutable data. It’s one of the cornerstones of functional programming. Immutability makes it easier to reason about your code, easier to reuse your functions, and easier to write concurrent functions.
Immutable data sounds like a good idea.
Well, it turns out that having immutable code is also amazingly beneficial: it’s how Unison pulls off all the tricks with our add
function.
Let’s investigate this from the bottom up.
What is a Function?
Here’s the definition of a function that sums its arguments, written in several languages:
sum a b = a + b
const sum = (a, b) => a + b
= -> (a, b) { a + b } sum
sum = lambda a, b : a + b
If you’re like me,5 you’ll interpret this code as create a function called “sum” that adds its arguments. But that’s not really true. All of these code fragments do two things: they create a function that sums its arguments, and then they associate that function with a variable or constant called sum
. The function and the name are separate.
If you’re not convinced, have a look at this JavaScript fragment.
let sum = (a, b) => a + b
let add = sum
sum = "Hello"
console.log(add(3, 4)) // => 7
console.log(sum(3, 4)) // error: `sum` is not a function
On line 2 we copy the function value into the variable add
, and on line 3 we reassign sum
.
The function is independent of the name it’s bound to.
The Names of a Function
But I tell you, a cat needs a name that’s particular,
A name that’s peculiar, and more dignified,
Else how can he keep up his tail perpendicular,
Or spread out his whiskers, or cherish his pride?
Of names of this kind, I can give you a quorum,
Such as Munkustrap, Quaxo, or Coricopat,
Such as Bombalurina, or else Jellylorum—
Names that never belong to more than one cat.T.S.Eliot, The Naming of Cats
T. S. Eliot tells us cats have three names: the name we call them, the unique name they call each other, and a secret name known only to its owner.
Functions are like cats in a couple of ways: they often ignore what we tell them to do, and they have their own secret name as well as the names that we call them.
For functions, that secret name is their implementation; their code.
How can we turn an implementation into some kind of name? One way is to use the abstract syntax tree (AST) which the compiler uses to represent the function.
The AST of the function implementation is just a data structure. We can generate a hash value from it, and use that hash as the internal (or secret) name of this particular function.
In Unison, this hash value is 512 bits long, making the chances that two different functions will have the same signature effectively zero.
Initially Defining a Function
When we defined the add
function, Unison computed the hash of its AST, and associated the hash with the AST representation. Once added to the repository, these two things will never change: the function body and the hash that references it are both immutable.
At the same time, Unison creates an alias to the hash. In this case, that alias is add
, the name we first bound to the function.
We then used move.term
in ucm. Although it looked like we were renaming the function, all we were really doing was replacing an alias to that function:
Even if we deleted all aliases, the function would still be there, and can be referenced using its secret hash name.
Using That Function
We then defined two more functions, square
and sumSquare
. The sumSquare
function introduces no new concepts: an AST is created, a hash is used to name it, and the alias square
references that hash.
But sumSquare
lets us explore the second trick in our magic show: how did Unison update the call to add
to a call to sum
when we renamed it?
It turns out there’s nothing really to do. When creating the AST, Unison resolves the names of functions that are called via their aliases to their underlying hash.
Let’s repeat that, because it is the foundation from which all the benefits of immutability grow. Functions are always called by their internal hash, and not their local name.
Here’s part of the AST for sumSquare a b = add (square a) (square b)
:
See how the name in the call
node is the hash for the add
function.
Because the hash corresponds to the implementation of the function, what we’re effectively doing is calling the implementation of the function which just happens to have a particular name (add,
initially).
Viewing a function
UCM stores terms in the repository as their AST, not as their source. Whenever we fetch that term to display or edit it, ucm converts that AST back into plain text.6 As part of that process, it looks up the secret names of functions in the aliases list. If it finds an alias, it substitutes it for the hash name. That’s why when we said view sumSquare
, we initially saw it using add
as the name of the function it called. However, when we replaced add
with sum
in the alias table, Unison would do the same lookup and come back with the new name: miraculously the function is now called sum
. (Of course, all the time it’s actually called #aut6jgfc1j
)
Changing a Function
Then we changed the definition of sum
to have three arguments.
When ucm compiles this, the AST will be different to that of the original add
or sum
; its hash will be different.7 This means that when it gets stored, it will be under the new hash, and the alias sum
will point to that rather than the original.
But nothing hash changed in the implementation of sumSquare
: that function call still references the hash #aut6jgfc1j
, which refers to the original implementation. When we call sumSquare
it will still call our original two-argument version of sum
.
However, if we view sumSquare
, it won’t be able to find an alias for that hash when generating the text version of the function. That’s why you see the hash instead.
UCM
.magic> view sumSquare
sumSquare : Nat -> Nat -> Nat sumSquare a b = #aut6jgfc1j (square a) (square b)
That’s actually valid Unison syntax: you can use the (not-so-)secret hash in place of the function name.
Discovering a New Name
Finally we created a total function:
Unison
total : Nat -> Nat -> Nat
total addend1 addend2 =
addend1 + addend2
Compare it to our original add
function:
Unison
add: Nat -> Nat -> Nat
add a b = a + b
Because Unison doesn’t care about the names of parameters or the textual layout of the function when creating the AST hash, it turns out that the total
function has the identical hash to the add
function #aut6jgfc1j
.
At the time we created total
, #aut6jgfc1j
was pointing to our original code, and there were no aliases referencing it (because sum
had moved). So Unison says “this function already exists in the repository, so I don’t need to store it again. All I have to do is add the alias total
to the existing hash.”
This means that the next time we fetch the source for sumSquare
it can resolve the function call to the name total
.
Our add two natural numbers function was compiled, hashed, and stored in the repository at the very beginning of this episode. Since then, it hasn’t changed—it can’t change. But, locally, we’ve referred to it using three aliases as well as by its internal hash, we changed its name and its implementation, and all the while the rest of our code just kept running.
You’re writing an application that uses external libraries.
Imagine that you didn’t have to use a package manager or edit a TOML or JSON file to add it. Instead you just called it, and the then current version of it is associated with your code. You don’t need a local copy; ucm can handle all that behind the scenes. Maybe it does caching, but who cares?
Imagine coming back to that code two years later and expecting it to just run. Why wouldn’t it? Nothing has changed.
Imagine being able to change a function that you or someone else wrote and not having to worry about breaking code that depends on that function.
Imagine being able to distribute code by just giving the target machine the hash of your main function.
Imagine immutable code.
There’s Always More
This article is already twice as long as I’d hoped, so I’ve had to leave out a lot. In particular, ucm basically has a git-like diff/merge/patch system built in, along with the ability to traverse dependencies. This makes it easy to cherry-pick updates when new versions of functions you use are produced. It also allows you to generated a patch file that others can apply to their local namespace.
Obviously, there’s also a whole lot to be said about Unison the language, but in these articles I wanted to focus on specific and unique features that I find exciting.
The next article in this series will talk about Unison abilities. These let you isolate mutable state and scope access to it via the call-chain; not syntactically.
After that I want to talk about the way Unison abilities made me rethink how distributed code could work.
Footnotes
Unison is a crazily innovative language. As well as this articles, I’ll be writing about its effect system and about its super-easy distributed computing model. But this isn’t a Unison tutorial (let me know if you want one). Instead, I’m just using it to illustrate some points↩︎
The term natural numbers here means positive integers along with zero.↩︎
If you’ve come across Smalltalk, this is quite similar to its idea of an image.↩︎
Unison Code Manager↩︎
Or, at least, me as of a few months ago.↩︎
Which means that there are no more discussions about source layout. You can submit source to ucm formatted using arcane Tarot rules, but when it comnes back out it will look like every other piece of Unison code.↩︎
#r4ohr76lvt
versus the original#aut6jgfc1j
↩︎