(Home)

Unix Delenda Est

March 3, 2015

Intro

When I first read about Japan’s fax machine use I smiled at how backwards it was.

Then I realized the web is also a fax machine. The situation doesn’t seem amusing anymore. Shoehorning a general purpose computing environment into our document viewers isn’t really “funny”. It’s more of a historical-scale disaster, like lead in the water supply.

I’m not sure what excuse technologists have for not exposing a “compose” primitive to millions of people (copy/paste doesn’t count). Or a “save” primitive that gives actual data, not a blob of markup. Or a “repeat” primitive. Hello RSI.

But if web browsers are so bad, why do they keep spreading?

Partly because they’re fine for very casual users.[1] Some people like Disneyland more than Europe. There’s nothing wrong with that.

But mainly because the alternative is defective. The startup, academic and open source communities all have their weight behind Unix, but Unix is a dead end.

The Unix Problem

The first part of the Unix problem is that Unix is such an enticing mirage. It looks perfect. When you fire up the terminal, you get this:

$

Which is exactly what a humane computing environment should look like.

The second part of the Unix problem is that the Unix user environment is awful.

All that superficial perfection is fakery. Unix sits in a perfect position relative to good design where it can suck up millions of hours of developer improvements – making all of them seem productive at the time – and remain a broken system.

This is because the Unix filesystem and the Unix way are the wrong tools for general computing. A single global, mutable tree of untyped text is a bad persistence model. Regexes are bad tools for most data manipulation.[2] For a computer to be a good computer, it must provide good initial defaults. Experts can build a good system on bad defaults (this is exactly what they’ve done with Unix), but this doesn’t make Unix computers good computers.

To illustrate the problems with the Unix approach I’ll talk about what could be instead. None of the following requires new technology, just new tools and conventions.

The Cabinet Computer

“[A] computer does not primarily compute in the sense of doing arithmetic. […] They primarily are filing systems.”

– Richard Feynman, (from this video at 5:10)

1. Tagged Filesystem

Hierarchical filesystems are an anti-pattern – the more you organize them, the less useful they become. A simple example follows.

$ tree
.
└── paperwork
    ├── consulting
    ├── personal
    └── work

The following is more organized, but harder to work with.

$ tree
.
└── paperwork
    ├── consulting
    │   ├── 2013
    │   └── 2014
    ├── personal
    │   ├── 2013
    │   └── 2014
    └── work
        ├── 2013
        └── 2014

And there’s more than one way to do things. Something is wrong.

$ tree
.
└── paperwork
    ├── 2013
    │   ├── consulting
    │   ├── personal
    │   └── work
    └── 2014
        ├── consulting
        ├── personal
        └── work

The Cabinet Computer’s tagged filesystem is much better.

λ get tag=paperwork,consulting
λ get tag=paperwork,consulting date=2014

Hierarchical filesystems are only synthesized when necessary for backwards-compatibility (e.g. when a compiler or interpreter expects one).

The Camlistore docs include a perfect description of why this is the correct way to do things:

[I wanted to] … not always be forced into a POSIX-y filesystem model. That involves thinking of where to put stuff, and most the time I don’t even want filenames. If I take a bunch of photos, those don’t have filenames (or not good ones, and not unique). They just exist. They don’t need a directory or a name. Likewise with blog posts, comments, likes, bookmarks, etc. They’re just objects.

See also: Sections 1 and 2 of Seltzer and Murphy’s Hierarchical File Systems are Dead.

2. Data is Typed

λ get type=python author=me | grep fooBarBaz
fooBarBaz(a, b) # Some note about the tricky use of fooBarBaz

The reader might understandably wonder why there’s still grep and pipe in a Unix hater’s dream environment.

The answer is that it’s not composability that’s bad, but string typing. The command above searches through a lifetime of code very quickly, because it knows the difference between Python source files (regardless of extension), .pyz files, images, etc.

Now I’ve written an error checker for Python. Let’s try it.

λ get type=python author=me | myNewErrorChecker
found 24 interesting errors ...

Next I want to compare its output with an old error checker, but I forgot its name. So I type

λ get type=python author=me | 

And press tab. Autocompletion starts on every function that can take a Python source file or a list of such files as arguments (ordered by specificity and frequency of use).

The type system is implemented via Camlistore-style metadata, where metadata files are separate from the files they’re describing. Changing a file’s metadata doesn’t change the hash of the file, which yields a cornucopia of nice things.

3. Data is Immutable

This gives “versioning” as a primitive.

In a reasonable world, file system and version control would be structurally identical. “Commit” would be “put a name on the current tree”.

Gary Bernhardt

Each application shouldn’t have to implement its own “undo”. This problem is correctly solved at the infrastructure level.

λ install sketchyLyricsAdder permissions=music

λ get type=music | sketchyLyricsAdder

λ diff type=music current-1 current
12,542 files changed, 155 files removed

λ undo

λ diff type=music current-2 current
none

λ remove sketchyLyricsAdder

There can still be a “sports car” mode that allows mutating data when needed for performance, for instance when editing video. But storage is cheap. The Cabinet Computer never throws away a single piece of human-written text without a specific instruction to do so (it obeys the Second Law of Sane Personal Computing). Let’s search through every line of code I’ve ever written.

λ get includeTrash type=code author=me | grep fooBarBaz
fooBarBaz(a, b) # Some note about the tricky use of fooBarBaz
x = fooBarBaz(y, z) # TODO: is this a security risk?

4. Server First

The server is the primary computing device. This provides the crucial primitive “share” which a local computer will never be able to replicate. This unlocks web hosting, friends-only web pages (Facebook), data hosting (photos, calendars, etc) collaborative work (documents, code, etc.), syncing (Dropbox), and so on.

There’s no magic preventing a normal person from having a server, but it must be kept simple. Unix utterly fails here. As the Urbit people say, “[Unix is] a 747, not a car.”

Local storage is still nice, but it’s just a cache of server data.

This also lets us do some cool things if our friends have Cabinet Computers.

λ befriend ma.shaftoe.com
λ befriend robin.shaftoe.com
λ befriend randy-waterhouse.net
λ get type=python author=friends | myNewErrorChecker
found 51 interesting errors ...
λ get unread type=review tag=book author=friends
1. Dune (by Marcus)
2. Marooned in Realtime (by Randy)
3. Captain Blood (by Robin)
...

5. Decent Shell Language

The default shell language is designed for beginners. It has as few gotchyas as possible.

This allows amateurs to automate simple tasks and prototype more complicated systems. They can use the Cabinet Computer as an “Excel of Computers”.

Metatdata: type=program name=personalsupplies live=true

Contents:

  monthly:
    # Price files on Cabinet Computers are immutable.
    # Shops using Cabinet Computers can post new prices for
    # their products, but if they try a ninja edit buy will
    # fail and email us an error.

    buy snacks.net/8573a99821 # Beef jerky
    buy www.officesupplies.com/9884e01ba8 # Paper
    buy www.officesupplies.com/c91f6f4e11 # Pens

  weekly monday 08:00:
    # Cuban Coffee delivery
    buy horace-bury-cafe.net/f842792a70 tip=$1.50

  weekly friday 08:00:
    # 68b37b18ed is their Friday special
    special = get horace-bury-cafe.net/68b37b18ed | latest

    # Sometimes their special isn't coffee. This is bad.
    if (special | contains "coffee") and special.price < $5:
      buy special tip=$1.50
    else:
      email me "No coffee today" body=special

6. Reasonably Secure

Using non-memory safe languages in security critical code is engineering malpractice. The only exceptions are when there’s no other choice, such as extremely low-level programming, or when you have the resources to read everything 20 times (e.g. you’re NASA).

We are not NASA. RHEL had 30 million lines of code, 71% of them C, in 2001! How many hundred vulnerabilities lurk there, unseen by human eyes?

There should be as little non-memory safe code outside of sandboxes as possible. Optional additional safety can be gained situationally with static typing and pure functional code, but maximizing memory safety is more important.

7. Browsable Executables

The web is winning for a reason. The Cabinet Computer allows you to click a link to a program and have that program run instantly in a sandbox.

Sandboxes are a great place to run non-memory safe code for research, gaming, and so on.

Permissions are programmatic. Once you find a list of permissions you trust, you’ll never have to grant “use my webcam” to another video chat program again.

Sandboxes make a much better security primitive than sudo, file permissions, etc. which are useless for protecting a single user’s data.

Conclusion

The web is a peasantizing tech stack. It has to be killed. We can make a better alternative, but first we have to abandon our failed dreams.

Unix delenda est.

Inspiration

Footnotes

[1] It seems obvious to me that we shouldn’t focus our effort on very casual users. It’s great that we can provide a nice environment for Bob-the-once-a-week-email-checker. That we can’t do the same for small businesses is not great. That we can’t do the same for hospitals is a disaster.

[2] If I seem bitter about this it’s because I am. When I was a new programmer I believed the articles on the internet about how elegant piping text around is. It was not very nice of them to mislead me, but I didn’t know better.

(For other beginners out there – when you send structured data through a regex it becomes programmatically useless. You can still read it or treat it as a text blob, but you can never be sure it’s still formatted correctly. For that you need a parser.)

Fork me on GitHub