User permissions for Glee

The previous user permission system for Glee that I had in mind was dead simple: you are either unauthenticated, a user, or an admin, and each repository was configured to have certain permissions (read/write) for users and admins. Unfortunately, upon further reflection I have concluded that this system is utterly useless for anything besides systems with a very small number of people (usually 1), and at that point you may as well run cgit. So I am revamping the design with the explicit goal of refining access control even for large organizations, though the primary target audience of small groups will not change.

There is a question of whether we ought to store the permission information with users (each user has a list of paths they have permissions to), or with paths (each repository lists out the users that are allowed to read/write/admin it). The answer is dead obvious in retrospect: store permissions with paths because files can/will be frequently moved around. Looking through every user and editing their permissions to point to the new path names is massively inconvenient. So when I say “this user has permission to write to this repository”, the permission-giver is the repository.

So what does each path store? Dead simple:

a list of people who can read the path;
a list of people who can write to the path;
a list of people who can administrate the path.

With that out of the way, let’s flesh out the permissions system by considering an example. We will work with the following Glee filesystem:

gym/
    squat.git
    bench.git
    deadlift.git
running.git

(For clarity, the .git suffix is how Git stores bare repositories by default and is what we will use to differentiate git repositories from directories. Using the filesystem as an analogy, the repositories are “files” and the directories are… directories.)

Suppose I have administrative privileges over the entire system, i.e. I have admin permissions in /. I invite my friend Carl as an admin of gym/. Carl can now write to every repository in gym/, because being an admin necessarily means you can do that. Furthermore, Carl can invite Alice to gym/ and give her whatever permissions in gym/ he wants, including making her an administrator of gym/ as well. However, Carl cannot step outside the bounds of gym/ and give her any permissions over running.git. Furthermore, unless I explicitly state otherwise, Carl has no permissions in running.git.

This way we can have maintainers of entire subsystems without giving them any permissions out of their scope.

No negative permissions

I cannot specify that Carl can administrate all of gym/ except gym/bench.git. This would make life far too complex, and if you want to give Carl access to all of gym/ except gym/bench.git, that’s a sign that semantically, gym/bench.git should be moved to a different path.

Unauthenticated read access

We still want people who are not logged in to the Glee instance, i.e. the general public, to be able to publicly see some repositories. Here, we do want negative permissions to exist; we may want the general public to be able to see everything except running.git, which is a highly private repository.

For this, we will use the ideas in the initial design post. To rehash, here is how it will work:

For each path, we will store a value unauthenticated_read: Option<boolean>. For those not familiar with Rust or functional languages, it means either unauthenticated_read will be set to true, false, or not set at all.
To determine whether a repository can be read, we will look at the value of unauthenticated_read. If it is unset, we will recursively traverse up one directory and look at unauthenticated_read, stopping when we see that unauthenticated_read is set or when we reach /, whichever comes first.
For example, take a path like gym/squat.git. Say unauthenticated_read is not set, then we will look at gym/. If gym/ has unauthenticated_read set to false, then we make gym/squat.git unreadable. If it was set to true, then gym/squat.git would be readable.
If gym/squat.git, gym/, and / all do not have unauthenticated_read set, i.e. we do not find any unauthenticated_read value, we default to making the path unreadable for the sake of security.

So whether a repository is readable follows a simple algorithm: if the first unauthenticated_read value we find is true when traversing up the directory structure, then the repository is readable. Otherwise, the repository is not readable.

Now what happens if we stumble upon a path where unauthenticated_read resolves to false (as according to the recursive algorithm mentioned above)? The answer is not “make the path unreadable”: what if it has readable children? Instead,

if the path has any readable children, we will list all of said readable children;
if the path has no readable children, then the path will not be readable; in other words, it will return a 404.

Consider the following example:

a/              UNREADABLE
    b/          READABLE
        c.git   UNREADABLE

When we read a/, since b/ is readable, we will display

a/
    b/

This is despite the fact that b/ has no readable repositories as children!

The frontend

User registration

Before this, when permissions were much simpler (so simple that they were essentially useless), the only way to make a user would be for an administrator to invite them directly. Now we will allow any user to make an account. However, instead of allowing a user to create an account and then verify their email (a stupid decision given that a malicious actor could steal your email address without much effort), the way someone who owns abc@def.xyz will register is

they will visit https://git.dennisc.net/register and send themselves a registration email, which contains the registration link https://git.dennisc.net/register?code=abcd;
they visit https://git.dennisc.net/register?code=abcd, and since they have a valid registration code, they will register on that page. Note the registration code abcd is tied to the email abc@def.xyz.

Changing permissions for a path

Suppose you want to change the permissions for gym/ and you are an admin for gym/. Then https://git.dennisc.net/gym will display a “Manage permissions” link, which will take you to https://git.dennisc.net/gym?permissions. Then there will be a TOML file structured like

read = ["somereader@gmail.com"]
write = ["somewriter@outlook.com"]
admin = ["abc@def.xyz"]

i.e. there is a list of readers, writers, and admins for each path. You will be able to edit this path in a <textarea> and submit the new values in an HTML form.

It goes without saying, but you will not be able to do any of this if you are not an admin for gym/.

How are permissions stored?

They are stored in the filesystem. You are literally (indirectly) reading/writing to the filesystem, though of course the latter step only happens after validating your input. Also relevant: the default TOML file will be

read = []
write = []
admin = []

This TOML file will be populated immediately upon creation of the path.

Handling query strings (implementation details)

In axum, the way to handle a query string with no value is to make the datatype

#[derive(Deserialize)]
pub struct Permissions {
    permissions: Option<()>,
}

and then function for the get API route would be

pub async fn get(
    Query(Permissions { permissions }): Query<Permissions>,
) -> Response {
    ...
}

where we can parse permissions in order to determine whether the query string ?permissions is present.

I haven’t tested this but it should be right in principle, at least.

Caching permissions

This is now getting down to implementation details, so the design here is much more uncertain. (After all, I can’t know if an implementation detail is a good idea until I get down to implementing it, unlike high-level designs.)

Perhaps it would be a good idea to cache permissions so that, instead of resolving permissions for a/b/c.git by recursively looking through the permissions of a/b/, a/, and /, we can instead resolve “who has permissions to a/b/c.git?” every time we edit the permissions of a/b/c.git or any of its parents. And any time we edit the permissions of a path, we update the cache for it and all of its children.

The argument against doing this is that it increases the complexity of the code, and you’d have to be kind of a monster to have deep enough paths for this kind of thing to matter.

A small note about redirects for renamed repositories

This has nothing to do with the rest of this note, but it is something I was thinking about when re-reading the original design post and thinking about my old solution for redirects. I realize now there is a better solution and will write it down before I forget.

There is a dead simple solution: symlinks. If you rename old.git to new.git then

mv old.git new.git
ln -s new.git old.git

And whenever we try to create either the directory new/ or the repository new.git, we

check whether new/ or new.git already exists: new.git does;
then check whether new.git is a symlink: it is;
since it is a symlink, we may remove it without worries and create the directory new/ or repository new.git as desired.

Addendum (2024/07/24)

The problem is not as simple as I made it out to be. Suppose you make repository a.git, and then rename it to b.git and then to c.git. You initially would have

a.git -> b.git -> c.git

But what happens if you replace b.git? Now you have

a.git -> b.git (NEW)
c.git

Now a.git points to b.git. If the whole point of redirects is to introduce stability, then this defeats the whole purpose.

In principle, it is possible to hunt down every symlink for every rename. My original solution of a history array would have the same problem. Plus, if you were to rename a directory, that would be a lot of work to do. And what if your redirect /foo/bar.git was stored in the directory foo, and then you moved /foo to /baz? How would you even keep track of that?

So for the sake of simplicity and predictability, I am inclined to drop redirects entirely. If I think of a really good solution, I will use it, but based on the flexibility of moving files in a filesystem, the difficulty of keeping redirects consistent while still keeping renames performant seems quite hard. GitHub can do redirects easily because they have fixed depth; we do not.