User permissions for Glee
The previous user permission system for Glee that I had in mind was dead simple: you are either unauthenticated, a user, or an admin, and each repository was configured to have certain permissions (read/write) for users and admins. Unfortunately, upon further reflection I have concluded that this system is utterly useless for anything besides systems with a very small number of people (usually 1), and at that point you may as well run cgit. So I am revamping the design with the explicit goal of refining access control even for large organizations, though the primary target audience of small groups will not change.
There is a question of whether we ought to store the permission information with users (each user has a list of paths they have permissions to), or with paths (each repository lists out the users that are allowed to read/write/admin it). The answer is dead obvious in retrospect: store permissions with paths because files can/will be frequently moved around. Looking through every user and editing their permissions to point to the new path names is massively inconvenient. So when I say “this user has permission to write to this repository”, the permission-giver is the repository.
So what does each path store? Dead simple:
- a list of people who can read the path;
- a list of people who can write to the path;
- a list of people who can administrate the path.
With that out of the way, let’s flesh out the permissions system by considering an example. We will work with the following Glee filesystem:
gym/
squat.git
bench.git
deadlift.git
running.git
(For clarity, the .git
suffix is how Git stores bare
repositories by default and is what we will use to differentiate git
repositories from directories. Using the filesystem as an analogy, the
repositories are “files” and the directories are… directories.)
Suppose I have administrative privileges over the entire system,
i.e. I have admin
permissions in /
. I invite
my friend Carl as an admin
of gym/
. Carl can
now write to every repository in gym/
, because being an
admin
necessarily means you can do that. Furthermore, Carl
can invite Alice to gym/
and give her whatever permissions
in gym/
he wants, including making her an administrator of
gym/
as well. However, Carl cannot step outside the bounds
of gym/
and give her any permissions over
running.git
. Furthermore, unless I explicitly
state otherwise, Carl has no permissions in
running.git
.
This way we can have maintainers of entire subsystems without giving them any permissions out of their scope.
No negative permissions
I cannot specify that Carl can administrate all of gym/
except gym/bench.git
. This would make life far too complex,
and if you want to give Carl access to all of gym/
except
gym/bench.git
, that’s a sign that semantically,
gym/bench.git
should be moved to a different path.
Unauthenticated read access
We still want people who are not logged in to the Glee instance,
i.e. the general public, to be able to publicly see some repositories.
Here, we do want negative permissions to exist; we may want the general
public to be able to see everything except running.git
,
which is a highly private repository.
For this, we will use the ideas in the initial design post. To rehash, here is how it will work:
- For each path, we will store a value
unauthenticated_read: Option<boolean>
. For those not familiar with Rust or functional languages, it means eitherunauthenticated_read
will be set totrue
,false
, or not set at all. - To determine whether a repository can be read, we will look at the
value of
unauthenticated_read
. If it is unset, we will recursively traverse up one directory and look atunauthenticated_read
, stopping when we see thatunauthenticated_read
is set or when we reach/
, whichever comes first. - For example, take a path like
gym/squat.git
. Sayunauthenticated_read
is not set, then we will look atgym/
. Ifgym/
hasunauthenticated_read
set tofalse
, then we makegym/squat.git
unreadable. If it was set totrue
, thengym/squat.git
would be readable. - If
gym/squat.git
,gym/
, and/
all do not haveunauthenticated_read
set, i.e. we do not find anyunauthenticated_read
value, we default to making the path unreadable for the sake of security.
So whether a repository is readable follows a simple
algorithm: if the first unauthenticated_read
value we find
is true
when traversing up the directory structure, then
the repository is readable. Otherwise, the repository is not
readable.
Now what happens if we stumble upon a path where
unauthenticated_read
resolves to false (as according to the
recursive algorithm mentioned above)? The answer is not
“make the path unreadable”: what if it has readable children?
Instead,
- if the path has any readable children, we will list all of said readable children;
- if the path has no readable children, then the path will not be readable; in other words, it will return a 404.
Consider the following example:
a/ UNREADABLE
b/ READABLE
c.git UNREADABLE
When we read a/
, since b/
is readable, we
will display
a/
b/
This is despite the fact that b/
has no
readable repositories as children!
The frontend
User registration
Before this, when permissions were much simpler (so simple that they
were essentially useless), the only way to make a user would be for an
administrator to invite them directly. Now we will allow any user to
make an account. However, instead of allowing a user to create an
account and then verify their email (a stupid decision given that a
malicious actor could steal your email address without much effort), the
way someone who owns abc@def.xyz
will register is
- they will visit
https://git.dennisc.net/register
and send themselves a registration email, which contains the registration linkhttps://git.dennisc.net/register?code=abcd
; - they visit
https://git.dennisc.net/register?code=abcd
, and since they have a valid registration code, they will register on that page. Note the registration codeabcd
is tied to the emailabc@def.xyz
.
Changing permissions for a path
Suppose you want to change the permissions for gym/
and
you are an admin
for gym/
. Then
https://git.dennisc.net/gym
will display a “Manage
permissions” link, which will take you to
https://git.dennisc.net/gym?permissions
. Then there will be
a TOML file structured like
read = ["somereader@gmail.com"]
write = ["somewriter@outlook.com"]
admin = ["abc@def.xyz"]
i.e. there is a list of readers, writers, and admins for each path.
You will be able to edit this path in a <textarea>
and submit the new values in an HTML form.
It goes without saying, but you will not be able to do any of this if
you are not an admin
for gym/
.
How are permissions stored?
They are stored in the filesystem. You are literally (indirectly) reading/writing to the filesystem, though of course the latter step only happens after validating your input. Also relevant: the default TOML file will be
read = []
write = []
admin = []
This TOML file will be populated immediately upon creation of the path.
Handling query strings (implementation details)
In axum, the way to handle a query string with no value is to make the datatype
#[derive(Deserialize)]
pub struct Permissions {
: Option<()>,
permissions}
and then function for the get
API route would be
pub async fn get(
{ permissions }): Query<Permissions>,
Query(Permissions -> Response {
) ...
}
where we can parse permissions
in order to determine
whether the query string ?permissions
is present.
I haven’t tested this but it should be right in principle, at least.
Caching permissions
This is now getting down to implementation details, so the design here is much more uncertain. (After all, I can’t know if an implementation detail is a good idea until I get down to implementing it, unlike high-level designs.)
Perhaps it would be a good idea to cache permissions so that, instead
of resolving permissions for a/b/c.git
by recursively
looking through the permissions of a/b/
, a/
,
and /
, we can instead resolve “who has permissions to
a/b/c.git
?” every time we edit the permissions of
a/b/c.git
or any of its parents. And any time we edit the
permissions of a path, we update the cache for it and all of its
children.
The argument against doing this is that it increases the complexity of the code, and you’d have to be kind of a monster to have deep enough paths for this kind of thing to matter.
A small note about redirects for renamed repositories
This has nothing to do with the rest of this note, but it is something I was thinking about when re-reading the original design post and thinking about my old solution for redirects. I realize now there is a better solution and will write it down before I forget.
There is a dead simple solution: symlinks. If you rename
old.git
to new.git
then
mv old.git new.git
ln -s new.git old.git
And whenever we try to create either the directory new/
or the repository new.git
, we
- check whether
new/
ornew.git
already exists:new.git
does; - then check whether
new.git
is a symlink: it is; - since it is a symlink, we may remove it without worries and create
the directory
new/
or repositorynew.git
as desired.
Addendum (2024/07/24)
The problem is not as simple as I made it out to be. Suppose you make
repository a.git
, and then rename it to b.git
and then to c.git
. You initially would have
a.git -> b.git -> c.git
But what happens if you replace b.git? Now you have
a.git -> b.git (NEW)
c.git
Now a.git
points to b.git
. If the whole
point of redirects is to introduce stability, then this defeats the
whole purpose.
In principle, it is possible to hunt down every symlink for
every rename. My original solution of a history array would
have the same problem. Plus, if you were to rename a directory, that
would be a lot of work to do. And what if your redirect
/foo/bar.git
was stored in the directory foo
,
and then you moved /foo
to /baz
? How would you
even keep track of that?
So for the sake of simplicity and predictability, I am inclined to drop redirects entirely. If I think of a really good solution, I will use it, but based on the flexibility of moving files in a filesystem, the difficulty of keeping redirects consistent while still keeping renames performant seems quite hard. GitHub can do redirects easily because they have fixed depth; we do not.