A plan of attack for Glee

As of a few hours ago, my understanding of trees in Postgres has shot up a lot, thanks to a blog post by Leonard Marc. The approach we will be using is #2, i.e. storing the parent directory as a foreign key. (More details soon.) Suffice to say that I feel very stupid and enlightened at the same time, because now I believe I have a perfectly sound model of Glee. In fact, it is sound enough that I am confident that I can begin programming — and I am so averse to programming that my mailserver has not been able to send emails for the last eight months.

Storytime (skippable)

Previously I had a hare-brained idea to store the metadata of a repository (who has permissions to act in this repository?) as an actual file in the filepath. It suffices to say this is a terrible idea, but it would have worked just fine with my previous design.

Then I started thinking about the goals of Glee. At the start, Glee was meant to be a host that didn’t need to scale, but now my goal is for Glee to scale towards large single-organization hosts. (Because otherwise it is utterly useless and you can just use cgit.) The underlying complexity, however, will still remain very low, somewhere between cgit and sourcehut. It is the goal that

all of the design decisions can be explained in a single medium-sized document,
understanding the design is sufficient to use the software proficiently.

But this post is not meant to be that document. I will hold off on that until I have Glee in a somewhat usable state.

Now, if we want large single organizations to use this for enterprise-ish stuff, we really don’t want random people creating accounts. (I also cannot be bothered to implement a “send registration email” feature.) The logical flow is that if you are an admin of a directory or repository (which we will henceforth call a path), then you may give an email address (say dchen@dennisc.net) permissions to (say) write to the path. If dchen@dennisc.net is already associated with an account, then that account will now have permissions to write to said path.

If not, an email will be sent containing an invite link, with which dchen@dennisc.net can use to register. In the database, we will note that this invite link should also give the new account permissions to the path (say) a/b/c.git. So we will store the permission write: a/b/c.git in the associated database entry for that invite link, and when the invite link is used, that permission will be added for the new user.

Now what if a/b/c.git moves to a/e.git after the invite link is created but before it is used?

Oh.

The correct way to deal with this issue is to instead point to a unique id representing the path, one that doesn’t change. Which calls for storing information about paths in the database.

Directories and Repositories

To motivate this section, I will say what I have said before in many of my previous Glee posts. Glee is about storing your Git repositories as a filesystem.

We are going to use a standard tree structure in Postgres. We will have a table Directories and Repositories. The fields of each are going to be

id: a uuid
public: whether the directory or repository is publicly viewable, an optional boolean (no value means “inherit”)
name: the name of the directory or repository, for example c.git
parent: a nullable foreign key pointing to the uuid of a directory
permissions: a list of read/write/admin with user ids, e.g. [read, b3994226-6761-456d-879c-7b18facbbd81]

Of course, the parent of the root directory / will have no parent. It will be the sole path with no parent. It also will be the sole path we cannot move.

For now I’m thinking the struct representing this unified model in Rust should be DirRepo in backend.rs. Or maybe just Path, but that is not the most ideal name because it conflicts with a filesystem path, and Path implies a complete path rather than just one step (i.e. current file plus parent).

We will index the column id in the table Directories. That way we can emulate ls for the directory b3994226-6761-456d-879c-7b18facbbd81 by simply searching for anything with a parent id of b3994226-6761-456d-879c-7b18facbbd81 and have this query be efficient.

We will have to validate that name does not contain any / characters upon any client POST request for obvious reasons.

Handling redirects

Now handling redirects is trivial, which means we will do it. We will have a table of Redirects which store

parent
name
link: which ID the redirect goes to.

Let me give you a concrete example to explain how resolving redirects will work. Suppose we rename a/b/c.git ->a/d.git. As expected, we look at the entry forc.git`,

change its name to d.git,
change its parent to a (more accurately the id of a).

Furthemore, we create a Redirect with

parent: whatever the id of b is (this is the parent of c.git before rename)
name: c.git
link: whatever the id of c.git was, i.e. the id of the new d.git.

Note that the path a/b still exists, we just moved c.git. Here is what happens when we try to navigate to a/b/c.git:

There is a directory named a with parent /.
There is a directory named b with parent a.
There is no directory or file named c.git with parent b. But there is a Redirect with the name c.git with parent b that points to d.git. So now we look at d.git and get a repository.

To be clear, when we say “with parent /”, we really mean that its parent is the id of the root directory, etc.

(Basically this is the idea I had with symlinks, but it solves the problem of changing identifiers because we use a static id.)

To demonstrate the robustness of this idea, suppose we now move a/b to a/e. We still want a/b/c.git to go to the correct repository. What happens?

There is a directory named a with parent /.
There is no directory named b with parent a. However, there is a redirect with name b and parent a that goes to e (which has path a/e).
There is no directory or file named c.git with parent e (remember that when we do parent checks, it is with the id; saying “with parent e” is merely a shorthand). However, there is a redirect with name c.git with parent e that points to d.git. So now we get the same d.git, precisely as desired!

In fact, a/b/c.git will always redirect to that same repository until a new directory or repository is made at that same path.

And these redirects persist until they are “overwritten” by a new path at the same location. When the overwrite occurs, we will delete the redirect. The leading principle here is very simple:

The combination of name and parent id must be unique among all directories, repositories, and redirects.

That means there are no unused redirects lying around, meaning that we never have to prune redirects. So our analogy of a filesystem with repositories and directories can be extended with redirects. Now we just have two types of files: redirects and repositories, and directories.

What do we do when we try to create a new directory/repository and there is a conflict?

If the conflict is with a redirect, simply delete the redirect.
If the conflict is with another directory/repository, forbid the operation.

How are we concretely storing repositories?

Now it would be stupid to actually perform a filesystem move every time we do a “virtual” move in the database. The correct answer is very obvious: we store the repository with uuid b3994226-6761-456d-879c-7b18facbbd81 in the path b3994226-6761-456d-879c-7b18facbbd81.git.¹ That way when we resolve a path to a repository, we merely need to look at uuid.git in the filesystem. Furthermore, because we never move repositories in the actual filesystem and never change uuids, any bug with filepath resolution is fixable. This means we will never corrupt our data with moves, because we are never changing the underlying data; bugs will only appear due to incorrect filepath resolution.

User permissions, again

Specifically we will talk about

how users ought to be invited
how permissions will be managed on the frontend

because those are the only things which I have changed the design of.

A link to a special page to “manage permisisons” for a directory/repository will appear if you have admin access. We will not be modifying a raw TOML file because that is a bad idea. Here is our new approach:

When resolving whether a user is an admin, we should also determine whether they have directly been defined to be admin (i.e. in the current directory or repository) or whether they have inheritd admin from a parent directory. We will say an admin is an Inherit Admin if they have inherited and a Direct Admin if they have been directly defined as an admin.

An inherit admin will be stronger than a direct admin. So if we have determined a user is a direct admin, we also must check whether they inherit admin as well, since making someone a direct admin on top of being an inherit admin should not reduce their permissions.

A direct admin cannot delete admins in the current directory/repository. An inherit admin can delete direct admins in the current directory/repository.

Regardless of what permisisons you have (read/write/admin), the main page of the repo will tell you what permissions you have upfront. (Many other sites are awful at doing this.)

Admins, whether inherit or direct, can see both who has direct permissions on the “manage permissions” page. You may think that resolving who has inherited permissions might be complex, and you would be right, but we already do this work when determining whether to show the permissions page. Instead of just resolving permissions for one user, we will create a list of all users and the permissions they have. For example, we might say

Alice is an admin of the repository
Bob inherits admin from /gym
Charles inherits write from /gym

However, it might be privileged information that Dean is an admin of /, and it would be bad if a low-level admin saw “Dean inherits admin from /”. Suppose the highest parent directory we inherit admin from is /path. Then we only want to show users who inherit permissions from /path or lower. So when we traverse the tree to resolve permissions, we will be keeping track of

the highest parent directory you inherit admin from,
and the lowest parent directory every other user inherits their strongest permission from,

where “high” means “less deep” and “low” means “deeper”.

If your highest inheriting directory is higher than a user’s lowest inheriting directory, then you will see that the user inherits their permissions from said lowest inheriting directory.

Also, there will be a special “manage permissions” page on / which allows admins of / to delete any user who is not an admin of /.

Displaying the git log as a graph

Here’s a tip that will change your life: try using git log --graph. GitHub, GitLab, and SourceHut’s log views are all linear, meaning they do not show the commit graph. BitBucket of all places does. We show the commit graph as well because that is the right thing to do, although we will shamelessly fail on unreasonably large octopus merges.

This will require a good understanding of libgit2’s rev-walk function and significant thought into the frontend design of the Git log. Of all the things I want to implement in Glee, this seems like it will be the hardest.

Plan of action

Having finally fleshed out the design, here is the plan of action. In this order, here is how I plan to implement Glee:

Revamp the user model.
- Remove admin, because we are now handling permissions on the filesystem in a more sophisticated manner.
- Maybe start using Redis (well really ValKey now). Because while scalability was not a goal before, the whole point now of several new ideations is that scalability is actually important. We want big organizations to be able to use this, at least in theory, so Postgres authentication might become a bottleneck. (But I will have to do research onto whether this is actually worth doing, though my gut says using Redis is the right thing to do.)
Create Directory/Repository tables.
- Use foreign key pointer approach.
- Make an index on the parent id and set up scaffolding for initializing indices.
Revamp Invite Token model
- Invites should be associated with “what do you want to invite user to”, so path + permission (read/write/admin).
- Said invites will also modify the appropriate directory/repository entry.
Implement permissions page for each directory/repository.
Implement directory main page view.
Implement repository main page view
Implement log/trunk view, etc for repositories.
- Need to figure out how to emulate git log --graph, but with a web UI.
Figure out SSH interceptors to implement write access.

When all this is done, we will have a reasonably complete product. I feel that I finally have the requisite understanding of Postgres to implement the database-side stuff, though I will have to spend some more time understanding Git better. But at the very least, I can implement everything up to the repository main page view without understanding Git one bit more. So the goal will be to get to that point soon.

Really, we store this repository in the data directory of Glee.↩︎