A High-Level Overview of TeX

If you’re a beginner and want to just get LaTeX to work, read LaTeX for Beginners instead. I assume you know the basics of LaTeX and have a working installation.

This isn’t mainly about how the TeX programming language itself works. Rather, it’s moreso about the environment that (La)TeX works in, and why some of its more puzzling behaviors occur.

History

TeX and LaTeX are very old, so some of the decisions they make will be quite archaic.1 I strongly recommend that if you want to do anything nontrivial (like randomizing hints), just use another language to compile into an intermediate TeX build file before compiling it into a PDF.2

Just keep this in mind as I go through some of these concepts. A lot of these design choices may not make sense at first, but they will become more clear through the context of “computers used to be really weak.”

Compiler

The compiler does not care what you used to edit the plain text. It just cares what’s in the text file.

By the way, the LaTeX language is not unique to online editors like Overleaf.3 That would be like thinking that just because you bought an all-in-one desktop where the monitor is tied with the desktop, the monitor and the desktop are the same thing, or you have to buy them together. The worst conclusion to come to would be that all-in-ones are the only type of desktop.4

There are a ton of text editors, but not that many compilers. The most popular are pdfLaTeX, XeTeX, LuaTeX, and ConTeXt. (It’s mostly pdfLaTeX.) Compilers are usually shipped with a distribution, more on that below.

TeX Distributions

A TeX distribution is a mechanism for distributing a compiler and a set of tools to help it. Typically this “set of tools” is a standardized set of classes, packages, as well as compilers (like pdflatex, XeTeX, LuaTeX). If you are using TeX Live, this “standardized set” is CTAN.5 MikTeX also uses CTAN, but supposedly updates are slower since there’s a single maintainer. This doesn’t matter unless you want a bleeding edge TeX install.

It’s important to make the following observation: when you are importing a document class or package, you’re importing a .cls or .sty file. No matter what you put in there. For instance, calling \documentclass{article} inserts article.cls into your document. It’s just that you haven’t ever looked at article.cls before. If you want to find it, run kpsewhich article.cls.6

Every software distribution comes with a package manager, and TeX Live/MiKTeX are no exception. Every TeX distribution worth using has tlmgr as their package manager. For those of you coming from a Linux background, this terminology will be familiar. Otherwise, here is a quick explanation of what a package manager is: it is a standardized interface which helps you install a set of software from a repository. So instead of installing software through an installer, each of which has its own idiosyncrasies, you run tlmgr install PACKAGE and the command automatically takes care of extraction and installation for you.

Build Process

If you want to do something non-trivial with TeX, I weakly recommend you refrain from using the latexmk tool until you understand why it would be helpful.7 Otherwise this will make no sense because “whee latexmk compiles exactly the number of times I need it to.”8

Anyone who’s run pdflatex instead of latexmk will have noticed a couple of things.

This is because LaTeX code executes sequentially, period. You can’t just go back and scan every line whenever you have a reference in the code. This isn’t a batch script, people! (And what happens when the user makes an error and the reference is undefined?)

Why is this an issue? Well, let’s consider the following example (fill in the rest of the document in your head):

\tableofcontents
\section{Insert Title Here}

Clearly \tableofcontents must run before \section. But we have a problem: How is \tableofcontents supposed to know what comes after it? It’s not like it can look ahead. Well, you could make it, but it would end up being slow, annoying, and buggy.

So to work around this, whenever pdflatex is run, it creates some auxiliary files like .aux and .toc files. Inside these auxiliary files is information about the last run of pdflatex, so when you run it next time, it reads the auxiliary files and can generate references correctly.

Keep in mind that synctex files are not part of this, they just exist so you can navigate between PDF and code.

TeX Paths

There are seven directories which a TeX install writes to, we’ll only explain three of them:

Typically a TeX installation writes to your root directory, I think this is kind of stupid if you’re the only user running TeX on your machine (or the only user on your machine, actually). If you put the TeX tree inside your root directory, then you’ll need root permissions to run tlmgr. But the root user doesn’t know what tlmgr is because it’s in your user PATH variable, and so you then need to point it in the right place — bah! Fortunately, it’s very easy to move a TeX install from root to HOME, you just need to change your PATH variable after.

No matter where you install TeX Live, any personal packages or classes you use should go into TEXMFHOME, period. A quick refresher: TEXMFHOME is ~/texmf by default, you can change this in texmf.cnf (run kpsewhich texmf.cnf to find it) or you can run export TEXMFHOME= in your bashrc or whatever shell you use.

Concluding Thoughts

Hopefully this explains why TeX behaves on your system the way it does. If you want to learn how TeX runs under the hood, Overleaf’s “How TeX macros actually work” is really good. A lot of LaTeX’s implementation details depend on this, because TeX really does two things uniquely:

  1. It can typeset stuff nicely. (You probably already knew this.)
  2. It uses category codes (“catcodes”) to interpret TeX documents.

Said article goes in depth about the second part. So far, I have yet to find an accessible explanation on how TeX actually decides how to typeset stuff, but I also haven’t looked very hard.


  1. On the bright side, the PDF size of a compiled document is super small. Pretty much every PDF is reasonably under 512KB.↩︎

  2. On second thought, scrambledenvs might have been a smart thing to write in LaTeX. One, more people can use it — it’s included in default in TeX Live 2021 (which is why you should install vanilla ;) and two, you could either choose to write the environment’s contents to a file anyway and them randomise them outside of LaTeX and make sure that refs are generated correctly or just do everything in LaTeX. Using some other language in tandem might actually be harder, I don’t know for sure though since I never tried.↩︎

  3. I say this because my school math team wanted to continue using Overleaf because “the syntax is easy to understand.” When you don’t understand what pieces are independent from each other, you don’t just make bad choices, you straight up don’t understand what the choices are.↩︎

  4. Yes, all-in-one desktops might be more convenient in that they take up less room. But they are less convenient because in exchange for an upfront investment of time/money, you suffer from having a computer with shitty specs. The same goes for using Overleaf instead of installing a TeX distribution (more on what a TeX distribution is later).↩︎

  5. Fun fact: I actually have packages on CTAN.↩︎

  6. You’ll notice the source code for article.cls isn’t very readable. That’s because article.cls is a compiled file, and the source is a .dtx file. Typically maintainers use .dtx files because they can generate multiple files from the same source and the documentation is coupled with the source code. This is called literate programming. It is CTAN standard to use dtx files, but you do not have to.↩︎

  7. If you are using Windows this will sort of happen by force. The OS is so shitty that you have no clue what is happening half the time and using the terminal to compile TeX is too scary. You’ll probably be doing one of two things:

    — Using the TeXWorks editor and pressing green buttons to run pdflatex, asy *.asy, pdflatex every time. — Using VSCode and pressing a green button.

    You will end up running pdflatex twice enough times to pick up on what LaTeX is really doing.↩︎

  8. You might reasonably ask, “Well shouldn’t running the compile command once always be enough to compile it?” This is one of the situations where you have to consider the context in which TeX was made. However, some recent TeX-based macro packages like ConTeXt (I believe) have made it so that you do not need to compile multiple times.↩︎