A High-Level Overview of TeX
If you’re a beginner and want to just get LaTeX to work, read LaTeX for Beginners instead. I assume you know the basics of LaTeX and have a working installation.
This isn’t mainly about how the TeX programming language itself works. Rather, it’s moreso about the environment that (La)TeX works in, and why some of its more puzzling behaviors occur.
History
TeX and LaTeX are very old, so some of the decisions they make will be quite archaic.1 I strongly recommend that if you want to do anything nontrivial (like randomizing hints), just use another language to compile into an intermediate TeX build file before compiling it into a PDF.2
Just keep this in mind as I go through some of these concepts. A lot of these design choices may not make sense at first, but they will become more clear through the context of “computers used to be really weak.”
Compiler
You write plain text in a file that follows the LaTeX specifications. Specifications are basically rules you have to follow in order for your code (yes, LaTeX is code) to be considered “correct.”
The compiler is a tool that interprets your plain text according to the rules of LaTeX and generates a PDF for you. The thing about using a tool is you have to have it in the first place. There’s no point following the instructions for, say, how to use a screwdriver if you don’t have one, so get a goddamn screwdriver!!
The compiler does not care what you used to edit the plain text. It just cares what’s in the text file.
By the way, the LaTeX language is not unique to online editors like Overleaf.3 That would be like thinking that just because you bought an all-in-one desktop where the monitor is tied with the desktop, the monitor and the desktop are the same thing, or you have to buy them together. The worst conclusion to come to would be that all-in-ones are the only type of desktop.4
There are a ton of text editors, but not that many compilers. The most popular are pdfLaTeX, XeTeX, LuaTeX, and ConTeXt. (It’s mostly pdfLaTeX.) Compilers are usually shipped with a distribution, more on that below.
TeX Distributions
A TeX distribution is a mechanism for distributing a compiler and a set of tools to help it. Typically this “set of tools” is a standardized set of classes, packages, as well as compilers (like pdflatex, XeTeX, LuaTeX). If you are using TeX Live, this “standardized set” is CTAN.5 MikTeX also uses CTAN, but supposedly updates are slower since there’s a single maintainer. This doesn’t matter unless you want a bleeding edge TeX install.
It’s important to make the following observation: when you are
importing a document class or package, you’re importing a .cls
or .sty file. No matter what you put in there. For
instance, calling \documentclass{article}
inserts
article.cls
into your document. It’s just that you haven’t
ever looked at article.cls
before. If you want to find it,
run kpsewhich article.cls
.6
Every software distribution comes with a package manager, and TeX
Live/MiKTeX are no exception. Every TeX distribution worth using has
tlmgr
as their package manager. For those of you coming
from a Linux background, this terminology will be familiar. Otherwise,
here is a quick explanation of what a package manager is: it is a
standardized interface which helps you install a set of software from a
repository. So instead of installing software through an installer, each
of which has its own idiosyncrasies, you run
tlmgr install PACKAGE
and the command automatically takes
care of extraction and installation for you.
Build Process
If you want to do something non-trivial with TeX, I weakly recommend
you refrain from using the latexmk
tool until you
understand why it would be helpful.7 Otherwise this will make
no sense because “whee latexmk compiles exactly the number of
times I need it to.”8
Anyone who’s run pdflatex
instead of
latexmk
will have noticed a couple of things.
- If the document has any cross-references (toc, bibliography, etc) it has to be compiled twice.
- If the document has asymptote, you must run
pdflatex; asy *.asy; pdflatex
. (Well, okay, this is a bit simplified since it’sasy -f pdf *.asy
on any Unix-based operating system, butasy *.asy
should work on Windows.)
This is because LaTeX code executes sequentially, period. You can’t just go back and scan every line whenever you have a reference in the code. This isn’t a batch script, people! (And what happens when the user makes an error and the reference is undefined?)
Why is this an issue? Well, let’s consider the following example (fill in the rest of the document in your head):
\tableofcontents
\section{Insert Title Here}
Clearly \tableofcontents
must run before
\section
. But we have a problem: How is
\tableofcontents
supposed to know what comes after it? It’s
not like it can look ahead. Well, you could make it, but it would end up
being slow, annoying, and buggy.
So to work around this, whenever pdflatex
is run, it
creates some auxiliary files like .aux
and
.toc
files. Inside these auxiliary files is information
about the last run of pdflatex
, so when you run it next
time, it reads the auxiliary files and can generate references
correctly.
Keep in mind that synctex
files are not part of
this, they just exist so you can navigate between PDF and code.
TeX Paths
There are seven directories which a TeX install writes to, we’ll only explain three of them:
- TEXDIR: Where the install is located
- TEXMFLOCAL: Local files (packages, fonts, etc) that are available to all users.
- TEXMFSYSVAR
- TEXMFSYSCONFIG
- TEXMFVAR
- TEXMFCONFIG
- TEXMFHOME: Local files available only to the current user.
Typically a TeX installation writes to your root directory, I think
this is kind of stupid if you’re the only user running TeX on your
machine (or the only user on your machine, actually). If you put the TeX
tree inside your root directory, then you’ll need root permissions to
run tlmgr
. But the root user doesn’t know what
tlmgr
is because it’s in your user PATH variable, and so
you then need to point it in the right place — bah! Fortunately, it’s
very easy to move a TeX install from root to HOME, you just need to
change your PATH variable after.
No matter where you install TeX Live, any personal packages or
classes you use should go into TEXMFHOME, period. A quick refresher:
TEXMFHOME is ~/texmf by default, you can change this in
texmf.cnf
(run kpsewhich texmf.cnf
to find it)
or you can run export TEXMFHOME=
in your bashrc or whatever
shell you use.
Concluding Thoughts
Hopefully this explains why TeX behaves on your system the way it does. If you want to learn how TeX runs under the hood, Overleaf’s “How TeX macros actually work” is really good. A lot of LaTeX’s implementation details depend on this, because TeX really does two things uniquely:
- It can typeset stuff nicely. (You probably already knew this.)
- It uses category codes (“catcodes”) to interpret TeX documents.
Said article goes in depth about the second part. So far, I have yet to find an accessible explanation on how TeX actually decides how to typeset stuff, but I also haven’t looked very hard.
On the bright side, the PDF size of a compiled document is super small. Pretty much every PDF is reasonably under 512KB.↩︎
On second thought,
scrambledenvs
might have been a smart thing to write in LaTeX. One, more people can use it — it’s included in default in TeX Live 2021 (which is why you should install vanilla ;) and two, you could either choose to write the environment’s contents to a file anyway and them randomise them outside of LaTeX and make sure that refs are generated correctly or just do everything in LaTeX. Using some other language in tandem might actually be harder, I don’t know for sure though since I never tried.↩︎I say this because my school math team wanted to continue using Overleaf because “the syntax is easy to understand.” When you don’t understand what pieces are independent from each other, you don’t just make bad choices, you straight up don’t understand what the choices are.↩︎
Yes, all-in-one desktops might be more convenient in that they take up less room. But they are less convenient because in exchange for an upfront investment of time/money, you suffer from having a computer with shitty specs. The same goes for using Overleaf instead of installing a TeX distribution (more on what a TeX distribution is later).↩︎
Fun fact: I actually have packages on CTAN.↩︎
You’ll notice the source code for
article.cls
isn’t very readable. That’s becausearticle.cls
is a compiled file, and the source is a.dtx
file. Typically maintainers use.dtx
files because they can generate multiple files from the same source and the documentation is coupled with the source code. This is called literate programming. It is CTAN standard to use dtx files, but you do not have to.↩︎If you are using Windows this will sort of happen by force. The OS is so shitty that you have no clue what is happening half the time and using the terminal to compile TeX is too scary. You’ll probably be doing one of two things:
— Using the TeXWorks editor and pressing green buttons to run
pdflatex, asy *.asy, pdflatex
every time. — Using VSCode and pressing a green button.You will end up running
pdflatex
twice enough times to pick up on what LaTeX is really doing.↩︎You might reasonably ask, “Well shouldn’t running the compile command once always be enough to compile it?” This is one of the situations where you have to consider the context in which TeX was made. However, some recent TeX-based macro packages like ConTeXt (I believe) have made it so that you do not need to compile multiple times.↩︎