Discrete Math the Hard Way

In October, I was staring down a long, cold winter in Minnesota. I knew I’d need a hobby. It had been a while since I completed a certificate course, so I pulled up Coursera to see what they had to offer on the topic of computer science. I found a certificate series on discrete mathematics for computer science from UC San Diego. I’ve been interested in computer science fundamentals so, after reviewing the courses, I decided to dive in.

I’ve always struggled with math. I feel comfortable with geometry, but algebra is a foreign language. The math I used in my software development career felt more like geometry – when paginating lists, for example, or calculating array indexes. It’s not always easy, but I feel like I have an aptitude for it and I wanted to expand on that. I hoped that, after completing this course, I would be able to unlock the mysteries of the science of computation!

The Challenge

Discrete math is the math of integers. It includes topics such as logic, set theory, combinatorics, probability, and graph theory. Classic problems and puzzles include magic squares, chess puzzles, the Monty Hall problem, map coloring, the travelling salesman problem, and encryption. Over five courses, I would have to complete quizzes and assignments that consisted of solving puzzles and implementing algorithms in Python.

Before this course, I hadn’t written a line of Python. Most of my professional experience had been using a form of BASIC. Now that I program for fun, I’m more interested in older languages like Awk and Forth. I really wanted to dig into those comp sci fundamentals and Python, alone, wasn’t going to cut it for me. I decided that I wanted to really roll up my sleeves. After completing course exercises in Python, I would do them again using Forth.

The difference between programming in Python and programming in Forth is akin to the difference between wrenching on a Tesla and wrenching on a pre-OBD2 carbureted engine. Python is safe, user-friendly, and gives you everything you need to be successful without having to sweat the small stuff. Forth, on the other hand, demands your full attention and there are myriad ways for things to go wrong but, when it works, it’s a thing a beauty.

Riker rizzes a Ferengi in ST:TNG S6E7

Translating Exercises to Forth

The N Queens puzzle asks you to determine the placement of some number n queens on an nxn chessboard such that no two queens are attacking each other. A brute-force approach to solving this requires considering all permutations of the positions of the queens and testing those permutations to see if they meet the criterion. Python makes this easy by providing a library, itertools, that includes a permutations function. You just give it the data and it gives back the permutations. Easy, right?

I created my own version of that permutations function in Forth. I didn’t create a generic, do-it-all function. Mine was specific to my N Queens solver. Concrete implementation is a cultural value among Forthers. Some call it YAGNI, or “you aren’t gonna need it.” I like to say, “solve the problems you actually have.” Pre-work is re-work.

For me, creating something with Forth begins by imagining a vast expanse of undefined computer memory. What data structures will I need to represent real-life objects like the chessboard, the chess pieces, and the attack lines? Most of my initial work is done on paper, drawing out diagrams and data models, prototyping algorithms, and figuring out what pointers I’m going to need.

I think about the algorithms that need to be executed by my application. In practice, an algorithm usually consists of a loop that iterates over some piece of data and changes other data along the way. Just as in math, an algorithm takes an input and produces an output. The algorithm requires some way to access and manipulate my data structures, so I define supplementary functions and variables that give the algorithm a handle on those data structures.

I pseudocode as much as I can to check my thinking and to give myself a rough guide to follow. Then I start building the application from the bottom up, creating and testing data structures and helper functions before implementing logic and algorithms. This is usually followed by a period of testing and troubleshooting until it works. Once it’s working, I’ll clean up and refactor the code, test one more time, and call it a day.

The two things that drew me to Forth were the postfix syntax and the lack of memory management.

Because Forth is stack-based and uses postfix syntax, translation really makes you think through an algorithm. It’s not just a matter of changing a function name or some punctuation. For example, in combinatorics we have a function called choose. This takes two operands, n and k. n choose k represents the number of combinations of k items that can be selected from a set of n items. For example, if your local pizza shop offers 30 different toppings, how many different 3-topping pizzas can you make? Let’s assume you can’t repeat toppings, and their sequence doesn’t matter. This is calculated as n! / k!(n-k)!. Plug “30 choose 3” into Wolfram Alpha and you’ll get the answer, 4,060! Now you can quantify your dread when asked to get pizza for the whole office.

In Python, this calculation can be accomplished like this:

import math

answer = math.factorial(30) / ( math.factorial(3) * math.factorial(30-3) )

That looks a lot like the formula, doesn’t it? In Forth, things look a little different. First, we have to translate the formula into postfix notation, like this:

n ! k ! n k - ! * /

In postfix notation, the order of operations is always left to right. This is aligned with how computers do math in practice and eliminates some of the ambiguity of infix notation. The factorial operator, !, is unary, meaning it takes only one operand. Compare with binary operators, such as the familiar +, - , *, and /, that operate on two operands. Implicit in postfix expressions is the existence of a stack that holds intermediate results.

Now we can define our own factorial and choose functions. The Forth code is a little messier than the postfix formula because we have to manipulate the stack where our values are stored.

variable answer

: fact ( n -- n! )
  1 swap 1+ 2 do i * loop ;

: choose ( n k -- u )
  2dup - fact swap fact * swap fact swap / ;

30 3 choose answer !

Let me break that down. First, we create a variable, answer, that will store the result of this operation. The fact function multiplies the value on the top of the stack by each integer between 1 and itself, exclusive. The choose function calculates n-k!, then k!, then multiplies those two terms, then calculates n! and, finally, divides n! by the product of k! and n-k!. This is equivalent to the postfix-formatted formula. The last line puts 30 and 3, our n and k, on the stack and invokes the choose function. This leaves the result, 4,060, on the top of the stack. With “answer !”, we store the result in memory at the location assigned to our variable, answer. In other languages, this is denoted by the familiar assignment operator, “=”.

You can see my N Queens solver at Gitlab. I enjoy using Forth because it allows me to play with the computery parts of computer science. I learned a lot by figuring out the things that Python gives you for free. If I had done this in Python, it would have taken a fraction of the time and code and allowed me to get a result without having to think through the minutiae. That’s actually the point in most cases but, in a learning scenario, I think it short-changes the learner.

I’m not claiming that Forth is magic; I think someone could get a similar benefit using Python by just skipping on libraries. In a professional setting, you will almost certainly be using libraries that have been created by someone else. Also, I would not suggest that someone take this approach to learning programming in general. For that, I suggest picking up an appropriate book on Python or even BASIC. If you are learning programming in the context of a holistic CS course, you will probably start with Java.

Branch and Bound

As I worked my way through the encryption course and the final delivery problem course, I was challenged enough that writing Forth on the side was starting to feel like a distraction. After breaking around the holidays, I put my head down and completed the remaining courses in the specialization. Relieved, and with certificate in hand, I wanted one last challenge.

The final course focused on applications of graph theory via to the travelling salesman problem (TSP). The final test was an implementation of a 2-approximation algorithm that quickly calculates a solution that is, at worst, twice the optimal value. Along the way, the course material described another algorithm, branch and bound, that calculates the optimal solution. The algorithm seemed simple enough in principle and it used a couple of common data structures, so I decided to try it out.

A complete graph on six vertices.

Defining data structures is key in Forth. This wasn’t a data structures and algorithms course, though, and I didn’t want to get bogged down in research, so I turned to ChatGPT to help me get started. I asked for general advice on data structures I should use to represent the graph and the tree. It suggested a few ways to represent a graph and I chose the adjacency matrix. An adjaceny matrix stores the edge weight (or distance) between each pair of vertices. Graphs for TSP are usually complete graphs, meaning each vertex is connected to every other vertex. This mirrors the real world where you can go to any point from any other point (unless it’s rush hour and one of those points is the 35W entrance ramp on 10th street).

Association Matrix

Branch and bound uses a tree structure to represent the various possible paths through the graph so that it can evaluate them systematically. As we evaluate each path, we keep track of the shortest path we’ve found so far. If the current path gets longer than that, we back up and try the next one. Since this is a complete graph, there are n! possible paths. If the start and end point is fixed, a graph with just six vertices still has 5!, or 120, possible paths! I was having a hard time imagining how a node in the tree could have an arbitrary number of children. ChatGPT clued me into having a parent node point to its first child, then have each child node point to its next sibling. I knew I would need to go up the tree as well, so I added a parent property to the model for my nodes.

My tree structure

You can see my branch and bound implementation on Gitlab. There’s plenty of room for improvement, but it works for small graphs! I’m sure it could handle larger graphs if I put some effort into memory management. I thought I could even use it for practical purposes by plugging in GPS coordinates. They would need to be scaled up or the coordinates changed to floating point numbers.

Lessons Learned

I’m still not a math whiz, but I learned some patterns that I can imagine using in real life. I think about graphs quite often in the form of decision trees, dependency trees, and project networks. I finally understand how key-pair encryption actually works under the hood, and elliptic curve encryption is just a variation on that. I also feel that I have a slightly better grasp on probability, but it’s a challenging subject. At one point I was trying to decompose Euclid’s algorithm so that I could simplify my code. My research led me to the Wikipedia article on Bézout’s identity. The part I was trying to isolate is known as the Bézout coefficient pair. I thought it was interesting that my effort to refactor these functions in Forth drove me to understand the math more deeply.

Doing the exercises in Forth was a fun challenge. I do feel that I got more out of the experience having done that. I got to play with recursion, too, which I hadn’t really done yet. It helped me to work things out on a notepad first, then pseudocode, then code. Often, I would try to think of a clever (or efficient, or abstract, or re-usable) solution to a problem, but I would get stuck. Upon recognizing this trap, I would move forward with a naive approach. I may not have arrived at the most elegant solution, but this resulted in code that worked. I can always refactor later, if needed.

“When in doubt, use brute force.” - Ken Thompson

Tips I’m writing down for future me:

Create small words and test them individually.
- This is one of the strengths of interactive Forth.
Use words that make sense.
- This is another strength of Forth, the names you choose can make your code easier to read.
Lean into the way Forth works.
I learned how to use assertions and floating-point in Gforth.
- After watching If Considered Harmful, I’ve come to appreciate assertions.
- Potential future deep-dive: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Don’t make a word deal with any items on the stack that it doesn’t need.
- Tying to pass stack items through creates complexity. Make the calling word responsible for the stack items it needs.
- As soon as stack items are no longer needed, get rid of them. Don’t continue to dance around them.
- I rediscovered a lot of what’s documented here: Designing the Stack Effect
If I’m going to need something for a computation later, I’ll stash it on the return stack.
- There’s always money in the return stack.

In engineering, we use a lot of patterns and algorithms. There are lots of ways to solve problems. The hard part is often to identify the type of problem you’re dealing with. If you want to learn more about that, I recommend reading Programming Pearls by Jon Bentley. If you want to learn more about Forth, I recommend reading Starting Forth by Leo Brodie.

If you are considering a similar challenge, here are some things to keep in mind:

Consistency is key
Take breaks
One way to get good at something is to do it a lot!