Linked List Code-Along

Low Level Learning published a video demonstrating the implementation of a linked list in C. I thought it would be fun to code along - in forth!

I enjoy learning about data models and I thought this demonstration was a lot of fun to watch. The progressive elaboration approach along with live-coding inspired me to follow along. I’ve been interested in the forth programming language for a couple years now, and I thought it would be an interesting challenge to translate the C code in the video into equivalent forth code.

After explaining linked lists at a high-level, Low Level Learning (LLL) breaks the programming task into about five parts:

Define the data structure of a node
Display a menu for the user
Add a node to the list
Remove a node from the list
Insert a node into the list

If you’re unfamiliar with forth, there are many places to learn about it, including YouTube. In short, functions in forth are called words. Words implicitly operate upon an ever-present stack of values. In C, you might write something like dup(stack[0]), but in forth it’s just dup.

You’ll see a lot of symbols that look like ( n – n ). These are comments called stack pictures. They indicate what a word takes from the stack and what it leaves on the stack when it’s done.

I’ll be presenting the words I wrote that are relevant to each section, but these examples won’t work on their own. You can see the completed code in this snippet at Gitlab.

Create the data structure

LLL uses a C struct to define a node. Forth is usually typeless, so it’s really about how memory gets used. Forth divides memory into word-length cells. On a 64-bit system, a cell is 8 bytes. My nodes will be two adjacent cells, one for the pointer to the next node, and one for the value that the node holds. Depending on the application, that value could be a signed or unsigned integer, a floating point value, or a string of eight characters.

I’ll discuss the structure further later on but, for now it’s important to have a picture in my head of how memory will be used for a node.

The menu presents the operations available to the user and captures their selection. My code looks like this:

: getoption ( -- n )
  cr
  ." You have the following options:" cr cr
  ." 1. Add a node" cr
  ." 2. Remove a node" cr
  ." 3. Insert a node" cr
  ." 4. Quit" cr
  key 48 -
;

: getvalue ( -- n )
  pad 32 accept pad swap s>number? 2drop \ drop the flag and half of the double
;

: main ( -- )
  begin
    print
    getoption
    dup 4 <>
  while
    dup 3 = if
      ." What data to insert? " getvalue cr
      ." What position?" getvalue
      insert drop then
    dup 2 = if 
      ." What data to remove? "  getvalue
      remove 0 = if ." Element not found" cr then
    then
    dup 1 = if
      ." What value to add? " getvalue 
      add drop
    then
    drop
  repeat
  drop
;

Even if you don’t read forth, the strings and the structure of these words should give you a clue about what’s going on. The main word is a bit long by forth conventions. The code in the while clause could be broken up into about four words, one with the control flow and three for the individual actions: adding, removing, and inserting.

Notice that main is unaware of the structure of a node. Depending on the user’s response, main invokes getvalue and then add, remove, or insert. If the structure of a node ever changes, it’s entirely possible that main would be unaffected.

Add a node

LLL’s addNode function takes the user-provided value as its operand and returns a pointer to the new node. This is also what add does, invoking addempty or addnext, depending on the value stored in the head pointer. The words addempty and addnext reserve memory for the node and give a pointer to add that it then stores in the head pointer.

Here is my code for adding a node:

: addempty ( n -- addr )
  here swap 0 , ,
;

: addnext ( n -- addr )
  here swap head @ , ,
;

: add ( value -- addr )
  head @ 0 = if addempty else addnext then dup head !
;

Printing the list

LLL takes a sidetrack here to create a function that will print all the values in the list. I think comparing C and forth here should be instructive. I’ll break down my print word and show the stack at each step. The stack grows left to right in this table.

Here is print as it appears in my code:

: data> ( addr -- n )
  cell + @
;

: next> ( addr -- addr )
  @
;

: print ( -- )
  cr head @ begin dup while dup data> . ." ->" next> repeat cr drop
;

Here’s the breakdown:

Word	Stack	Description
cr		Output a newline
head	->head	Put a pointer to the head pointer on the stack
@	addr	Dereference the head pointer, putting the address of the head node on the stack
begin	addr	Start a loop
dup	addr addr	duplicate the value on the top of the stack
while	addr	if the address is null, end the loop
dup	addr addr	duplicate the top stack value
data>	addr data	get the data value from the node referenced by the pointer
.	addr	print the data value to the screen
.” ->”	addr	print a literal “->”
next>	addr’	dereference the address on the stack to retrieve the pointer to the next node
repeat	addr’	return to beginning of the loop
cr	addr’	Output a newline
drop		remove the value on the top of the stack

The key piece here is next> which transforms the pointer to the current node into a pointer to the next node by dereferencing the pointer. This value is then used in the next loop iteration. All data> does is dereference the adjacent cell where the node’s value is stored.

Remove a node

A LLL describes, removing a node involves

Locating the value the user wants to remove, and then either 2a. If the node to be removed is referenced by head, store the node’s next pointer as the head pointer, thereby removing the node from the list, or 2b. Change the previous node’s pointer to point to the following node, thereby removing the current node from the list.

: remove ( val  -- f )
  head @ dup ( val prev current )
  begin
  dup
  while
  ( val prev current )
  rot over data> over ( prev current val data val )
  = if ( prev current val ) \ found the value
    drop dup ( prev current current )
    \ if the current node is the head node, store prev in head
    head @ = if next> head ! drop 
    else \ if the current node is not the head node, store the current node's next pointer into the prev node
      ( prev current ) next> swap !
    then
    -1 exit \ leave a true flag
  then
  -rot nip dup next> ( val prev->current current->next )
  repeat
  \ if we get here, we never found it, drop everything and leave a zero flag
  drop drop drop 0
;

This word, too, is quite long by forth convention. There’s also quite a bit of so-called stack-juggling going on. This is more a reflection of my skill level than a requirement of forth, per se.

I’ve also made no attempt to free memory allocated to nodes that have been removed. Memory management in forth, as in life, is an exercise left for the reader.

Insert a node

Inserting a node builds on concepts I’ve already used: searching the list, adding a node, and updating pointers.

: insert ( data pos -- addr )
  head @ swap ( data addr pos )
  begin
  dup
  while ( data addr pos )
  swap next> ( data pos next )
  dup 0 = if ( data pos next )
    ." Requested position too far into the list" cr
    drop drop drop exit
  then
  swap 1 - ( data next pos-1 )
  repeat
  drop swap addempty ( current new )
  tuck swap 2dup ( new new curr new curr )
  \ store curr next in new
  next> swap ! ( new new curr )
  \ store new in curr next, leaving new on the stack
  ! 
;

This, too, could be refactored. A new word, search say, could take a value on the stack and return the address of the first node having that value – or all nodes having that value. Then, insert could be responsible only for inserting a new node in a given position.

Conclusion

I had a lot of fun with this code-along. I don’t program professionally, but I love the arcana of computer science and the brutalism of forth is especially appealing to me. Charles Moore, credited with inventing forth, has said, “Forth is highly factored code. […] If you have a lot of small definitions you are writing Forth.” This preference for small functions can be taken to any language, and I think that’s worth thinking about.

In programming, or any endeavor, I think it’s easy to overestimate one’s cognitive capacity. If you haven’t defined exactly what you’re doing, it’s easy to get in over your head. That’s why I think it’s important to ask questions, write things down, and make sure it makes sense.