Show HN: Lisp with copying GC in 537 lines of C (plus Lisp 1.5 on top)

blt · on Oct 14, 2018

This is cool. I wrote a similar Lisp but got stuck debugging my copying GC. The interaction between the Lisp environment chain and the C stack is tricky.

I prefer something like this over Norvig's Lisp interpreter written in Python. Norvig's is good for understanding environments and eval/apply, but IMO it's cheating to use the host language's GC for allocations.

I would add integers, even if it costs a few lines of code... using `itos` and re-parsing the string when evaluating an int is just too painful for me.

mpirstitz · on Oct 15, 2018

Here's my own attempt at implementing a somewhat readable copying GC in C - https://github.com/matp/tiny-lisp/blob/master/tiny-lisp.c#L1...

krig · on Oct 15, 2018

Cool, that's an interesting solution! I tried hard to keep the GC implementation as straight-forward as possible, as a teaching aid, but I'll admit that debugging the GC was a bit of a struggle when roots weren't clearly indicated as such.

krig · on Oct 15, 2018

I felt the same way about Norvigs version which was one reason to do this. I feel like the recursive lisp interpreter-on-an-interpreter thing is not as clear when so much of the host language is relied on.

Oh, and I hear you about integers as strings. This choice was in part for compactness and in part inspired by the first LISP papers where there are only symbols and no other types.

slaymaker1907 · on Oct 15, 2018

To be fair to Norvig, malloc is also a relatively hefty abstraction comparable to a GC on a systems level.

krig · on Oct 15, 2018

That is certainly a good point. The problem I felt with dropping below malloc is that the code becomes very platform-dependent.

blt · on Oct 15, 2018

but copying GC can be easily implemented with sbrk, or on the bare metal if no MMU is present. mpirstitz's GC uses mmap but could easily be replaced with sbrk. the main challenge of dynamic memory management is fragmentation, which is completely handled by using a copying GC. an implementation on top of malloc is not leaning on malloc's fragmentation handling at all.

FullyFunctional · on Oct 15, 2018

Always good fun. Reminds me of XLISP which was my first exposure to Lisp.

There's a lot to be said for small implementations that gets to the heart of the issues. Most great oeuvres of software had a kernel beginning like this.

Some fun directions you could take this: Change the GC to Baker-style incremental (~ "real-time"), use cdr-coding, use tagged integers, add a compiler and compile to byte-code or threaded code. EDIT: grammar.

krig · on Oct 15, 2018

Thanks!

Yeah, I am thinking that adding a similarly minimal compiler to this would be interesting as a next step.

There is a previous effort in the ”old-version” directory which had tagged integers and strings and an attempt at macros, but it was a bit too unfocused and I never finished it.

With this one the goal was to make it as small as possible but still be a complete implementation including GC, which made for a clearer end goal.

FullyFunctional · on Oct 16, 2018

Spot on. I have a partially finished byte-compiled Scheme which I abandoned when it grew to big for its original target platform, sigh. So much to be said for something that actually runs.

daef · on Oct 15, 2018

ahahahaAHAHAhahaha...

    To prevent reading from continuing indefinitely, each packet should end with STOP followed by a large number of right parentheses. An unpaired right parenthesis will cause a read error and terminate reading.

    STOP )))))))))))))))))

waterhouse · on Oct 15, 2018

Things I see...

1. Mac OS apparently has a conflicting definition of "isnumber", as "__DARWIN_CTYPE_TOP_inline int isnumber(int _c)" in "/usr/include/ctype.h:323". Thanks, Mac OS. So I'm running this on another machine of mine.

2. My usual test: iteration expressed as anonymous recursion, which should be possible to run in constant space:

  pi@raspberrypi:~/LISP $ echo '
  ((lambda (f) (f f 10 0))
   (lambda (f n tt)
     (cond ((equal? n 0) tt)
           (#t (f f (- n 1) (+ n tt))))))' | ./komplott
  55
  pi@raspberrypi:~/LISP $ echo '
  ((lambda (f) (f f 100 0))
   (lambda (f n tt)
     (cond ((equal? n 0) tt)
           (#t (f f (- n 1) (+ n tt))))))' | ./komplott
  5050
  pi@raspberrypi:~/LISP $ echo '
  ((lambda (f) (f f 1000 0))
   (lambda (f n tt)
     (cond ((equal? n 0) tt)
           (#t (f f (- n 1) (+ n tt))))))' | ./komplott
  Out of memory
  Aborted

I noticed from the source that eval did recursive calls to itself. I expected that to be its bane, but I'm surprised it hit an OOM rather than a stack overflow... It looks like, in lisp_eval's "apply" case, it'll call gc_protect on all the relevant stuff, then only do gc_pop after everything returns; perhaps that's why.

In the absence of other looping constructs, looping by tail recursion seems the best available option, and therefore an important usage to support. Implementing that, along with a moving GC, in a language that doesn't do its own tail call elimination is a pain.

krig · on Oct 15, 2018

Yeah the TCO is not complete, it allocates a call frame for each call regardless which is why it ran out of memory. It doesn’t use up the C call stack though, so you could in theory increase the GC heap size to make it survive longer. Due to the primitive environment implementation it’ll get slower and slower as it goes.

Thanks for the note on Mac OS, I’ll fix that.

kwccoin · on Oct 15, 2018

Before I try, just always have a question - is it lisp-1 or lisp-2? Is it scheme/lisp?

Given the lisp15.scm is a scm it seems to be underlying is a lisp 1.5 running under scheme. But given it is doing make, is it just share code?

Too many of this but really want to have a code for simple C based lisp to play with. Is that it?

hawkice · on Oct 15, 2018

Considering the example code and test code is all (define func-name (lambda (....) )), this has got to be a lisp-1.

krig · on Oct 15, 2018

The base language interpreter written in C is a lisp-1, and it’s not even close to a full scheme implementation but it is a subset of scheme.

The LISP 1.5 interpreter runs on top and is implemented in the base language.

rurban · on Oct 14, 2018

I'm ok with minimalism, but interning strings in a simple array with linear search only scales to ~100 entries. a hash table can be written in 15 lines.

UncleEntity · on Oct 14, 2018

It's actually a linked list (with linear search).

Good enough for such a simple interpreter methinks...

sitkack · on Oct 14, 2018

Sounds like a great pedagogical tool to me!

krig · on Oct 15, 2018

It works surprisingly well, and appeals to me for its compactness. But i maybe I’ll switch to hashing if I can squeeze that in.

shawn · on Oct 15, 2018

This isn’t true. Emacs interns strings in a global obarray, which is a simple array with linear search. I hear emacs scales well.

pg314 · on Oct 15, 2018

That is incorrect. An Emacs obarray is a kind of hash table [1].

[1] https://www.gnu.org/software/emacs/manual/html_node/elisp/Cr...

shawn · on Oct 15, 2018

(make-vector length 0)

Interesting. That threw me off. I overlooked the previous paragraph:

In Emacs Lisp, an obarray is actually a vector. Each element of the vector is a bucket; its value is either an interned symbol whose name hashes to that bucket, or 0 if the bucket is empty. Each interned symbol has an internal link (invisible to the user) to the next symbol in the bucket. Because these links are invisible, there is no way to find all the symbols in an obarray except using mapatoms (below). The order of symbols in a bucket is not significant.

Thanks!

brian_herman · on Oct 14, 2018

Good job

krig · on Oct 15, 2018

Thanks!

iddan · on Oct 14, 2018

[flagged]

YorickPeterse · on Oct 14, 2018

Heavens forbid people having fun by implementing a Lisp.

bbeonx · on Oct 14, 2018

I feel like these are both excellent points.

krig · on Oct 15, 2018

Well, I’m also learning to play the piano, and that had certainly been done before and better than me.

But the reason for both is that it’s fun and I enjoy doing it.

Annatar · on Oct 15, 2018

Because it's a good programming exercise towards the goal of mastering C, and because being a functional programming language, LISP embodies software that is easy to debug, easy to understand and easy to extend.

SamReidHughes · on Oct 15, 2018

Yes, like implementing a statically typed language, but we all have to start somewhere.

avhon1 · on Oct 15, 2018

When I took a programming languages class with Dan Friedman [0], the entire class was taught in scheme. One of our assignments was to implement a typerchecker for the simply-typed lambda calculus in minikanren [1]. I recall it being a relatively straightforward assignment.

[0] https://en.wikipedia.org/wiki/Daniel_P._Friedman

[1] http://minikanren.org/

shawn · on Oct 15, 2018

I actually need to do this for my lisp. Do you know if that class is available online? I only found things like http://mullr.github.io/micrologic/literate.html

which doesn’t seem to do type checking.

avhon1 · on Oct 15, 2018

The current version of the course is locked away on Canvas, but an older version (nearly the same as it was in 2017) is at [0]. It has all of the assignments, including the partially-complete type checker. (Look in Assignment 10.)

We spent a couple of weeks learning minikanren out of "The Reasoned Schemer", using Will Byrd's implementation of minikanren.

[0] https://web.archive.org/web/20070610012057/http://www.cs.ind...

shawn · on Oct 16, 2018

I've been studying this for the past day. I just wanted to say: thank you. This is the most helpful and relevant resource for my work that I've come across in some time.

I really appreciate that you took the time to send me this.