PostHole
Compose Login
You are browsing eu.zone1 in read-only mode. Log in to participate.
rss-bridge 2026-03-01T15:23:42+00:00

Lil' Fun Langs' Guts

Article URL: https://taylor.town/scrapscript-001

Comments URL: https://news.ycombinator.com/item?id=47207531

Points: 32

Comments: 2


Lil' Fun Langs' Guts

I'm still thinking about those lil' fun langs. How do they
work? What's inside them? Do I need my pancreas? What if I don't want to
normalize my IR? Is laziness a virtue?

Haskell-esque languages may look alike, but they differ across many dimensions:

  • strict vs. lazy
  • curried vs. bland
  • bootstrapped vs. hosted
  • interpreted vs. compiled
  • nominal vs. structural types
  • pretty vs. ugly errors

Most implementations use standard compilation phases:

  • Lexing: Source → Token stream
  • Parsing: Tokens → Surface AST
  • Desugaring: Surface AST → Core AST
  • Type Inference: Core AST → Typed AST
  • Pattern Match Compile: Typed AST → Case trees
  • Normalization (ANF/K): Typed AST → Normalized IR
  • Optimization: Normalized IR → Normalized IR
  • Closure Conversion: Normalized IR → Closure-explicit
  • Code Generation: Closure IR → Target

(asm/bytecode/C/LLVM)

(if native)

  • Runtime System: GC, primitives, entry point

Strict vs. Lazy

In strict evaluation, arguments are evaluated before being passed to a function.
In lazy evaluation, arguments are only evaluated if their value is actually
needed; the result is cached, so the work happens at most once.

-- lazy eval returns `3` without applying `foo`
length [ 1, foo 2, 4 ]
AspectStrict (ML, OCaml)Lazy (Haskell)
NormalizationANF / K-normal formSTG / thunks required
Closure conversionStandard flat closuresClosures + thunks + update frames
Code generationStraightforwardRequires eval/apply or push/enter
Memory managementValues are always evaluatedMay contain unevaluated thunks
Tail callsSimple (jump)Complex (enters, updates)
DebuggingEasy (call stack is meaningful)Hard (thunks obscure control flow)
Runtime complexitySimpler (~200 LOC C)More complex (~500–2000 LOC C)

Strict evaluation is the simple choice. If you want laziness, Peyton Jones's
STG machine
is the standard approach. MicroHs sidesteps the STG machine by compiling
directly to combinatory logic with graph reduction.

Lazy evaluation also unlocks infinite collections — you can define an infinite
list and consume only what you need.

Curried vs. Bland

********

StyleExamplesImplementation cost
CurriedHaskell, Ben Lynn, MicroHsFree in combinator backends; native backends need arity analysis to avoid allocating a closure per argument
BlandMinCaml, OCaml (internally), Grace, EYGSimpler codegen -- multi-arg functions are just functions that take tuples or multiple params

In a curried language, f x y is ((f) x) y: two function applications. If
your backend doesn't detect that f always takes two arguments (arity
analysis), you pay for a heap allocation on every multi-argument call.

Bootstrapped vs. Hosted

I tried to teach myself to play the guitar. But I'm a horrible teacher —
because I do not know how to play a guitar.

Mitch Hedberg

Most compilers are written in an existing language (e.g. C, Rust, Haskell,
OCaml) and lean on that host's ecosystem for parsing libraries, build tools, and
package management.

A bootstrapped compiler compiles itself. You write the compiler in the language
it compiles, then use an earlier version of the compiler (or a minimal seed
runtime) to build the next version. Your language becomes self-sustaining; the
compiler is its own test suite.

There are many exemplary self-hosted languages to study:

  • MicroHs is a Haskell compiler that compiles Haskell to combinators. The

combinator reducer is implemented in C. The compiler is written in Haskell and
can compile itself. Bootstrapping requires only a C compiler -- no
pre-existing Haskell installation.

  • Ben Lynn starts with a minimal runtime in C (~350 LOC), then constructs

increasingly capable compilers, each written in the subset that the previous
one can compile. Each stage is ~100–300 LOC of the language being defined. The
total chain is ~2000 LOC + 350 LOC C.

C runtime (350 LOC)
→ compiler₁: lambda calculus + integers
→ compiler₂: + let, letrec, ADTs
→ compiler₃: + type inference
→ compiler₄: + pattern matching
→ compiler₅: + type classes
→ ...
→ compilerₙ: near-Haskell-98
  • Newt is a dependently typed language whose compiler is written in Newt,

targeting JavaScript. It bootstraps by keeping the generated JS checked in.
This works best when your target is a high-level runtime (JS, JVM) rather than
native code.

Interpreted vs. Compiled

An interpreter executes the program directly by walking its AST or stepping
through bytecode. A compiler translates the program into another language (e.g.
x86, C, JS) and lets that target handle execution.

The boundary here is blurry. Bytecode VMs compile to an intermediate
form. "Transpilers" compile to source code rather than machine instructions.

************************

StrategyExamplesLOC estimateTrade-off
Tree-walking interpreterPLZoo poly, Eff, Frank, Grace, 1ML50–200Simplest. No codegen, no runtime. Slow (10–100× native)
Bytecode VMOCaml (ZINC), Tao, PLZoo miniml200–500Middle ground. Portable, reasonable speed. Write ~30–50 instructions
Native compilationMinCaml, mlml, AQaml500–1500Fast execution, but you own register allocation, calling conventions, ABI
Transpile to CKoka, Scrapscript, Chicken, Austral200–500Best of both worlds -- portable native speed, C compiler does the hard parts
Transpile to JS/GoNewt, SOSML, Borgo200–400Web/ecosystem deployment, but you inherit the target's performance model
Combinator reductionBen Lynn, MicroHs100–300No closures, no registers. Graph reduction evaluator in C. Simple but slow

Lil' fun langs are usually interpreters. Without
systems. The leap from interpreter to compiler costs ~500–2000 LOC.

Nominal vs. Structural Types

type Meters  = Int
type Seconds = Int

-- Nominal:     Meters ≠ Seconds  (different names)
-- Structural:  Meters = Seconds  (same shape)

********

StyleExamplesConsequence
NominalOCaml, Haskell, AustralName is identity -- same shape doesn't mean same type
StructuralEYG, Grace, TypeScript, Simple-subShape is identity -- same fields/variants means same type

Most ML-family languages are nominal for algebraic data types but structural for
records (if implemented). Row polymorphism (EYG, Grace, Koka) is inherently
structural -- it acts on "any record with at least these fields." Simple-sub
goes further: union and intersection types, with principal inference intact.

Pretty vs. Ugly Errors

-- Ugly:
Error: type mismatch: int vs string

-- Pretty:
3 │ let x = 1 + "hello"
│             ^^^^^^^^
Error: I expected an `int` here, but got a `string`.
The left side of `+` is `int`, so the right side must be too.

Pretty errors cannot be achieved with a coat of paint. To point at a line/region
of code, you must thread source locations through every compiler phase. A
minimum viable error system:

  • Source spans on every AST node. Every expression, pattern, and type

carries { file, start_line, start_col, end_line, end_col }. This costs one
extra field per node.

  • Preserve spans through desugaring. When you lower where to let, the

new let node inherits the span of the where.

  • Preserve spans through type inference. When unification fails, you need

the spans of both conflicting types.

  • Format errors with context. Show the source line, underline the relevant

span, explain the mismatch.

****************

[...]


Original source

📄 spineless-tagless-gmachine.pdf

Reply