diff --git a/SPEC.md b/SPEC.md index 813e7143..c33933da 100644 --- a/SPEC.md +++ b/SPEC.md @@ -441,6 +441,7 @@ Common shapes reached for from other languages. The parser and lexer surface eac | `fmt "{}" +0.1 0.2` -> `0.30000000000000004` (float Display = full IEEE 754) | `fmt "{:.2f}" (+0.1 0.2)` for human-readable; `fmt2 v N` for precise dp | docs only | | `*/ sz 0.3 0` ("scale then div by 0") | `*/a b c` is `(a/b)*c` — b is the divisor; for `(a*b)/c` use `/*sz 0.3 0` or bind `r=*sz 0.3;/r 0` | hint only | | `?h a b` (keyword form on bare ref) | `? a b` (bare-bool prefix ternary) | `ILO-W003` | +| `pred q:t>b;=q "" 1;false` (guard tail literal) | `=q "" true;false` (tail value must match declared return type) | `ILO-T008` | Each case fires a hint pointing at the canonical form; the agent's first retry should be the right one. Identifier-shaped collisions with builtin names (`len=...`, `sin=...`) are rejected with `ILO-P011` plus a rename suggestion. diff --git a/ai.txt b/ai.txt index 06c92ac6..58b58a79 100644 --- a/ai.txt +++ b/ai.txt @@ -2,7 +2,7 @@ INTRO: ilo is a token-optimised programming language for AI agents. Every design FILE VERSION PRAGMA: Optional. ^26.5 -- rest of file Top-of-file declaration of the minimum required runtime. First line, no leading whitespace. Sigil-led (principle 4), ~3 tokens (principle 1). First-class syntax, not a magic comment - the lexer recognises `^` only at file start, so `^` elsewhere keeps its `return err` meaning. Pragma absent=Assume latest installed runtime, no diagnostic File targets older than runtime, breaking change between=Fail with migration pointer File targets newer than runtime=Fail asking to upgrade Tooling: `ilo --version-of ` reads the pragma (returns nothing when absent); the formatter canonicalises position when present, never inserts one. Ships with the CalVer cut; 0.x files have no pragma and verify silently. FUNCTIONS: : ...>; No parens around params - `>` separates params from return type `;` separates statements - no newlines required Last expression is the return value (no `return` keyword) Zero-arg call: `make-id()` Paren-form call (ILO-51): `spl(row, ",")` is sugar for `spl row ","` — same AST, postfix is canonical Labelled args (ILO-71): `dtfmt epoch:e fmt:"%Y"` — optional `label:value` form for any callable with declared parameter names. Labels resolve to positional by name; order is free. Mixed positional + labelled is allowed (positional fill from left; labels fill remaining slots by name). Unknown or duplicate labels surface `ILO-P019` at parse time. Works in both postfix and paren form: `f(b:2, a:1)` ≡ `f a:1 b:2`. **Two body forms — both fully supported:** -- Inline: semicolons separate statements; last expression returns. add-and-double x:n y:n>n;s=+x y;*s 2 -- Brace-block, single-line: explicit braces wrap the whole body (same semantics). add-and-double x:n y:n>n { s = +x y; *s 2 } -- Brace-block, multi-line: newlines inside `{ ... }` act as statement separators -- (same as `;`). The brace form may be inline or multi-line interchangeably. add-and-double x:n y:n>n { s = +x y *s 2 } Multi-step transforms bind intermediate results as locals: tot p:n q:n r:n>n;s=*p q;t=*s r;+s t Early return: braceless guard (`>=x 0 val` exits the function immediately when true); `ret val` exits from any depth including inside a loop or braced conditional. Result unwrap mid-body: `v=call!` extracts the Ok value and propagates Err out of the function before continuing. TYPES: `n`=number (f64) `t`=text (string) `b`=bool `_`=any/unknown (wildcard type) `L n`=list of number `R n t`=result: ok=number, err=text `O n`=optional number (nil or n) `M t n`=map from text keys to numbers `S red green blue`=sum type - one of named text variants `F n t`=function type: takes n, returns t (used in HOF params) `W`=capability World token — `w:W` declares a capability parameter (ILO-68) `order`=named type `a`=type variable - any single lowercase letter except n, t, b [Optional (`O T`)] `O T` accepts either `nil` or a value of type `T`. f x:O n>n;??x 0 -- unwrap optional or default to 0 g>O n;nil -- returns nil (valid O n) h>O n;42 -- returns 42 (valid O n) `??x default` - nil-coalesce: returns `x` if non-nil, else `default`. Unwraps `O T` to `T`. [Sum types (`S a b c`)] Closed set of named text variants. Verifier-enforced; runtime value is always `t`. color x:S red green blue > t ?x{red:"ff0000";green:"00ff00";blue:"0000ff"} Sum types are compatible with `t` - a sum value can be passed to any `t` parameter. [Discriminated union types (`type Foo = A | B(n) | C(t)`)] Named sum types with optional per-variant payloads (Rust-style enums). Each variant is either payload-less or carries exactly one value of a primitive type. type shape = circle(n) | square(n) | point area s:shape > n ?s{circle(r):*3.14159 *r r;square(side):*side side;point:0} **Declaration**: `type Name = V1 | V2(payloadType) | ...` at top level. **Construction**: `circle 5` (payload variant), `point` (payload-less variant used as value directly). **Pattern match**: `?s{circle(r):...; square(side):...; point:...}` using `tag(binding):` or `tag:` arms. **Exhaustiveness**: verifier (ILO-T024) checks all variants are covered; the error lists every missing variant by name and suggests the correct arm syntax (`tag(v): ` for payload variants, `tag: ` for payload-less). A wildcard `_:` arm satisfies exhaustiveness. Missing multiple variants produces a single diagnostic naming all of them. **VM**: programs using discriminated unions fall back to the tree interpreter (JIT codegen deferred). [Generic discriminated union types (`type Result = ok(a) | err(b)`)] Sum type declarations accept type parameters (ILO-402), enabling reusable polymorphic variants. type result = ok(a) | err(b) type option = some(a) | none type either = left(a) | right(b) **Syntax**: `type Name` or `type Name` — one or more single-letter type variables (commas optional). **Type variables**: declared letters (including `n`, `t`, `b`) are treated as type variables in variant payloads, not as primitives. **Erasure**: type variables are erased at runtime — no boxing or specialisation. The verifier accepts any concrete type for a type-variable payload. **Usage**: construct and match exactly like non-generic sum types; the concrete type is inferred from context. safe-div x:n y:n>result =(y) 0{ret err "division by zero"} ok /x y main>t dv=safe-div 10 2 ?dv{ok(v):str v;err(msg):msg} -- "5" [Map type (`M k v`)] Dynamic key-value collection. Keys are typed: text (`t`) or integer (`n`). `Int(1)` and `Text("1")` are distinct keys. mmap -- empty map mset m k v -- return new map with key k set to v mget m k -- value at key k, or nil mget-or m k default -- value at key k, or default if missing (never nil) mhas m k -- b: true if key exists mkeys m -- L t: sorted list of keys mvals m -- L v: values sorted by key mpairs m -- L (L _): sorted [k, v] pairs; mpairs m == zip (mkeys m) (mvals m) mdel m k -- return new map with key k removed len m -- number of entries Numeric keys work directly - no `str` conversion needed. Float keys floor to `i64` at the builtin boundary (matching `at xs i`); NaN/Infinity raise at runtime. idx=mmap idx=mset idx 7 "seven" -- M n t, integer key mget idx 7 -- "seven" mhas idx 7 -- true mhas idx "7" -- false (Int and Text are distinct) `jdmp` stringifies numeric keys for JSON output (JSON object keys are always strings). The round-trip via `jpar` is lossy - numeric keys come back as text. Example: scores>M t n m=mmap m=mset m "alice" 99 m=mset m "bob" 87 mget m "alice" -- 99 [Type variables] A single lowercase letter (other than `n`, `t`, `b`) in type position is a type variable. Used for higher-order function signatures: identity x:a>a;x apply f:F a a x:a>a;f x **Without a bound declaration** type variables are treated as `unknown` during verification — the verifier accepts any type for `a` without consistency checking across call sites (legacy behaviour; backward compatible). [Bounded generics] Explicit generic type parameters allow the verifier to enforce two properties at call sites: 1. All arguments bound to the same type variable have the same concrete type. 2. The concrete type satisfies the declared bound. **Syntax:** `name` before the parameter list. Bounds are optional per variable; omitting `:bound` defaults to `any`. gmn x:a y:a>a -- min of two comparable values gadd x:a y:a>a -- addition, numeric values only grep s:a n:n>t -- repeat text gid x:a>a -- identity, any type **Bound set** (small and fixed): `any`=any type (default when bound omitted) `comparable`=`n`, `t`, `b` `numeric`=`n` `text`=`t` **Call-site checking:** gmn 3 7 -- ok: both n gmn "a" "b" -- ok: both t gmn 1 "two" -- ILO-T044: 'a' bound to n then t (inconsistent) gadd "x" "y" -- ILO-T044: 't' does not satisfy numeric bound Unbounded legacy type-variable usage (`identity x:a>a;x`) continues to work without changes. [Inline lambdas] Pass a function literal directly to a HOF instead of defining a one-off top-level helper: by-dist xs:L n>L n;srt (x:n>n;abs x) xs nonempty ws:L t>L t;flt (s:t>b;>(len s) 0) ws sumsq xs:L n>n;fld (a:n x:n>n;+a *x x) xs 0 Syntax: `(: ...>;)`. Same shape as a top-level function declaration, wrapped in parens, no name. **Brace-lambda shorthand** (`{params> stmts}`): bare param names (types inferred as `any`) and no explicit return type. Useful for compact multi-statement bodies in `map`/`flt`/`fld`: sumsq xs:L n>n;fld {a x>; tmp=*x x; +a tmp} xs 0 dbl xs:L n>L n;map {x> *x 2} xs pos xs:L n>L n;flt {x> >x 0} xs The `;` after `>` is optional. The body supports the same `;`-chained statement forms as the paren lambda and top-level function bodies (let-bindings, guards, match, loops, `ret`/`brk`/`cnt`). Closure capture also works — any name that isn't a param or body-local is snapshot from the enclosing scope. **Phase 1 (no captures)** lifts the literal to a synthetic top-level decl and works across every engine (tree, VM, Cranelift JIT, AOT). The body's free variables must all be params, locals defined inside the lambda body, or known top-level fns. **Phase 2 (closure capture)** lets the body reference variables from the enclosing scope: f xs:L n thr:n>L n;flt (x:n>b;>x thr) xs -- captures `thr` (paren form) f xs:L n thr:n>L n;flt {x> >x thr} xs -- captures `thr` (brace form) **Builtins are not first-class values.** Builtin names (`sha256`, `hmac-sha256`, `b64`, etc. — every name in the builtin table) are call-only: they can appear in call position but cannot be passed by name to a HOF. `map sha256 xs` fails ILO-T004 ("undefined variable 'sha256'") with a hint pointing to the canonical wrap-as-lambda rewrite. Wrap the builtin in an inline lambda instead: hashes xs:L t>L t;map (x:t>t;sha256 x) xs -- paren form hashes xs:L t>L t;map {x> sha256 x} xs -- brace form A handful of arithmetic/string builtins (`abs`, `min`, `max`, `mod`, `sum`, `prod`, `len`, `upr`, `lwr`, `trm`, `cap`, `padl`, `padr`, `ord`, `chr`, `chars`, `str`, `num`, `jdmp`, `fmod`, `flr`, `cel`, `rou`, `avg`, `median`, `stdev`, `variance`) are promoted to `Ty::Fn` at the verifier so they *can* be passed directly (see `builtins-as-hof.ilo`); everything else needs the lambda wrap. Phase 2 captures run natively on every engine: the tree interpreter, the register VM, the Cranelift JIT, and the Cranelift AOT backend. Each free variable is snapshot by value at the call site (`Expr::MakeClosure`) and appended to the call frame's arg slice on dispatch. The AOT backend additionally embeds the postcard-serialised `CompiledProgram` into the binary's `.rodata` and publishes TLS pointers on startup, so dispatch helpers can re-enter the VM on user-fn callbacks. The ctx-arg form (`srt fn ctx xs`) remains the cross-engine alternative when you want explicit state without forming a closure. **Braceless guards are rejected inside lambda bodies (`ILO-P023`).** A braceless guard at statement position (`>=x 0 val`, `=x 0 val`, etc.) early-returns from the *enclosing function*, not from the lambda — see "Early Return" below. Inside a lambda body that semantics is almost never what the author meant; the lambda body would silently skip past the guard and the outer caller would return out from under the higher-order call. The parser therefore rejects braceless guards inside lambda bodies and asks for one of two expression-shaped rewrites: **Prefix ternary** when both arms are values: `map (x:n>n;?>=x 0 0 x) xs` **Braced match** when arms need statements: `map (x:n>n;?>=x 0{0}{x}) xs` Braceless guards at top-level function bodies continue to work — this restriction is lambda-body only. A future runtime change (follow-up to ILO-473) may switch the early-return target inside lambdas; until then the diagnostic prevents the silent miscompile. **Rejected lambda shapes (ILO-456).** Only the paren form `(x:t>r;body)` and the bare-param brace form `{x> body}` are accepted. Three shapes from other functional languages look plausible but are deliberately rejected — each emits a targeted hint naming both canonical forms and the call-site rewrite: `flt {x:t> body} xs`=`flt (x:t>r;body) xs` (paren is the typed form) `flt \x:t>body xs`=`flt (x:t>r;body) xs` or `flt {x> body} xs` `flt fn x:t>r;body xs`=`flt (x:t>r;body) xs` or `flt {x> body} xs` The brace form is the *bare-param* shorthand — its params are inferred as `any`, so `{x:n> body}` is a category error rather than a typed-brace lambda. [Trailing-semicolon semantics] `;` is the **statement separator** in ilo. A trailing `;` — one that appears after the last statement with nothing following it before the next structural boundary — is **always silently consumed** (ignored). It is never required, never an error, and never changes the meaning of the body. This applies uniformly across all three body contexts: Top-level function declaration=`name params>return;body` — the `;` after the return type separates the header from the body; it is **optional** when a newline is present=A trailing `;` after the last statement is consumed and ignored Inline lambda=`(params>return;body)` — the `;` after the return type separates the header from the body; it is **optional**=A trailing `;` before the closing `)` is consumed and ignored Match / guard arm body=`arm:body;` — `;` terminates an arm and starts the next; a trailing `;` before `}` is consumed and ignored=Consumed silently; arm body is parsed as-is The parser calls `parse_body_with` (for function bodies) and `parse_lambda_body` (for inline-lambda bodies). After consuming each `;` separator between statements, if the next token is at a body-end boundary (`EOF`, `}`, `)`, or the start of a new sibling function declaration) the loop breaks without error. No statement is emitted for the trailing `;`. **Practical rules:** `f>n;42` and `f>n;42;` are identical — both parse to a single-statement body returning `42`. `(x:n>n;+x 1)` and `(x:n>n;+x 1;)` are identical inline lambdas. `?x{a:1;b:2;}` and `?x{a:1;b:2}` parse identically — the trailing `;` before `}` is silently dropped. A `;` at the very start of a body (before any statement) is **not** a trailing semicolon — it is a missing-statement parse error (`ILO-P001`/`ILO-P003`). Only a `;` after a valid statement is silently consumed. The header/body separator `;` in `name params>return;body` is similarly optional when the token stream contains a newline at that boundary (the lexer converts indented newlines to `;`). The parser checks `peek() == Semi` and advances past it if present. -NAMING: Short names everywhere. 1–3 chars. `order`=`ord`=truncate `customers`=`cs`=consonants `data`=`d`=single letter `level`=`lv`=drop vowels `discount`=`dc`=initials `final`=`fin`=first 3 `items`=`its`=first 3 Function names follow the same rules. Field names in constructors and external tool names keep their full form - they define the public interface. [Identifier syntax] Identifiers are lowercase ASCII only, optionally with hyphenated segments. Formally: `[a-z][a-z0-9]*(-[a-z0-9]+)*`. Capital letters and underscores are rejected at the binding and call site. run -- OK run-d -- OK (hyphen separates segments) r2 -- OK (digit after first letter) runD -- ERROR (capital letter) RunD -- ERROR (leading capital) run_d -- ERROR (underscore not allowed in bindings) -run -- ERROR (must start with a letter) `runD` in the interactive CLI surfaces as `ILO-L003 unexpected token` with a suggestion to use `run-d` or `rund`. The constraint is intentional: a single lexical shape per identifier keeps the token stream predictable for agents and avoids style debates over camelCase vs snake_case vs kebab-case. **Hyphen vs subtraction.** A hyphen with no surrounding whitespace is always part of an identifier — `best-d` is one token, never `best - d`. Subtraction requires whitespace on at least the operator side: `- best d` (prefix form) or `best - d` (infix form). When an unbound kebab ident has every segment bound, `ILO-T004` adds a hint pointing at the prefix form. When an unbound kebab ident splits uniquely into two bound names (e.g. `zr-sq-zi-sq` → `zr-sq` and `zi-sq`), the hint shows both the prefix form (`- zr-sq zi-sq`) and the infix-with-spaces form (`zr-sq - zi-sq`). The only place capital letters and underscores are accepted is **after `.` or `.?`** at field-access position, so heterogeneous JSON keys from real APIs work without rewriting. See [Field names at dot-access](#field-names-at-dot-access) for the full list of post-dot relaxations (`r.URL`, `r.AccessKey`, `r.user_name`, etc.). Binding names (`AccessKey = ...`) and function names (`AccessKey x:n>n;...`) still error. [Reserved words] The following identifiers are reserved and cannot be used as names: `if`, `return`, `let`, `fn`, `def`, `var`, `const`. Using them produces a friendly error with the ilo equivalent: -- ERROR: `if` is a reserved word. Use: ?cond{true:...;false:...} -- ERROR: `return` is a reserved word. Last expression is the return value. -- ERROR: `let` is a reserved word. Use: name = expr -- ERROR: `fn`/`def` is a reserved word. Use: name param:type > rettype; body These checks fire at parse time across every context the keyword can appear in: top-level declaration head (`fn>n;...`), binding LHS (`fn=5`), and **parameter position** (`g fn:n>n;fn` rejects with ILO-P011 against the param name, not a cryptic ILO-P003 against the missing `>`). Builtin names (`flat`, `frq`, `map`, `flt`, `cat`, `len`, `srt`, `hd`, `tl`, `ord`, `fld`, `lst`, ...) are also rejected as user-function names and as local-binding LHS. Without this, calls to the user fn or use sites of the local binding silently mis-dispatch to the builtin and surface as a confusing `ILO-T006` arity mismatch. The parser intercepts at the declaration site with ILO-P011 and a rename hint: flat n:n>n;n -- ERROR ILO-P011: `flat` is a builtin and cannot be used as a function name -- hint: rename to something like `myflat` or `flatof`. main>n;flat=cat xs " ";spl flat ". " -- ERROR ILO-P011: `flat` is a builtin and cannot be used as a binding name -- hint: rename to something like `myflat` or `flatv`. [Reserved namespaces] Short builtin names are precious surface and ilo reserves a stable subset of them. To save agents (and their carry-forward scripts) from "what got reserved this release?" debugging cycles, the language publishes the full short-name reserve list plus a forward-compatibility rule for future builtins. **Type-sigil letters are not reserved as identifiers.** The primitive type letters `n` (number), `t` (text), `b` (bool) — and the compound sigils `L`, `R`, `O`, `M`, `S`, `F` — are *position*-scoped. They are recognised as types only after `:` in a parameter binding or after `>` in a return-type annotation. Everywhere else (binding LHS, expression operand, fn name) they are normal lowercase identifiers. `t = 5` binds a local `t`; the canonical first example `tot p:n q:n r:n>n;s=*p q;t=*s r;+s t` uses `t` as a scratch local. Agents are free to use `n`, `t`, `b` as variables. Capital letters remain rejected as user identifiers (the rule `[a-z][a-z0-9]*(-[a-z0-9]+)*` is the source of truth). See [ILO-478](https://linear.app/ilo-lang/issue/ILO-478) for the in-flight migration that makes the primitive sigils uppercase (`N`/`T`/`B`) and removes this positional caveat. **Currently reserved short names (1-3 characters).** Every name in this list is a builtin today and triggers `ILO-P011` if used as a binding or user-function name: 1-char e 2-char at hd pi tl rd wr ct 3-char abs avg b64 bor cap cat cel chr cos del det dot env ewm exp fft fld flr flt fmt frq get grp has hed hex inv len log lsd lst lwr map max min mod now num opt ord pat pow pst put rdb rdl rep rev rgx rng rnd rou run sin slc spl srt str sum tan tau trm unq upr wra wrl wro zip All builtin aliases (`head`, `length`, `filter`, `concat`, `tail`, `sort`, `reverse`, `flatten`, `contains`, `group`, `average`, `print`, `trim`, `split`, `format`, `regex`, `read`, `readlines`, `readbuf`, `write`, `writelines`, `lset`, `floor`, `ceil`, `round`, `rand`, `random`, `rng`, `string`, `number`, `slice`, `unique`, `fold`) are reserved with the same shadow-prevention semantics as canonical builtin names. Binding an alias name or using it as a user-function name fires `ILO-P011` at parse time with the canonical form in the diagnostic, since the call-site rewrite to the canonical builtin silently bypasses any user binding of the same name. Previously only `rng` and `rand` had individual guards; as of 0.12.1 every alias in the table above is covered by a single `resolve_alias` check, so new aliases automatically inherit the protection when added to the table. Longer builtin names (`acos`, `asin`, `atan`, `flat`, `take`, `drop`, `mget`, `mset`, `mmap`, `prnt`, `mapr`, `solve`, `lstsq`, `clamp`, `cumsum`, `cprod`, `median`, `matmul`, `range`, `window`, `chunks`, `walk`, `glob`, `prod`, `fsize`, `mtime`, `isfile`, `isdir`, `band`, `bxor`, `bnot`, `bshl`, `bshr`, `brot`, …) are also reserved and rejected by `ILO-P011`, but the short-name namespace above is where carry-forward scripts most often collide, so it gets explicit enumeration. Longer builtin names (`acos`, `asin`, `atan`, `flat`, `take`, `drop`, `mget`, `mset`, `mmap`, `prnt`, `mapr`, `solve`, `clamp`, `cumsum`, `cprod`, `median`, `matmul`, `range`, `window`, `chunks`, `walk`, `glob`, `prod`, `fsize`, `mtime`, `isfile`, `isdir`, `ones`, `linspace`, …) are also reserved and rejected by `ILO-P011`, but the short-name namespace above is where carry-forward scripts most often collide, so it gets explicit enumeration. **Forward-compatibility rule.** Future ilo releases add new builtins under names **4 characters or longer**. A 2-character name that is not on this list today is safe to use as a binding or function name and stays safe across releases. A 3-character name that is not on this list is _highly likely_ to stay safe but is not a hard promise - the 3-char surface is already dense, and a rare ergonomic win may justify an addition, called out in the changelog. This gives agents a deterministic safe-name strategy: **2 chars**: any unreserved 2-char name is permanently fine for bindings (`ce` for "category", `ix` for index, `mn` for "mean", `pq` for "priority queue", …). Names on the reserved list above never get removed. **3 chars**: prefer unreserved 3-char names where possible. If a future release reserves one, the migration is a 1-character rename plus a changelog entry. **4+ chars**: always safe. New builtins land here first; any short alias is added later only if the long name is unambiguous and the short doesn't shadow a plausible user binding. When a collision does happen, `ILO-P011` surfaces it at the binding site with a rename suggestion - never silently mis-dispatches at the call site (see the `flat=cat xs " "` example above). Combined with the reserve list, that turns every name-collision incident into a single-character rename instead of a debugging spiral. [Cross-language gotchas] Common shapes reached for from other languages. The parser and lexer surface each with a friendly hint: `AND a b`, `OR a b`, `NOT a`=`&a b`, `|a b`, `!a`=`ILO-L001` `=a b`=`<=a b`, `>=a b` (single token)=`ILO-P003` `f=fn x:n>n;+x 1` (lambda)=`(x:n>n;+x 1)` (parenthesised lambda)=`ILO-P009` `\x{+x 1}` (Haskell/Rust lambda)=`(x:n>n;+x 1)` (parenthesised lambda)=`ILO-L001` `flt {x:t> body} xs` (typed-brace at HOF)=`flt (x:t>r;body) xs` (paren = typed; brace = bare params)=`ILO-P001` `flt \x:t>body xs` (typed backslash)=`flt (x:t>r;body) xs` or `flt {x> body} xs`=`ILO-L001` `flt fn x:t>r;body xs` (`fn`-keyword inline)=`flt (x:t>r;body) xs` or `flt {x> body} xs`=`ILO-P009` `main:>n;body`=`main>n;body` (no `:` before `>`)=`ILO-P003` Multi-line body without braces=`@k xs{body}`, `cond{body}` on one line=`ILO-P003` `cond{^"err"}` braced-cond=Braceless `cond ^"err"` for early return=hint only `- -*a b *c d` (double-minus)=`- 0 +*a b *c d` (negate the sum)=`ILO-P021` `[k fmt2 v 2]` (call in list)=`[k (fmt2 v 2)]` or bind-first=`ILO-P101` `[login "a" logout "b"]` (variant ctor in list)=`[(login "a") (logout "b")]` or bind-first=`ILO-T047` `pts=gen-pts;cs0=[...];prnt cs0` at top level=`main>_;pts=gen-pts;cs0=[...];prnt cs0` (wrap in `main>_;`)=`ILO-P102` `((((...((1+1))))...))` 1000 deep=bind intermediates, or pass `--max-ast-depth N`=`ILO-P103` `dx=xj 0-xi` (call vs binop)=`-xj xi` or pre-bind: `nxi=0-xi;+xj nxi`=`ILO-T005` `wc==q ""` (no space, binding+equality)=`wc = =q ""` (single `=` to bind, then prefix `=a b` for equality)=`ILO-T005` `tup.0` / `pair.0` (tuple access)=bind from `zip`-pair, then `at pair 0` (no tuple type)=`ILO-T004` `?? (num s) 0` (`??` on `R T E`)=`default-on-err (num s) 0` or `?(num s){~v:v;^_:0}`=`ILO-T041` `?bool{body}` (bool-conditional)=guard `=bool true body`, braced `=bool true{body}`, ternary `?bool a b`, or match `?bool{true:a; false:b}`=`ILO-P011` `(x:n>n;>=x 0 0;x)` (braceless guard inside lambda)=`(x:n>n;?>=x 0 0 x)` (prefix ternary) or `(x:n>n;?>=x 0{0}{x})` (braced match)=`ILO-P023` `+a+" "+b+c` (infix-style chain with leading prefix `+`)=drop the leading `+`: `a+" "+b+c`; or `fmt "{} {} {}" a b c`; or nested prefix `+a +" " +b c`; or bind intermediates=`ILO-P010` `fmt "{}" +0.1 0.2` -> `0.30000000000000004` (float Display = full IEEE 754)=`fmt "{:.2f}" (+0.1 0.2)` for human-readable; `fmt2 v N` for precise dp=docs only `*/ sz 0.3 0` ("scale then div by 0")=`*/a b c` is `(a/b)*c` — b is the divisor; for `(a*b)/c` use `/*sz 0.3 0` or bind `r=*sz 0.3;/r 0`=hint only `?h a b` (keyword form on bare ref)=`? a b` (bare-bool prefix ternary)=`ILO-W003` Each case fires a hint pointing at the canonical form; the agent's first retry should be the right one. Identifier-shaped collisions with builtin names (`len=...`, `sin=...`) are rejected with `ILO-P011` plus a rename suggestion. The list-literal call trap (`ILO-P101`) catches the case where a variadic builtin (`fmt`, `fmt2`) appears bare inside `[...]`. Fixed-arity builtins (`str`, `at`, `map`, ...) auto-expand to a call as one element, but variadic ones can't (the parser doesn't know where their args end), so the bare form would silently fall through as multiple elements with the builtin name as an undefined Ref. Fix by wrapping the call in parens (`[k (fmt2 v 2)]`) or binding first. The top-level chain trap (`ILO-P102`) catches a bare `name=expr` at the top level. ilo requires every binding to live inside a function body; a top-level `pts=gen-pts;cs0=[[...]]; ...; prnt cs2` without a `main>_;` (or any) header used to either die on the `=` (a bare `ILO-P003`) or get slurped into a previous function's body and emit a wall of misleading `ILO-T005` cascades on the wrong line. `ILO-P102` collapses both shapes into a single diagnostic that names the offending binding and suggests the canonical `main>_;` wrapper. The double-minus trap (`ILO-P021`) catches the silent-miscompile shape `- - a b c d` for `` in `{+,*,/}`. Read intuitively as `-(a*b) - (c*d)` but parses as `-((a*b) - (c*d)) = -(a*b) + (c*d)` because the inner `-` greedily consumes both prefix-binop groups as binary subtract and the outer `-` falls back to unary negate. Fix by negating the sum (`- 0 +*a b *c d`) or binding first (`p=*a b;q=*c d;- 0 +p q`). Single-atom variants like `- -a b` remain accepted since they're unambiguous. The glued-`==` binding trap (`ILO-T005` with the ILO-469 hint) catches `name==expr` written without a space. Both `=` and `==` lex as a single `Token::Eq`, so `wc==q ""` parses as the binding `wc = (q "")` — a call on `q` — and the verifier fails because `q` isn't a function. The hint names the missing space and shows the canonical rewrite `wc = =q ""` (single `=` for the binding, then prefix `=a b` for equality). ilo does not fuse `==` into a single bind-then-equality token; the diagnostic is a nudge, not a syntactic concession. The call-vs-binop trap (`ILO-T005` with tailored hint) catches the assignment-RHS shape `name expr` where `name` is a bound non-fn value (typically a parameter). Whitespace-juxtaposition is the call syntax in ilo, so `dx=xj 0-xi` parses as `dx=(xj 0)-xi` — a call to `xj` with argument `0`. Verification fails because `xj` isn't a function. The hint surfaces the prefix-operator alternatives (`-xj xi`, `+xj `) and the pre-bind workaround. The misparse is most common when an agent reaches for infix arithmetic between a parameter and a subexpression; pre-binding the operand always resolves the ambiguity. `ilo --explain ILO-T005` includes the full gotcha walkthrough. The tuple-access trap (`ILO-T004` with the `at ` hint) catches `tup.0` / `pair.0` shapes where `tup` / `pair` was never bound. ilo has no tuple type. `zip xs ys` returns `L (L n)` — a list of two-element lists — so destructuring a pair is `at pair 0` / `at pair 1`, not `pair.0` / `pair.1`. The hint names the exact `at` call to write. (`pair.0` itself is still valid sugar for list indexing once `pair` is bound to an `L T`; the diagnostic only fires when the identifier is unbound.) The AST depth cap (`ILO-P103`) catches deeply nested source that would otherwise blow the parser stack. Any context that compiles untrusted text - `ilo serv`, the bare-positional dispatch, the `--ast` dump - is exposed to a payload of the shape `((((...((1+1))))...))` 1000 levels deep that recurses straight through the OS thread stack. The default cap of 256 is far above anything hand-written (the in-tree examples top out under 20) and low enough to keep the worst-case stack frame in `parse_atom`/`parse_expr` inside the default 8 MB main-thread stack. Override with `--max-ast-depth N` on `ilo`, `ilo run`, `ilo check`, `ilo build`, and `ilo serv` when a legitimate program needs deeper nesting. +NAMING: Short names everywhere. 1–3 chars. `order`=`ord`=truncate `customers`=`cs`=consonants `data`=`d`=single letter `level`=`lv`=drop vowels `discount`=`dc`=initials `final`=`fin`=first 3 `items`=`its`=first 3 Function names follow the same rules. Field names in constructors and external tool names keep their full form - they define the public interface. [Identifier syntax] Identifiers are lowercase ASCII only, optionally with hyphenated segments. Formally: `[a-z][a-z0-9]*(-[a-z0-9]+)*`. Capital letters and underscores are rejected at the binding and call site. run -- OK run-d -- OK (hyphen separates segments) r2 -- OK (digit after first letter) runD -- ERROR (capital letter) RunD -- ERROR (leading capital) run_d -- ERROR (underscore not allowed in bindings) -run -- ERROR (must start with a letter) `runD` in the interactive CLI surfaces as `ILO-L003 unexpected token` with a suggestion to use `run-d` or `rund`. The constraint is intentional: a single lexical shape per identifier keeps the token stream predictable for agents and avoids style debates over camelCase vs snake_case vs kebab-case. **Hyphen vs subtraction.** A hyphen with no surrounding whitespace is always part of an identifier — `best-d` is one token, never `best - d`. Subtraction requires whitespace on at least the operator side: `- best d` (prefix form) or `best - d` (infix form). When an unbound kebab ident has every segment bound, `ILO-T004` adds a hint pointing at the prefix form. When an unbound kebab ident splits uniquely into two bound names (e.g. `zr-sq-zi-sq` → `zr-sq` and `zi-sq`), the hint shows both the prefix form (`- zr-sq zi-sq`) and the infix-with-spaces form (`zr-sq - zi-sq`). The only place capital letters and underscores are accepted is **after `.` or `.?`** at field-access position, so heterogeneous JSON keys from real APIs work without rewriting. See [Field names at dot-access](#field-names-at-dot-access) for the full list of post-dot relaxations (`r.URL`, `r.AccessKey`, `r.user_name`, etc.). Binding names (`AccessKey = ...`) and function names (`AccessKey x:n>n;...`) still error. [Reserved words] The following identifiers are reserved and cannot be used as names: `if`, `return`, `let`, `fn`, `def`, `var`, `const`. Using them produces a friendly error with the ilo equivalent: -- ERROR: `if` is a reserved word. Use: ?cond{true:...;false:...} -- ERROR: `return` is a reserved word. Last expression is the return value. -- ERROR: `let` is a reserved word. Use: name = expr -- ERROR: `fn`/`def` is a reserved word. Use: name param:type > rettype; body These checks fire at parse time across every context the keyword can appear in: top-level declaration head (`fn>n;...`), binding LHS (`fn=5`), and **parameter position** (`g fn:n>n;fn` rejects with ILO-P011 against the param name, not a cryptic ILO-P003 against the missing `>`). Builtin names (`flat`, `frq`, `map`, `flt`, `cat`, `len`, `srt`, `hd`, `tl`, `ord`, `fld`, `lst`, ...) are also rejected as user-function names and as local-binding LHS. Without this, calls to the user fn or use sites of the local binding silently mis-dispatch to the builtin and surface as a confusing `ILO-T006` arity mismatch. The parser intercepts at the declaration site with ILO-P011 and a rename hint: flat n:n>n;n -- ERROR ILO-P011: `flat` is a builtin and cannot be used as a function name -- hint: rename to something like `myflat` or `flatof`. main>n;flat=cat xs " ";spl flat ". " -- ERROR ILO-P011: `flat` is a builtin and cannot be used as a binding name -- hint: rename to something like `myflat` or `flatv`. [Reserved namespaces] Short builtin names are precious surface and ilo reserves a stable subset of them. To save agents (and their carry-forward scripts) from "what got reserved this release?" debugging cycles, the language publishes the full short-name reserve list plus a forward-compatibility rule for future builtins. **Type-sigil letters are not reserved as identifiers.** The primitive type letters `n` (number), `t` (text), `b` (bool) — and the compound sigils `L`, `R`, `O`, `M`, `S`, `F` — are *position*-scoped. They are recognised as types only after `:` in a parameter binding or after `>` in a return-type annotation. Everywhere else (binding LHS, expression operand, fn name) they are normal lowercase identifiers. `t = 5` binds a local `t`; the canonical first example `tot p:n q:n r:n>n;s=*p q;t=*s r;+s t` uses `t` as a scratch local. Agents are free to use `n`, `t`, `b` as variables. Capital letters remain rejected as user identifiers (the rule `[a-z][a-z0-9]*(-[a-z0-9]+)*` is the source of truth). See [ILO-478](https://linear.app/ilo-lang/issue/ILO-478) for the in-flight migration that makes the primitive sigils uppercase (`N`/`T`/`B`) and removes this positional caveat. **Currently reserved short names (1-3 characters).** Every name in this list is a builtin today and triggers `ILO-P011` if used as a binding or user-function name: 1-char e 2-char at hd pi tl rd wr ct 3-char abs avg b64 bor cap cat cel chr cos del det dot env ewm exp fft fld flr flt fmt frq get grp has hed hex inv len log lsd lst lwr map max min mod now num opt ord pat pow pst put rdb rdl rep rev rgx rng rnd rou run sin slc spl srt str sum tan tau trm unq upr wra wrl wro zip All builtin aliases (`head`, `length`, `filter`, `concat`, `tail`, `sort`, `reverse`, `flatten`, `contains`, `group`, `average`, `print`, `trim`, `split`, `format`, `regex`, `read`, `readlines`, `readbuf`, `write`, `writelines`, `lset`, `floor`, `ceil`, `round`, `rand`, `random`, `rng`, `string`, `number`, `slice`, `unique`, `fold`) are reserved with the same shadow-prevention semantics as canonical builtin names. Binding an alias name or using it as a user-function name fires `ILO-P011` at parse time with the canonical form in the diagnostic, since the call-site rewrite to the canonical builtin silently bypasses any user binding of the same name. Previously only `rng` and `rand` had individual guards; as of 0.12.1 every alias in the table above is covered by a single `resolve_alias` check, so new aliases automatically inherit the protection when added to the table. Longer builtin names (`acos`, `asin`, `atan`, `flat`, `take`, `drop`, `mget`, `mset`, `mmap`, `prnt`, `mapr`, `solve`, `lstsq`, `clamp`, `cumsum`, `cprod`, `median`, `matmul`, `range`, `window`, `chunks`, `walk`, `glob`, `prod`, `fsize`, `mtime`, `isfile`, `isdir`, `band`, `bxor`, `bnot`, `bshl`, `bshr`, `brot`, …) are also reserved and rejected by `ILO-P011`, but the short-name namespace above is where carry-forward scripts most often collide, so it gets explicit enumeration. Longer builtin names (`acos`, `asin`, `atan`, `flat`, `take`, `drop`, `mget`, `mset`, `mmap`, `prnt`, `mapr`, `solve`, `clamp`, `cumsum`, `cprod`, `median`, `matmul`, `range`, `window`, `chunks`, `walk`, `glob`, `prod`, `fsize`, `mtime`, `isfile`, `isdir`, `ones`, `linspace`, …) are also reserved and rejected by `ILO-P011`, but the short-name namespace above is where carry-forward scripts most often collide, so it gets explicit enumeration. **Forward-compatibility rule.** Future ilo releases add new builtins under names **4 characters or longer**. A 2-character name that is not on this list today is safe to use as a binding or function name and stays safe across releases. A 3-character name that is not on this list is _highly likely_ to stay safe but is not a hard promise - the 3-char surface is already dense, and a rare ergonomic win may justify an addition, called out in the changelog. This gives agents a deterministic safe-name strategy: **2 chars**: any unreserved 2-char name is permanently fine for bindings (`ce` for "category", `ix` for index, `mn` for "mean", `pq` for "priority queue", …). Names on the reserved list above never get removed. **3 chars**: prefer unreserved 3-char names where possible. If a future release reserves one, the migration is a 1-character rename plus a changelog entry. **4+ chars**: always safe. New builtins land here first; any short alias is added later only if the long name is unambiguous and the short doesn't shadow a plausible user binding. When a collision does happen, `ILO-P011` surfaces it at the binding site with a rename suggestion - never silently mis-dispatches at the call site (see the `flat=cat xs " "` example above). Combined with the reserve list, that turns every name-collision incident into a single-character rename instead of a debugging spiral. [Cross-language gotchas] Common shapes reached for from other languages. The parser and lexer surface each with a friendly hint: `AND a b`, `OR a b`, `NOT a`=`&a b`, `|a b`, `!a`=`ILO-L001` `=a b`=`<=a b`, `>=a b` (single token)=`ILO-P003` `f=fn x:n>n;+x 1` (lambda)=`(x:n>n;+x 1)` (parenthesised lambda)=`ILO-P009` `\x{+x 1}` (Haskell/Rust lambda)=`(x:n>n;+x 1)` (parenthesised lambda)=`ILO-L001` `flt {x:t> body} xs` (typed-brace at HOF)=`flt (x:t>r;body) xs` (paren = typed; brace = bare params)=`ILO-P001` `flt \x:t>body xs` (typed backslash)=`flt (x:t>r;body) xs` or `flt {x> body} xs`=`ILO-L001` `flt fn x:t>r;body xs` (`fn`-keyword inline)=`flt (x:t>r;body) xs` or `flt {x> body} xs`=`ILO-P009` `main:>n;body`=`main>n;body` (no `:` before `>`)=`ILO-P003` Multi-line body without braces=`@k xs{body}`, `cond{body}` on one line=`ILO-P003` `cond{^"err"}` braced-cond=Braceless `cond ^"err"` for early return=hint only `- -*a b *c d` (double-minus)=`- 0 +*a b *c d` (negate the sum)=`ILO-P021` `[k fmt2 v 2]` (call in list)=`[k (fmt2 v 2)]` or bind-first=`ILO-P101` `[login "a" logout "b"]` (variant ctor in list)=`[(login "a") (logout "b")]` or bind-first=`ILO-T047` `pts=gen-pts;cs0=[...];prnt cs0` at top level=`main>_;pts=gen-pts;cs0=[...];prnt cs0` (wrap in `main>_;`)=`ILO-P102` `((((...((1+1))))...))` 1000 deep=bind intermediates, or pass `--max-ast-depth N`=`ILO-P103` `dx=xj 0-xi` (call vs binop)=`-xj xi` or pre-bind: `nxi=0-xi;+xj nxi`=`ILO-T005` `wc==q ""` (no space, binding+equality)=`wc = =q ""` (single `=` to bind, then prefix `=a b` for equality)=`ILO-T005` `tup.0` / `pair.0` (tuple access)=bind from `zip`-pair, then `at pair 0` (no tuple type)=`ILO-T004` `?? (num s) 0` (`??` on `R T E`)=`default-on-err (num s) 0` or `?(num s){~v:v;^_:0}`=`ILO-T041` `?bool{body}` (bool-conditional)=guard `=bool true body`, braced `=bool true{body}`, ternary `?bool a b`, or match `?bool{true:a; false:b}`=`ILO-P011` `(x:n>n;>=x 0 0;x)` (braceless guard inside lambda)=`(x:n>n;?>=x 0 0 x)` (prefix ternary) or `(x:n>n;?>=x 0{0}{x})` (braced match)=`ILO-P023` `+a+" "+b+c` (infix-style chain with leading prefix `+`)=drop the leading `+`: `a+" "+b+c`; or `fmt "{} {} {}" a b c`; or nested prefix `+a +" " +b c`; or bind intermediates=`ILO-P010` `fmt "{}" +0.1 0.2` -> `0.30000000000000004` (float Display = full IEEE 754)=`fmt "{:.2f}" (+0.1 0.2)` for human-readable; `fmt2 v N` for precise dp=docs only `*/ sz 0.3 0` ("scale then div by 0")=`*/a b c` is `(a/b)*c` — b is the divisor; for `(a*b)/c` use `/*sz 0.3 0` or bind `r=*sz 0.3;/r 0`=hint only `?h a b` (keyword form on bare ref)=`? a b` (bare-bool prefix ternary)=`ILO-W003` `pred q:t>b;=q "" 1;false` (guard tail literal)=`=q "" true;false` (tail value must match declared return type)=`ILO-T008` Each case fires a hint pointing at the canonical form; the agent's first retry should be the right one. Identifier-shaped collisions with builtin names (`len=...`, `sin=...`) are rejected with `ILO-P011` plus a rename suggestion. The list-literal call trap (`ILO-P101`) catches the case where a variadic builtin (`fmt`, `fmt2`) appears bare inside `[...]`. Fixed-arity builtins (`str`, `at`, `map`, ...) auto-expand to a call as one element, but variadic ones can't (the parser doesn't know where their args end), so the bare form would silently fall through as multiple elements with the builtin name as an undefined Ref. Fix by wrapping the call in parens (`[k (fmt2 v 2)]`) or binding first. The top-level chain trap (`ILO-P102`) catches a bare `name=expr` at the top level. ilo requires every binding to live inside a function body; a top-level `pts=gen-pts;cs0=[[...]]; ...; prnt cs2` without a `main>_;` (or any) header used to either die on the `=` (a bare `ILO-P003`) or get slurped into a previous function's body and emit a wall of misleading `ILO-T005` cascades on the wrong line. `ILO-P102` collapses both shapes into a single diagnostic that names the offending binding and suggests the canonical `main>_;` wrapper. The double-minus trap (`ILO-P021`) catches the silent-miscompile shape `- - a b c d` for `` in `{+,*,/}`. Read intuitively as `-(a*b) - (c*d)` but parses as `-((a*b) - (c*d)) = -(a*b) + (c*d)` because the inner `-` greedily consumes both prefix-binop groups as binary subtract and the outer `-` falls back to unary negate. Fix by negating the sum (`- 0 +*a b *c d`) or binding first (`p=*a b;q=*c d;- 0 +p q`). Single-atom variants like `- -a b` remain accepted since they're unambiguous. The glued-`==` binding trap (`ILO-T005` with the ILO-469 hint) catches `name==expr` written without a space. Both `=` and `==` lex as a single `Token::Eq`, so `wc==q ""` parses as the binding `wc = (q "")` — a call on `q` — and the verifier fails because `q` isn't a function. The hint names the missing space and shows the canonical rewrite `wc = =q ""` (single `=` for the binding, then prefix `=a b` for equality). ilo does not fuse `==` into a single bind-then-equality token; the diagnostic is a nudge, not a syntactic concession. The call-vs-binop trap (`ILO-T005` with tailored hint) catches the assignment-RHS shape `name expr` where `name` is a bound non-fn value (typically a parameter). Whitespace-juxtaposition is the call syntax in ilo, so `dx=xj 0-xi` parses as `dx=(xj 0)-xi` — a call to `xj` with argument `0`. Verification fails because `xj` isn't a function. The hint surfaces the prefix-operator alternatives (`-xj xi`, `+xj `) and the pre-bind workaround. The misparse is most common when an agent reaches for infix arithmetic between a parameter and a subexpression; pre-binding the operand always resolves the ambiguity. `ilo --explain ILO-T005` includes the full gotcha walkthrough. The tuple-access trap (`ILO-T004` with the `at ` hint) catches `tup.0` / `pair.0` shapes where `tup` / `pair` was never bound. ilo has no tuple type. `zip xs ys` returns `L (L n)` — a list of two-element lists — so destructuring a pair is `at pair 0` / `at pair 1`, not `pair.0` / `pair.1`. The hint names the exact `at` call to write. (`pair.0` itself is still valid sugar for list indexing once `pair` is bound to an `L T`; the diagnostic only fires when the identifier is unbound.) The AST depth cap (`ILO-P103`) catches deeply nested source that would otherwise blow the parser stack. Any context that compiles untrusted text - `ilo serv`, the bare-positional dispatch, the `--ast` dump - is exposed to a payload of the shape `((((...((1+1))))...))` 1000 levels deep that recurses straight through the OS thread stack. The default cap of 256 is far above anything hand-written (the in-tree examples top out under 20) and low enough to keep the worst-case stack frame in `parse_atom`/`parse_expr` inside the default 8 MB main-thread stack. Override with `--max-ast-depth N` on `ilo`, `ilo run`, `ilo check`, `ilo build`, and `ilo serv` when a legitimate program needs deeper nesting. COMMENTS: -- full line comment +a b -- end of line comment -- no multi-line comments; use consecutive -- lines -- like this Single-line only. `--` to end of line. No multi-line comment syntax - newlines are a human display concern, not a language concern. An entire ilo program can be one line. Use consecutive `--` lines when humans need multi-line comments. Stripped at the lexer level before parsing - comments produce no AST nodes and cost zero runtime tokens. Generating `--` costs 1 LLM token, so comments are essentially free. **Gotcha:** `--x 1` is a comment, not "negate (x minus 1)". The lexer matches `--` greedily as a comment and eats the rest of the line. To negate a subtraction, use a space or bind first: -- DON'T: --x 1 (comment, not negate-subtract) -- DO: - -x 1 (space separates the two minus operators) -- DO: r=-x 1;-r (bind first) OPERATORS: Both prefix and infix notation are supported. **Prefix is preferred** - it is the token-optimal form that eliminates parentheses and produces denser code. Infix is available for readability when needed. [Binary] `+a b`=`a + b`=add / concat / list concat=`n`, `t`, `L` `+=a v`=append to list (returns new list, see [Append semantics](#append-semantics-+=))=`L` `-a b`=`a - b`=subtract=`n` `*a b`=`a * b`=multiply=`n` `/a b`=`a / b`=divide=`n` `=a b`=`a == b`=equal (prefix `=` is preferred; `==a b` also accepted)=any `!=a b`=`a != b`=not equal=any `>a b`=`a > b`=greater than=`n`, `t` `=a b`=`a >= b`=greater or equal=`n`, `t` `<=a b`=`a <= b`=less or equal=`n`, `t` `&a b`=`a & b`=logical AND (short-circuit)=any (truthy) `|a b`=`a | b`=logical OR (short-circuit)=any (truthy) [Append semantics (`+=`)] `+=xs v` is **pure-shaped**, despite the imperative-looking syntax. It returns a new list with `v` appended and does **not** mutate `xs` in the caller's scope. It works in every position a value-producing expression works: -- 1. Rebind (canonical accumulator pattern) xs=[];@i 0..3{xs=+=xs i};xs -- [0, 1, 2] -- 2. Non-rebind assignment (xs preserved) xs=[1, 2, 3];ys=+=xs 99 -- xs is still [1, 2, 3]; ys is [1, 2, 3, 99] -- 3. Pipeline / argument position len +=xs 99 -- length of [xs..., 99] sum +=xs 99 -- sum of [xs..., 99] The rebind shape `xs = +=xs v` is the standard foreach-build accumulator. When the binding is RC=1 the engines mutate the underlying buffer in place (amortised O(1) per push) - but this is a behind-the-scenes optimisation. To any observer the operation is still functional: nothing outside the rebind sees the old `xs`. The non-rebind shape `ys = +=xs v` always allocates a fresh list and leaves `xs` untouched, so source aliases are safe. There is no separate `push` builtin. `+=` covers every use case and is shorter; adding an alias would mean two ways to spell the same operation, costing reasoning tokens and surface area. [Unary] `-x`=negate=`n` `!x`=logical NOT=any (truthy) [Special infix] `a??b`=nil-coalesce (if a is nil, return b)=any `a>>f`=pipe (desugar to `f(a)`)=any **`??` precedence.** Infix `??` is parsed by `maybe_nil_coalesce` after the primary expression — it binds **looser than every arithmetic, comparison, and boolean operator**, and tighter than `>>` (pipe). So `c??0+1` is `c ?? (0+1)`, not `(c??0) + 1`. Prefix `??x default` mirrors the infix form: the default slot is a full expression, exactly like the right operand of any other prefix binop. This means **`??` inside a prefix-binop chain follows the standard prefix-binop rule**: the outer op consumes its left atom, and `??` then binds the next atom as its value and the rest as its default. To get `(a ?? d) + b` you must bind first or wrap in parens: +a ??d b -- = a + (d ?? b) ← parses as prefix `??d b` +(a??d) b -- = (a ?? d) + b ← parens force the grouping x=a??d;+x b -- = (a ?? d) + b ← bind-first, manifesto-preferred The same shape applies to every prefix binop (`-a ??d b`, `*x ??y z`, `>p ??d r`, etc.). The grouping is consistent with `+a *b c` = `a + (b*c)` — a prefix op in the right-operand slot consumes its own operands greedily. The trap is that `??` reads visually like it should be sticky to the preceding atom; it isn't. When the LHS of `??` is the value being defaulted, bind first or wrap in parens. The analogous shape with the boolean operators (`+a |0 b`, `*a &1 b`) parses the same way, but those produce a type error at verify time (`+` / `*` on a bool result), so they fail loudly rather than silently miscompiling. The `??` shape is the dangerous one: both sides of `??` can be `n`, so the parse silently produces the wrong arithmetic. [Prefix nesting (no parens needed)] +*a b c -- (a * b) + c *a +b c -- a * (b + c) >=+x y 100 -- (x + y) >= 100 -*a b *c d -- (a * b) - (c * d) +a ??c 0 -- a + (c ?? 0) ← not (a ?? 0) + c *x ??y 1 -- x * (y ?? 1) ← not (x ?? y) * 1 The outer prefix op binds the inner prefix subexpression as its **left** operand, regardless of operator precedence. With two same-precedence ops side by side this is easy to misread: */a b c -- (a/b) * c ← NOT (a*b)/c, NOT a 3-arg compound op /*a b c -- (a*b) / c ← NOT (a/b)*c +-a b c -- (a-b) + c ← NOT (a+b)-c -+a b c -- (a+b) - c ← NOT (a-b)+c `*/` is **not** a 3-arg compound multiply-then-divide. It is the prefix-`*` op with a nested prefix-`/` subexpression as its left operand, so the divisor is the **second** atom (`b`), not the third (`c`). Reading agents commonly mis-write `*/ sz 0.3 0` expecting `sz * 0.3 / 0` and trip a divide-by-zero from the `/0.3`-shaped subexpression once the inputs make `b` zero, since they assumed the trailing `0` was the divisor. The runtime emits a `hint:` diagnostic when one of these four pairs appears at a prefix position, since the parse order disagrees with the natural left-to-right reading. To force the other grouping, swap the ops or bind the inner result first: -- Want (a*b)/c with a=6, b=2, c=3: r=*a b;/r c -- bind, then divide → 4 /*a b c -- equivalent, swapping the prefix-pair order [Infix-style chained `+` with a leading prefix `+`] Infix `a+b+c` parses cleanly. But adding a leading `+` (`+a+b+c`) flips the expression into **prefix mode**: the parser reads `+a` as a prefix binop, so the immediately-following `+b` orphans and the chain unwinds into `ILO-P010` (expected expression, got EOF). The parser detects the `+atom+atom+...` shape (adjacent `+` between atoms) and attaches a hint pointing at the canonical rewrites: +a+" "+b+" "+c -- rejected (ILO-P010 + targeted hint) a+" "+b+" "+c -- OK (pure infix concat) fmt "{} {} {}" a b c -- OK (formatted string) +a +" " +b +" " c -- OK (nested prefix, right-associative) s1=+a " ";s2=+s1 b;s3=+s2 " ";+s3 c -- OK (bind intermediates) The same rule applies to numeric chains (`+a+b+c` regardless of operand type). The fix is always one of: drop the leading `+`, switch to `fmt`, nest prefix ops with spaces, or bind intermediates. [Infix precedence] Standard mathematical precedence (higher binds tighter): 6=`*` `/` 5=`+` `-` `+=` 4=`>` `<` `>=` `<=` 3=`=` `!=` 2=`&` 1=`|` 0=`??` (binds looser than every arithmetic/boolean op; tighter than `>>`) Function application binds tighter than all infix operators: f a + b -- (f a) + b, NOT f(a + b) x * y + 1 -- (x * y) + 1 (x + y) * 2 -- parens override precedence Each nested prefix operator saves 2 tokens (no `(` `)` needed). Flat prefix like `+a b` saves 1 char vs `a + b`. Across 25 expression patterns, prefix notation saves **22% tokens** and **42% characters** vs infix. See [research/explorations/prefix-vs-infix/](research/explorations/prefix-vs-infix/) for the full benchmark. Disambiguation: `-` followed by one atom is unary negate, followed by two atoms is binary subtract. [Operands] Operator operands are **atoms** (literals, refs, field access), **nested prefix operators**, or **known-arity function calls**. The prefix-binop operand parser dispatches to call parsing when the ident at the cursor is a known-arity user fn or builtin AND the next token can start another operand: wh >len q 0{body} -- parses as wh > (len q) 0 { body } +f g h -- if f is 1-arity: BinOp(+, Call(f, [g]), h) -lnx 5 lnx 3 -- BinOp(-, Call(lnx, [5]), Call(lnx, [3])) dbl 5 -- Negate(Call(dbl, [5])) - unary on a call This parallels the `??` precedent: `??x default` accepts a call expression on the value side. Applies to every prefix-binop family member - `+`, `-`, `*`, `/`, comparisons, `&`, `|`, `+=` - and to unary negate when the call consumes the only operand. The same expansion also applies to the then/else slots of the prefix-ternary family (`?=cond a b`, `?>cond a b`, …) and the `?h cond a b` keyword form, so `?h =a b sev sc "NONE"` parses `sev sc` as a nested call without parens or a bind-first. Bare locals that shadow a user fn name still resolve via `Ref` rather than expanding into a zero-arg call, so `&e f{...}` where `f` is a local still parses as the bool operator with two refs. When the call expansion isn't available (the ident is a local that shadows a fn name, or the call's arity doesn't fit the remaining tokens), bind the call result first: r=fac p;*n r -- bind, then operate - always unambiguous **Negative literals vs binary minus**: the lexer greedily includes a leading `-` into number tokens. `-1`, `-7`, `-0` are all number literals at fresh-expression positions. To subtract from zero at the start of a statement, use a space: `- 0 v` (Minus token, then `0`, then `v`). f v:n>n;-0 v -- WRONG: -0 is Number(-0.0); v is a stray token f v:n>n;- 0 v -- OK: binary subtract: 0 - v = -v The lexer splits a glued negative literal back into `Minus + Number` when the previous token is one of `;`, `\n`, `=`, `{`, `(`, or `-`. The `-` context covers the operand slot of an outer prefix-minus, so `- -0 a b` lexes as `-, -, 0, a, b` and parses as `Subtract(Subtract(0, a), b)` = `-a - b` rather than tripping `ILO-P020`. Negative literals after an Ident, `[`, or another prefix binop (`+`, `*`, `/`) stay glued so call args (`at xs -1`), list literals (`[-2 1 3]`), and binary operands (`+a -3`) read naturally. **Subtraction spacing convention**: for general subtraction at statement position, write `a - b` with spaces on **both** sides. `a -b` (glued, no space before the `-`) is not a binary subtract: the lexer packs `-b` into a negative-literal token because the previous token (`a`, an Ident) is one of the keep-glued contexts above. That's deliberate so call args and list elements read naturally, but it means `0 -1.5` is a parse error (`ILO-P001: expected declaration, got number `-1.5`` with a tailored hint pointing at this rule). For a bare negative value as an expression, wrap in parens: `(-1.5)`. STRING LITERALS: Text values are written in double quotes. Escape sequences: `\n`=newline (0x0A) `\t`=tab (0x09) `\r`=carriage return (0x0D) `\f`=form feed (0x0C, PDF page separator) `\b`=backspace (0x08) `\v`=vertical tab (0x0B) `\a`=bell (0x07) `\0`=null (0x00) `\"`=literal double quote `\\`=literal backslash `\/`=literal forward slash (JSON passthrough) Unknown escapes (e.g. `\z`) preserve the backslash + char verbatim. "hello\nworld" -- two-line string "col1\tcol2" -- tab-separated spl text "\n" -- split file content into lines spl pdf "\f" -- split pdftotext output into pages [Triple-quoted strings: `"""..."""`] Same surface as `"..."` (same escape decoding, same `{name}` interpolation) with two extra affordances: 1. Raw newlines are allowed inside the literal, so multi-line content does not need `cat`-concatenation or `\n` escapes. 2. When the closing `"""` sits on its own line, the leading newline is dropped and the common leading whitespace (matching the indent of the closing-`"""` line) is stripped from every content line. The terminating `\n` of the last content line is preserved. This is the Python PEP 257 / Rust `indoc!` convention, so indented source produces clean output. banner>t """ line one line two """ -- value is "line one\nline two\n" inline>t """foo bar""" -- value is "foo\n bar" (no dedent: closing inline) len """hello""" -- 5 (single-line form, no newline) len """""" -- 0 (empty body) Inside `"""..."""` a single `"` is literal: only `"""` ends the literal. Escapes (`\n`, `\t`, ...) and `{name}` interpolation decode identically to the single-quoted form, so triple-quoted is a drop-in upgrade rather than a parallel surface. [Interpolation: `{name}`] A bare `{name}` slot inside a double-quoted string desugars at parse time to a `fmt` call with the binding looked up by name. Manifesto principle 1: `"hello {name}"` is cheaper for an agent to write than the verbose `fmt "hello {}" name`, and both produce the same AST so they cost nothing extra at verify or run time. greet name:t>t fmt "hello {name}" -- desugars to: fmt "hello {}" name pair a:t b:t>t fmt "{a} and {b}" -- multiple slots, resolved left-to-right with-braces name:t>t fmt "{{json}} {name}" -- {{ / }} escape to literal { / } Scope (deliberately tight to keep the surface predictable): Only single-identifier slots matching the ident regex (`[a-z][a-z0-9]*(-[a-z0-9]+)*`). `{a-b}` works; `{Foo}`, `{x + 1}`, `{ }` pass through verbatim. `{{` / `}}` escape to literal `{` / `}` in **every** double-quoted string literal (Rust `format!` / Python `str.format` convention). The agent always has a way to emit a literal `{` or `}` without dropping to `chr 123` + concat. A lone unmatched `{` or `}` in a string literal is a parse error (`ILO-P024`) whose hint points at `{{` / `}}` as the canonical escape. Matched but non-ident `{...}` (e.g. `{Foo}`, `{x+1}`) still passes through verbatim so existing `fmt` templates keep working. Bare `{}` keeps its existing meaning as a positional placeholder filled by trailing args of the enclosing `fmt` call. Mixing `{ident}` and bare `{}` in the same string is left verbatim: pick one style per string. Use `fmt "{name} {} done" other` and the parser keeps the `{name}` literal so the bare `{}` resolves to `other`, or write `"{name} {other} done"` and drop the trailing arg. Undefined `{name}` slots surface as a normal ILO-T004 undefined-variable diagnostic against the desugared `fmt` arg, not a silent empty substitution. Interpolation does not apply in pattern literals (`"foo":` arm of a match) - literal patterns stay literal. diff --git a/src/verify.rs b/src/verify.rs index fd52d6f4..06ca9a5a 100644 --- a/src/verify.rs +++ b/src/verify.rs @@ -5313,6 +5313,7 @@ impl VerifyContext { condition, body, else_body, + braceless, .. } => { let _ = self.infer_expr(func, scope, condition, span); @@ -5354,6 +5355,49 @@ impl VerifyContext { scope.pop(); } + // ILO-468: a braceless guard early-returns the value of its + // tail expression. The function's existing return-type check + // only sees the body's *last* statement, so a guard with a + // type-wrong tail value followed by a type-correct fallback + // (`cond wrong-val;fallback`) slips through silently. Surface + // the mismatch here, against the enclosing function's + // declared return type. + if *braceless + && let Some(last) = body.last() + && let Stmt::Expr(_) = &last.node + && body_ty != Ty::Unknown + && let Some(sig) = self.functions.get(func) + { + let expected = sig.return_type.clone(); + if expected != Ty::Unknown + && !compatible_ext(&body_ty, &expected, &self.types) + { + let hint = match (&body_ty, &expected) { + (Ty::Number, Ty::Text) => { + Some("use 'str' to convert: str ".to_string()) + } + (Ty::Text, Ty::Number) => { + Some("use 'num' to parse text (returns R n t)".to_string()) + } + (Ty::Number, Ty::Bool) => Some( + "guards return early — the tail value must be the function's return type. To test for non-zero, use `!=val 0`; for bool from comparison, the comparison itself already yields bool".to_string(), + ), + _ => Some(format!( + "the braceless-guard tail value is an early return — change it to {expected}, or adjust the function's return type" + )), + }; + self.err( + "ILO-T008", + func, + format!( + "braceless-guard tail value type mismatch: expected {expected}, got {body_ty}" + ), + hint, + Some(last.span), + ); + } + } + body_ty } Stmt::Match { subject, arms } => { @@ -7925,6 +7969,59 @@ mod tests { assert!(parse_and_verify("f x:b>t;!x{\"yes\"};\"no\"").is_ok()); } + // ILO-468: a braceless guard early-returns its tail value. When the tail + // value's type doesn't match the function's declared return type, the + // fallback expression after the guard could mask the mismatch — the + // function body's last_ty equals the fallback's type, so the existing + // return-type check passes and the bug silently slips through. + #[test] + fn ilo468_braceless_guard_tail_wrong_type_named_fn() { + // pred returns b (bool); guard tail `1` is n (number); fallback `false` + // is bool, which by itself satisfies the function return type. Without + // the guard-tail check this passed verification silently. + let result = parse_and_verify("pred q:t>b;=q \"\" 1;false"); + assert!(result.is_err(), "expected ILO-T008 for guard-tail mismatch"); + let errors = result.unwrap_err(); + assert!( + errors.iter().any(|e| e.code == "ILO-T008" + && e.message.contains("braceless-guard tail value type mismatch")), + "expected ILO-T008 braceless-guard-tail diagnostic, got {errors:?}" + ); + } + + #[test] + fn ilo468_braceless_guard_tail_wrong_type_inline_lambda() { + // Same trap inside an inline lambda passed to `flt`. The synthetic + // lambda function `__lit_0` must surface the mismatch. + let result = + parse_and_verify("m xs:L t>L t;flt (x:t>b;=x \"\" 1;false) xs"); + assert!(result.is_err(), "expected ILO-T008 for lambda guard-tail mismatch"); + let errors = result.unwrap_err(); + assert!( + errors.iter().any(|e| e.code == "ILO-T008" + && e.message.contains("braceless-guard tail value type mismatch")), + "expected ILO-T008 braceless-guard-tail diagnostic, got {errors:?}" + ); + } + + #[test] + fn ilo468_braceless_guard_tail_matching_type_ok() { + // Tail value is bool (matches declared return) — no diagnostic. + assert!(parse_and_verify("pred q:t>b;=q \"\" true;false").is_ok()); + } + + #[test] + fn ilo468_braceless_guard_negated_matching_type_ok() { + // `>` form of braceless guard, matching type — still clean. + assert!(parse_and_verify("pred q:t>b;>q \"\" false;true").is_ok()); + } + + #[test] + fn ilo468_braceless_guard_number_return_ok() { + // Function declared to return n; guard tail is also n — clean. + assert!(parse_and_verify("fz n:n>n;=n 0 99;n").is_ok()); + } + #[test] fn index_on_non_list() { let result = parse_and_verify("f x:n>n;x.0");