Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,14 @@
"practices": [],
"prerequisites": [],
"difficulty": 9
},
{
"slug": "assemble",
"name": "assemble",
"uuid": "17337704-ef28-490b-bbdd-f8e3f51e326b",
"practices": [],
"prerequisites": [],
"difficulty": 10
}
]
},
Expand Down
141 changes: 141 additions & 0 deletions exercises/practice/assemble/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Instructions

In this exercise, you will define the syntax and execution model for a small assembly-like language inspired by the `x86-64` assembly language.

The goal is to formalize how code is written, parsed, and executed within a constrained environment.

## Execution Model

Your assembly code will be written inside a special construct:

```lean
let program := assemble!(
-- assembly code here
)
```

This construct must produce a **program**, which simulates a function that executes the assembly code and returns a value.
This program is then invoked using the following syntax, where `a`, `b`, `c`, etc. are arguments:

```lean
let result := program(a, b, c, ...)
```

All arguments and the return value are _64-bit signed integers_.
A maximum of _6 arguments_ is allowed.

## Registers

The return value and all arguments are passed in **registers**.
Your language must support the following registers:

| register | role |
|----------|--------------|
| `rdi` | 1st argument |
| `rsi` | 2nd argument |
| `rdx` | 3rd argument |
| `rcx` | 4th argument |
| `r8` | 5th argument |
| `r9` | 6th argument |
| `rax` | return value |

Registers start with a default value of `0`, unless they are used to pass arguments to the function.

For example, in the following program, all registers have a value of `0` with the exception of `rdi`, which is initialized with `10`, and `rsi`. which is initialized with `-20`:

```lean
program(10, -20)
```

The return value of the program is always stored in `rax`.

Note that register names are _case-insensitive_.
This means that `rax`, `RAX` and `rAx` all refer to the same register.

## Assembly code

Each line in the assembly code can be of two forms:

- **Labels** mark specific places of the program to be used by some instructions.
- **Instructions** represent operations executed by the program.

Unless modified by some instruction, the execution flow of the program proceeds linearly.
This means that a previous line is fully executed before the line after it is executed.

### Labels

Labels have the following syntax:

```lean
<label>:
```

They do not alter the values of any register or have any effect to the program other than marking specific places of the code so they can be used by instructions.

Note that labels are _case-sensitive_.
This means that `Start`, `start` and `sTart` are all different labels.

### Two-operand instructions

Most instructions have the following syntax:

```lean
<opcode> <destination>, <source>
```

The `opcode` indicates the operation being executed, using the values of the `destination` and `source` operands.
The result of the operation is stored in the `destination` operand.

For example, the instruction `add` sums the values in the `destination` and the `source` operands and stores this result in the `destination` operand.

The `destination` operand is _always_ a register, whereas the `source` operand may be a register or a literal number.

This is a list of instructions your program must support:

| opcode | operation performed |
|--------|---------------------------------------|
| `mov` | destination := source |
| `add` | destination := destination + source |
| `sub` | destination := destination - source |
| `mul` | destination := destination * source |
| `div` | destination := destination / source |
| `xor` | destination := destination ^^^ source |
| `and` | destination := destination &&& source |
| `or` | destination := destination ||| source |
| `shl` | destination := destination <<< source |
| `shr` | destination := destination >>> source |

Other than those, your program must support the two-operand `cmp` instruction.

This instruction does not modify any register, but instead compares the `destination` and the `source` operands and sets an internal state of the program to one of three states:

- _greater than_, if `destination` > `source`
- _equal_, if `destination` == `source`
- _lesser than_, if `destination` < `source`

How you keep track internally of this state is up to you.

### One-operand instructions

There are some instructions that alter the flow of the program, transferring execution to another point of the code.
They are called _jumping_ instructions.

They all take just one operand, which is a label that indicates the target of the jump, i.e., the point of the code where execution will continue:

```lean
<opcode> <label>
```

The `jmp` instruction _always_ makes the jump to the target label.

Other jumping instructions make the jump only if the internal state of the program is set to a specific value.
This means they are usually preceded by a `cmp` instruction.

Those are the jumping instructions your program must support:

| instruction | operation performed |
|-------------|----------------------------------------------------|
| `jmp` | unconditional jump. The jump is always performed |
| `je` | jumps if the internal state is _equal_ |
| `jl` | jumps if the internal state is _less than_ |
| `jg` | jumps if the internal state is _greater than_ |
157 changes: 157 additions & 0 deletions exercises/practice/assemble/.meta/Example.lean
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
import Std.Data.HashMap

open Lean Syntax Macro Std

structure CPU where
regs : Vector Int64 10
deriving Repr

structure Program where
labels : HashMap String Nat
ops : Array (List String)

private def getStr (id : TSyntax `ident) : String :=
id.getId.toString.toLower

private def regIdx (reg : String) : Fin 10 :=
match reg with
| "rdi" => 0
| "rsi" => 1
| "rdx" => 2
| "rcx" => 3
| "r8" => 4
| "r9" => 5
| "r10" => 6
| "r11" => 7
| "rax" => 8
| "rflags" => 9
| _ => panic! "invalid register"

private def getVal (cpu : CPU) (reg : String) : Int64 :=
match reg.toInt? with
| some n => n.toInt64
| none => cpu.regs[regIdx reg]!

private def unFlag : Int64 := 1
private def eqFlag : Int64 := (1 : Int64) <<< 6
private def gtFlag : Int64 := (1 : Int64) <<< 5
private def ltFlag : Int64 := (1 : Int64) <<< 4

private def compareRegs (dest src : Int64) : Int64 := Id.run do
let mut acc : Int64 := 0
acc := acc ||| ((dest == src).toInt64 <<< 6) -- equal
acc := acc ||| ((decide (dest > src)).toInt64 <<< 5) -- greater
acc := acc ||| ((decide (dest < src)).toInt64 <<< 4) -- lesser
return acc

private def dispatchCondJmp (cpu : CPU) (dest : String) (labels : Std.HashMap String Nat) (flag : Int64) (RIP : Nat) : Nat :=
if flag == unFlag || getVal cpu "rflags" &&& flag != 0 then
if h₀: dest ∈ labels then
labels[dest] + 1
else panic! "Invalid label"
else RIP + 1

declare_syntax_cat x86_dest
syntax ident : x86_dest

declare_syntax_cat x86_src
syntax ident : x86_src
syntax num : x86_src
syntax "-" num : x86_src

declare_syntax_cat x86_inst
syntax ident x86_dest "," x86_src : x86_inst
syntax ident x86_dest : x86_inst
syntax ident ":" : x86_inst

syntax "assemble!(" (x86_inst)* ")" : term
syntax term "(" (term),* ")" : term

private def Program.evaluate (args : Array Int64) (program : Program) : Int64 := Id.run do
let regs := (Array.finRange 6).foldl (init := Vector.replicate 10 0) fun acc i =>
acc.set i (args[i]?.getD 0)

let mut acc : CPU := { regs }
let mut i := 0

while h: i < program.ops.size do
match program.ops[i] with
| [op, dest, src] =>
i := i + 1
let srcVal := getVal acc src
let destVal := getVal acc dest
match op with
| "mov" => acc := { acc with regs := acc.regs.set (regIdx dest) srcVal }
| "add" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal + srcVal) }
| "sub" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal - srcVal) }
| "xor" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal ^^^ srcVal) }
| "and" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal &&& srcVal) }
| "or" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal ||| srcVal) }
| "shl" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal <<< srcVal) }
| "shr" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal >>> srcVal) }
| "mul" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal * srcVal) }
| "div" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal / srcVal) }
| "cmp" => acc := { acc with regs := acc.regs.set (regIdx "rflags") (compareRegs destVal srcVal) }
| _ => continue
| [op, dest] =>
match op with
| "jmp" => i := dispatchCondJmp acc dest program.labels unFlag i
| "je" => i := dispatchCondJmp acc dest program.labels eqFlag i
| "jl" => i := dispatchCondJmp acc dest program.labels ltFlag i
| "jg" => i := dispatchCondJmp acc dest program.labels gtFlag i
| _ => i := i + 1
| _ => i := i + 1

return acc.regs.get $ regIdx "rax"

macro_rules
| `(assemble!( $[$insts]* )) => do
let mut labelNames : Array (TSyntax `term) := #[]
let mut labelIndices : Array (TSyntax `term) := #[]
let mut ops : Array (TSyntax `term) := #[]

for i in [:insts.size] do
match insts[i]! with
| `(x86_inst| $opId:ident $dest:x86_dest , $src:x86_src) =>
match dest with
| `(x86_dest| $destId:ident) =>
match src with
| `(x86_src| $srcId:ident) =>
let opStr := mkStrLit $ getStr opId
let destStr := mkStrLit $ getStr destId
let srcStr := mkStrLit $ getStr srcId
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
| `(x86_src| $srcN:num) =>
let opStr := mkStrLit $ getStr opId
let destStr := mkStrLit $ getStr destId
let srcStr := mkStrLit $ s!"{srcN.getNat}"
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
| `(x86_src| -$srcN:num) =>
let opStr := mkStrLit $ getStr opId
let destStr := mkStrLit $ getStr destId
let srcStr := mkStrLit $ s!"-{srcN.getNat}"
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
| _ => throwUnsupported
| _ => throwUnsupported
| `(x86_inst| $opId:ident $dest:x86_dest) =>
match dest with
| `(x86_dest| $destId:ident) =>
let opStr := mkStrLit $ getStr opId
let destStr := mkStrLit destId.getId.toString
ops := ops.push (← `([$opStr, $destStr]))
| _ => throwUnsupported
| `(x86_inst| $labelId:ident :) =>
labelNames := labelNames.push (mkStrLit labelId.getId.toString)
labelIndices := labelIndices.push (mkNumLit s!"{i}")
ops := ops.push (← `([]))
| _ => throwUnsupported

`(({
ops := #[$[$ops],*],
labels := HashMap.ofList [ $[ ($labelNames, $labelIndices) ],* ]
} : Program)
)
| `($program( $[$args],* )) => do
let args' ← args.mapM fun a => `(($a : Int64))
let result ← `(Program.evaluate #[$[$args'],*] $program)
`(($result))
18 changes: 18 additions & 0 deletions exercises/practice/assemble/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"authors": [
"oxe-i"
],
"files": {
"solution": [
"Assemble.lean"
],
"test": [
"AssembleTest.lean"
],
"example": [
".meta/Example.lean"
]
},
"icon": "binary",
"blurb": "Define the syntax of a minimal x86-64–inspired assembly language"
}
Loading
Loading