A general-purpose tokenizer and Markdown parser with HTML rendering for Go.
- Lexical Scanner: Tokenizes text into identifiers, numbers, strings, operators, and punctuation
- Markdown Parser: Converts Markdown text into an Abstract Syntax Tree (AST)
- HTML Renderer: Renders Markdown AST to HTML with proper escaping
- Configurable: Optional features like comment parsing, newline handling, and float parsing
go get github.com/mutablelogic/go-tokenizerRequires Go 1.23 or later.
package main
import (
"fmt"
"strings"
"github.com/mutablelogic/go-tokenizer"
)
func main() {
scanner := tokenizer.NewScanner(strings.NewReader("hello world 123"), tokenizer.Pos{})
for {
tok := scanner.Next()
if tok.Kind == tokenizer.EOF {
break
}
fmt.Printf("%s: %q\n", tok.Kind, tok.Value)
}
}Output:
Ident: "hello"
Space: " "
Ident: "world"
Space: " "
NumberInteger: "123"package main
import (
"fmt"
"strings"
"github.com/mutablelogic/go-tokenizer"
"github.com/mutablelogic/go-tokenizer/pkg/markdown"
"github.com/mutablelogic/go-tokenizer/pkg/markdown/html"
)
func main() {
input := `# Hello World
This is **bold** and _italic_ text.
- Item 1
- Item 2
- Item 3
`
doc := markdown.Parse(strings.NewReader(input), tokenizer.Pos{})
output := html.RenderString(doc)
fmt.Println(output)
}Output:
<h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em> text.</p><ul><li>Item 1</li><li>Item 2</li><li>Item 3</li></ul>The lexical scanner that breaks input text into tokens.
Token Types:
Ident- Identifiers (hello, world)NumberInteger,NumberFloat,NumberHex,NumberOctal,NumberBinary- NumbersString,QuotedString- String literalsHash,Asterisk,Underscore,Backtick,Tilde- Special charactersSpace,Newline- WhitespaceComment- Comments (when enabled)- And more...
Scanner Features:
// Enable features with bitwise OR
scanner := tokenizer.NewScanner(r, pos,
tokenizer.HashComment | // # style comments
tokenizer.LineComment | // // style comments
tokenizer.BlockComment | // /* */ style comments
tokenizer.NewlineToken | // Emit newlines as separate tokens
tokenizer.UnderscoreToken | // Emit underscores as separate tokens
tokenizer.NumberFloatToken, // Parse floating point numbers
)Defines the AST node types and tree traversal.
// Node interface
type Node interface {
Kind() Kind
Children() []Node
}
// Walk the AST
ast.Walk(doc, func(node ast.Node, depth int) error {
fmt.Printf("%s%s\n", strings.Repeat(" ", depth), node.Kind())
return nil
})Parses Markdown text into an AST.
Supported Syntax:
- Headings:
# H1through###### H6 - Paragraphs: Text separated by blank lines
- Emphasis:
_italic_or*italic* - Strong:
__bold__or**bold** - Strikethrough:
~~deleted~~ - Inline code:
`code` - Code blocks:
```language ... ``` - Links:
[text](url)or<url> - Images:
 - Blockquotes:
> quoted text - Unordered lists:
- item,* item, or+ item - Ordered lists:
1. itemor1) item - Horizontal rules:
---,***, or___
Renders Markdown AST to HTML.
// Render to string
output := html.RenderString(doc)
// Render to io.Writer with indentation
renderer := html.NewRenderer(w).WithIndent(true)
err := renderer.Render(doc)Features:
- Proper HTML escaping for XSS prevention
- Optional indented output for readability
- Language classes on code blocks:
<code class="language-go">
| Kind | Description | HTML Output |
|---|---|---|
Document |
Root node | (container) |
Paragraph |
Text block | <p>...</p> |
Heading |
H1-H6 | <h1>...</h1> |
Text |
Plain text | (escaped text) |
Emphasis |
Italic | <em>...</em> |
Strong |
Bold | <strong>...</strong> |
Strikethrough |
Deleted | <del>...</del> |
Code |
Inline code | <code>...</code> |
CodeBlock |
Fenced code | <pre><code>...</code></pre> |
Link |
Hyperlink | <a href="...">...</a> |
Image |
Image | <img src="..." alt="..."/> |
Blockquote |
Quote | <blockquote>...</blockquote> |
List |
Ordered/Unordered | <ol>...</ol> or <ul>...</ul> |
ListItem |
List item | <li>...</li> |
HorizontalRule |
Divider | <hr/> |
Apache 2.0 - see LICENSE for details.