Metanorm

A high-performance Unicode processing framework for Swift, built with Metal GPU acceleration and optimized for modern text processing applications.

Overview

Metanorm provides a comprehensive solution for Unicode text processing, including normalization, encoding/decoding, and tokenization operations. The framework is designed with performance in mind, utilizing Apple's Metal framework for GPU-accelerated operations and SIMD instructions for parallel processing.

Features

Unicode Engine

Decode/Encode: Convert between text and Unicode codepoints
Normalization: Support for NFC, NFD, NFKC, and NFKD forms
Decomposition: Unicode character decomposition operations
Analysis: Text analysis and script detection
GPU Acceleration: Metal-based parallel processing

Tokenizer Engine

BPE Tokenization: Byte Pair Encoding implementation
Model Support: GPT-2 and other transformer models
GPU Processing: Parallel tokenization using Metal
Cache Management: Automatic rule caching and optimization
Performance Metrics: Built-in benchmarking and analysis

Architecture

Modular Design: Separate Unicode and Tokenizer engines
Unified API: Single entry point for all operations
Swift Package Manager: Easy integration and dependency management
Concurrency Safe: MainActor-based thread safety

Installation

Swift Package Manager

Add Metanorm to your project using Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/toprakdeviren/Metanorm.git", from: "1.0.0")
]

Then add it to your target:

.target(
    name: "YourTarget",
    dependencies: ["Metanorm"]
)

Usage

Basic Unicode Operations

import Metanorm

// Unicode decode/encode
let text = "Hello 🌍 World"
let codepoints = try Metanorm.decodeUnicode(text)
let reconstructed = try Metanorm.encodeUnicode(codepoints)

// Unicode normalization
let normalized = try Metanorm.unicode.normalize(text, form: .NFC)

// Text analysis
let analysis = Metanorm.unicode.analyze(text)
print("Script: \(analysis.primaryScript)")
print("Character count: \(analysis.characterCount)")

Tokenization

import Metanorm

// Initialize tokenizer
let tokenizer = Metanorm.tokenizer(model: .gpt2)

// Tokenize text
let text = "Hello world!"
let tokens = try tokenizer.encode(text)
print("Tokens: \(tokens)")

// Detokenize
let detokenized = try tokenizer.decode(tokens)
print("Detokenized: \(detokenized)")

// Performance analysis
let metrics = tokenizer.benchmark(text)
print("Processing time: \(metrics.totalTime)ms")
print("Throughput: \(metrics.throughput) tokens/sec")

Advanced Usage

import Metanorm

// Direct engine access
let unicodeEngine = Metanorm.unicode
let tokenizerEngine = Metanorm.tokenizer(model: .gpt2)

// Round-trip testing
let testText = "Test text with emoji 🚀"
let result = try Metanorm.roundTripTest(testText, model: .gpt2)
print("Accuracy: \(result.accuracy)%")

// Framework information
let buildInfo = Metanorm.buildInfo
print("Version: \(buildInfo["version"] ?? "Unknown")")

API Reference

Metanorm (Main Interface)

unicode: Access to Unicode processing engine
tokenizer(model:): Make tokenizer for specific model
decodeUnicode(_:): Decode text to Unicode codepoints
encodeUnicode(_:): Encode Unicode codepoints to text
tokenize(_:model:): Quick tokenization
detokenize(_:model:): Quick detokenization
roundTripTest(_:model:): Test round-trip accuracy

UnicodeEngine

decode(_:): Convert text to Unicode codepoints
encode(_:): Convert Unicode codepoints to text
normalize(_:form:): Normalize text to specified form
decompose(_:): Decompose Unicode characters
analyze(_:): Analyze text properties

TokenizerEngine

encode(_:): Tokenize text to token IDs
decode(_:): Convert token IDs back to text
analyze(_:): Analyze token properties
benchmark(_:): Measure performance metrics

Requirements

Platform: macOS 10.15+, iOS 13.0+, tvOS 13.0+, watchOS 6.0+
Swift: 5.7+
Xcode: 14.0+
Metal: Supported GPU required for GPU acceleration

Performance

Metanorm is optimized for high-performance text processing:

GPU Acceleration: Metal-based parallel processing
SIMD Instructions: Vectorized operations where possible
Memory Efficient: Optimized buffer management
Cache System: Automatic rule caching and optimization
Concurrent Processing: Multi-threaded operations

Thread Safety

All public APIs are designed to be thread-safe:

Main actor isolation for shared state
Concurrent access to immutable operations
Safe cross-thread communication

Error Handling

The framework provides comprehensive error handling:

do {
    let tokens = try tokenizer.encode(text)
} catch BpeTokenizerError.gpuExecutionFailed {
    // Handle GPU errors
} catch BpeTokenizerError.pipelineStateNotFound {
    // Handle pipeline errors
} catch {
    // Handle other errors
}

Contributing

Contributions are welcome. Please ensure:

Code follows Swift style guidelines
Tests are included for new features
Documentation is updated
Performance impact is considered

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For questions, issues, or contributions, please visit the GitHub repository.

Changelog

Version 1.0.0

Initial release
Unicode processing engine
BPE tokenization support
Metal GPU acceleration
Swift Package Manager support

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Sources/Metanorm		Sources/Metanorm
Tests/MetanormTests		Tests/MetanormTests
.gitignore		.gitignore
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metanorm

Overview

Features

Unicode Engine

Tokenizer Engine

Architecture

Installation

Swift Package Manager

Usage

Basic Unicode Operations

Tokenization

Advanced Usage

API Reference

Metanorm (Main Interface)

UnicodeEngine

TokenizerEngine

Requirements

Performance

Thread Safety

Error Handling

Contributing

License

Support

Changelog

Version 1.0.0

About

Uh oh!

Releases

Packages

Languages

toprakdeviren/Metanorm

Folders and files

Latest commit

History

Repository files navigation

Metanorm

Overview

Features

Unicode Engine

Tokenizer Engine

Architecture

Installation

Swift Package Manager

Usage

Basic Unicode Operations

Tokenization

Advanced Usage

API Reference

Metanorm (Main Interface)

UnicodeEngine

TokenizerEngine

Requirements

Performance

Thread Safety

Error Handling

Contributing

License

Support

Changelog

Version 1.0.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages