Skip to content

A high-performance Unicode processing framework for Swift, built with Metal GPU acceleration and optimized for modern text processing applications.

Notifications You must be signed in to change notification settings

toprakdeviren/Metanorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Metanorm

A high-performance Unicode processing framework for Swift, built with Metal GPU acceleration and optimized for modern text processing applications.

Overview

Metanorm provides a comprehensive solution for Unicode text processing, including normalization, encoding/decoding, and tokenization operations. The framework is designed with performance in mind, utilizing Apple's Metal framework for GPU-accelerated operations and SIMD instructions for parallel processing.

Features

Unicode Engine

  • Decode/Encode: Convert between text and Unicode codepoints
  • Normalization: Support for NFC, NFD, NFKC, and NFKD forms
  • Decomposition: Unicode character decomposition operations
  • Analysis: Text analysis and script detection
  • GPU Acceleration: Metal-based parallel processing

Tokenizer Engine

  • BPE Tokenization: Byte Pair Encoding implementation
  • Model Support: GPT-2 and other transformer models
  • GPU Processing: Parallel tokenization using Metal
  • Cache Management: Automatic rule caching and optimization
  • Performance Metrics: Built-in benchmarking and analysis

Architecture

  • Modular Design: Separate Unicode and Tokenizer engines
  • Unified API: Single entry point for all operations
  • Swift Package Manager: Easy integration and dependency management
  • Concurrency Safe: MainActor-based thread safety

Installation

Swift Package Manager

Add Metanorm to your project using Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/toprakdeviren/Metanorm.git", from: "1.0.0")
]

Then add it to your target:

.target(
    name: "YourTarget",
    dependencies: ["Metanorm"]
)

Usage

Basic Unicode Operations

import Metanorm

// Unicode decode/encode
let text = "Hello 🌍 World"
let codepoints = try Metanorm.decodeUnicode(text)
let reconstructed = try Metanorm.encodeUnicode(codepoints)

// Unicode normalization
let normalized = try Metanorm.unicode.normalize(text, form: .NFC)

// Text analysis
let analysis = Metanorm.unicode.analyze(text)
print("Script: \(analysis.primaryScript)")
print("Character count: \(analysis.characterCount)")

Tokenization

import Metanorm

// Initialize tokenizer
let tokenizer = Metanorm.tokenizer(model: .gpt2)

// Tokenize text
let text = "Hello world!"
let tokens = try tokenizer.encode(text)
print("Tokens: \(tokens)")

// Detokenize
let detokenized = try tokenizer.decode(tokens)
print("Detokenized: \(detokenized)")

// Performance analysis
let metrics = tokenizer.benchmark(text)
print("Processing time: \(metrics.totalTime)ms")
print("Throughput: \(metrics.throughput) tokens/sec")

Advanced Usage

import Metanorm

// Direct engine access
let unicodeEngine = Metanorm.unicode
let tokenizerEngine = Metanorm.tokenizer(model: .gpt2)

// Round-trip testing
let testText = "Test text with emoji πŸš€"
let result = try Metanorm.roundTripTest(testText, model: .gpt2)
print("Accuracy: \(result.accuracy)%")

// Framework information
let buildInfo = Metanorm.buildInfo
print("Version: \(buildInfo["version"] ?? "Unknown")")

API Reference

Metanorm (Main Interface)

  • unicode: Access to Unicode processing engine
  • tokenizer(model:): Make tokenizer for specific model
  • decodeUnicode(_:): Decode text to Unicode codepoints
  • encodeUnicode(_:): Encode Unicode codepoints to text
  • tokenize(_:model:): Quick tokenization
  • detokenize(_:model:): Quick detokenization
  • roundTripTest(_:model:): Test round-trip accuracy

UnicodeEngine

  • decode(_:): Convert text to Unicode codepoints
  • encode(_:): Convert Unicode codepoints to text
  • normalize(_:form:): Normalize text to specified form
  • decompose(_:): Decompose Unicode characters
  • analyze(_:): Analyze text properties

TokenizerEngine

  • encode(_:): Tokenize text to token IDs
  • decode(_:): Convert token IDs back to text
  • analyze(_:): Analyze token properties
  • benchmark(_:): Measure performance metrics

Requirements

  • Platform: macOS 10.15+, iOS 13.0+, tvOS 13.0+, watchOS 6.0+
  • Swift: 5.7+
  • Xcode: 14.0+
  • Metal: Supported GPU required for GPU acceleration

Performance

Metanorm is optimized for high-performance text processing:

  • GPU Acceleration: Metal-based parallel processing
  • SIMD Instructions: Vectorized operations where possible
  • Memory Efficient: Optimized buffer management
  • Cache System: Automatic rule caching and optimization
  • Concurrent Processing: Multi-threaded operations

Thread Safety

All public APIs are designed to be thread-safe:

  • Main actor isolation for shared state
  • Concurrent access to immutable operations
  • Safe cross-thread communication

Error Handling

The framework provides comprehensive error handling:

do {
    let tokens = try tokenizer.encode(text)
} catch BpeTokenizerError.gpuExecutionFailed {
    // Handle GPU errors
} catch BpeTokenizerError.pipelineStateNotFound {
    // Handle pipeline errors
} catch {
    // Handle other errors
}

Contributing

Contributions are welcome. Please ensure:

  1. Code follows Swift style guidelines
  2. Tests are included for new features
  3. Documentation is updated
  4. Performance impact is considered

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For questions, issues, or contributions, please visit the GitHub repository.

Changelog

Version 1.0.0

  • Initial release
  • Unicode processing engine
  • BPE tokenization support
  • Metal GPU acceleration
  • Swift Package Manager support

About

A high-performance Unicode processing framework for Swift, built with Metal GPU acceleration and optimized for modern text processing applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published