A high-performance Unicode processing framework for Swift, built with Metal GPU acceleration and optimized for modern text processing applications.
Metanorm provides a comprehensive solution for Unicode text processing, including normalization, encoding/decoding, and tokenization operations. The framework is designed with performance in mind, utilizing Apple's Metal framework for GPU-accelerated operations and SIMD instructions for parallel processing.
- Decode/Encode: Convert between text and Unicode codepoints
- Normalization: Support for NFC, NFD, NFKC, and NFKD forms
- Decomposition: Unicode character decomposition operations
- Analysis: Text analysis and script detection
- GPU Acceleration: Metal-based parallel processing
- BPE Tokenization: Byte Pair Encoding implementation
- Model Support: GPT-2 and other transformer models
- GPU Processing: Parallel tokenization using Metal
- Cache Management: Automatic rule caching and optimization
- Performance Metrics: Built-in benchmarking and analysis
- Modular Design: Separate Unicode and Tokenizer engines
- Unified API: Single entry point for all operations
- Swift Package Manager: Easy integration and dependency management
- Concurrency Safe: MainActor-based thread safety
Add Metanorm to your project using Swift Package Manager:
dependencies: [
.package(url: "https://github.com/toprakdeviren/Metanorm.git", from: "1.0.0")
]Then add it to your target:
.target(
name: "YourTarget",
dependencies: ["Metanorm"]
)import Metanorm
// Unicode decode/encode
let text = "Hello π World"
let codepoints = try Metanorm.decodeUnicode(text)
let reconstructed = try Metanorm.encodeUnicode(codepoints)
// Unicode normalization
let normalized = try Metanorm.unicode.normalize(text, form: .NFC)
// Text analysis
let analysis = Metanorm.unicode.analyze(text)
print("Script: \(analysis.primaryScript)")
print("Character count: \(analysis.characterCount)")import Metanorm
// Initialize tokenizer
let tokenizer = Metanorm.tokenizer(model: .gpt2)
// Tokenize text
let text = "Hello world!"
let tokens = try tokenizer.encode(text)
print("Tokens: \(tokens)")
// Detokenize
let detokenized = try tokenizer.decode(tokens)
print("Detokenized: \(detokenized)")
// Performance analysis
let metrics = tokenizer.benchmark(text)
print("Processing time: \(metrics.totalTime)ms")
print("Throughput: \(metrics.throughput) tokens/sec")import Metanorm
// Direct engine access
let unicodeEngine = Metanorm.unicode
let tokenizerEngine = Metanorm.tokenizer(model: .gpt2)
// Round-trip testing
let testText = "Test text with emoji π"
let result = try Metanorm.roundTripTest(testText, model: .gpt2)
print("Accuracy: \(result.accuracy)%")
// Framework information
let buildInfo = Metanorm.buildInfo
print("Version: \(buildInfo["version"] ?? "Unknown")")unicode: Access to Unicode processing enginetokenizer(model:): Make tokenizer for specific modeldecodeUnicode(_:): Decode text to Unicode codepointsencodeUnicode(_:): Encode Unicode codepoints to texttokenize(_:model:): Quick tokenizationdetokenize(_:model:): Quick detokenizationroundTripTest(_:model:): Test round-trip accuracy
decode(_:): Convert text to Unicode codepointsencode(_:): Convert Unicode codepoints to textnormalize(_:form:): Normalize text to specified formdecompose(_:): Decompose Unicode charactersanalyze(_:): Analyze text properties
encode(_:): Tokenize text to token IDsdecode(_:): Convert token IDs back to textanalyze(_:): Analyze token propertiesbenchmark(_:): Measure performance metrics
- Platform: macOS 10.15+, iOS 13.0+, tvOS 13.0+, watchOS 6.0+
- Swift: 5.7+
- Xcode: 14.0+
- Metal: Supported GPU required for GPU acceleration
Metanorm is optimized for high-performance text processing:
- GPU Acceleration: Metal-based parallel processing
- SIMD Instructions: Vectorized operations where possible
- Memory Efficient: Optimized buffer management
- Cache System: Automatic rule caching and optimization
- Concurrent Processing: Multi-threaded operations
All public APIs are designed to be thread-safe:
- Main actor isolation for shared state
- Concurrent access to immutable operations
- Safe cross-thread communication
The framework provides comprehensive error handling:
do {
let tokens = try tokenizer.encode(text)
} catch BpeTokenizerError.gpuExecutionFailed {
// Handle GPU errors
} catch BpeTokenizerError.pipelineStateNotFound {
// Handle pipeline errors
} catch {
// Handle other errors
}Contributions are welcome. Please ensure:
- Code follows Swift style guidelines
- Tests are included for new features
- Documentation is updated
- Performance impact is considered
This project is licensed under the MIT License. See the LICENSE file for details.
For questions, issues, or contributions, please visit the GitHub repository.
- Initial release
- Unicode processing engine
- BPE tokenization support
- Metal GPU acceleration
- Swift Package Manager support