Skip to content

Conversation

@andyundso
Copy link
Contributor

This PR replaces the StringScanner in unescape with regular gsub. In order for it to not match too much, I needed to add a negative regex lookahead. Performance is still a bit better than before.

require 'benchmark/ips'
require_relative 'lib/rdf'

Benchmark.ips do |x|
  x.report('without') do
    RDF::NTriples::Reader.unescape("D\u00FCrst" )
  end

  if ENV['WITH_MODULE'] == 'true'
    module RDF::NTriples
      class Reader
        UCHAR4          = /(?<!\\)\\(?!\\)u([0-9A-Fa-f]{4,4})/.freeze
        UCHAR8          = /(?<!\\)\\(?!\\)U([0-9A-Fa-f]{8,8})/.freeze

        def self.unescape(string)
          # Note: avoiding copying the input string when no escaping is needed
          # greatly reduces the number of allocations and the processing time.
          string = string.dup.force_encoding(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8

          string
            .gsub(ESCAPE_CHARS_ESCAPED_REGEXP, ESCAPE_CHARS_ESCAPED)
            .gsub(UCHAR) do
              [($1 || $2).hex].pack('U*')
            end
        end
      end
    end
  end

  x.report('with') do
    RDF::NTriples::Reader.unescape("D\u00FCrst" )
  end
  x.hold! 'temp_results'
  x.compare!
end
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [x86_64-linux]
Warming up --------------------------------------
                with   250.703k i/100ms
Calculating -------------------------------------
                with      2.530M (± 2.5%) i/s  (395.26 ns/i) -     12.786M in   5.056920s

Comparison:
                with:  2530003.9 i/s
             without:  1478262.7 i/s - 1.71x  slower

@coveralls
Copy link

Coverage Status

coverage: 91.721% (-0.01%) from 91.731%
when pulling 7622a28 on andyundso:remove-string-scanner
into cae8390 on ruby-rdf:develop.

@gkellogg
Copy link
Member

Thanks, this looks good. I'll plan a new release soon.

@gkellogg gkellogg merged commit c119c20 into ruby-rdf:develop May 18, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants