Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More optimizations after #157 #158

Merged
merged 2 commits into from
Oct 8, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 49 additions & 3 deletions lib/csvlint/validate.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,52 @@
module Csvlint

class Validator
class LineCSV < CSV
ENCODE_RE = Hash.new do |h,str|
h[str] = Regexp.new(str)
end

ENCODE_STR = Hash.new do |h,encoding_name|
h[encoding_name] = Hash.new do |h,chunks|
h[chunks] = chunks.map { |chunk| chunk.encode(encoding_name) }.join('')
end
end

ESCAPE_RE = Hash.new do |h,re_chars|
h[re_chars] = Hash.new do |h,re_esc|
h[re_esc] = Hash.new do |h,str|
h[str] = str.gsub(re_chars) {|c| re_esc + c}
end
end
end

# Optimization: Memoize `encode_re`.
# @see https://github.com/ruby/ruby/blob/v2_2_3/lib/csv.rb#L2273
def encode_re(*chunks)
ENCODE_RE[encode_str(*chunks)]
end

# Optimization: Memoize `encode_str`.
# @see https://github.com/ruby/ruby/blob/v2_2_3/lib/csv.rb#L2281
def encode_str(*chunks)
ENCODE_STR[@encoding.name][chunks]
end

# Optimization: Memoize `escape_re`.
# @see https://github.com/ruby/ruby/blob/v2_2_3/lib/csv.rb#L2265
def escape_re(str)
ESCAPE_RE[@re_chars][@re_esc][str]
end

# Optimization: Disable the CSV library's converters feature.
# @see https://github.com/ruby/ruby/blob/v2_2_3/lib/csv.rb#L2100
def init_converters(options, field_name = :converters)
@converters = []
@header_converters = []
options.delete(:unconverted_fields)
options.delete(field_name)
end
end

include Csvlint::ErrorCollector

Expand Down Expand Up @@ -132,12 +178,12 @@ def parse_contents(stream, line = nil)
@csv_options[:encoding] = @encoding

begin
row = CSV.parse_line(stream, @csv_options)
row = LineCSV.parse_line(stream, @csv_options)
# this is a one line substitute for CSV.new followed by row = CSV.shift. a CSV Row class is required
# CSV.parse will return an array of arrays which breaks subsequent each_with_index invocations
# TODO investigate if above would be a drag on memory

rescue CSV::MalformedCSVError => e
rescue LineCSV::MalformedCSVError => e
build_exception_messages(e, stream, current_line)
end

Expand Down Expand Up @@ -228,7 +274,7 @@ def header?

def report_line_breaks(line_no=nil)
return if @input !~ /[\r|\n]/ # Return straight away if there's no newline character - i.e. we're on the last line
line_break = CSV.new(@input).row_sep
line_break = LineCSV.new(@input).row_sep
@line_breaks << line_break
unless line_breaks_reported?
if line_break != "\r\n"
Expand Down