-
Notifications
You must be signed in to change notification settings - Fork 110
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added Bayes backend benchmarks (#98)
* Disabled Redis disc persistence and refactored integration test, fixes #95 * Added Bayes backend benchmarks
- Loading branch information
1 parent
a8c7578
commit e21a9be
Showing
5 changed files
with
119 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# encoding: utf-8 | ||
|
||
module BayesianCommonBenchmarks | ||
def load_data | ||
sms_spam_collection = File.expand_path(File.dirname(__FILE__) + '/../data/corpus/SMSSpamCollection.tsv') | ||
File.read(sms_spam_collection).force_encoding("utf-8").split("\n") | ||
end | ||
|
||
def bench_train | ||
assert_performance_linear do |n| | ||
n.times do |i| | ||
parts = @data[i].strip.split("\t") | ||
@classifiers[n].train(parts.first, parts.last) | ||
end | ||
end | ||
end | ||
|
||
def bench_train_untrain | ||
assert_performance_linear do |n| | ||
n.times do |i| | ||
parts = @data[i].strip.split("\t") | ||
@classifiers[n].train(parts.first, parts.last) | ||
end | ||
n.times do |i| | ||
parts = @data[i].strip.split("\t") | ||
@classifiers[n].untrain(parts.first, parts.last) | ||
end | ||
end | ||
end | ||
|
||
def bench_train_classify | ||
assert_performance_linear do |n| | ||
n.times do |i| | ||
parts = @data[i].strip.split("\t") | ||
@classifiers[n].train(parts.first, parts.last) | ||
end | ||
n.times do |i| | ||
parts = @data[i].strip.split("\t") | ||
@classifiers[n].classify(parts.last) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# encoding: utf-8 | ||
|
||
require File.dirname(__FILE__) + '/../test_helper' | ||
require_relative './bayesian_common_benchmarks' | ||
|
||
class BayesianMemoryBenchmark < Minitest::Benchmark | ||
MAX_RECORDS = 5000 | ||
|
||
include BayesianCommonBenchmarks | ||
|
||
def self.bench_range | ||
(bench_exp(1, MAX_RECORDS) << MAX_RECORDS).uniq | ||
end | ||
|
||
def setup | ||
@data ||= load_data | ||
if @data.length < MAX_RECORDS | ||
skip("Not enough records in the dataset") | ||
end | ||
@classifiers = {} | ||
self.class.bench_range.each do |n| | ||
@classifiers[n] = ClassifierReborn::Bayes.new 'Ham', 'Spam' | ||
end | ||
print "memory_" | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# encoding: utf-8 | ||
|
||
require File.dirname(__FILE__) + '/../test_helper' | ||
require_relative './bayesian_common_benchmarks' | ||
|
||
class BayesianRedisBenchmark < Minitest::Benchmark | ||
MAX_RECORDS = 5000 | ||
|
||
include BayesianCommonBenchmarks | ||
|
||
def self.bench_range | ||
(bench_exp(1, MAX_RECORDS) << MAX_RECORDS).uniq | ||
end | ||
|
||
def setup | ||
@data ||= load_data | ||
if @data.length < MAX_RECORDS | ||
skip("Not enough records in the dataset") | ||
end | ||
@classifiers = {} | ||
self.class.bench_range.each_with_index do |n, i| | ||
begin | ||
redis_backend = ClassifierReborn::BayesRedisBackend.new(db: i) | ||
redis_backend.instance_variable_get(:@redis).config(:set, "save", "") | ||
@classifiers[n] = ClassifierReborn::Bayes.new 'Ham', 'Spam', backend: redis_backend | ||
rescue Redis::CannotConnectError => e | ||
skip(e) | ||
end | ||
end | ||
print "redis_" | ||
end | ||
|
||
def teardown | ||
self.class.bench_range.each do |n| | ||
@classifiers[n].instance_variable_get(:@backend).instance_variable_get(:@redis).flushdb | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
$LOAD_PATH.unshift(File.dirname(__FILE__) + '/../lib') | ||
|
||
require 'minitest/autorun' | ||
require "minitest/benchmark" | ||
require 'minitest/reporters' | ||
Minitest::Reporters.use! | ||
Minitest::Reporters.use! unless ENV['NOPROGRESS'] | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong. |
||
require 'pry' | ||
require 'classifier-reborn' | ||
include ClassifierReborn |
@Ch4s3 and @parkr when I was running recently written benchmarks I faced an issue due to the
Minitest::Reporters
inclusion. The fancy progress meter was mixing up with the reported benchmark values. That's why I used thisENV
hack here to conditionally turn the fancy progress meter off and use the default one instead. However, when I tried to set this special ENV in the:bench
Rake task it was applied to all the tasks. As a result now we must pass something likeNOPROGRESS=T
along with therake bench
command. I don't like this ugliness and would prefer to have a more sophisticated approach. Any better approaches?