Training error. Help #21

ilichev-andrey · 2016-11-29T17:17:03Z

Hello.
I teach a neural network for two of my classes.
Error occurs at the stage of training.
How to fix it?

th> require 'nn';
th> trainset = torch.load('animals_peoples2.t7')
th> testset = torch.load('animals_peoples2.t7')
th> classes = {'animals', 'peoples'}

th> print(trainset)
{
data : ByteTensor - size: 17299x3x96x96
label : ByteTensor - size: 17299
}

th> print(#trainset.data)
17299
3
96
96
[torch.LongStorage of size 4]

th> setmetatable(trainset,
..> {__index = function(t, i)
..> return {
..> t.data[i],
..> t.label[i]
..> }
..> end}
..> );

th> function trainset:size()
..> return self.data:size(1)
..> end

th> trainset.data = trainset.data:double()

th> print(trainset:size())
17299

th> print(trainset[33])
{
1 : DoubleTensor - size: 3x96x96
2 : 1
}

th> redChannel = trainset.data:select(2, 1)

th> print(#redChannel)
17299
96
96
[torch.LongStorage of size 3]

th> mean = {} -- store the mean, to normalize the test set in the future

th> stdv = {} -- store the standard-deviation for the future

th> for i=1,3 do -- over each image channel
..> mean[i] = trainset.data:select(2, 1):mean() -- mean estimation
..> print('Channel ' .. i .. ', Mean: ' .. mean[i])
..> trainset.data:select(2, 1):add(-mean[i]) -- mean subtraction
..>
..> stdv[i] = trainset.data:select(2, i):std() -- std estimation
..> print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
..> trainset.data:select(2, i):div(stdv[i]) -- std scaling
..> end
Channel 1, Mean: 0
Channel 1, Standard Deviation: 0
Channel 2, Mean: nan
Channel 2, Standard Deviation: 0
Channel 3, Mean: nan
Channel 3, Standard Deviation: 0

th> net = nn.Sequential()
th> net:add(nn.SpatialConvolution(3, 6, 9, 9)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
th> net:add(nn.ReLU()) -- non-linearity
th> net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
th> net:add(nn.SpatialConvolution(6, 16, 9, 9))
th> net:add(nn.ReLU()) -- non-linearity
th> net:add(nn.SpatialMaxPooling(2,2,2,2))
th> net:add(nn.View(1699)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 1655
th> net:add(nn.Linear(1699, 120)) -- fully connected layer (matrix multiplication between input and weights)
th> net:add(nn.ReLU()) -- non-linearity
th> net:add(nn.Linear(120, 84))
th> net:add(nn.ReLU()) -- non-linearity
th> net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
th> net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems

th> criterion = nn.ClassNLLCriterion()

th> trainer = nn.StochasticGradient(net, criterion)
th> trainer.learningRate = 0.001
th> trainer.maxIteration = 5 -- just do 5 epochs of training.

th> trainer:train(trainset)

trainer:train(trainset)

StochasticGradient: training

/root/facedetect/torch/install/share/lua/5.1/nn/THNN.lua:110: Assertion `THIndexTensor_(size)(target, 0) == batch_size' failed. at /tmp/luarocks_nn-scm-1-1625/nn/lib/THNN/generic/ClassNLLCriterion.c:50
stack traceback:
[C]: in function 'v'
/root/facedetect/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
...ect/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:43: in function 'forward'
...ct/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/root/facedetect/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004064f0

mhmtsarigul · 2016-11-30T08:24:13Z

You have 2 classes but you give 10 output from the logsoftmax.

ilichev-andrey · 2016-11-30T08:52:36Z

I changed the line net:add(nn.Linear(84, 10) to net:add(nn.Linear(84, 2), but the error remains.
Help. I'm new to this.

mhmtsarigul · 2016-11-30T10:22:15Z

Your batch size is very big. It could be the problem. Try with smaller batch.

ilichev-andrey · 2016-11-30T13:27:13Z

Does not work. could you help me with the correct data?
My goal is that the neural network can distinguish human from animal.

mhmtsarigul · 2016-11-30T13:40:06Z

net:add(nn.View(1699))

must be

net:add(nn.View(5184)).
for 181816.

I dont know what the problem is directly. It says something about batch size. Is this the full error message.

ilichev-andrey · 2016-11-30T14:35:40Z

New error!

nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output]
(1): nn.SpatialConvolution(3 -> 6, 18x18)
(2): nn.ReLU
(3): nn.SpatialMaxPooling(2x2, 2,2)
(4): nn.SpatialConvolution(6 -> 16, 18x18)
(5): nn.ReLU
(6): nn.SpatialMaxPooling(2x2, 2,2)
(7): nn.View(5184)
(8): nn.Linear(5184 -> 120)
(9): nn.ReLU
(10): nn.Linear(120 -> 84)
(11): nn.ReLU
(12): nn.Linear(84 -> 2)
(13): nn.LogSoftMax
}
[0.0371s]
th> criterion = nn.ClassNLLCriterion()
[0.0761s]
th>
[0.0000s]
th> trainer = nn.StochasticGradient(net, criterion)
[0.0001s]
th> trainer.learningRate = 0.001
[0.0000s]
th> trainer.maxIteration = 5
[0.0000s]
th> trainer:train(trainset)

StochasticGradient: training

.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:67:
In 7 module of nn.Sequential:
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:47: input view (16x11x11) and desired view (5184) do not match
stack traceback:
[C]: in function 'error'
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:79: in function </root/facedetect/torch/install/share/lua/5.1/nn/View.lua:77>
[C]: in function 'xpcall'
.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
...ct/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/root/facedetect/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004064f0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
...ct/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/root/facedetect/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004064f0
[0.3159s]

it looked at the beginning:
th> net:add(nn.View(16x9x9)) not "net:add(nn.View(1699))"
th> net:add(nn.Linear(16x9x9, 120)) not "net:add(nn.Linear(1699, 120))"

mhmtsarigul · 2016-11-30T14:44:35Z

Ok. 16 x 11 x 11 would be correct.

ilichev-andrey · 2016-11-30T14:55:04Z

I replaced it and did from the beginning:

th> net:add(nn.SpatialConvolution(3, 6, 9, 9))
th> net:add(nn.SpatialConvolution(6, 16, 9, 9))
th> net:add(nn.View(16x9x9))
th> net:add(nn.Linear(16x9x9, 120))

net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 11, 11))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.SpatialConvolution(6, 16, 11, 11))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16x11x11))
net:add(nn.Linear(16x11x11, 120))
net:add(nn.ReLU())
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())
net:add(nn.Linear(84, 2))
net:add(nn.LogSoftMax())

Error:
th> trainer:train(trainset)

StochasticGradient: training

.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:67:
In 7 module of nn.Sequential:
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:47: input view (16x16x16) and desired view (1936) do not match
stack traceback:
[C]: in function 'error'
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
/root/facedetect/torch/install/share/lua/5.1/nn/View.lua:79: in function </root/facedetect/torch/install/share/lua/5.1/nn/View.lua:77>
[C]: in function 'xpcall'
.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
...ct/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/root/facedetect/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004064f0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
.../facedetect/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
...ct/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
[string "_RESULT={trainer:train(trainset)}"]:1: in main chunk
[C]: in function 'xpcall'
/root/facedetect/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004064f0

write 16x16x16
next error:
input view (16x12x12) and desired view (4096) do not match

maybe I'm not there to change the value?

ilichev-andrey · 2016-11-30T16:27:40Z

how to determine these parameters:

5x5 convolution kernel?
net:add(nn.SpatialConvolution(3, 6, 5, 5))

reshapes from a 3D tensor of 16x5x5 into 1D tensor of 1655?
net:add(nn.View(1655))

mhmtsarigul · 2016-12-01T17:24:22Z

Convolution makes your pictures smaller by m-1 n-1 where mxn is your filter. Subsampling makes a division over size. Your pictures are 96x96. 5x5 convolution will make them 92x92 if used without padding. Search about convolution and pooling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error. Help #21

Training error. Help #21

ilichev-andrey commented Nov 29, 2016

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016 •

edited

Loading

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016 •

edited

Loading

ilichev-andrey commented Nov 30, 2016

mhmtsarigul commented Dec 1, 2016

Training error. Help #21

Training error. Help #21

Comments

ilichev-andrey commented Nov 29, 2016

StochasticGradient: training

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016 • edited Loading

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016

StochasticGradient: training

mhmtsarigul commented Nov 30, 2016

ilichev-andrey commented Nov 30, 2016 • edited Loading

StochasticGradient: training

ilichev-andrey commented Nov 30, 2016

mhmtsarigul commented Dec 1, 2016

ilichev-andrey commented Nov 30, 2016 •

edited

Loading

ilichev-andrey commented Nov 30, 2016 •

edited

Loading