-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conceptual Question on Video Bottleneck Models #82
Comments
Which model exactly you are referring to? The example 1 you provide is not the one we implemented and neither the second example. In our bottleneck blocks, we always have something like, let's start with 3D case,
Note that this is the design from ResNet [He at al. CVPR'16] where 1x1 filters are about 4x more than the middle layer 3x3, that's why it is name bottleneck. The above is a naive extension to 3D case. When we move to (2+1)D, we design a (2+1)D block to match the # of parameter of the 3x3x3 3D conv, it become
So virtually, (2+1)D block is designed to replace 3x3x3 with the same parameters/FLOPs cost, obviously it has increase memory overhead. M_i is specified in the paper as well as in the code. Hope this helps. |
Looking at the official implementation [1], I don't think that's correct, specifically, it looks like the naive bottleneck looks like the following
If you look at the output of that code (omitting BN and RELU's), the middle bottleneck for the naive resnet 152 layer 1 looks something like the following:
But this is beside the point. Since we know that But the actual M_i that I have in the weights dimension is 144, which corresponds to |
It is exactly the formula, except here I hard-coded t=3. To conclude, it is pretty much your design choice on how many filters you want to implement for those layers. My suggestions are:
|
Sounds good, thanks for clarification. |
Hi @dutran et al,
I was wondering, is not including the block expansion in spatio-temporal convolution by design?
Specifically, let's say VideoModelBuilder makes a model where layer_2 has 2 bottleneck layers (for illustration purposes). Then the graph from VMZ looks like
When according to the block formula for the second bottlenect, one would expect to have the middle layer expanded as well, i.e.
Note that the difference is the separated conv layer - specifically midplanes computation.
In the original paper, it specifies the midplanes formula but it's ambiguous wether
N_i
refers to block or layersJust as a side note, I found that models with block expansion accounted for get approx. 0.9% better results on Kinetics with R2+5D50, but the setup was different than specified in paper so it's not really apple-to-apple comparison.
Cheers,
Bruno
The text was updated successfully, but these errors were encountered: