-
Notifications
You must be signed in to change notification settings - Fork 35
fair1m_1_5 baseline无法运行 #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
有可能是显存不够导致的,运行时需要大约12G的显存。最好使用12G以上显存的显卡。 |
感谢您的回复,在configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py更改batch_size=1之后,错误依旧,同时参考论坛做了几个测试 |
请问您的电脑大概拥有多大的显存呢? |
我是1060-6g的笔记本电脑,和您说的12g相差甚远😂 |
那您可以尝试一些更轻量的模型试试,或者租用服务器训练。 |
我2080Ti 11G现存也跑不起来,一样的问题 |
测试环境
windows10 21H2
wsl2 ubuntu 22.04 LTS 4.19.128-microsoft-standard
miniconda3+python3.7
cuda 11.7 显卡型号1060
错误
使用CUDA时
Loading config from: configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py
[w 0809 11:21:49.947748 32 init.py:1344] load parameter fc.weight failed ...
[w 0809 11:21:49.947908 32 init.py:1344] load parameter fc.bias failed ...
[w 0809 11:21:50.017176 32 init.py:1363] load total 267 params, 2 failed
Tue Aug 9 11:21:50 2022 Start running
Traceback (most recent call last):
File "tools/run_net.py", line 56, in
main()
File "tools/run_net.py", line 47, in main
runner.run()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 84, in run
self.train()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 126, in train
losses = self.model(images,targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/networks/s2anet.py", line 35, in execute
outputs = self.bbox_head(features, targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 627, in execute
return self.loss(*outs,*self.parse_targets(targets))
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 360, in loss
sampling=self.sampling)
File "/home/la/JT/JDet/python/jdet/models/boxes/anchor_target.py", line 74, in anchor_target
unmap_outputs=unmap_outputs)
File "/home/la/JT/JDet/python/jdet/utils/general.py", line 53, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/la/JT/JDet/python/jdet/models/boxes/anchor_target.py", line 127, in anchor_target_single
if not inside_flags.any(0):
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1735, in to_bool
return ori_bool(v.item())
RuntimeError: [f 0809 11:21:58.030555 32 executor.cc:665]
Execute fused operator(26/2009) failed.
[JIT Source]: /home/la/.cache/jittor/jt1.3.5/g++11.2.0/py3.7.13/Linux-4.19.128xc4/IntelRCoreTMi5xbf/default/cu11.7.99/jit/__opkey0_broadcast_to__Tx_float32__DIM_7__BCAST_19__opkey1_reindex__Tx_float32__XDIM_4__YD___hash_8f91e55bdd99985a_op.cc
[OP TYPE]: fused_op:( broadcast_to, reindex, binary.multiply, reduce.add,)
[Input]: float32[64,64,3,3,]backbone.layer1.0.conv2.weight, float32[2,64,256,256,],
[Output]: float32[2,64,256,256,],
[Async Backtrace]: ---
tools/run_net.py:56 <>
tools/run_net.py:47
/home/la/JT/JDet/python/jdet/runner/runner.py:84
/home/la/JT/JDet/python/jdet/runner/runner.py:126
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/networks/s2anet.py:30
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/backbones/resnet.py:166
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/nn.py:2054
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/backbones/resnet.py:84
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/nn.py:847
[Reason]: [f 0809 11:21:58.030132 32 helper_cuda.h:128] CUDA error at /home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/src/mem/allocator/cuda_managed_allocator.cc:23 code=2( cudaErrorMemoryAllocation ) cudaMallocManaged(&ptr, size)
加入参数--no_cuda
Tue Aug 9 11:15:30 2022 Start running
Traceback (most recent call last):
File "tools/run_net.py", line 56, in
main()
File "tools/run_net.py", line 47, in main
runner.run()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 84, in run
self.train()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 126, in train
losses = self.model(images,targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/networks/s2anet.py", line 35, in execute
outputs = self.bbox_head(features, targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 625, in execute
outs = multi_apply(self.forward_single, feats, self.anchor_strides)
File "/home/la/JT/JDet/python/jdet/utils/general.py", line 53, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 236, in forward_single
align_feat = self.align_conv(x, refine_anchor.clone(), stride)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 722, in execute
x = self.relu(self.deform_conv(x, offset_tensor))
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/ops/dcn_v1.py", line 696, in execute
self.dilation, self.groups, self.deformable_groups)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1603, in apply
return func(*args, **kw)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1559, in call
ori_res = self.execute(*args)
File "/home/la/JT/JDet/python/jdet/ops/dcn_v1.py", line 589, in execute
raise NotImplementedError
NotImplementedError
已经进行的操作
搜索了一下, code=2( cudaErrorMemoryAllocation )似乎和内存有关,当程序需要的内存不足时会报错,在本res里面搜索,发现有个issue有类似的错误代码,但不知道如何解决
而加入不使用cuda的参数后报错我也不是很理解,只知道是子类没有实现父类要求一定要实现的接口
The text was updated successfully, but these errors were encountered: