Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在ThunderX2机器上跑benchmark到resnet18时出现段错误 #5605

Closed
violet73 opened this issue Jul 28, 2024 · 4 comments
Closed

在ThunderX2机器上跑benchmark到resnet18时出现段错误 #5605

violet73 opened this issue Jul 28, 2024 · 4 comments
Labels

Comments

@violet73
Copy link
Contributor

violet73 commented Jul 28, 2024

error log | 日志或报错信息 | ログ

context | 编译/运行环境 | バックグラウンド

cd ncnn; mkdir build; cd build; cmake .. -G Ninja; cmake --build .

how to reproduce | 复现步骤 | 再現方法

  1. 直接跑benchmark,cd benchmark; ../build/benchmark/benchncnn 4 1

more | 其他 | その他

gdb调了一下debug版本,出错位置是在src/layer/arm/convolution_im2col_gemm.h:ncnn::convolution_gemm_transB_packed_tile:3405:"ld1 {v4.4s, v5.4s, v6.4s, v7.4s}, [%1], #64 \n"

在同文件convolution_im2col_gemm函数里打开log,发现前面通过的几个算例KTILE_K都是一样的,但是resnet18打印出来的是K==2*TILE_K,因此怀疑跟TILE_K有关系。强制让TILE_K=K可以通过benchmark所有算例。

但显然强制不在K维进行分块是不合理的。因此进一步去鲲鹏机器上跑,同样的编译和运行,发现是没有这个问题的,非常困惑。

@violet73
Copy link
Contributor Author

更新一下,把thunderX2出错的二进制扔到鲲鹏跑,鲲鹏也能正常跑完benchmark,没懂,麻了

@nihui nihui added the bug label Aug 16, 2024
@nihui
Copy link
Member

nihui commented Aug 16, 2024

已复现问题确认

@nihui
Copy link
Member

nihui commented Aug 16, 2024

#5631

@nihui
Copy link
Member

nihui commented Aug 16, 2024

修好了,感谢报告!

@nihui nihui closed this as completed Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants