Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 [Bug]: A lot of nodes with fake GPU cards on mainnet #2770

Closed
1 task done
A-Harby opened this issue May 26, 2024 · 5 comments
Closed
1 task done

🐞 [Bug]: A lot of nodes with fake GPU cards on mainnet #2770

A-Harby opened this issue May 26, 2024 · 5 comments
Assignees
Labels
dashboard type_bug Something isn't working
Milestone

Comments

@A-Harby
Copy link
Contributor

A-Harby commented May 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

which package/s did you face the problem with?

Dashboard

What happened?

I tried deploying a VM on a node with a GPU after renting it, but it kept failing to deploy a lot, and even when it did, I couldn't ssh to the VM.

A lot of the nodes with GPUs, either one or more, are fake or have a problem deploying or connecting to them.

Steps To Reproduce

No response

which network/s did you face the problem on?

Main

version

2.4.2

Twin ID/s

No response

Node ID/s

No response

Farm ID/s

No response

Contract ID/s

No response

Relevant screenshots/screen records

image

Relevant log output

[
  {
    "version": 0,
    "contractId": 459604,
    "nodeId": 4771,
    "name": "vmjsu3m",
    "created": 1716720653,
    "status": "ok",
    "message": "",
    "flist": "https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist",
    "publicIP": null,
    "planetary": "301:7100:a1fd:e83f:e79f:4fc8:32b5:369c",
    "myceliumIP": "",
    "interfaces": [
      {
        "network": "nws8a98",
        "ip": "10.20.2.2"
      }
    ],
    "capacity": {
      "cpu": 1,
      "memory": 2048
    },
    "mounts": [
      {
        "name": "diskh72",
        "mountPoint": "/",
        "size": 26843545600,
        "state": "ok",
        "message": ""
      }
    ],
    "env": {
      "SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCvmSYj5BKWkizdNyIziMfMm0CRrgq0UxFtBfBdArUMXSFkU30rtExbg6dlVFJCufw6UsCESm/QBSwlOyYKRdpsfVARcAw5G5OG0iX/22n+tYmCcWpmOJLJZaAmSynF1kwpBx3XDy40mFL6OMXjqFcU3DirbcucIL185XfAsTzrq3tDtSvmYPwbXMVMbs1ZAZValxOQNuaLty4qGD2awxVNZq4vtAzgpz3FFo3cs2g70jsGiPq/rsH+NxZHYmqvYnF/S0aPGBrbOe2yW8Cz0mqq5CzqbjYd+Qorkmt7uyJrlCBI43ky+pUuiAYeE1eJlWl5svA+ibWk5yYoa6svLUkaL1kcOMeZWLzDjL3OCrno9gglNp8zUkdSagHROoYlGgpc5fnzMGX4MKDLpHdE9xdk4NjpjB2Iq3hV3DamexXAhLEirLYv9fSilSyKVf/2kvYAa9o+NS4DgBVvXg9J54By/mAVsydD4lbpNe/2pX/WoFGCczmoQfWdHgJPYVCf2vU= harby@ahmed-Saleh-Harby"
    },
    "entrypoint": "/init.sh",
    "metadata": "",
    "description": "",
    "rootfs_size": 2147483648,
    "corex": false,
    "gpu": [
      "0000:01:00.0/10de/2488"
    ]
  }
]
@A-Harby A-Harby added the type_bug Something isn't working label May 26, 2024
@ramezsaeed ramezsaeed added this to the 2.5.0 milestone May 27, 2024
@AhmedHanafy725 AhmedHanafy725 moved this to Accepted in 3.14.x May 27, 2024
@maayarosama maayarosama self-assigned this May 29, 2024
@maayarosama maayarosama moved this from Accepted to In Progress in 3.14.x May 29, 2024
@maayarosama maayarosama moved this from In Progress to Accepted in 3.14.x May 30, 2024
@maayarosama maayarosama moved this from Accepted to In Progress in 3.14.x May 30, 2024
@maayarosama
Copy link
Contributor

Investigation and Solution:

it took me sometime to set up mainnet locally as there're some relay updates that's not available on mainnet

Issue is reproducible on same node, still investigating what's causing this

@xmonader
Copy link
Contributor

It could be a builtin GPU not an external one @muhamadazmy correct?

@muhamadazmy
Copy link
Member

I don't believe this is a built in GPU. You can always google the model using the GPU identity string suffix. So by looking up 10de:2488 it showed that this device is an RTX 3070

https://www.techpowerup.com/vgabios/236303/gigabyte-rtx3070-8192-210526

Failure to deploy can be duo other reasons. It would be very useful if we can collect the deployment status after failure (get the changes, to the deployment from the node) not only the final status. This will give more details on what exactly happened.

Would be also very useful if we could collect node logs at the same time of the problem. Since i have no idea when this deployment was created it's not possible to find the corresponding node logs.

In general the node logs looks normal, and I don't believe it's fake.

@maayarosama
Copy link
Contributor

After some investigation, I noticed that the deployment with gpu crashes, I opened an issue for this

I also noticed that the vm isn't listed but the contract doesn't get deleted

@maayarosama maayarosama moved this from In Progress to Blocked in 3.14.x Jun 2, 2024
@AhmedHanafy725
Copy link
Contributor

the issue happened on the node so nothing can be done here

@github-project-automation github-project-automation bot moved this from Blocked to Done in 3.14.x Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard type_bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

6 participants