Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 [Bug]: When the machine crashes, it disappear from the deployment list #2827

Open
1 task done
maayarosama opened this issue Jun 2, 2024 · 3 comments
Open
1 task done
Assignees
Labels
dashboard type_bug Something isn't working
Milestone

Comments

@maayarosama
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

which package/s did you face the problem with?

Dashboard

What happened?

I tried deploying a VM on node 4771 with a GPU after renting it, but I couldn't ssh to it, after some investigation and getting the changes of the contract, I noticed that the machine crashes and the status of the contract is deleted. The deployment isn't shown in the deployment list and the contract for the machine is deleted and is shown in my contracts list

Steps To Reproduce

  1. Go to mainnet
  2. Rent node 4771
  3. Deploy a vm with gpu on it
  4. try to ssh on it

which network/s did you face the problem on?

Main

version

2.4.2

Twin ID/s

No response

Node ID/s

4771

Farm ID/s

No response

Contract ID/s

No response

Relevant screenshots/screen records

2024-06-02_11-30
2024-06-02_11-46

Relevant log output

[
  {
    version: 0,
    name: 'vmgpuDisk',
    type: 'zmount',
    data: { size: 107374182400 },
    metadata: '',
    description: 'test deploying VM with GPU via ts grid3 client',
    result: { created: 1717314878, state: 'init', message: '', data: null }
  },
  {
    version: 0,
    name: 'vmgpu',
    type: 'zmachine',
    data: {
      flist: 'https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist',
      network: [Object],
      size: 0,
      compute_capacity: [Object],
      mounts: [Array],
      entrypoint: '/',
      env: [Object],
      corex: false,
      gpu: [Array]
    },
    metadata: '',
    description: 'test deploying VM with GPU via ts grid3 client',
    result: { created: 1717314878, state: 'init', message: '', data: null }
  },
  {
    version: 0,
    name: 'vmgpuDisk',
    type: 'zmount',
    data: { size: 107374182400 },
    metadata: '',
    description: 'test deploying VM with GPU via ts grid3 client',
    result: { created: 1717314880, state: 'ok', message: '', data: [Object] }
  },
  {
    version: 0,
    name: 'vmgpu',
    type: 'zmachine',
    data: {
      flist: 'https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist',
      network: [Object],
      size: 0,
      compute_capacity: [Object],
      mounts: [Array],
      entrypoint: '/',
      env: [Object],
      corex: false,
      gpu: [Array]
    },
    metadata: '',
    description: 'test deploying VM with GPU via ts grid3 client',
    result: { created: 1717314887, state: 'ok', message: '', data: [Object] }
  },
  {
    version: 0,
    name: 'vmgpu',
    type: 'zmachine',
    data: {
      flist: 'https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist',
      network: [Object],
      size: 0,
      compute_capacity: [Object],
      mounts: [Array],
      entrypoint: '/',
      env: [Object],
      corex: false,
      gpu: [Array]
    },
    metadata: '',
    description: 'test deploying VM with GPU via ts grid3 client',
    result: {
      created: 1717314925,
      state: 'deleted',
      message: 'workload decommissioned by system, reason: deleting vm due to so many crashes',
      data: null
    }
  }
]```
@maayarosama maayarosama added the type_bug Something isn't working label Jun 2, 2024
@ramezsaeed ramezsaeed added this to 3.14.x Jun 3, 2024
@ramezsaeed ramezsaeed added this to the 2.5.0 milestone Jun 3, 2024
@ramezsaeed ramezsaeed removed this from 3.14.x Jun 3, 2024
@ramezsaeed ramezsaeed added this to 3.15.x Jun 3, 2024
@ramezsaeed ramezsaeed modified the milestones: 2.5.0, 2.6.0 Jun 3, 2024
@AhmedHanafy725 AhmedHanafy725 moved this to Accepted in 3.15.x Jun 3, 2024
@zaelgohary zaelgohary self-assigned this Jun 26, 2024
@zaelgohary zaelgohary moved this from Accepted to In Progress in 3.15.x Jun 26, 2024
@zaelgohary
Copy link
Contributor

Investigation:

I searched for the node and found that it's shared not dedicated.

image

Although the node is shared, it shows GPU details in node details when clicking try again.

image

image

@zaelgohary
Copy link
Contributor

Blocked on threefoldtech/zos#2350

@zaelgohary zaelgohary moved this from In Progress to Blocked in 3.15.x Jun 26, 2024
@xmonader
Copy link
Contributor

Won't be fixed in 3.15

@xmonader xmonader added this to 3.16.x Oct 14, 2024
@xmonader xmonader removed this from 3.15.x Oct 14, 2024
@xmonader xmonader moved this to Blocked in 3.16.x Oct 14, 2024
@xmonader xmonader modified the milestones: 2.6.0, 2.7.0 Oct 14, 2024
@AhmedHanafy725 AhmedHanafy725 modified the milestones: 2.7.0, 2.8.0 Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard type_bug Something isn't working
Projects
Status: Blocked
Development

No branches or pull requests

5 participants