Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERFSCALE-3427] Public VLAN improvements #549

Merged
merged 10 commits into from
Oct 2, 2024

Conversation

rsevilla87
Copy link
Member

Adding some improvements to deploy in public VLANs in bm. Also updating the docs accordigly

@rsevilla87 rsevilla87 changed the title Public VLAN improvements [PERFSCALE-3427] Public VLAN improvements Sep 18, 2024
@rsevilla87
Copy link
Member Author

cc @akrzos

@akrzos akrzos requested review from dbutenhof and akrzos September 18, 2024 19:08
Comment on lines 5 to 11
{% if lab_name == "scalelab" %}
base_dns_name=rdu2.scalelab.redhat.com
{% elif lab_name == "performancelab" %}
base_dns_name=rdu3.labs.perfscale.redhat.com
{% else %}
base_dns_name=example.com
{% endif %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to carefully review this as base_dns_name today is a var within this file and not an inventory var. Inventory vars must be referenced from the context of a host vs just being a var. We should be able to switch it just need to think through all of the potential cases this would touch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to include it as an inventory lab since this variable its referenced in the create-ai-cluster role, which is executed in the deploy stage. What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but it's not actually referenced as a host var, the difference would be something like such:

Inventory:

[all:vars]
allocation_node_count=15
supermicro_nodes=False
base_dns_name=bar

[bastion]
e38-h01-...

Playbook:

- name: Test
  hosts: bastion
  gather_facts: false
  vars:
    base_dns_name: foo
  tasks:
  - name: debug task
    debug:
      msg: "{{ item }}"
    loop:
    - "{{ base_dns_name }}"
    - "{{ hostvars[inventory_hostname].base_dns_name }}"

Output:

PLAY [Test] **********************************************************************************************************************************************************************************

TASK [debug task] ****************************************************************************************************************************************************************************
Friday 20 September 2024  09:33:28 -0400 (0:00:00.005)       0:00:00.005 ****** 
ok: [e38-h01-...] => (item=foo) => {
    "msg": "foo"
}
ok: [e38-h01-...] => (item=bar) => {
    "msg": "bar"
}

PLAY RECAP ***********************************************************************************************************************************************************************************
e38-h01-...

So if we want the automatic configuration of the base_dns_name with the determination to be in "step1" the create-inventory we would need to migrate all references of base_dns_name (aside from create-inventory templating the inventory) to {{ hostvars[inventory_hostname].base_dns_name }} Let me know if this makes sense because it would apply to cluster_name as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion would be that base_dns_name and cluster_name be "step1 one vars" in all.yml meaning they just get rendered into the inventory file and all future usage of them is actually through a hostvar. Folks would have to keep in mind if they intend on changing the cluster_name or base_dns_name they would have to start from the first step which is creating inventory.


- `public_vlan`: Enable this variable **before creating the inventory**, to let jetlag to auto-configure the interface number (`controlplane_network_interface_idx`) to the last of the available NICs of the ocpinventory.json file, when this variable is configured, `base_dns_name` is set to `rdu2.scalelab.redhat.com` in the inventory file.
- `cluster_name`: Should be configured to the public VLAN name. i.e: `vlan600`, check the public VLANs allocation info in the [scalelab's wiki](https://wiki.rdu3.labs.perfscale.redhat.com/vlans/)
- `controlplane_network`, `network_prefix` and `gateway` also need to be configured to its corresponding values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could get quads to expose these values as json too, and add automation to automatically grab the typical values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that would be ideal, but as far as I know, quads only expose VLAN allocation at https://wiki.rdu3.labs.perfscale.redhat.com/vlans/, I'll ask the quads team just in case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I included some quads API magic to autogen most of the variables we need to deploy in the public VLAN, seems like quads has an older version in the performancelab though which and the API endpoint is not available, currently investigating that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the performancelab working code, not very fancy... (actually very ugly), but as long as quads runs an older version in the performancelab I haven't found another alternatyve

- [Disconnected and ipv6 vars](#disconnected-and-ipv6-vars)
- [Review vars `all.yml`](#review-vars-allyml)
- [Run playbooks](#run-playbooks)
- [Monitor install and interact with cluster](#monitor-install-and-interact-with-cluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 Just wanted to point out I really appreciate the inclusion of documentation with your pull request, this will be super helpful for when other folks want to deploy a public vlan cluster.

@@ -11,6 +11,9 @@ lab_cloud:
# Either bm or rwn or sno
cluster_type:

# Applies to both bm/rwn clusters
cluster_name: bm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should adjust the default for cluster_name to mno since it will only effect multi-node clusters. I also believe we should remove it from here since we want to bubble this up to the all.yml. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1,6 +1,16 @@
[all:vars]
allocation_node_count={{ ocpinventory.json.nodes | length }}
supermicro_nodes={{ has_supermicro | bool }}
cluster_name={{ cluster_name }}
Copy link
Member Author

@rsevilla87 rsevilla87 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_name is required here as it needs to be configured by the create-inventory playbook when using a publi VLAN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put this inside the public_vlan conditional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I'm still uncomfortable with the grep -B7. The risk here is that something will eventually break these assumptions before before the performancelab QUADS is upgraded and "someone" gets around to removing your performancelab workaround.

I don't see a practical alternative; so even though I dislike leaving that sort of tech debt, I'm approving.

@akrzos
Copy link
Member

akrzos commented Sep 30, 2024

@radez Did you have any feedback to this PR? I know you have been working with public vlan SNO clusters.

@@ -324,13 +344,22 @@ lab_cloud: cloud99
# Either bm or rwn or sno
cluster_type: bm

# Applies to both bm/rwn clusters
cluster_name: mno
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also stay bm until bm is replaced with mno

@@ -321,9 +341,27 @@ lab_cloud: cloud99
# Either bm or rwn or sno
cluster_type: bm

# Applies to both bm/rwn clusters
cluster_name: mno
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another bm/mno

@akrzos akrzos merged commit 8aee2a8 into redhat-performance:main Oct 2, 2024
@rsevilla87 rsevilla87 deleted the public-vlan branch October 2, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants