diff --git a/website/content/docs/v0.3/Reference/minimum-requirements.md b/website/content/docs/v0.3/Reference/minimum-requirements.md new file mode 100644 index 000000000..caf188ff5 --- /dev/null +++ b/website/content/docs/v0.3/Reference/minimum-requirements.md @@ -0,0 +1,22 @@ +--- +description: "System Requirements" +weight: 1 +title: System Requirements +--- + +## System Requirements + +Most of the time, Sidero does very little, so it needs very few resources. +However, since it is in charge of any number of workload clusters, it **should** +be built with redundancy. +It is also common, if the cluster is single-purpose, +to combine the controlplane and worker node roles. +Virtual machines are also +perfectly well-suited for this role. + +Minimum suggested dimensions: + +- Node count: 3 +- Node RAM: 4GB +- Node CPU: ARM64 or x86-64 class +- Node storage: 32GB storage on system disk diff --git a/website/content/docs/v0.3/Tutorial/create-workload.md b/website/content/docs/v0.3/Tutorial/create-workload.md new file mode 100644 index 000000000..a666fda4d --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/create-workload.md @@ -0,0 +1,125 @@ +--- +description: "Create a Workload Cluster" +weight: 1 +--- + +# Create a Workload Cluster + +Once created and accepted, you should see the servers that make up your ServerClasses appear as "available": + +```bash +$ kubectl get serverclass +NAME AVAILABLE IN USE +any ["00000000-0000-0000-0000-d05099d33360"] [] +``` + +## Generate Cluster Manifests + +We are now ready to generate the configuration manifest templates for our first workload +cluster. + +There are several configuration parameters that should be set in order for the templating to work properly: + +- `CONTROL_PLANE_ENDPOINT`: The endpoint used for the Kubernetes API server (e.g. `https://1.2.3.4:6443`). + This is the equivalent of the `endpoint` you would specify in `talosctl gen config`. + There are a variety of ways to configure a control plane endpoint. + Some common ways for an HA setup are to use DNS, a load balancer, or BGP. + A simpler method is to use the IP of a single node. + This has the disadvantage of being a single point of failure, but it can be a simple way to get running. +- `CONTROL_PLANE_SERVERCLASS`: The server class to use for control plane nodes. +- `WORKER_SERVERCLASS`: The server class to use for worker nodes. +- `KUBERNETES_VERSION`: The version of Kubernetes to deploy (e.g. `v1.21.1`). +- `CONTROL_PLANE_PORT`: The port used for the Kubernetes API server (port 6443) + +For instance: + +```bash +export CONTROL_PLANE_SERVERCLASS=any +export WORKER_SERVERCLASS=any +export TALOS_VERSION=v0.10.3 +export KUBERNETES_VERSION=v1.21.1 +export CONTROL_PLANE_PORT=6443 +export CONTROL_PLANE_ENDPOINT=1.2.3.4 + +clusterctl config cluster cluster-0 -i sidero > cluster-0.yaml +``` + +Take a look at this new `cluster-0.yaml` manifest and make any changes as you +see fit. +Feel free to adjust the `replicas` field of the `TalosControlPlane` and `MachineDeployment` objects to match the number of machines you want in your controlplane and worker sets, respecively. +`MachineDeployment` (worker) count is allowed to be 0. + +Of course, these may also be scaled up or down _after_ they have been created, +as well. + +## Create the Cluster + +When you are satisfied with your configuration, go ahead and apply it to Sidero: + +```bash +kubectl apply -f cluster-0.yaml +``` + +At this point, Sidero will allocate Servers according to the requests in the +cluster manifest. +Once allocated, each of those machines will be installed with Talos, given their +configuration, and form a cluster. + +You can watch the progress of the Servers being selected: + +```bash +watch kubectl --context=sidero-demo \ + get servers,machines,clusters +``` + +First, you should see the Cluster created in the `Provisioning` phase. +Once the Cluster is `Provisioned`, a Machine will be created in the +`Provisioning` phase. + +![machine provisioning](./images/sidero-cluster-start.png) + +During the `Provisioning` phase, a Server will become allocated, the hardware +will be powered up, Talos will be installed onto it, and it will be rebooted +into Talos. +Depending on the hardware involved, this may take several minutes. + +Eventually, the Machine should reach the `Running` phase. + +![machine_running](./images/sidero-cluster-up.png) + +The initial controlplane Machine will always be started first. +Any additional nodes will be started after that and will join the cluster when +they are ready. + +## Retrieve the Talosconfig + +In order to interact with the new machines (outside of Kubernetes), you will +need to obtain the `talosctl` client configuration, or `talosconfig`. +You can do this by retrieving the resource of the same type from the Sidero +management cluster: + +```bash +kubectl --context=sidero-demo \ + get talosconfig \ + -l cluster.x-k8s.io/cluster-name=cluster-0 \ + -o jsonpath='{.items[0].status.talosConfig}' \ + > cluster-0-talosconfig.yaml +``` + +## Retrieve the Kubeconfig + +With the talosconfig obtained, the workload cluster's kubeconfig can be retrieved in the normal Talos way: + +```bash +talosctl --talosconfig cluster-0.yaml kubeconfig +``` + +## Check access + +Now, you should have two cluster available: you management cluster +(`sidero-demo`) and your workload cluster (`cluster-0`). + +```bash +kubectl --context=sidero-demo get nodes +kubectl --context=cluster-0 get nodes +``` diff --git a/website/content/docs/v0.3/Tutorial/expose-services.md b/website/content/docs/v0.3/Tutorial/expose-services.md new file mode 100644 index 000000000..f16df6c42 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/expose-services.md @@ -0,0 +1,37 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Expose Sidero Services + +> If you built your cluster as specified in the [Prerequisite: Kubernetes] section in this tutorial, your services are already exposed and you can skip this section. + +There are two external Services which Sidero serves and which much be made +reachable by the servers which it will be driving. + +For most servers, TFTP (port 69/udp) will be needed.\ +This is used for PXE booting, both BIOS and UEFI. +Being a primitive UDP protocl, many load balancers do not support TFTP. +Instead, solutions such as [MetalLB](https://metallb.universe.tf) may be used to expose TFTP over a known IP address. +For servers which support UEFI HTTP Network Boot, TFTP need not be used. + +The kernel, initrd, and all configuration assets are served from the HTTP service +(port 8081/tcp). +It is needed for all servers, but since it is HTTP-based, it +can be easily proxied, load balanced, or run through an ingress controller. + +The main thing to keep in mind is that the services **MUST** match the IP or +hostname specified by the `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` environment +variable (or configuration parameter) when you installed Sidero. + +It is a good idea to verify that the services are exposed as you think they +should be. + +```bash +$ curl -I http://192.168.1.150:8081/tftp/ipxe.efi +HTTP/1.1 200 OK +Accept-Ranges: bytes +Content-Length: 1020416 +Content-Type: application/octet-stream +``` diff --git a/website/content/docs/v0.3/Tutorial/images/sidero-cluster-start.png b/website/content/docs/v0.3/Tutorial/images/sidero-cluster-start.png new file mode 100644 index 000000000..f53c0140f Binary files /dev/null and b/website/content/docs/v0.3/Tutorial/images/sidero-cluster-start.png differ diff --git a/website/content/docs/v0.3/Tutorial/images/sidero-cluster-up.png b/website/content/docs/v0.3/Tutorial/images/sidero-cluster-up.png new file mode 100644 index 000000000..0eb0ae5b6 Binary files /dev/null and b/website/content/docs/v0.3/Tutorial/images/sidero-cluster-up.png differ diff --git a/website/content/docs/v0.3/Tutorial/import-machines.md b/website/content/docs/v0.3/Tutorial/import-machines.md new file mode 100644 index 000000000..5c19d740b --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/import-machines.md @@ -0,0 +1,74 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Import Workload Machines + +At this point, any servers on the same network as Sidero should network boot from Sidero. +To register a server with Sidero, simply turn it on and Sidero will do the rest. +Once the registration is complete, you should see the servers registered with `kubectl get servers`: + +```bash +$ kubectl get servers -o wide +NAME HOSTNAME ACCEPTED ALLOCATED CLEAN +00000000-0000-0000-0000-d05099d33360 192.168.1.201 false false false +``` + +## Accept the Servers + +Note in the output above that the newly registered servers are not `accepted`. +In order for a server to be eligible for consideration, it _must_ be marked as `accepted`. +Before a `Server` is accepted, no write action will be performed against it. +This default is for safety (don't accidentally delete something just because it +was plugged in) and security (make sure you know the machine before it is given +credentials to communicate). + +> Note: if you are running in a safe environment, you can configure Sidero to +> automatically accept new machines. + +For more information on server acceptance, see the [server docs](/docs/v0.3/configuration/servers/#server-acceptance). + +## Create ServerClasses + +By default, Sidero comes with a single ServerClass `any` which matches any +(accepted) server. +This is sufficient for this demo, but you may wish to have +more flexibility by defining your own ServerClasses. + +ServerClasses allow you to group machines which are sufficiently similar to +allow for unnamed allocation. +This is analogous to cloud providers using such classes as `m3.large` or +`c2.small`, but the names are free-form and only need to make sense to you. + +For more information on ServerClasses, see the [ServerClass +docs](/docs/v0.3/configuration/serverclasses/). + +## Hardware differences + +In baremetal systems, there are commonly certain small features and +configurations which are unique to the hardware. +In many cases, such small variations may not require special configurations, but +others do. + +If hardware-specific differences do mandate configuration changes, we need a way +to keep those changes local to the hardware specification so that at the higher +level, a Server is just a Server (or a server in a ServerClass is just a Server +like all the others in that Class). + +The most common variations seem to be the installation disk and the console +serial port. + +Some machines have NVMe drives, which show up as something like `/dev/nvme0n1`. +Others may be SATA or SCSI, which show up as something like `/dev/sda`. +Some machines use `/dev/ttyS0` for the serial console; others `/dev/ttyS1`. + +Configuration patches can be applied to either Servers or ServerClasses, and +those patches will be applied to the final machine configuration for those +nodes without having to know anything about those nodes at the allocation level. + +For examples of install disk patching, see the [Installation Disk +doc](/docs/v0.3/configuration/servers/#installation-disk). + +For more information about patching in general, see the [Patching +Guide](/docs/v0.3/guides/patching). diff --git a/website/content/docs/v0.3/Tutorial/index.md b/website/content/docs/v0.3/Tutorial/index.md new file mode 100644 index 000000000..9b663063b --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/index.md @@ -0,0 +1,62 @@ +--- +description: "Tutorial Index" +weight: 1 +--- + +# Tutorial Index + +This tutorial will walk you through a complete Sidero setup and the formation, +scaling, and destruction of a workload cluster. + +To complete this tutorial, you will need a few things: + +- ISC DHCP server. + While any DHCP server will do, we will be presenting the + configuration syntax for ISC DHCP. + This is the standard DHCP server available on most Linux distributions (NOT + dnsmasq) as well as on the Ubiquiti EdgeRouter line of products. +- Machine or Virtual Machine on which to run Sidero itself. + The requirements for this machine are very low, but it does need to be x86 for + now, and it should have at least 4GB of RAM. +- Machines on which to run Kubernetes clusters. + These have the same minimum specifications as the Sidero machine. +- Workstation on which `talosctl`, `kubectl`, and `clusterctl` can be run. + +## Steps + +1. Prerequisite: CLI tools +1. Prerequisite: DHCP server +1. Prerequisite: Kubernetes +1. Install Sidero +1. Expose Services +1. Import workload machines +1. Create a workload cluster +1. Scale the workload cluster +1. Destroy the workload cluster +1. Optional: Pivot management cluster + +## Useful Terms + +**ClusterAPI** or **CAPI** is the common system for managing Kubernetes clusters +in a declarative fashion. + +**Management Cluster** is the cluster on which Sidero itself runs. +It is generally a special-purpose Kubernetes cluster whose sole responsibility +is maintaining the CRD database of Sidero and providing the services necessary +to manage your workload Kubernetes clusters. + +**Sidero** is the ClusterAPI-powered system which manages baremetal +infrastructure for Kubernetes. + +**Talos** is the Kubernetes-focused Linux operating system built by the same +people who bring to you Sidero. +It is a very small, entirely API-driven OS which is meant to provide a reliable +and self-maintaining base on which Kubernetes clusters may run. +More information about Talos can be found at +[https://talos.dev](https://talos.dev). + +**Workload Cluster** is a cluster, managed by Sidero, on which your Kubernetes +workloads may be run. +The workload clusters are where you run your own applications and infrastruture. +Sidero creates them from your available resources, maintains them over time as +your needs and resources change, and removes them whenever it is told to do so. diff --git a/website/content/docs/v0.3/Tutorial/install-clusterapi.md b/website/content/docs/v0.3/Tutorial/install-clusterapi.md new file mode 100644 index 000000000..a014b160d --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/install-clusterapi.md @@ -0,0 +1,45 @@ +--- +description: "Install Sidero" +weight: 1 +--- + +# Install Sidero + +Sidero is included as a default infrastructure provider in `clusterctl`, so the +installation of both Sidero and the Cluster API (CAPI) components is as simple +as using the `clusterctl` tool. + +> Note: Because Cluster API upgrades are _stateless_, it is important to keep all Sidero +> configuration for reuse during upgrades. + +Sidero has a number of configuration options which should be supplied at install +time, kept, and reused for upgrades. +These can also be specified in the `clusterctl` configuration file +(`$HOME/.cluster-api/clusterctl.yaml`). +You can reference the `clusterctl` +[docs](https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#clusterctl-configuration-file) +for more information on this. + +For our purposes, we will use environment variables for our configuration +options. + +```bash +export SIDERO_CONTROLLER_MANAGER_HOST_NETWORK=true +export SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=192.168.1.150 + +clusterctl init -b talos -c talos -i sidero +``` + +First, we are telling Sidero to use `hostNetwork: true` so that it binds its +ports directly to the host, rather than being available only from inside the +cluster. +There are many ways of exposing the services, but this is the simplest +path for the single-node management cluster. +When you scale the management cluster, you will need to use an alternative +method, such as an external load balancer or something like +[MetalLB](https://metallb.universe.tf). + +The `192.168.1.150` IP address is the IP address or DNS hostname as seen from the workload +clusters. +In our case, this should be the main IP address of your Docker +workstation. diff --git a/website/content/docs/v0.3/Tutorial/local-settings.md b/website/content/docs/v0.3/Tutorial/local-settings.md new file mode 100644 index 000000000..5ad5f7c3c --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/local-settings.md @@ -0,0 +1,171 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Local Configuration + +## Create the Default Environment + +We must now create an `Environment` in our bootstrap cluster. +An environment is a CRD that tells the PXE component of Sidero what information to return to nodes that request a PXE boot after completing the registration process above. +Things that can be controlled here are kernel flags and the kernel and init images to use. + +To create a default environment that will use the latest published Talos release, issue the following: + +```bash +cat < management-plane.yaml +``` + +Note that there are several variables that should be set in order for the templating to work properly: + +- `CONTROL_PLANE_ENDPOINT`: The endpoint used for the Kubernetes API server (e.g. `https://1.2.3.4:6443`). + This is the equivalent of the `endpoint` you would specify in `talosctl gen config`. + There are a variety of ways to configure a control plane endpoint. + Some common ways for an HA setup are to use DNS, a load balancer, or BGP. + A simpler method is to use the IP of a single node. + This has the disadvantage of being a single point of failure, but it can be a simple way to get running. +- `CONTROL_PLANE_SERVERCLASS`: The server class to use for control plane nodes. +- `WORKER_SERVERCLASS`: The server class to use for worker nodes. +- `KUBERNETES_VERSION`: The version of Kubernetes to deploy (e.g. `v1.19.4`). +- `CONTROL_PLANE_PORT`: The port used for the Kubernetes API server (port 6443) + +For instance: + +```bash +export CONTROL_PLANE_SERVERCLASS=master +export WORKER_SERVERCLASS=worker +export KUBERNETES_VERSION=v1.20.1 +export CONTROL_PLANE_PORT=6443 +export CONTROL_PLANE_ENDPOINT=1.2.3.4 +clusterctl config cluster management-plane -i sidero > management-plane.yaml +``` + +In addition, you can specify the replicas for control-plane & worker nodes in +management-plane.yaml manifest for TalosControlPlane and MachineDeployment +objects. +Also, they can be scaled if needed: + +```bash +kubectl get taloscontrolplane +kubectl get machinedeployment +kubectl scale taloscontrolplane management-plane-cp --replicas=3 +``` + +Now that we have the manifest, we can simply apply it: + +```bash +kubectl apply -f management-plane.yaml +``` + +**NOTE: The templated manifest above is meant to act as a starting point. +If customizations are needed to ensure proper setup of your Talos cluster, they should be added before applying.** + +Once the management plane is setup, you can fetch the talosconfig by using the cluster label. +Be sure to update the cluster name and issue the following command: + +```bash +kubectl get talosconfig \ + -l cluster.x-k8s.io/cluster-name= \ + -o yaml -o jsonpath='{.items[0].status.talosConfig}' > management-plane-talosconfig.yaml +``` + +With the talosconfig in hand, the management plane's kubeconfig can be fetched with `talosctl --talosconfig management-plane-talosconfig.yaml kubeconfig` + +## Pivoting + +Once we have the kubeconfig for the management cluster, we now have the ability to pivot the cluster from our bootstrap. +Using clusterctl, issue: + +```bash +clusterctl init --kubeconfig=/path/to/management-plane/kubeconfig -i sidero -b talos -c talos +``` + +Followed by: + +```bash +clusterctl move --to-kubeconfig=/path/to/management-plane/kubeconfig +``` + +Upon completion of this command, we can now tear down our bootstrap cluster with `talosctl cluster destroy` and begin using our management plane as our point of creation for all future clusters! +:q! diff --git a/website/content/docs/v0.3/Tutorial/pivot.md b/website/content/docs/v0.3/Tutorial/pivot.md new file mode 100644 index 000000000..89a4ebf23 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/pivot.md @@ -0,0 +1,44 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Optional: Pivot management cluster + +Having the Sidero cluster running inside a Docker container is not the most +robust place for it, but it did make for an expedient start. + +Conveniently, you can create a Kubernetes cluster in Sidero and then _pivot_ the +management plane over to it. + +Start by creating a workload cluster as you have already done. +In this example, this new cluster is called `management`. + +After the new cluster is available, install Sidero onto it as we did before, +making sure to set all the environment variables or configuration parameters for +the _new_ management cluster first. + +```bash +export SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=sidero.mydomain.com + +clusterctl init \ + --kubeconfig-context=management + -i sidero -b talos -c talos +``` + +Now, you can move the database from `sidero-demo` to `management`: + +```bash +clusterctl move \ + --kubeconfig-context=sidero-demo \ + --to-kubeconfig-context=management +``` + +## Delete the old Docker Management Cluster + +If you created your `sidero-demo` cluster using Docker as described in this +tutorial, you can now remove it: + +```bash +talosctl cluster destroy --name sidero-demo +``` diff --git a/website/content/docs/v0.3/Tutorial/prereq-cli-tools.md b/website/content/docs/v0.3/Tutorial/prereq-cli-tools.md new file mode 100644 index 000000000..54b658467 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/prereq-cli-tools.md @@ -0,0 +1,58 @@ +--- +description: "Prerequisite: CLI tools" +weight: 10 +--- + +# Prerequisite: CLI tools + +You will need three CLI tools installed on your workstation in order to interact +with Sidero: + +- `kubectl` +- `clusterctl` +- `talosctl` + +## Install `kubectl` + +Since `kubectl` is the standard Kubernetes control tool, many distributions +already exist for it. +Feel free to check your own package manager to see if it is available natively. + +Otherwise, you may install it directly from the main distribution point. +The main article for this can be found +[here](https://kubernetes.io/docs/tasks/tools/#kubectl). + +```bash +sudo curl -Lo /usr/local/bin/kubectl \ + "https://dl.k8s.io/release/$(\ + curl -L -s https://dl.k8s.io/release/stable.txt\ + )/bin/linux/amd64/kubectl" +sudo chmod +x /usr/local/bin/kubectl +``` + +## Install `clusterctl` + +The `clusterctl` tool is the standard control tool for ClusterAPI (CAPI). +It is less common, so it is also less likely to be in package managers. + +The main article for installing `clusterctl` can be found +[here](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl). + +```bash +sudo curl -Lo /usr/local/bin/clusterctl \ + "https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.14/clusterctl-linux-amd64" \ +sudo chmod +x /usr/local/bin/clusterctl +``` + +## Install `talosctl` + +The `talosctl` tool is used to interact with the Talos (our Kubernetes-focused +operating system) API. +The latest version can be found on our +[Releases](https://github.com/talos-systems/talos/releases) page. + +```bash +sudo curl -Lo /usr/local/bin/talosctl \ + "https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr '[:upper:]' '[:lower:]')-amd64" +chmod +x /usr/local/bin/talosctl +``` diff --git a/website/content/docs/v0.3/Tutorial/prereq-dhcp.md b/website/content/docs/v0.3/Tutorial/prereq-dhcp.md new file mode 100644 index 000000000..77e5d2e72 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/prereq-dhcp.md @@ -0,0 +1,143 @@ +--- +description: "Prerequisite: DHCP Service" +weight: 11 +--- + +# Prerequisite: DHCP service + +In order to network boot Talos, we need to set up our DHCP server to supply the +network boot parameters to our servers. +For maximum flexibility, Sidero makes use of iPXE to be able to reference +artifacts via HTTP. +Some modern servers support direct UEFI HTTP boot, but most existing servers +still rely on the old, slow TFTP-based PXE boot first. +Therefore, we need to tell our DHCP server to find the iPXE binary on a TFTP +server. + +Conveniently, Sidero comes with a TFTP server which will serve the appropriate +files. +We need only set up our DHCP server to point to it. + +The tricky bit is that at different phases, we need to serve different assets, +but they all use the same DHCP metadata key. + +In fact, for each architecture, we have as many as four different client types: + +- Legacy BIOS-based PXE boot (undionly.kpxe via TFTP) +- UEFI-based PXE boot (ipxe.efi via TFTP) +- UEFI HTTP boot (ipxe.efi via HTTP URL) +- iPXE (boot.ipxe via HTTP URL) + +## Common client types + +If you are lucky and all of the machines in a given DHCP zone can use the same +network boot client mechanism, your DHCP server only needs to provide two +options: + +- `Server-Name` (option 66) with the IP of the Sidero TFTP service +- `Bootfile-Name` (option 67) with the appropriate value for the boot client type: + - Legacy BIOS PXE boot: `undionly.kpxe` + - UEFI-based PXE boot: `ipxe.efi` + - UEFI HTTP boot: `http://sidero-server-url/tftp/ipxe.efi` + - iPXE boot: `http://sidero-server-url/boot.ipxe` + +In the ISC DHCP server, these options look like: + +```config +next-server 172.16.199.50; +filename "ipxe.efi"; +``` + +## Multiple client types + +Any given server will usually use only one of those, but if you have a mix of +machines, you may need a combination of them. +In this case, you would need a way to provide different images for different +client or machine types. + +Both ISC DHCP server and dnsmasq provide ways to supply such conditional responses. +In this tutorial, we are working with ISC DHCP. + +For modularity, we are breaking the conditional statements into a separate file +and using the `include` statement to load them into the main `dhcpd.conf` file. + +In our example below, `172.16.199.50` is the IP address of our Sidero service. + +`ipxe-metal.conf`: + +```config +allow bootp; +allow booting; + +# IP address for PXE-based TFTP methods +next-server 172.16.199.50; + +# Configuration for iPXE clients +class "ipxeclient" { + match if exists user-class and (option user-class = "iPXE"); + filename "http://172.16.199.50/boot.ipxe"; +} + +# Configuration for legacy BIOS-based PXE boot +class "biosclients" { + match if not exists user-class and substring (option vendor-class-identifier, 15, 5) = "00000"; + filename "undionly.kpxe"; +} + +# Configuration for UEFI-based PXE boot +class "pxeclients" { + match if not exists user-class and substring (option vendor-class-identifier, 0, 9) = "PXEClient"; + filename "ipxe.efi"; +} + +# Configuration for UEFI-based HTTP boot +class "httpclients" { + match if not exists user-class and substring (option vendor-class-identifier, 0, 10) = "HTTPClient"; + option vendor-class-identifier "HTTPClient"; + filename "http://172.16.199.50/tftp/ipxe.efi"; +} +``` + +Once this file is created, we can include it from our main `dhcpd.conf` inside a +`subnet` section. + +```config +shared-network sidero { + subnet 172.16.199.0 netmask 255.255.255.0 { + option domain-name-servers 8.8.8.8, 1.1.1.1; + option routers 172.16.199.1; + include "/etc/dhcp/ipxe-metal.conf"; + } +} +``` + +Since we use a number of Ubiquiti EdgeRouter devices especially in our home test +networks, it is worth mentioning the curious syntax gymnastics we must go +through there. +Essentially, the quotes around the path need to be entered as HTML entities: +`"e;`. + +Ubiquiti EdgeRouter configuration statement: + +```config +set service dhcp-server shared-network-name sidero \ + subnet 172.16.199.1 \ + subnet-parameters "include "e;/config/ipxe-metal.conf";" +``` + +Also note the fact that there are two semicolons at the end of the line; +the first is part of the HTML-encoded quote; +the second is the actual terminating semicolon. + +## Troubleshooting + +Getting the netboot environment is tricky and debugging it is difficult. +Once running, it will generally stay running; +the problem is nearly always one of a missing or incorrect configuration, since +the process involves several different components. + +We are working toward integrating as much as possible into Sidero, to provide as +much intelligence and automation as can be had, but until then, you will likely +need to figure out how to begin hunting down problems. + +See the Sidero [Troubleshooting](troubleshooting) guide for more assistance. diff --git a/website/content/docs/v0.3/Tutorial/prereq-kubernetes.md b/website/content/docs/v0.3/Tutorial/prereq-kubernetes.md new file mode 100644 index 000000000..8bbcaf44d --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/prereq-kubernetes.md @@ -0,0 +1,87 @@ +--- +description: "Prerequisite: Kubernetes" +weight: 11 +--- + +# Prerequisite: Kubernetes + +In order to run Sidero, you first need a Kubernetes "cluster". +There is nothing special about this cluster. +It can be, for example: + +- a Kubernetes cluster you already have +- a single-node cluster running in Docker on your laptop +- a cluster running inside a virtual machine stack such as VMWare +- a Talos Kubernetes cluster running on a spare machine + +Two important things are needed in this cluster: + +- Kubernetes `v1.18` or later +- Ability to expose tcp and udp Services to the workload cluster machines + +For the purposes of this tutorial, we will create this cluster in Docker on a +workstation, perhaps a laptop. + +If you already have a suitable Kubernetes cluster, feel free to skip this step. + +## Create a Local Management Cluster + +The `talosctl` CLI tool has built-in support for spinning up Talos in docker containers. +Let's use this to our advantage as an easy Kubernetes cluster to start from. + +Issue the following to create a single-node Docker-based Kubernetes cluster: + +```bash +export HOST_IP="192.168.1.150" + +talosctl cluster create \ + --name sidero-demo \ + -p 69:69/udp,8081:8081/tcp \ + --workers 0 \ + --config-patch '[{"op": "add", "path": "/cluster/allowSchedulingOnMasters", "value": true}]' \ + --endpoint $HOST_IP +``` + +The `192.168.1.150` IP address should be changed to the IP address of your Docker +host. +This is _not_ the Docker bridge IP but the standard IP address of the +workstation. + +Note that there are two ports mentioned in the command above. +The first (69) is +for TFTP. +The second (8081) is for the web server (which serves netboot +artifacts and configuration). + +Exposing them here allows us to access the services that will get deployed on this node. +In turn, we will be running our Sidero services with `hostNetwork: true`, +so the Docker host will forward these to the Docker container, +which will in turn be running in the same namespace as the Sidero Kubernetes components. +A full separate management cluster will likely approach this differently, +with a load balancer or a means of sharing an IP address across multiple nodes (such as with MetalLB). + +Finally, the `--config-patch` is optional, +but since we are running a single-node cluster in this Tutorial, +adding this will allow Sidero to run on the controlplane. +Otherwise, you would need to add worker nodes to this management plane cluster to be +able to run the Sidero components on it. + +## Access the cluster + +Once the cluster create command is complete, you can retrieve the kubeconfig for it using the Talos API: + +```bash +talosctl kubeconfig +``` + +> Note: by default, Talos will merge the kubeconfig for this cluster into your +> standard kubeconfig under the context name matching the cluster name your +> created above. +> If this name conflicts, it will be given a `-1`, a `-2` or so +> on, so it is generally safe to run. +> However, if you would prefer to not modify your standard kubeconfig, you can +> supply a directory name as the third parameter, which will cause a new +> kubeconfig to be created there instead. +> Remember that if you choose to not use the standard location, your should set +> your `KUBECONFIG` environment variable or pass the `--kubeconfig` option to +> tell the `kubectl` client the name of the `kubeconfig` file. diff --git a/website/content/docs/v0.3/Tutorial/scale-workload.md b/website/content/docs/v0.3/Tutorial/scale-workload.md new file mode 100644 index 000000000..dec7f77d5 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/scale-workload.md @@ -0,0 +1,15 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Scale the Workload Cluster + +If you have more machines available, you can scale both the controlplane +(`TalosControlPlane`) and the workers (`MachineDeployment`) for any cluster +after it has been deployed. +This is done just like normal Kubernetes `Deployments`. + +```bash +kubectl scale taloscontrolplane cluster-0-cp --replicas=3 +``` diff --git a/website/content/docs/v0.3/Tutorial/troubleshooting.md b/website/content/docs/v0.3/Tutorial/troubleshooting.md new file mode 100644 index 000000000..440693a49 --- /dev/null +++ b/website/content/docs/v0.3/Tutorial/troubleshooting.md @@ -0,0 +1,78 @@ +--- +description: "Troubleshooting" +weight: 99 +--- + +# Troubleshooting + +The first thing to do in troubleshooting problems with the Sidero installation +and operation is to figure out _where_ in the process that failure is occurring. + +Keep in mind the general flow of the pieces. +For instance: + +1. A server is configured by its BIOS/CMOS to attempt a network boot using the PXE firmware on +its network card(s). +1. That firmware requests network and PXE boot configuration via DHCP. +1. DHCP points the firmware to the Sidero TFTP or HTTP server (depending on the firmware type). +1. The second stage boot, iPXE, is loaded and makes an HTTP request to the + Sidero metadata server for its configuration, which contains the URLs for + the kernel and initrd images. +1. The kernel and initrd images are downloaded by iPXE and boot into the Sidero + agent software (if the machine is not yet known and assigned by Sidero). +1. The agent software reports to the Sidero metadata server via HTTP the hardware information of the machine. +1. A (usually human or external API) operator verifies and accepts the new + machine into Sidero. +1. The agent software reboots and wipes the newly-accepted machine, then powers + off the machine to wait for allocation into a cluster. +1. The machine is allocated by Sidero into a Kubernetes Cluster. +1. Sidero tells the machine, via IPMI, to boot into the OS installer + (following all the same network boot steps above). +1. The machine downloads its configuration from the Sidero metadata server via + HTTP. +1. The machine applies its configuration, installs a bootloader, and reboots. +1. The machine, upon reboot from its local disk, joins the Kubernetes cluster + and continues until Sidero tells it to leave the cluster. +1. Sidero tells the machine to leave the cluster and reboots it into network + boot mode, via IPMI. +1. The machine netboots into wipe mode, wherein its disks are again wiped to + come back to the "clean" state. +1. The machine again shuts down and waits to be needed. + +## Device firmware (PXE boot) + +The worst place to fail is also, unfortunately, the most common. +This is the firmware phase, where the network card's built-in firmware attempts +to initiate the PXE boot process. +This is the worst place because the firmware is completely opaque, with very +little logging, and what logging _does_ appear frequently is wiped from the +console faster than you can read it. + +If you fail here, the problem will most likely be with your DHCP configuration, +though it _could_ also be in the Sidero TFTP service configuration. + +## Validate Sidero TFTP service + +The easiest to validate is to use a `tftp` client to validate that the Sidero +TFTP service is available at the IP you are advertising via DHCP. + +```bash + $ atftp 172.16.199.50 + tftp> get ipxe.efi +``` + +TFTP is an old, slow protocol with very little feedback or checking. +Your only real way of telling if this fails is by timeout. +Over a local network, this `get` command should take a few seconds. +If it takes longer than 30 seconds, it is probably not working. + +Success is also not usually indicated: +you just get a prompt returned, and the file should show up in your current +directory. + +If you are failing to connect to TFTP, the problem is most likely with your +Sidero Service exposure: +how are you exposing the TFTP service in your management cluster to the outside +world? +This normally involves either setting host networking on the Deployment or +installing and using something like MetalLB.