Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Windows hosts through WinRM #216

Merged
merged 1 commit into from
May 13, 2020

Conversation

dvanbrug
Copy link
Contributor

Fixes #173.

Based on WinRM plugin from https://github.com/Azure/vagrant-azure
and existing code here.

Windows instances are already setup to use WinRM, however
setting up a temporary password is required before being
able to connect. This feature uses the Google APIs directly
since the functionality to reset Windows passwords is not
available in fog-google.

Once a temporary password is set, the final password is pulled
from the Vagrantfile and set over the WinRM channel.

@dvanbrug
Copy link
Contributor Author

@Temikus, Here is the Pull request I mentioned. This is my first Vagrant plugin so I would appreciate any feedback.

The plugin does require the setting of the GOOGLE_APPLICATION_CREDENTIALS environment variable to use the same JSON file that is used by the rest of the vagrant-google plugin. I could not figure out how to pass the file directly to the googl-api-ruby-client.

@Temikus
Copy link
Collaborator

Temikus commented May 12, 2019

@dvanbrug Sorry for the response delay - this is really cool! I think we can pull through the functionality to reset windows passwords into fog-google as I maintain that library as well.

Can you give me an example Vagrantfile that I can use to debug/repro?

@dvanbrug
Copy link
Contributor Author

Oh cool! Here is an example Vagrantfile: https://gist.github.com/dvanbrug/f24572ac7eb340cda096273a640c9e99

One note, I've found a couple of bugs in this implementation but have not had time to chase them down yet. They both have to do with how the Ansible hosts file, likely due to some race conditions or incorrect error handling on my part.

dvanbrug#1
dvanbrug#2

@GodKratos
Copy link

GodKratos commented Sep 19, 2019

+1 for me, really need to build some windows machines with WinRM access.
Looking through the code though (and its my first time with Ruby) it looks like the ports are hard coded and it doesn't support all the winrm communicator options that are available.
https://www.vagrantup.com/docs/vagrantfile/winrm_settings.html
It would need to be able to use all of those options to work properly as I'm forced to use non standard settings in my environment.

I would have thought the winrm connection stuff would have been separate to the provider however. The provider builds a box and exposes its public/private IP and then the config.vm.communicator and config.winrm settings should define how vagrant connects to that box after it has been provisioned which should be the same regardless of whether its gce, azure, aws or any other provider.

@dvanbrug
Copy link
Contributor Author

@GodKratos, are you still interested in this? I thought I'd try and dust it off and give things another go.

The current version was really setup just to make WinRM communicator work with GCE defaults. Part of this was dealing with the dance of GCE that requires setting a temporary password and then setting your desired password. The rest was using the default ports that GCE had in place.

For your "non-standard" settings, is this based on the standard windows images provided by GCE or custom windows images that you are using? If the latter, is it correct to assume you don't have to go through Google APIs to set a temporary password before being able to setup WinRM settings?

@dvanbrug
Copy link
Contributor Author

@Temikus I am brushing this off and reworking the pull request. I wanted to get your opinion on my approach.

In order to get a Google-based Windows instance up and running, we need to perform the following steps:

  • Reset password / Create User
    • This actually creates the Windows user that will be used by the WinRM communicator. A randomly generated password is then returned for that user
  • Set Longterm Password
    • This uses WinRM with the temporary password from the previous step to change the password.

For the rework, first, I've pulled out the "Password Reset" functionality into fog-google. This allows vagrant-google to just call the reset functionality. I'll submit that pull request shortly.

Second, I am trying to figure out the best place to add the needed functionality within vagrant-google. I am thinking I'd create a new action called setup_windows_password but am not sure where I should call it from.

My first thought was in run-instance, around line 289

This is because after this point, while the next block of code is a little specific to SSH, if we succeed in setting up the correct WinRM password, the next block should work as written.

If this is a good location, what would be a good way to call the reset code only if we are using a Google provided image? Some sort of flag within the config to indicate that we should reset the password? Are there any examples of these sort of flags being used elsewhere?

Thank you any ideas you may have!

@Temikus
Copy link
Collaborator

Temikus commented Mar 27, 2020

@Temikus I am brushing this off and reworking the pull request. I wanted to get your opinion on my approach.

Awesome, thanks! Apologies this didn't move much last year. I've changed roles and then had some medical issues which took me out of commission for a while m(_ _)m
I'm back in business now though, so should be able to give you quick feedback 👍

For the rework, first, I've pulled out the "Password Reset" functionality into fog-google. This allows vagrant-google to just call the reset functionality. I'll submit that pull request shortly.

Great! I'm prepping a new release so I think we should be able to push it through quickly.

Second, I am trying to figure out the best place to add the needed functionality within vagrant-google. I am thinking I'd create a new action called setup_windows_password but am not sure where I should call it from.

My first thought was in run-instance, around line 289

Generally new actions should be called in the action.rb only, e.g. somewhere after RunInstance in the initialization flow:
https://github.com/mitchellh/vagrant-google/blob/master/lib/vagrant-google/action.rb#L151

If this is a good location, what would be a good way to call the reset code only if we are using a Google provided image? Some sort of flag within the config to indicate that we should reset the password? Are there any examples of these sort of flags being used elsewhere?

Hm, if it's only google images that need to be reset, I think it's reasonable to hardcode a list of windows image projects (e.g. windows-cloud) and default to a password reset behaviour for those + make an option to override it, if needed. You can also try spelunking in the fields available in the API response: https://cloud.google.com/compute/docs/reference/rest/v1/images/get
Maybe something unique there for Google's windows images? We should be able to access any of those fields through Fog.

And if I'm missing something - just let me know. I'm happy to help!

@Temikus
Copy link
Collaborator

Temikus commented Mar 27, 2020

@dvanbrug ^

( Forgot to @ you :) )

@dvanbrug
Copy link
Contributor Author

@Temikus, No worries. I too was pulled away and the current global situation provided a great time to dig back into this.

Generally new actions should be called in the action.rb only, e.g. somewhere after RunInstance in the initialization flow:
https://github.com/mitchellh/vagrant-google/blob/master/lib/vagrant-google/action.rb#L151

If I understand how action.rb works correctly, I think this may be a problem. Since RunInstance creates and boots the instance and then waits for env[:machine].communicate.ready?, this will never pass without first setting up the Windows password. Also, this password cannot be set until the machine has been created and booted.

That is why I was thinking it belonged on line 289.

Hm, if it's only google images that need to be reset, I think it's reasonable to hardcode a list of windows image projects (e.g. windows-cloud) and default to a password reset behaviour for those + make an option to override it, if needed. You can also try spelunking in the fields available in the API response: https://cloud.google.com/compute/docs/reference/rest/v1/images/get
Maybe something unique there for Google's windows images? We should be able to access any of those fields through Fog.

I'll dig into those and see if I can find a suitable indicator.

@Temikus
Copy link
Collaborator

Temikus commented Mar 30, 2020

@dvanbrug Thanks for the explanation!

If I understand how action.rb works correctly, I think this may be a problem. Since RunInstance creates and boots the instance and then waits for env[:machine].communicate.ready?, this will never pass without first setting up the Windows password. Also, this password cannot be set until the machine has been created and booted.

That is why I was thinking it belonged on line 289.

Got ya. Let's proceed with that then and I'll later refactor this out. Probably going to make communication with the VM a separate action.

@Temikus
Copy link
Collaborator

Temikus commented Apr 20, 2020

@dvanbrug Just to clarify - I didn't forget about this - just figuring out the fog release. Will ping you here as soon as it's released. Planning to work on it sometime this week.

@dvanbrug
Copy link
Contributor Author

@Temikus No worries, I figured that was the case. It makes sense to get the fog-google change released first.

@Temikus
Copy link
Collaborator

Temikus commented Apr 21, 2020

@dvanbrug released: https://github.com/fog/fog-google/releases/tag/v1.10.0

Can you update the deps in the PR and let me know if this vagrantfile is still valid? https://gist.github.com/dvanbrug/f24572ac7eb340cda096273a640c9e99

Thank you for your patience and hard work ❤️

@dvanbrug
Copy link
Contributor Author

@Temikus I just updated the dependency and confirmed it worked with the published v1.10 of fog-google. I updated the original Vagrantfile within the Gist so it corresponds to the latest version. Please let me know if you run into any trouble.

@Temikus
Copy link
Collaborator

Temikus commented Apr 22, 2020

Alright, acceptance tests are green:

Finished in 25 minutes 14 seconds (files took 0.74341 seconds to load)
9 examples, 0 failures

Only left to check one other scenario and we should be good to go :)

Copy link
Collaborator

@Temikus Temikus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvanbrug Ok, so - I admit I spent 3-4 solid hours on this and didn't manage to get WinRM to connect 😞 After debugging for a while I think it's purely my problem (some org policy I suspect), but during the debugging I did uncover some issues that we ideally want to fix.

I'm sorry to lay even more changes on you m(_ _)m PTAL

In the meantime I'll take another stab at this on Monday.

# https://github.com/GoogleCloudPlatform/compute-image-windows/blob/master/examples/windows_auth_python_sample.py
# to enable WinRM with vagrant.

require "fog/google"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those requires should not be needed, can you take a look?

require "fog/google" 
require "googleauth"
require "google/apis/compute_v1"
require "vagrant/../../plugins/communicators/winrm/communicator"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the first 3 when I refactored to use fog-google directly. For the others, things seem to work if I remove all of the requires. I would have thought I'd need at least the following 2 because I make use of those Classes directly.

require "log4r"
require "vagrant/../../plugins/communicators/winrm/communicator"

At this point I've removed them all.

env[:ui].info("Setting up WinRM Password")
env[:action_runner].run(Action.action_setup_winrm_password, env)
end

unless env[:terminated]
Copy link
Collaborator

@Temikus Temikus Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while debugging I noticed that winrm options are not really set:

ls @machine.winrm_info
NoMethodError: undefined method `winrm_info' for #<Vagrant::Machine: winserver (VagrantPlugins::Google::Provider)>
from /Users/temikus/.rbenv/versions/2.4.3/lib/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/core_extensions.rb:17:in `__pry__'

I suspect it may be because we never set up the winrm capability in the provider, like Azure did:

https://github.com/Azure/vagrant-azure/blob/97f4d125aebcb5669e9d1278b702e13554bef640/lib/vagrant-azure/capabilities/winrm.rb

I'm raising this since this may cause issues in the future as Vagrant has some autodetect logic which this may break.

Can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally had all the features from the Azure plugin included, like this capability file but refactored it out when it didn't seem to do anything. I am confused as to what the difference is between including the details in the config (e.g. config.winrm.*) and the need for setting up env[:machine_winrm_info] via this capability.

The WinRM communicator does not appear to need this winrm_info, using the details from the Vagrantfile config instead. However, I do see that there is a machine_ssh_info, so it would make sense to mimic that with WinRM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more digging, I've found that if I add in a read_winrm_info functionality, I am actually breaking things. I've found that in the following file, it has a default way of computing winrm_info if it is not explicitly defined. That must be why it works without adding in this functionality.

https://github.com/hashicorp/vagrant/blob/2c4a40fccbcd1869c7ea457d6f251c67c30eda71/plugins/communicators/winrm/helper.rb#L9

Would this be sufficient? Else I can explicitly add in this functionality, though I am still troubleshooting it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've been more clear here, sorry m(_ _)m

So, ideally we would want the provider to explicitly set that guest capability to make sure we don't rely on heuristics. However, if it's too difficult I'm also open leaving that as an explicit TODO in the code and creating an issue for the future.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another shot at implementing this and see if I can get it working correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and use the default heuristics.

@dvanbrug
Copy link
Contributor Author

@Temikus I started digging into this and noticed that things are responding much slower and setting up the password does not even complete. I'll spend some time this weekend debugging to see if I can figure out what is going on. If I had to guess, I'm thinking there are some timing issues with Windows being set up and restarting a few times during setup.

@dvanbrug
Copy link
Contributor Author

dvanbrug commented Apr 25, 2020

@Temikus So I think I've narrowed down the issues I was having to the latest windows-2016 server image (`windows-server-2016-dc-v20200421). From my initial debugging, it looks like WinRM is not set up the same. Also, these images seem to have longer startup times which causes the password reset to time out.

If you use Windows Server 2019 google.image_family = "windows-2019" or a previous Windows 2016 image google.image = "windows-server-2016-dc-v20200414", things seem to work just fine.

Also, if you are still having trouble, I made a note in the Vagrantfile that you'll need to create a new firewall rule that allows both TCP and UDP connections to 5985 and 5986 ports and associate it with the target tag winrm.

From there, I found this site useful for WinRM troubleshooting (http://www.hurryupandwait.io/blog/understanding-and-troubleshooting-winrm-connection-and-authentication-a-thrill-seekers-guide-to-adventure). Notably, the following command will at least tell you if you have connectivity to the WinRM service as needed. 0 is success, 1 is failure. This is the command that fails on the latest version of windows-2016 images.

$ nc -z -w1 <IP or host name> 5986;echo $?

@dvanbrug
Copy link
Contributor Author

dvanbrug commented Apr 27, 2020

@Temikus, I've addressed the two changes above. Let me know if you run into further issues getting WinRM working. I am hoping my previous comment can help a bit.

Thanks again for all your help!

@Temikus
Copy link
Collaborator

Temikus commented Apr 27, 2020

@dvanbrug - those comments help a lot, lemme try with a different image.

And no - thank you for your efforts and patience :)

@Temikus
Copy link
Collaborator

Temikus commented Apr 27, 2020

Ok, so now I am finally getting a connection and the port does open, but it gives me a WinRMAuthorisation Error:

==> winserver: Changing password from temporary to winrm password
==> winserver: Waiting for Communicator to become available...
An authorization error occurred while connecting to WinRM.

User: temikus
Endpoint: https://35.222.5.11:5986/wsman
Message: WinRM::WinRMAuthorizationError

Is there anything specific I should be doing?

I've set the account user/pass:

  config.winrm.username = 'temikus'
  config.winrm.password = 'somepassword'

vagrant winrm -s cmd just exits.

@dvanbrug
Copy link
Contributor Author

@Temikus I have seen this error when the WinRM password is not correct. This can happen if the password reset doesn't succeed and then a WinRM command is issued. I have not seen it during the password reset process though. But from your output, it would look like the "Changing password from temporary ... " part passed successfully.

One way to confirm this is what happened is, if the instance did not terminate, edit your Vagrantfile to set the password to the temp one that was printed to stdout. Then try running the test below.

My go to test for WinRM is to run the dir command as follows:

$ vagrant winrm -c "dir C:\\"

WinRM does not allow for an interactive shell. The -s command allows you to set what shell will interpret the command that you provide if I understand correctly.

I haven't been able to reproduce this during the first boot up but I'll keep trying to see if I can.

@dvanbrug
Copy link
Contributor Author

@Temikus Ok, so I still haven't been able to reproduce this on my. Does the image terminate or are you able to run the vagrant winrm command from my previous comment after changing the password in the Vagrantfile? Also:

  • What version of the winrm plugin and vagrant are you using? I've been using vagrant-2.2.6 and winrm-2.3.2.

  • What Windows image are you using? I've been using windows-server-2016-dc-v20200414.

@Temikus
Copy link
Collaborator

Temikus commented Apr 28, 2020

@dvanbrug - I'm using the vagrant from the dev bundle, i.e. bundle exec vagrant up

Regarding image - I used the "Windows 2019" family but lemme retry with 2016.

@Temikus
Copy link
Collaborator

Temikus commented Apr 28, 2020

Ok, I see. The password doesn't get changed, I am able to send winrm functions if I set the password to a temporary one:

λ bundle exec vagrant winrm -c "dir C:\\"

    Directory: C:\


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----        4/16/2020   8:07 PM                PerfLogs
d-r---        4/16/2020   8:13 PM                Program Files
d-----        4/16/2020   8:13 PM                Program Files (x86)
d-r---        4/28/2020   4:06 AM                Users
d-----        4/28/2020   4:03 AM                Windows

Continuing to debug.

@Temikus
Copy link
Collaborator

Temikus commented Apr 28, 2020

Something's not working when actually changing the password, i.e.:

          winrmcomm.test(cmd, opts)

It does return "true" but maybe doesn't stick properly? I'm not sure 🤔

I'll try debugging a bit more this week. If I don't nail it - let's just merge and leave it as an issue which I will dig into a bit more, since obviously this does work for you and I don't want to block you on this.

@dvanbrug
Copy link
Contributor Author

dvanbrug commented May 2, 2020

@Temikus So I just started digging back into this and I am concerned about how this doesn't work on a lot of different images. I'll try and iron things out this weekend and see what I can do. The problems I have is that same call just hanging:

winrmcomm.test(cmd, opts)

I am thinking I should add in some more logic to at least provide more in depth information if WinRM does not respond as expected.

@dvanbrug
Copy link
Contributor Author

dvanbrug commented May 4, 2020

@Temikus So it turns out my issues were due to a problematic firewall rule. I've tested the code out against the following images and it works on all but two.

  • windows-2012-r2-core
  • windows-2016
  • windows-2019
  • windows-1809-core
  • windows-1903-core: Hangs on Changing Temporary Password
  • windows-1909-core: Hangs on Changing Temporary Password

Also, in trying to reproduce your error, I spun up a new Ruby environment on a vanilla Centos 7 image on Google Compute. However, I could not reproduce the error there either. The only errors I've had is that the "Changing Temporary Password" hangs because either 1.) The firewall is blocking connections, or 2.) The WinRM service is not running on port 5986 due to problems with certificates not being setup correctly.

Since you are able to run WinRM commands through Vagrant, are you able to set the password manually?

vagrant winrm -c "net user <username> <password>"

The only other thought I have is trying to debug the winrm plugin. All the testing I did was with the changed plugin installed directly instead of using bundle exec vagrant up. I just noticed that rbenv is installing winrm-2.3.4 instead of the winrm-2.3.2 that I have installed on my system. I'll retry my tests all using bundle and see if I can reproduce the error. Spot checking a few images and it seemed to work the same.

@Temikus
Copy link
Collaborator

Temikus commented May 6, 2020

@dvanbrug nah, vagrant winrm doesn't work for me

If you cannot repro I'm happy to go with your lead here and merge and then if you run into it we can dig around a bit more.

@Temikus
Copy link
Collaborator

Temikus commented May 11, 2020

@dvanbrug just circling back here - WDYT? :)

@dvanbrug
Copy link
Contributor Author

@Temikus Sorry for the delay. Let's go ahead and merge it for now and troubleshoot it as we go.

@Temikus
Copy link
Collaborator

Temikus commented May 13, 2020

@dvanbrug Alright, merging this in. Thanks for your collosal efforts, they're super appreciated ❤️

@Temikus Temikus merged commit 07689b3 into mitchellh:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for deploying Windows instances
3 participants