Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project_usermap creation script (INF-1060) #19

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

williamnswanson
Copy link
Collaborator

Uses LDAP for determining group membership.

Uses LDAP to determine active users, and only adds a user to the map if they are active (from SOFTWARE-5413)

Still uses the COmanage API to find out which CO groups are projects, but now caches them so if the map needs to be recreated in a short amount of time, it doesn't need to make so many API calls (cache currently stales after 0.5 hours, but changing it should only take changing a global)

Uses LDAP search to find active and provisioned groups then compares them to the list project groups in COmanage to determine which LDAP groups and users to build the map out of.
And clean up of no-longer-used methods

remove collections import
Writes the osggids to project name dict to a file containing the epoch time the cache was made. If the file is found to exist and was made in the past 0.5 hours (by default), the program will read from the cache instead of making COmanage API calls. Otherwise, the program will get the project data from the API and overwrite any existing cache.
@williamnswanson williamnswanson self-assigned this Nov 27, 2023
@williamnswanson williamnswanson marked this pull request as draft November 28, 2023 21:56
@williamnswanson
Copy link
Collaborator Author

williamnswanson commented Nov 28, 2023

From talking with Brian L, projects might not need to be provisioned in LDAP, so assuming that LDAP has all the groups that are projects is an error. The project-usermap script should instead use the COmanage API to find the members of project groups (based on the group member entry's Member field being true)
We are moving to a paradigm of "All ospool projects should be provisioned in LDAP", and while the use of LDAP to find the provisioned projects is nessisary for this, we should wait to merge this pr until all ospool projects are ready in LDAP.

Copy link
Member

@brianhlin brianhlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamnswanson this is a solid first attempt but I'm really interested in having you going back to the drawing board using the ldap3 library because I think it will simplify a lot. You've got a lot of pounds of code here to parse the shell output and I think as a result, it looks like you've got more data structures than you actually need. With the ldap3 library, I think you can have two data sources:

  1. LDAP: All active users that are part of the relevant AP login group. I'd expect the data struct that you get back to include every group that the user is a member of
  2. COManage API: All project groups. It looks like you already have this, more or less, but I reserve the right to pick nits in the future :)

Then project map construction should just be as simple as iterating through the users and taking the intersection of their groups + (2) above

import getopt
import collections
import subprocess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think subprocess is necessary here as you can use existing ldap libraries (see https://github.com/opensciencegrid/topology/blob/master/src/webapp/ldap_data.py#L145-L179).

I'd say generally, you want to use first class libraries instead of subprocess callouts as the latter are often pretty brittle. Part of that is that first class libraries will give you Python-native data structures so that you don't have to write hard-to-debug, easy-to-break regular expressions for parsing output.

Comment on lines 23 to 25
LDAP_AUTH_COMMAND = [
"awk", "/ldap_default_authtok/ {print $3}", "/etc/sssd/conf.d/0060_domain_CILOGON.ORG.conf",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes a few things that I don't think are necessarily safe to assume:

  1. The user running this script can read this file
  2. That this file exists

Instead, you should have an option that allows the caller to reference a file containing the password. I prefer that over options specifying the password directly since you don't want the password showing up in plain text in the process tree.

Comment on lines 50 to 51
"|", "grep", "voPersonApplicationUID",
"|", "sort",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do need to call out to subprocess, you certainly don't need pipes. You'll have better output processing tools for you in native Python!

Also, move get_projects_to_setup() functionality into several different methods for clarity.
Created a file to hold methods that are, or will likely be, used by multiple scripts in the osg-comanage-rpoject-usermap repo. Refactored existing scripts to use the new library.
Projects that needed an OSGGID identifier weren't being provisioned if a UNIX cluster group already existed. Added identifier updating after adding missing identifiers and moved adding identifiers to before checking what projects need UNIX cluster groups and provisioning.
Uses LDAP search to find active and provisioned groups then compares them to the list project groups in COmanage to determine which LDAP groups and users to build the map out of.
And clean up of no-longer-used methods

remove collections import
Writes the osggids to project name dict to a file containing the epoch time the cache was made. If the file is found to exist and was made in the past 0.5 hours (by default), the program will read from the cache instead of making COmanage API calls. Otherwise, the program will get the project data from the API and overwrite any existing cache.
@williamnswanson williamnswanson force-pushed the INF-1060.member-removals branch 2 times, most recently from af0d2eb to a753c07 Compare December 29, 2023 16:51
@williamnswanson williamnswanson force-pushed the INF-1060.member-removals branch from a753c07 to 3d78c61 Compare December 29, 2023 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants