Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocis server startup error #3900

Closed
phil-davis opened this issue Jun 1, 2022 · 4 comments · Fixed by #3933
Closed

ocis server startup error #3900

phil-davis opened this issue Jun 1, 2022 · 4 comments · Fixed by #3933
Assignees
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Type:Bug

Comments

@phil-davis
Copy link
Contributor

phil-davis commented Jun 1, 2022

Describe the bug

https://drone.owncloud.com/owncloud/ocis/12206/39/3

+ ocis/bin/ocis server
{"level":"error","service":"proxy","error":"{\"id\":\"go.micro.client\",\"code\":500,\"detail\":\"service com.owncloud.api.settings: not found\",\"status\":\"Internal Server Error\"}","time":"2022-06-01T03:55:43Z","message":"Could not load roles"}

Is this a "normal" startup error message?

https://drone.owncloud.com/owncloud/ocis/12206/39/6

    Given user "Alice" has been created with default attributes and without skeleton files # FeatureContext::userHasBeenCreatedWithDefaultAttributesAndWithoutSkeletonFiles()
      GraphContext::throwHttpException
      Could not create user Alice
      HTTP status code: 401
      Error code: accessDenied
      Message: Unauthorized (Exception)

"something happens" (tm) sometimes on ocis startup, and admin cannot create users. So the whole CI pipeline obviously gets lots of unexpected failures.

Steps to reproduce

Run lots of CI pipelines and see one fail.

Expected behavior

ocis starts cleanly every time.

Actual behavior

ocis sometimes has an error on startup

Setup

current ocis master branch in CI

@micbar
Copy link
Contributor

micbar commented Jun 1, 2022

I had this problem locally when i was running out of file descriptors. But that should not be the case in CI.

could not load roles is a critical message during startup and brings ocis down.

@micbar micbar added the Priority:p2-high Escalation, on top of current planning, release blocker label Jun 1, 2022
@micbar
Copy link
Contributor

micbar commented Jun 1, 2022

Setting it to p2 because startup errors like these should not happen.

@rhafer rhafer self-assigned this Jun 2, 2022
@rhafer
Copy link
Contributor

rhafer commented Jun 7, 2022

+ ocis/bin/ocis server
{"level":"error","service":"proxy","error":"{\"id\":\"go.micro.client\",\"code\":500,\"detail\":\"service com.owncloud.api.settings: not found\",\"status\":\"Internal Server Error\"}","time":"2022-06-01T03:55:43Z","message":"Could not load roles"}

Is this a "normal" startup error message?

I wouldn't say normal, but it can happen when there a request issued against ocis before all services are fully up. I think this message is issue during the wait-for-ocis-server check (https://github.com/owncloud/ocis/blob/master/.drone.star#L1591). Unfortunately that check succeeds (as most of the other service are already up) and a failure to load the user's roles are not considered fatal here: https://github.com/owncloud/ocis/blob/master/extensions/proxy/pkg/user/backend/cs3.go#L79

It looks to me as if the settings service is just needing an awful long time to come up. Though I don't think that this is the root cause for the failing test afterwards. If it were, we'd see more error messages like the above in the logs (caused by the failing user creating during the tests). But as said this is just guessing for more detailed analysis we'd need more logging.

@rhafer
Copy link
Contributor

rhafer commented Jun 8, 2022

I think we found a possible race condition in the proxy that might be causing the kind of bug experienced here.

From GetUserByClaims() (https://github.com/owncloud/ocis/blob/master/extensions/proxy/pkg/user/backend/cs3.go#L52):

	if user.Id.Type != cs3.UserType_USER_TYPE_LIGHTWEIGHT {
		roleIDs, err = loadRolesIDs(ctx, user.Id.OpaqueId, c.settingsRoleService)
		if err != nil {
			c.logger.Error().Err(err).Msgf("Could not load roles")
		}
	}

If the settings service is not fully up at this point. roleIDs is empty. (And the error returned by loadRoleIDs is ignored. Now we're trying to add a default role to the user and if the settings service becomes available before doing that the code below will overwrite the role-assignment of the user. (If this was an admin user, it will loose the privilege to create users ...)

	// if roles are empty, assume we haven't seen the user before and assign a
	// default user role. At least until proper roles are provided. See
	// https://github.com/owncloud/ocis/v2/issues/1825 for more context.
	if len(roleIDs) == 0 {
		if user.Id.Type == cs3.UserType_USER_TYPE_PRIMARY {
			c.logger.Info().Str("userid", user.Id.OpaqueId).Msg("user has no role assigned, assigning default user role")
			_, err := c.settingsRoleService.AssignRoleToUser(ctx, &settingssvc.AssignRoleToUserRequest{
				AccountUuid: user.Id.OpaqueId,
				RoleId:      settingsService.BundleUUIDRoleUser,
			})
			if err != nil {
				c.logger.Warn().Err(err).Msg("Could not add default role")
			}
			roleIDs = append(roleIDs, settingsService.BundleUUIDRoleUser)
		}
	}

rhafer added a commit to rhafer/ocis that referenced this issue Jun 8, 2022
This reworks the assignment of the default role at login. The assignment
now only happens if settings service is reachable and the current user
does not have an assignment yet (we check for the NotFound status).
If the settings service returns an error other than 404, the
GetUserByClaims() (and with it the authentication) will also error out.

Closes: owncloud#3900
rhafer added a commit to rhafer/ocis that referenced this issue Jun 8, 2022
This reworks the assignment of the default role at login. The assignment
now only happens if settings service is reachable and the current user
does not have an assignment yet (we check for the NotFound status).
If the settings service returns an error other than 404, the
GetUserByClaims() (and with it the authentication) will also error out.

Closes: owncloud#3900
rhafer added a commit to rhafer/ocis that referenced this issue Jun 8, 2022
This reworks the assignment of the default role at login. The assignment
now only happens if settings service is reachable and the current user
does not have an assignment yet (we check for the NotFound status).
If the settings service returns an error other than 404, the
GetUserByClaims() (and with it the authentication) will also error out.

Closes: owncloud#3900
rhafer added a commit that referenced this issue Jun 9, 2022
This reworks the assignment of the default role at login. The assignment
now only happens if settings service is reachable and the current user
does not have an assignment yet (we check for the NotFound status).
If the settings service returns an error other than 404, the
GetUserByClaims() (and with it the authentication) will also error out.

Closes: #3900
ownclouders pushed a commit that referenced this issue Jun 9, 2022
Author: Ralf Haferkamp <[email protected]>
Date:   Wed Jun 8 14:16:04 2022 +0200

    Rework default role provisioning

    This reworks the assignment of the default role at login. The assignment
    now only happens if settings service is reachable and the current user
    does not have an assignment yet (we check for the NotFound status).
    If the settings service returns an error other than 404, the
    GetUserByClaims() (and with it the authentication) will also error out.

    Closes: #3900
@micbar micbar mentioned this issue Jun 22, 2022
40 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Type:Bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants