-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing regular ALS #19
Comments
Whats the loss function you're trying to optimize here? If you're going for X[u] = (Pui - XuYi)^2 I think thats correct - but there are substantially faster ways of doing this Since A = YtY + regI is the same for all users, you break the solving stage up and calculate the cholesky decomposition out of the loop, and then just use that to solve for different 'b' values. The code for this would look something like YtY = Y.dot(Y.T) + reg * numpy.eye(Y.shape[0])
U, err = scipy.linalg.lapack.dpotrf(YtY)
for u in range(users):
b = np.zeros(factors)
for i, preference in nonzeros(Pui, u):
b += Y[i] * preference
X[u] = scipy.linalg.lapack.dpotrs(U, b)[0] Note that I haven't tested this out - but I think the basic idea should work |
Thank you, yes the loss function is X[u] = (Pui - XuYi)^2 |
Would it be possible for you to email me the matrix causing the issue with the CG code? I'm interested in seeing if I can fix |
Cu matrix is not too big:
Here is the configuration for matrix factorization:
After changing iterations to 5, or changing mf_n_factors to 5 |
Thanks! will dig in on the weekend and hopefully fix |
#19 (comment) There was an issue with certain small matrices used as inputs causing NaN values to appear in the CG optimizer. The reason seems to be a divide by zero when rsold approached zero. Added a check to early exit in that case since the optimization has succeeded at that point.
Turned out to be a simple fix , that last commit should resolve this issue. Thanks for letting me know! |
Thanks ! I was suspecting division by zero too. |
Hi Ben, It feels like the current code is still suffering this issue same: when I tried to run the 'basic usage' on a few small 100*200 random 0-1 matrixes, the output scores are always nan. Could you try to look into that again? Thanks. |
With synthetic data and a large regularization parameter, the CG ALS model would converge so that some users/items factors had 0 vectors for solutions. The CG update would fail in this case setting all the factors to NaN. (#106). Fix by detecting when this would occur and aborting. A previous check handled the case inside the loop: #19 (comment)
With synthetic data and a large regularization parameter, the CG ALS model would converge so that some users/items factors had 0 vectors for solutions. The CG update would fail in this case setting all the factors to NaN. (#106). Fix by detecting when this would occur and aborting. A previous check handled the case inside the loop: #19 (comment)
With synthetic data and a large regularization parameter, the CG ALS model would converge so that some users/items factors had 0 vectors for solutions. The CG update would fail in this case setting all the factors to NaN. (#106). Fix by detecting when this would occur and aborting. A previous check handled the case inside the loop: #19 (comment), this handles the case where rsold = 0 entering the loop.
With synthetic data and a large regularization parameter, the CG ALS model would converge so that some users/items factors had 0 vectors for solutions. The CG update would fail in this case setting all the factors to NaN. (#106). Fix by detecting when this would occur and aborting. A previous check handled the case inside the loop: #19 (comment), this handles the case where rsold = 0 entering the loop.
Hi Ben,
I am trying to modify your code to work with regular ALS matrix factorization algorithm (for sparce matrices)
This code seems working for now.
However could you please take a look and verify correctness of the proposed changes ?
The text was updated successfully, but these errors were encountered: