Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of DB queries during execution of a single API call #33289

Closed
starypatyk opened this issue Jul 18, 2022 · 17 comments
Closed

Large number of DB queries during execution of a single API call #33289

starypatyk opened this issue Jul 18, 2022 · 17 comments
Assignees
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap performance 🚀

Comments

@starypatyk
Copy link
Contributor

When diagnosing slow loading of chat messages in Nextcloud Talk application I have noticed that in my environment (NC 24.0.3, MySQL database), during execution of a single API call (retrieve chat messages), NC sends over 2000 DB queries.

I use the following call (same as the Nextcloud Talk Android app):

https://<server>/ocs/v2.php/apps/spreed/api/v1/chat/<room-token>?lastKnownMessageId=<msgid>&includeLastKnown=1&limit=100&setReadMarker=1&timeout=0&lookIntoFuture=0

Most of the queries (over 2100 from about 2500) are:

INSERT INTO `oc_mounts` (`storage_id`,`root_id`,`user_id`,`mount_point`,`mount_id`,`mount_provider_class`) SELECT ?,?,?,?,?,? FROM `oc_mounts` WHERE `root_id` = ? AND `user_id` = ? HAVING COUNT(*) = 0

There are about 1300 records in the x_mounts table. Majority of them (about 1100) have NULL in the mount_provider_class column.

This causes code in the registerMounts method of the UserMountCache class to ignore the existing cache entries and try to re-add them - see addToCache call.

However, the correct records are not added to the x_mounts table, since records with the same root_id and user_id already exist (the ones with NULL in the mount_provider_class column). And the same situation occurs on each subsequent API call.

As I understand the mount_provider_class column has been added recently. However, it seems that all the records that existed earlier in the x_mounts table, get NULL in the new column after upgrade. This in turn leads to the observed behaviour.

I am not sure how the issue should be fixed.

  • Is it safe to delete all records from the x_mounts table? Will they be recreated properly by the registerMounts code?
  • If yes to above - should the upgrade delete the records after adding the column?
@starypatyk starypatyk added 0. Needs triage Pending check for reproducibility or if it fits our roadmap performance 🚀 labels Jul 18, 2022
@szaimen szaimen transferred this issue from nextcloud/server Jul 18, 2022
@nickvergessen nickvergessen transferred this issue from nextcloud/spreed Jul 20, 2022
@nickvergessen
Copy link
Member

Moving back to server.
@szaimen this is outside of Talk's stack.

@nickvergessen
Copy link
Member

This should also all be improved since #31265
But as you run 24.0.3 there seems to still be an issue somewhere

@szaimen
Copy link
Contributor

szaimen commented Jul 20, 2022

@szaimen this is outside of Talk's stack.

all right! sorry for that. Looked like a specific issue with the talk app to me

@starypatyk
Copy link
Contributor Author

I suspect that the oc_mounts table in my installation contains invalid entries after upgrading to 24. See below for a sample:

obraz

  • Entries for files that have been shared in Talk before upgrading to 24 have NULL instead of OCA\Files_Sharing\MountProvider in the mount_provider_class column.
  • A few entries at the bottom are related to files shared after upgrade to 24, and these seem OK.

@starypatyk
Copy link
Contributor Author

I went ahead and deleted all records from the oc_mounts table with NULL in the mount_provider_class column. Required records have been recreated automatically (as expected) and now all the records in the oc_mounts table have reasonable values in the mount_provider_class column.

This helped significantly. 😉

Now only two INSERT INTO... statements (down from over 2100) are executed and the total execution time of the API call was reduced from about 2.8 s to 1.4 s.

My preliminary conclusions are:

Any comments/suggestions?

@starypatyk
Copy link
Contributor Author

I have investigated the remaining INSERT INTO... statements. Turns out that the number of statement varies - between 1 and 5 (5 is the number of participants of the chat).

The code tries to update the entries for the home mounts - i.e. the ones with OC\Files\Mount\LocalHomeMountProvider in the mount_provider_class column.

obraz

Apparently it always tries to update the entry for the current user - so there is at least one INSERT INTO... statement. The other entries are sometimes updated - seemingly depending on timing.

I have tried to debug the UserMountCache::registerMounts method to find the root cause. Unfortunately this requires much deeper knowledge of the file system setup process than I currently have. 😞

I hope that information above would help someone else to find/fix the issue.

@starypatyk
Copy link
Contributor Author

I have been able to repeat the issue on a fresh Nextcloud installation (using Docker) in the following scenario.

  1. Create a new Nextcloud instance using version 23.0.7.
  2. Install Talk.
  3. Create an additional test account.
  4. Upload a few images from the test account and share them in a Talk chat with the admin user.
  5. Upgrade the instance to 24.0.3.

After these steps I got the following contents in the oc_mounts table:

MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
| id | storage_id | root_id | user_id | mount_point                                    | mount_id | mount_provider_class |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+
|  1 |          1 |       1 | admin   | /admin/                                        |     NULL | NULL                 |
|  2 |          3 |     125 | test01  | /test01/                                       |     NULL | NULL                 |
|  3 |          4 |     131 | test12  | /test12/                                       |     NULL | NULL                 |
|  4 |          3 |     381 | admin   | /admin/files/Talk/009-IMG_20170828_112601.jpg/ |     NULL | NULL                 |
|  5 |          3 |     357 | admin   | /admin/files/Talk/006-IMG_20170827_124444.jpg/ |     NULL | NULL                 |
|  6 |          3 |     372 | admin   | /admin/files/Talk/008-IMG_20170827_194823.jpg/ |     NULL | NULL                 |
|  7 |          3 |     511 | admin   | /admin/files/Talk/037-DSC05971.jpg/            |     NULL | NULL                 |
+----+------------+---------+---------+------------------------------------------------+----------+----------------------+

After deleting all records from the oc_mounts table these have been recreated automatically as expected on a next request (with correct values in the mount_provider_class column).

MariaDB [nextcloud]> select * from oc_mounts;
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
| id | storage_id | root_id | user_id | mount_point                                    | mount_id | mount_provider_class                  |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+
|  8 |          1 |       1 | admin   | /admin/                                        |     NULL | OC\Files\Mount\LocalHomeMountProvider |
|  9 |          3 |     381 | admin   | /admin/files/Talk/009-IMG_20170828_112601.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 10 |          3 |     357 | admin   | /admin/files/Talk/006-IMG_20170827_124444.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 11 |          3 |     372 | admin   | /admin/files/Talk/008-IMG_20170827_194823.jpg/ |     NULL | OCA\Files_Sharing\MountProvider       |
| 12 |          3 |     511 | admin   | /admin/files/Talk/037-DSC05971.jpg/            |     NULL | OCA\Files_Sharing\MountProvider       |
| 13 |          3 |     125 | test01  | /test01/                                       |     NULL | OC\Files\Mount\LocalHomeMountProvider |
+----+------------+---------+---------+------------------------------------------------+----------+---------------------------------------+

Apparently potential gains from #30978 cannot be fully realized due to this issue. 😞

@PVince81
Copy link
Member

@icewind1991

@icewind1991
Copy link
Member

#33540 should fix pre-24 mountpoints not having their mount_provider updated.

@starypatyk
Copy link
Contributor Author

Thanks @icewind1991! I have checked #33540 in my dev environment - works as expected. 😄

@starypatyk
Copy link
Contributor Author

As #33540 is now merged, I have moved the issue to Done in the Performance project and closing it now,

@meonkeys
Copy link

meonkeys commented Nov 13, 2022

@starypatyk I found this issue after searching and debugging a bunch because I...

  1. experienced seemingly random slow HTTP responses
  2. watched the mariadb processlist closely and
  3. noticed tons of inserts into oc_mounts

I have 7474 rows in my oc_mounts table and 1,708 of these rows have NULL values for mount_provider_class. I'm running the Nextcloud v24.0.6-apache Docker image.

  1. Should I go ahead and DELETE FROM 'oc_mounts'? Or is there some perhaps safer / higher-level way to fix this, maybe occ scan or some button I should click in the web UI instead?
  2. Does anyone want more info for debugging? I'm happy to wait to do this brute-force fix if it would be helpful.

@PVince81
Copy link
Member

@icewind1991 is it safe to clear the oc_mounts table in NC 24 ?

@meonkeys
Copy link

meonkeys commented Nov 14, 2022

I updated to v24.0.7 I did not delete any rows in oc_mounts. Everything feels faster. I don't seem to be having any more slow HTTP responses. 🎉

select count(*) from oc_mounts where mount_provider_class IS NULL now returns 30.

Related thread: https://help.nextcloud.com/t/rogue-orphaned-mount-points-in-oc-mounts/118920/4

@PVince81
Copy link
Member

as far as I remember, the ones with "NULL" are supposed to auto-populate during access

@starypatyk
Copy link
Contributor Author

@meonkeys - Indeed the fix mentioned here (#33540) has been backported to 24 (#34797) and released in 24.0.7.

@meonkeys
Copy link

Right on, thanks.

  1. Is it normal to have 7k rows in oc_mounts?
  2. If not, is safe/recommended to truncate rows?

I'm asking because I am still noticing some slow responses. I've also been watching the processlist in MariaDB and I see lots of DML against the oc_mounts table during these slow responses. Mostly INSERTs, DELETEs. Maybe some UPDATEs, too. Is this normal? Seems like a lot of churn on that table. I don't see a clear pattern as to which requests trigger this churn.

The count of rows in my oc_mounts dips below 7k for a couple seconds, then goes back up just above 7k. It stops churning after about 10 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap performance 🚀
Projects
None yet
Development

No branches or pull requests

6 participants