Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange leak of open file handles in Java using BlobDB #13066

Closed
vmv890 opened this issue Oct 14, 2024 · 3 comments
Closed

Strange leak of open file handles in Java using BlobDB #13066

vmv890 opened this issue Oct 14, 2024 · 3 comments
Assignees

Comments

@vmv890
Copy link

vmv890 commented Oct 14, 2024

I am noticing a strange issue with open file handles for deleted blob files in java using rocksdbjni. I do not see this issue in 8.11.x but I am seeing this issue in 9.5.x and 9.6.x. Running lsof shows growing number to deleted blob files. I do not create those files, so I am not sure how to properly close them or if something in my API is supposed to close/clear them? Is this expected in the 9.x series vs 8.x ?

Tested in JDK 17 and 21

Expected behavior

0 file handles for deleted blob files (or at least not growing)

Actual behavior

Run lsof -p <PID> | grep deleted | wc -l to see open file handles grow in 9.5.x and 9.6.x but not 8.11.x

-- lsof showing handles to deleted files (not on disk) --
java ... /data/testLeakingFileHandles/000106.blob (deleted)
java ... /data/testLeakingFileHandles/000088.blob (deleted)
java ... /data/testLeakingFileHandles/000101.blob (deleted)
java ... /data/testLeakingFileHandles/000095.blob (deleted)

Steps to reproduce the behavior

package test;

import org.rocksdb.Options;
import org.rocksdb.RocksDB;
import org.rocksdb.RocksDBException;

import java.lang.management.ManagementFactory;
import java.util.Random;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class RocksDeletedFileHandleTest {

    static {
        RocksDB.loadLibrary();
    }

    public static void main(String[] argv) {
        var random = new Random();
        var dbPath = "/tmp/testLeakingFileHandles"; // <-- Use whatever directory works on your system with enough space

        var dbQueryThread = new ThreadPoolExecutor(1, 1, 100L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<>(10));
        try (var opts = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(0).setBlobFileSize(1024 * 1024); // 1mb
             var db = RocksDB.open(opts, dbPath)) {

            System.out.println("----------- Running Test - PID: " + ManagementFactory.getRuntimeMXBean().getName().replace("@", " Host: "));

            // Run `lsof -p <PID> | grep deleted | wc -l` to see open file handles grow in 9.6.1 but not 8.11.4

            // -- Run queries in background thread
            dbQueryThread.submit(() -> {
                while (true) {
                    try {
                        String randomKeyToQuery = "key." + random.nextInt(1_000_000);
                        db.get(randomKeyToQuery.getBytes());
                        Thread.sleep(100);
                    } catch (Exception e) {
                        System.out.println("----------- Exiting due to Error: " + e.getMessage());
                        return;
                    }
                }
            });

            // -- Insert Data in a loop
            for (int loop = 0; loop < 1_000_000; loop++) {
                long start = System.currentTimeMillis();
                for (int k = 0; k <= 1_000_000; k++) {
                    db.put(("key." + k).getBytes(), ("value." + k).getBytes());
                }
                System.out.println("----------- Inserted 1M keys in " + ((System.currentTimeMillis() - start) / 1000) + " seconds");
            }

            dbQueryThread.shutdown();

        } catch (RocksDBException ex) {
            ex.printStackTrace();
        }
    }
}

@vmv890 vmv890 changed the title Strange leak of open file handled in Java using BlobDB Strange leak of open file handles in Java using BlobDB Oct 14, 2024
@alanpaxton
Copy link
Contributor

Hi @vmv890 - thanks for the report. I don't think this should be expected. I bisected it, and it looks like it was introduced in 9.4.0 , by the commit b34cef5

@pdillinger I presume this is not intended behaviour of the change. Do you think increasing uncache_aggressiveness would mitigate it ? We could think about adding it to the Java API..

@ltamasi
Copy link
Contributor

ltamasi commented Oct 15, 2024

The immediate cause of the issue is most likely the change to VersionSet::AddObsoleteBlobFile. Blob files do live in the same file cache (confusingly still called TableCache) as SST files because they are subject to the combined max_open_files limit. Cc @pdillinger

EDIT: Or rather, TableCache and BlobFileCache use the same cache under the hood.

@pdillinger pdillinger self-assigned this Oct 29, 2024
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Oct 31, 2024
Summary: ...

Fixes facebook#13066

Important follow-up (FIXME): The added check discovered some apparent
cases of leaked (into table_cache) SST file readers that would stick
around until DB::Close(). Need to enable that check, diagnose, and fix.

Test Plan: added a check that is called during DB::Close in ASAN builds
(to minimize paying the cost in all unit tests). Without the fix, the
check failed in at least these tests:

```
db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge
db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase
db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache
db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection
```
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Oct 31, 2024
Summary: ...

Fixes facebook#13066

Important follow-up (FIXME): The added check discovered some apparent
cases of leaked (into table_cache) SST file readers that would stick
around until DB::Close(). Need to enable that check, diagnose, and fix.

Test Plan: added a check that is called during DB::Close in ASAN builds
(to minimize paying the cost in all unit tests). Without the fix, the
check failed in at least these tests:

```
db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge
db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase
db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache
db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection
```
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Nov 1, 2024
Summary:
An earlier change (facebook@b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement).

Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache.

Fixes facebook#13066

Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix.

Pull Request resolved: facebook#13106

Test Plan:
added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests:

```
db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge
db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase
db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache
db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection
```

Reviewed By: ltamasi

Differential Revision: D65296123

Pulled By: pdillinger

fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
pdillinger added a commit to pdillinger/rocksdb that referenced this issue Nov 1, 2024
Summary:
An earlier change (facebook@b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement).

Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache.

Fixes facebook#13066

Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix.

Pull Request resolved: facebook#13106

Test Plan:
added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests:

```
db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge
db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase
db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache
db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection
```

Reviewed By: ltamasi

Differential Revision: D65296123

Pulled By: pdillinger

fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
pdillinger added a commit that referenced this issue Nov 1, 2024
Summary:
An earlier change (b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement).

Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache.

Fixes #13066

Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix.

Pull Request resolved: #13106

Test Plan:
added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests:

```
db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge
db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase
db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache
db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter
db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection
```

Reviewed By: ltamasi

Differential Revision: D65296123

Pulled By: pdillinger

fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
@pdillinger
Copy link
Contributor

Fixed in v9.8.1, v9.7.4, v9.6.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants