Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete tmp- segment directories on server startup #7961

Merged
merged 15 commits into from
Jan 11, 2022
Merged

delete tmp- segment directories on server startup #7961

merged 15 commits into from
Jan 11, 2022

Conversation

jadami10
Copy link
Contributor

@jadami10 jadami10 commented Dec 27, 2021

Description

Fixes #7954. This adds code at the very beginning of server startup to delete all tmp- directories created before the application started. I initially implemented this as a Periodic task, but that seemed unnecessary since we really just want to run this once at startup.

Given the many java file libraries, I added unit tests to ensure this only deletes directories, directories starting with tmp-, and directories last modified before the given start time. I also deployed this on a QA cluster and watched it log and delete 2 tmp- directories.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2021

Codecov Report

Merging #7961 (9f79c18) into master (99762cb) will decrease coverage by 0.11%.
The diff coverage is 85.71%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7961      +/-   ##
============================================
- Coverage     71.34%   71.23%   -0.12%     
- Complexity     4191     4214      +23     
============================================
  Files          1595     1595              
  Lines         82568    82725     +157     
  Branches      12320    12343      +23     
============================================
+ Hits          58906    58927      +21     
- Misses        19678    19808     +130     
- Partials       3984     3990       +6     
Flag Coverage Δ
integration1 28.87% <85.71%> (-0.17%) ⬇️
integration2 27.66% <85.71%> (+0.06%) ⬆️
unittests1 68.14% <71.42%> (-0.09%) ⬇️
unittests2 14.27% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../pinot/core/data/manager/BaseTableDataManager.java 84.77% <83.33%> (-0.13%) ⬇️
...ata/manager/realtime/RealtimeTableDataManager.java 67.88% <100.00%> (ø)
...data/manager/realtime/DefaultSegmentCommitter.java 0.00% <0.00%> (-80.00%) ⬇️
...readers/forward/BaseChunkSVForwardIndexReader.java 46.15% <0.00%> (-46.95%) ⬇️
...a/manager/realtime/RealtimeSegmentDataManager.java 50.00% <0.00%> (-25.00%) ⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java 43.56% <0.00%> (-18.82%) ⬇️
...data/manager/realtime/SegmentCommitterFactory.java 88.23% <0.00%> (-11.77%) ⬇️
...inot/client/JsonAsyncHttpPinotClientTransport.java 49.31% <0.00%> (-8.75%) ⬇️
...altime/ServerSegmentCompletionProtocolHandler.java 51.42% <0.00%> (-6.67%) ⬇️
...gregation/function/StUnionAggregationFunction.java 73.52% <0.00%> (-2.95%) ⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99762cb...9f79c18. Read the comment docs.

Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be multiple servers running on the same machine (e.g. running integration test that spins up multiple servers). Will the cleanup from the 2nd server wipe out the tmp- directory of the 1st server?

@jadami10
Copy link
Contributor Author

jadami10 commented Jan 3, 2022

There could be multiple servers running on the same machine (e.g. running integration test that spins up multiple servers). Will the cleanup from the 2nd server wipe out the tmp- directory of the 1st server?

That's a fair point. The integration tests passed which makes me think they're ok. But if someone is actually running it this way in production, this would be bad for them. The directories themselves have no tie to the server, so maybe the only option is to provide a longer threshold. Either delete for than N hours ago or offer a configuration for it?

@jackjlli
Copy link
Member

jackjlli commented Jan 3, 2022

There could be multiple servers running on the same machine (e.g. running integration test that spins up multiple servers). Will the cleanup from the 2nd server wipe out the tmp- directory of the 1st server?

That's a fair point. The integration tests passed which makes me think they're ok. But if someone is actually running it this way in production, this would be bad for them. The directories themselves have no tie to the server, so maybe the only option is to provide a longer threshold. Either delete for than N hours ago or offer a configuration for it?

Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: delete for more than N hours. We shouldn't see multiple servers running on the same machine in produciton.

@jadami10
Copy link
Contributor Author

jadami10 commented Jan 4, 2022

Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: delete for more than N hours. We shouldn't see multiple servers running on the same machine in produciton.

I went with 3 hours for the cutoff. I don't think in our use we've ever seen anything that old that still needed to stay around

@mayankshriv
Copy link
Contributor

mayankshriv commented Jan 4, 2022

Thanks @jadami10 for the PR, much appreciated. Wondering what the root cause is, as in why the files are not cleaned up. Looking at the issue linked, the temp file generated was recently added: #7319 (cc @klsince), or not sure if it existed before too.

My worry is that explicitly cleaning up based on tmp- prefix might have some side effects later on (I realize you have put in some guards there). Let me also look at why the files are not being deleted.

@mayankshriv
Copy link
Contributor

It seems that the problem may be attributed to jvm shutdown before temp files are deleted:

// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is

The changes in this PR are in line with what the TODO suggests.

return;
}
IOFileFilter beforeStartTimeFilter = new AgeFileFilter(startTime, true);
IOFileFilter tmpPrefixFilter = new PrefixFileFilter("tmp-");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix could be an argument to this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could, but that felt like a better and easy addition for a future PR that requires it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this delete data from table whose name has tmp- prefix? because table segments are put under dir like <dataDir>/<tableNameWithType>/... as initialized here:

defaultConfig.addProperty(TABLE_DATA_MANAGER_DATA_DIRECTORY,
        instanceDataManagerConfig.getInstanceDataDir() + "/" + tableNameWithType);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, these are the side effects that would be good to minimize. Perhaps it can be a config too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. I gave it a try in my test env and found I could create tables with 'tmp-' prefix. Things like pushing segment to it and querying it worked as usual. In the server's dataDir, there was folder for the table like below. So it may be risky to use "tmp-" to clean up dirs.

...PinotServerDataDir0/tmp-baseballStats_OFFLINE/tmp-baseballStats_OFFLINE_0/v3/
columns.psf          creation.meta        index_map            metadata.properties

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! If that's the case, can we revisit the logic of creating the tmp- directory inside the data directory? Can we move it anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may extract tmp dir creation and cleanup logic to a util class to manage its naming convention and location in one place, in case later on, folks use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic less effective.

This is all fair feedback. I thought about this and figured it was unlikely people were creating tmp- tables, but let's not leave it to chance. It seems we have 2 options:

  1. encode the tmp segment name with something that cannot be used for real table names. The only thing disallowed in code is double underscore: // Double underscore is reserved for real-time segment name delimiter
  2. put tmp segments in a different directory

2 seems like the better option given that's how most uses of tmp things work in typical software. I believe our issue does stem from BaseTableDataManager, so how about we add a new getTmpSegmentDataDir function that puts things in File(_indexDir, "tmp", segmentName);? At that point, can we just clear that directory every time the process comes up regardless of age?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposals. I prefer to option 2 as well.

There are a few places creating tmp- dir in pinot/core/data/manager module. Your util method may be used to unify all those cases. Right now, their dir suffix are unique, which (luckily) could help one figure out where deletion fails. So for the new util method, I'd suggest to allow caller to customize some part of the dir name like its suffix to help debugging.

pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:    File tempRootDir = getSegmentDataDir("tmp-" + segmentName + "-" + UUID.randomUUID());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/HLRealtimeSegmentDataManager.java:          File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + System.currentTimeMillis());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:      File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + _segmentNameStr + "-" + now());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:    File tempSegmentDir = new File(_indexDir, "tmp-" + segmentName + "." + System.currentTimeMillis());

if (_serverConf.getProperty(Server.CONFIG_OF_STARTUP_ENABLE_TEMP_CLEANUP,
Server.DEFAULT_STARTUP_ENABLE_TEMP_CLEANUP)) {
File dataDir = new File(_serverConf.getProperty(CommonConstants.Server.CONFIG_OF_INSTANCE_DATA_DIR));
// We use 3 hours as the cutoff time as a general heuristic for when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookback window can be encoded in the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: delete for more than N hours. We shouldn't see multiple servers running on the same machine in produciton.

@jackjlli preferred it this way, and I'm inclined to agree. There's already a large number of configs, and ideally we remove this config in the future and make this default behavior on server startup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to add more details on the purpose of introducing this threshold in the comment, like what we discuss in this PR.

@@ -350,6 +350,10 @@ private CommonConstants() {
public static final String CONFIG_OF_SHUTDOWN_RESOURCE_CHECK_INTERVAL_MS =
"pinot.server.shutdown.resourceCheckIntervalMs";
public static final long DEFAULT_SHUTDOWN_RESOURCE_CHECK_INTERVAL_MS = 10_000L;
public static final String CONFIG_OF_STARTUP_ENABLE_TEMP_CLEANUP =
"pinot.server.startup.enablePeriodicTasks";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...enablePeriodicTasks seems not for tmp cleanup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! this is leftover from when I implemented this as a periodic task rather than running once on startup

@klsince
Copy link
Contributor

klsince commented Jan 4, 2022

Thanks for the fix, @jadami10!

As to the concern of running multiple Server instances on a single machine, the interference may be avoided if each Server instance creates tmp- dir under its own dataDir, and this is already done for all tmp- dir created in the pinot/core/data/manager module today.

We may extract tmp dir creation and cleanup logic to a util class to manage its naming convention and location in one place, in case later on, folks use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic less effective.

Copy link
Contributor Author

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of what the best test plan is here. I don't know if the unit/integration tests reliably test restarting the server and ensuring things are still working.

Locally I ran the realtime quickstart and created a table that flushes segments on every row then used fswatch to ensure things were happening correctly. You can see the directory being updated, renamed, and removed.

fswatch -x /var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/venue_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_city.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lon.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/mtime.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/rsvp_count.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_id.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_country.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_id.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/location.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_time.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lat.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/metadata.properties Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/venue_name.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_city.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_name.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lon.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/mtime.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/rsvp_count.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_id.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_country.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_id.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_name.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/location.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_time.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lat.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987 Created IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/metadata.properties Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/index_map Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/columns.psf Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183 OwnerModified Created IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/v3 IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_time.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_country.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/venue_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lon.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lat.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/mtime.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/rsvp_count.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/rsvp_count.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_city.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/location.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/location.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/venue_name.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_id.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_city.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_name.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/metadata.properties IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lat.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_id.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_id.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_id.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_name.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_country.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_time.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/mtime.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lon.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/v3/creation.meta Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241 IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104 Created IsDir Removed

@@ -165,7 +165,7 @@ public HLRealtimeSegmentDataManager(final SegmentZKMetadata segmentZKMetadata, f
_invertedIndexColumns);

_segmentEndTimeThreshold = _start + _streamConfig.getFlushThresholdTimeMillis();
_resourceTmpDir = new File(resourceDataDir, "_tmp");
_resourceTmpDir = new File(resourceDataDir, "tmp");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there wasn't really a good wait to access the TableDataManager from the SegmentDataManager, so this is hardcoded for now

@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this approach we

  1. move the tmp files into their own tmp folder
  2. create the tmp folder nested under the table's data dir

@klsince, how do you feel about this approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mayankshriv wdyt?

this lgtm. perhaps use "_tmp" here so that no need to touch the HLR.../LLR.. subclasses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i'm going to switch the HL and LL segment managers back to _tmp and leave them as is for now to limit the scope of this PR.

  1. I believe our issues are coming from TableDataManager, so this should cover us for now.
  2. I have some concerns with the TableDataManager deleting the SegmentDataManager's temporary resources. I feel like we should implement this same logic in the SegmentDataManager or implement some sort of interface with "registerTemporaryResource", "cleanTemporaryResources", etc.

Either way, I think it will be best to limit this change to TableDataManager for now.

As for _tmp vs tmp. I feel like tmp is better since that's the typical name you expect for temporary directories in software. But happy to change it if you feel _tmp works better here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. tmp is fine.

perhaps leave a comment that this cleanup logic handles tmp resource from TableDataManager specifically, and tmp resources left by others may be cleaned as well if they happen to be put at tmp folder under table data dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String segmentName, IndexLoadingConfig
*/
private void untarAndMoveSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, File segmentTarFile)
throws IOException {
// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep this comment (removing the TODO:) for the context, and add how we cleanup the tmp dirs now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mayankshriv wdyt?

this lgtm. perhaps use "_tmp" here so that no need to touch the HLR.../LLR.. subclasses

@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String segmentName, IndexLoadingConfig
*/
private void untarAndMoveSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, File segmentTarFile)
throws IOException {
// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i'm going to switch the HL and LL segment managers back to _tmp and leave them as is for now to limit the scope of this PR.

  1. I believe our issues are coming from TableDataManager, so this should cover us for now.
  2. I have some concerns with the TableDataManager deleting the SegmentDataManager's temporary resources. I feel like we should implement this same logic in the SegmentDataManager or implement some sort of interface with "registerTemporaryResource", "cleanTemporaryResources", etc.

Either way, I think it will be best to limit this change to TableDataManager for now.

As for _tmp vs tmp. I feel like tmp is better since that's the typical name you expect for temporary directories in software. But happy to change it if you feel _tmp works better here.

@@ -102,6 +102,10 @@ public void testConvertToRawIndexTask()
File[] indexDirs = tableDataDir.listFiles();
Assert.assertNotNull(indexDirs);
for (File indexDir : indexDirs) {
// Skip the tmp directory since these are only temporary segments
if (indexDir.getName().equals("tmp")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the only thing looking at the tableDataDir directly like this. But it does worry me that there was at least 1 assumption that every directory in tableDataDir is a segment directory

@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs.
_resourceTmpDir.delete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use FileUtils.deleteQuietly, also is the exists check needed? I take it that the exists() check is needed in case the delete failed to delete existing directory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on deleting the directory quietly. We don't want an exception thrown during the startup of the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done. the exists check really shouldn't be needed here, but I figure it can't hurt since we're using Preconditions afterwards

@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs.
_resourceTmpDir.delete();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on deleting the directory quietly. We don't want an exception thrown during the startup of the server.

// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is
// deleted. Consider cleaning up all temporary directories when starting the server.
File tempSegmentDir = new File(_indexDir, "tmp-" + segmentName + "." + System.currentTimeMillis());
// This could leave temporary directories in _indexDir if JVM shuts down before the temp directory is deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really see a JVM gets shut down, I think it'd be good to consider changing JVM configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you have in mind here? I'm not sure what jvm config would make these code paths keep going and eventually delete

Copy link
Contributor

@klsince klsince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

Copy link
Contributor Author

@jadami10 jadami10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you all for the reviews!

@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs.
_resourceTmpDir.delete();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done. the exists check really shouldn't be needed here, but I figure it can't hurt since we're using Preconditions afterwards

@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
_resourceTmpDir = new File(_indexDir, "tmp");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up unused tmp segment directories
6 participants