-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete tmp- segment directories on server startup #7961
delete tmp- segment directories on server startup #7961
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7961 +/- ##
============================================
- Coverage 71.34% 71.23% -0.12%
- Complexity 4191 4214 +23
============================================
Files 1595 1595
Lines 82568 82725 +157
Branches 12320 12343 +23
============================================
+ Hits 58906 58927 +21
- Misses 19678 19808 +130
- Partials 3984 3990 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be multiple servers running on the same machine (e.g. running integration test that spins up multiple servers). Will the cleanup from the 2nd server wipe out the tmp-
directory of the 1st server?
That's a fair point. The integration tests passed which makes me think they're ok. But if someone is actually running it this way in production, this would be bad for them. The directories themselves have no tie to the server, so maybe the only option is to provide a longer threshold. Either delete for than N hours ago or offer a configuration for it? |
Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: |
I went with 3 hours for the cutoff. I don't think in our use we've ever seen anything that old that still needed to stay around |
Thanks @jadami10 for the PR, much appreciated. Wondering what the root cause is, as in why the files are not cleaned up. Looking at the issue linked, the temp file generated was recently added: #7319 (cc @klsince), or not sure if it existed before too. My worry is that explicitly cleaning up based on |
It seems that the problem may be attributed to jvm shutdown before temp files are deleted: Line 468 in a5f3dc5
The changes in this PR are in line with what the TODO suggests. |
return; | ||
} | ||
IOFileFilter beforeStartTimeFilter = new AgeFileFilter(startTime, true); | ||
IOFileFilter tmpPrefixFilter = new PrefixFileFilter("tmp-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix could be an argument to this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could, but that felt like a better and easy addition for a future PR that requires it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this delete data from table whose name has tmp-
prefix? because table segments are put under dir like <dataDir>/<tableNameWithType>/...
as initialized here:
defaultConfig.addProperty(TABLE_DATA_MANAGER_DATA_DIRECTORY,
instanceDataManagerConfig.getInstanceDataDir() + "/" + tableNameWithType);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, these are the side effects that would be good to minimize. Perhaps it can be a config too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. I gave it a try in my test env and found I could create tables with 'tmp-' prefix. Things like pushing segment to it and querying it worked as usual. In the server's dataDir, there was folder for the table like below. So it may be risky to use "tmp-" to clean up dirs.
...PinotServerDataDir0/tmp-baseballStats_OFFLINE/tmp-baseballStats_OFFLINE_0/v3/
columns.psf creation.meta index_map metadata.properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out! If that's the case, can we revisit the logic of creating the tmp-
directory inside the data directory? Can we move it anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may extract tmp dir creation and cleanup logic to a util class to manage its naming convention and location in one place, in case later on, folks use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic less effective.
This is all fair feedback. I thought about this and figured it was unlikely people were creating tmp-
tables, but let's not leave it to chance. It seems we have 2 options:
- encode the tmp segment name with something that cannot be used for real table names. The only thing disallowed in code is double underscore:
// Double underscore is reserved for real-time segment name delimiter
- put tmp segments in a different directory
2 seems like the better option given that's how most uses of tmp things work in typical software. I believe our issue does stem from BaseTableDataManager
, so how about we add a new getTmpSegmentDataDir
function that puts things in File(_indexDir, "tmp", segmentName);
? At that point, can we just clear that directory every time the process comes up regardless of age?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposals. I prefer to option 2 as well.
There are a few places creating tmp- dir in pinot/core/data/manager module. Your util method may be used to unify all those cases. Right now, their dir suffix are unique, which (luckily) could help one figure out where deletion fails. So for the new util method, I'd suggest to allow caller to customize some part of the dir name like its suffix to help debugging.
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java: File tempRootDir = getSegmentDataDir("tmp-" + segmentName + "-" + UUID.randomUUID());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/HLRealtimeSegmentDataManager.java: File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + System.currentTimeMillis());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java: File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + _segmentNameStr + "-" + now());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java: File tempSegmentDir = new File(_indexDir, "tmp-" + segmentName + "." + System.currentTimeMillis());
if (_serverConf.getProperty(Server.CONFIG_OF_STARTUP_ENABLE_TEMP_CLEANUP, | ||
Server.DEFAULT_STARTUP_ENABLE_TEMP_CLEANUP)) { | ||
File dataDir = new File(_serverConf.getProperty(CommonConstants.Server.CONFIG_OF_INSTANCE_DATA_DIR)); | ||
// We use 3 hours as the cutoff time as a general heuristic for when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookback window can be encoded in the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: delete for more than N hours. We shouldn't see multiple servers running on the same machine in produciton.
@jackjlli preferred it this way, and I'm inclined to agree. There's already a large number of configs, and ideally we remove this config in the future and make this default behavior on server startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to add more details on the purpose of introducing this threshold in the comment, like what we discuss in this PR.
@@ -350,6 +350,10 @@ private CommonConstants() { | |||
public static final String CONFIG_OF_SHUTDOWN_RESOURCE_CHECK_INTERVAL_MS = | |||
"pinot.server.shutdown.resourceCheckIntervalMs"; | |||
public static final long DEFAULT_SHUTDOWN_RESOURCE_CHECK_INTERVAL_MS = 10_000L; | |||
public static final String CONFIG_OF_STARTUP_ENABLE_TEMP_CLEANUP = | |||
"pinot.server.startup.enablePeriodicTasks"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...enablePeriodicTasks
seems not for tmp cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! this is leftover from when I implemented this as a periodic task rather than running once on startup
Thanks for the fix, @jadami10! As to the concern of running multiple Server instances on a single machine, the interference may be avoided if each Server instance creates tmp- dir under its own We may extract tmp dir creation and cleanup logic to a util class to manage its naming convention and location in one place, in case later on, folks use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic less effective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of what the best test plan is here. I don't know if the unit/integration tests reliably test restarting the server and ensuring things are still working.
Locally I ran the realtime quickstart and created a table that flushes segments on every row then used fswatch
to ensure things were happening correctly. You can see the directory being updated, renamed, and removed.
fswatch -x /var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/venue_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_city.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lon.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/mtime.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/rsvp_count.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_id.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_country.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_id.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_name.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/location.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_time.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lat.dict Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/metadata.properties Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/venue_name.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_city.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_name.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lon.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/mtime.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/rsvp_count.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_id.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_country.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_id.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_name.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/location.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/event_time.sv.sorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987/group_lat.sv.unsorted.fwd Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/tmp-7b51ce50-d181-4555-a8c8-bdbf05984987 Created IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/metadata.properties Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/index_map Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183/columns.psf Created AttributeModified IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241.v3.tmp10475897190224634183 OwnerModified Created IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/v3 IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_time.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_country.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/venue_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lon.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lat.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/mtime.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/rsvp_count.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/rsvp_count.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_city.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/location.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/location.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/venue_name.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_id.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_city.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_name.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/metadata.properties IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lat.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_id.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_id.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_id.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_name.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_country.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_name.dict IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/event_time.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/mtime.sv.sorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/group_lon.sv.unsorted.fwd IsFile Removed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241/v3/creation.meta Created IsFile Updated
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104/meetupRsvp2_REALTIME_1641413427239_0__0__1641414187241 IsDir Renamed
/private/var/folders/gx/84dlvy8s1fj3z5m5fjb3mnmh0000gn/T/1641413124515/meetupRsvp/rawdata/PinotServerDataDir0/meetupRsvp2_REALTIME/tmp/tmp-1641414189104 Created IsDir Removed
@@ -165,7 +165,7 @@ public HLRealtimeSegmentDataManager(final SegmentZKMetadata segmentZKMetadata, f | |||
_invertedIndexColumns); | |||
|
|||
_segmentEndTimeThreshold = _start + _streamConfig.getFlushThresholdTimeMillis(); | |||
_resourceTmpDir = new File(resourceDataDir, "_tmp"); | |||
_resourceTmpDir = new File(resourceDataDir, "tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there wasn't really a good wait to access the TableDataManager from the SegmentDataManager, so this is hardcoded for now
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this approach we
- move the tmp files into their own tmp folder
- create the tmp folder nested under the table's data dir
@klsince, how do you feel about this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mayankshriv wdyt?
this lgtm. perhaps use "_tmp" here so that no need to touch the HLR.../LLR.. subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i'm going to switch the HL and LL segment managers back to _tmp
and leave them as is for now to limit the scope of this PR.
- I believe our issues are coming from TableDataManager, so this should cover us for now.
- I have some concerns with the TableDataManager deleting the SegmentDataManager's temporary resources. I feel like we should implement this same logic in the SegmentDataManager or implement some sort of interface with "registerTemporaryResource", "cleanTemporaryResources", etc.
Either way, I think it will be best to limit this change to TableDataManager for now.
As for _tmp
vs tmp
. I feel like tmp
is better since that's the typical name you expect for temporary directories in software. But happy to change it if you feel _tmp
works better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. tmp
is fine.
perhaps leave a comment that this cleanup logic handles tmp resource from TableDataManager specifically, and tmp resources left by others may be cleaned as well if they happen to be put at tmp folder under table data dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment
@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String segmentName, IndexLoadingConfig | |||
*/ | |||
private void untarAndMoveSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, File segmentTarFile) | |||
throws IOException { | |||
// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe keep this comment (removing the TODO:
) for the context, and add how we cleanup the tmp dirs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mayankshriv wdyt?
this lgtm. perhaps use "_tmp" here so that no need to touch the HLR.../LLR.. subclasses
@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String segmentName, IndexLoadingConfig | |||
*/ | |||
private void untarAndMoveSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, File segmentTarFile) | |||
throws IOException { | |||
// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i'm going to switch the HL and LL segment managers back to _tmp
and leave them as is for now to limit the scope of this PR.
- I believe our issues are coming from TableDataManager, so this should cover us for now.
- I have some concerns with the TableDataManager deleting the SegmentDataManager's temporary resources. I feel like we should implement this same logic in the SegmentDataManager or implement some sort of interface with "registerTemporaryResource", "cleanTemporaryResources", etc.
Either way, I think it will be best to limit this change to TableDataManager for now.
As for _tmp
vs tmp
. I feel like tmp
is better since that's the typical name you expect for temporary directories in software. But happy to change it if you feel _tmp
works better here.
@@ -102,6 +102,10 @@ public void testConvertToRawIndexTask() | |||
File[] indexDirs = tableDataDir.listFiles(); | |||
Assert.assertNotNull(indexDirs); | |||
for (File indexDir : indexDirs) { | |||
// Skip the tmp directory since these are only temporary segments | |||
if (indexDir.getName().equals("tmp")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the only thing looking at the tableDataDir
directly like this. But it does worry me that there was at least 1 assumption that every directory in tableDataDir
is a segment directory
@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); | |||
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs. | |||
_resourceTmpDir.delete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use FileUtils.deleteQuietly
, also is the exists
check needed? I take it that the exists()
check is needed in case the delete failed to delete existing directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on deleting the directory quietly. We don't want an exception thrown during the startup of the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, done. the exists check really shouldn't be needed here, but I figure it can't hurt since we're using Preconditions
afterwards
@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); | |||
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs. | |||
_resourceTmpDir.delete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on deleting the directory quietly. We don't want an exception thrown during the startup of the server.
// TODO: This could leave temporary directories in _indexDir if JVM shuts down before the temporary directory is | ||
// deleted. Consider cleaning up all temporary directories when starting the server. | ||
File tempSegmentDir = new File(_indexDir, "tmp-" + segmentName + "." + System.currentTimeMillis()); | ||
// This could leave temporary directories in _indexDir if JVM shuts down before the temp directory is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really see a JVM gets shut down, I think it'd be good to consider changing JVM configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you have in mind here? I'm not sure what jvm config would make these code paths keep going and eventually delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you all for the reviews!
@@ -102,6 +103,12 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); | |||
// Delete and recreate the tmp directory if it exists as it may contain old files from previous runs. | |||
_resourceTmpDir.delete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, done. the exists check really shouldn't be needed here, but I figure it can't hurt since we're using Preconditions
afterwards
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig tableDataManagerConfig, String instanceI | |||
if (!_indexDir.exists()) { | |||
Preconditions.checkState(_indexDir.mkdirs()); | |||
} | |||
_resourceTmpDir = new File(_indexDir, "tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making the changes!
Description
Fixes #7954. This adds code at the very beginning of server startup to delete all
tmp-
directories created before the application started. I initially implemented this as a Periodic task, but that seemed unnecessary since we really just want to run this once at startup.Given the many java file libraries, I added unit tests to ensure this only deletes directories, directories starting with
tmp-
, and directories last modified before the given start time. I also deployed this on a QA cluster and watched it log and delete 2tmp-
directories.Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation