Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete tmp- segment directories on server startup #7961

Merged
merged 15 commits into from
Jan 11, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,13 @@
*/
package org.apache.pinot.server.starter.helix;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableList;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
Expand All @@ -29,8 +34,17 @@
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Supplier;
import java.util.stream.Stream;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.AgeFileFilter;
import org.apache.commons.io.filefilter.AndFileFilter;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.IOFileFilter;
import org.apache.commons.io.filefilter.PrefixFileFilter;
import org.apache.commons.lang3.StringUtils;
import org.apache.helix.HelixAdmin;
import org.apache.helix.HelixManager;
Expand Down Expand Up @@ -363,6 +377,41 @@ private void startupServiceStatusCheck(long endTimeMs) {
ServiceStatus.getStatusDescription());
}

/**
* Recursively deletes all data directories starting with tmp- last modified before the startTime.
* @param startTime start time of the application
* @param dataDir data directory to start from
*/
@VisibleForTesting
public static void deleteTempFilesSinceCutoffTime(long startTime, @Nonnull File dataDir) {
if (!dataDir.exists() || !dataDir.isDirectory()) {
LOGGER.warn("Data directory {} does not exist or is not a directory", dataDir);
return;
}
IOFileFilter beforeStartTimeFilter = new AgeFileFilter(startTime, true);
IOFileFilter tmpPrefixFilter = new PrefixFileFilter("tmp-");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix could be an argument to this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could, but that felt like a better and easy addition for a future PR that requires it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this delete data from table whose name has tmp- prefix? because table segments are put under dir like <dataDir>/<tableNameWithType>/... as initialized here:

defaultConfig.addProperty(TABLE_DATA_MANAGER_DATA_DIRECTORY,
        instanceDataManagerConfig.getInstanceDataDir() + "/" + tableNameWithType);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, these are the side effects that would be good to minimize. Perhaps it can be a config too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. I gave it a try in my test env and found I could create tables with 'tmp-' prefix. Things like pushing segment to it and querying it worked as usual. In the server's dataDir, there was folder for the table like below. So it may be risky to use "tmp-" to clean up dirs.

...PinotServerDataDir0/tmp-baseballStats_OFFLINE/tmp-baseballStats_OFFLINE_0/v3/
columns.psf          creation.meta        index_map            metadata.properties

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! If that's the case, can we revisit the logic of creating the tmp- directory inside the data directory? Can we move it anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may extract tmp dir creation and cleanup logic to a util class to manage its naming convention and location in one place, in case later on, folks use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic less effective.

This is all fair feedback. I thought about this and figured it was unlikely people were creating tmp- tables, but let's not leave it to chance. It seems we have 2 options:

  1. encode the tmp segment name with something that cannot be used for real table names. The only thing disallowed in code is double underscore: // Double underscore is reserved for real-time segment name delimiter
  2. put tmp segments in a different directory

2 seems like the better option given that's how most uses of tmp things work in typical software. I believe our issue does stem from BaseTableDataManager, so how about we add a new getTmpSegmentDataDir function that puts things in File(_indexDir, "tmp", segmentName);? At that point, can we just clear that directory every time the process comes up regardless of age?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposals. I prefer to option 2 as well.

There are a few places creating tmp- dir in pinot/core/data/manager module. Your util method may be used to unify all those cases. Right now, their dir suffix are unique, which (luckily) could help one figure out where deletion fails. So for the new util method, I'd suggest to allow caller to customize some part of the dir name like its suffix to help debugging.

pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:    File tempRootDir = getSegmentDataDir("tmp-" + segmentName + "-" + UUID.randomUUID());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/HLRealtimeSegmentDataManager.java:          File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + System.currentTimeMillis());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:      File tempSegmentFolder = new File(_resourceTmpDir, "tmp-" + _segmentNameStr + "-" + now());
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:    File tempSegmentDir = new File(_indexDir, "tmp-" + segmentName + "." + System.currentTimeMillis());

List<IOFileFilter> tmpFilters = Arrays.asList(beforeStartTimeFilter, tmpPrefixFilter, DirectoryFileFilter.INSTANCE);
IOFileFilter oldTmpDirectoryFileFilter = new AndFileFilter(tmpFilters);

AtomicInteger numDeletedDirectories = new AtomicInteger();
try (Stream<Path> walk = Files.walk(dataDir.toPath())) {
walk.forEach(path -> {
if (oldTmpDirectoryFileFilter.accept(path.toFile())) {
try {
FileUtils.deleteDirectory(path.toFile());
LOGGER.info("Deleted temporary file: {}", path);
numDeletedDirectories.incrementAndGet();
} catch (IOException e) {
LOGGER.warn("Failed to delete temporary file: {}", path, e);
}
}
});
} catch (IOException e) {
LOGGER.error("Failed to delete old tmp directories", e);
}
LOGGER.info("Deleted {} old tmp directories", numDeletedDirectories);
}

@Override
public ServiceRole getServiceRole() {
return ServiceRole.SERVER;
Expand All @@ -374,6 +423,15 @@ public void start()
LOGGER.info("Starting Pinot server");
long startTimeMs = System.currentTimeMillis();

if (_serverConf.getProperty(Server.CONFIG_OF_STARTUP_ENABLE_TEMP_CLEANUP,
Server.DEFAULT_STARTUP_ENABLE_TEMP_CLEANUP)) {
File dataDir = new File(_serverConf.getProperty(CommonConstants.Server.CONFIG_OF_INSTANCE_DATA_DIR));
// We use 3 hours as the cutoff time as a general heuristic for when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookback window can be encoded in the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of introducing a config just for cleaning up temp directory but I'm fine with having a fixed threshold just as you mentioned: delete for more than N hours. We shouldn't see multiple servers running on the same machine in produciton.

@jackjlli preferred it this way, and I'm inclined to agree. There's already a large number of configs, and ideally we remove this config in the future and make this default behavior on server startup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to add more details on the purpose of introducing this threshold in the comment, like what we discuss in this PR.

// tmp directories should be deleted as they are definitely no longer being used.
long cutoffTimeMs = startTimeMs - Duration.ofHours(3).toMillis();
deleteTempFilesSinceCutoffTime(cutoffTimeMs, dataDir);
}

// install default SSL context if necessary (even if not force-enabled everywhere)
TlsConfig tlsDefaults = TlsUtils.extractTlsConfig(_serverConf, Server.SERVER_TLS_PREFIX);
if (StringUtils.isNotBlank(tlsDefaults.getKeyStorePath()) || StringUtils
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.pinot.server.starter;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.commons.io.FileUtils;
import org.apache.pinot.server.starter.helix.BaseServerStarter;
import org.testng.Assert;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;


public class TempSegmentCleanupTest {

private File _dataDir;

@BeforeMethod
public void createTempDir()
throws IOException {
_dataDir = Files.createTempDirectory("pinot_data").toFile();
}

@AfterMethod
public void deleteTempDir() {
try {
Files.delete(_dataDir.toPath());
} catch (IOException e) {
return;
}
}

private long getDataDirFileCount() {
return FileUtils.listFiles(_dataDir, null, true).stream().count();
}

@Test
public void worksWithNoDatadir()
throws IOException {
long currentTimestamp = System.currentTimeMillis();
deleteTempDir();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
}

@Test
public void worksWithEmptyDirectory()
throws IOException {
long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 0);
}

@Test
public void deletesTmpDirectories()
throws IOException {
for (int i = 0; i < 5; i++) {
Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "tmp-segment-" + i);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 15);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 0);
}

@Test
public void deletesNestedTmpDirectories()
throws IOException {
for (int i = 0; i < 5; i++) {
Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "nested_2", "tmp-segment-" + i);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 15);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 0);
}

@Test
public void doesNotDeleteTmpFiles()
throws IOException {
for (int i = 0; i < 5; i++) {
Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "segment-" + i);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "tmp-star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "tmp-columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "tmp-creation.meta").toFile().createNewFile());

Path nestedSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "segment-" + i * 10);
Assert.assertTrue(Paths.get(nestedSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(nestedSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 30);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 30);
}

@Test
public void onlyDeletesDirectoriesAfterCutoffTime()
throws IOException {

long currentTimestamp = System.currentTimeMillis();
for (int i = 0; i < 5; i++) {
Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "tmp-segment-" + i);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
segmentDir.toFile().setLastModified(currentTimestamp - 1000);
}
Assert.assertEquals(getDataDirFileCount(), 15);

for (int i = 0; i < 5; i++) {
Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "tmp-segment2-" + i);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
segmentDir.toFile().setLastModified(currentTimestamp + 1000);
}
Assert.assertEquals(getDataDirFileCount(), 30);

BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 15);
}

@Test
public void deletesTmpDirectoriesAndKeepsNonTmpDirectories()
throws IOException {
for (int i = 0; i < 5; i++) {
Path tmpSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "tmp-segment-" + i);
Assert.assertTrue(Paths.get(tmpSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "segment-" + i * 10);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 30);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 15);
}

@Test
public void deletesNestedTmpDirectoriesAndKeepsNonTmpDirectories()
throws IOException {
for (int i = 0; i < 5; i++) {
Path tmpSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "nested_2", "tmp-segment-" + i);
Assert.assertTrue(Paths.get(tmpSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "nested_2", "segment-" + i * 10);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 30);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 15);
}

@Test
public void deletesMixedNestedTmpDirectoriesAndKeepsNonTmpDirectories()
throws IOException {
for (int i = 0; i < 5; i++) {

Path tmpSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "tmp-segment-" + i);
Assert.assertTrue(Paths.get(tmpSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(tmpSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path nestedOnceTmpSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "tmp-segment-" + i);
Assert.assertTrue(Paths.get(nestedOnceTmpSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(nestedOnceTmpSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedOnceTmpSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedOnceTmpSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path nestedTwiceTmpSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "nested_2", "tmp-segment-" + i);
Assert.assertTrue(Paths.get(nestedTwiceTmpSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(nestedTwiceTmpSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedTwiceTmpSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedTwiceTmpSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path segmentDir = Paths.get(_dataDir.getAbsolutePath(), "segment-" + i * 10);
Assert.assertTrue(Paths.get(segmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(segmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path nestedOnceSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "segment-" + i * 10);
Assert.assertTrue(Paths.get(nestedOnceSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(nestedOnceSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedOnceSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedOnceSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());

Path nestedTwiceSegmentDir = Paths.get(_dataDir.getAbsolutePath(), "nested_1", "nested_2", "segment-" + i * 10);
Assert.assertTrue(Paths.get(nestedTwiceSegmentDir.toString(), "v3").toFile().mkdirs());
Assert.assertTrue(
Paths.get(nestedTwiceSegmentDir.toString(), "v3", "star_tree_index").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedTwiceSegmentDir.toString(), "v3", "columns.psf").toFile().createNewFile());
Assert.assertTrue(
Paths.get(nestedTwiceSegmentDir.toString(), "v3", "creation.meta").toFile().createNewFile());
}
Assert.assertEquals(getDataDirFileCount(), 90);

long currentTimestamp = System.currentTimeMillis();
BaseServerStarter.deleteTempFilesSinceCutoffTime(currentTimestamp, _dataDir);
Assert.assertEquals(getDataDirFileCount(), 45);
}
}
Loading