-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copying multiple files to watchedFolder causes app to grab zero byte files #1214
Comments
A good callout and bug |
Hi @Frooodle, i think this is a good issue that can assign someone like me who wish to contribute to open source 😃 My initial idea of solving this issue is to update let me if there is comment for the solution 😆 |
@kkdlau hows this going? |
Here is an example. Haven't tested it. Also not an java expert. But could be leading into right direction. PipelineDirectoryProcessor.java // [...]
import java.util.concurrent.TimeUnit;
// [...]
public class PipelineDirectoryProcessor {
// [...]
private static final long STABILITY_CHECK_DELAY = 1000; // 1 second
private static final long STABILITY_CHECK_COUNT = 5; // Check 5 times
private File[] collectFilesForProcessing(Path dir, Path jsonFile, PipelineOperation operation) throws IOException {
try (Stream<Path> paths = Files.list(dir)) {
if ("automated".equals(operation.getParameters().get("fileInput"))) {
return paths.filter(path -> !Files.isDirectory(path) && !path.equals(jsonFile) && isFileStable(path))
.map(Path::toFile)
.toArray(File[]::new);
} else {
String fileInput = (String) operation.getParameters().get("fileInput");
return new File[] { new File(fileInput) };
}
}
}
private boolean isFileStable(Path path) throws IOException {
long initialSize = Files.size(path);
for (int i = 0; i < STABILITY_CHECK_COUNT; i++) {
try {
TimeUnit.MILLISECONDS.sleep(STABILITY_CHECK_DELAY);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Thread interrupted during stability check", e);
}
long newSize = Files.size(path);
if (initialSize != newSize) {
return false;
}
}
return initialSize > 0; // Also ensuring the file is not zero bytes
}
// [...]
}
// [...] |
Hi, was busy with my full-time work 😞 Will create a PR tnt (APAC time)👍🏻 |
Thanks for the idea 👍🏻 |
…ctory does not exist
#1214 Only take pdf that are good for processing
I copied around 200 PDFs into the watchedFolder, and realized there were more than 350 PDFs in the processing folder which I found weird. Then I saw many of the PDFs are "duplicated" and some of them have "zero bytes" size.
As I suspected the app was starting the process before the files were completely copied over.
I confirmed this by copying only 20 PDFs in the watchedFolder - same behavior.
Wish there was a way to tell the app to wait a bit before processing the file. Similar to the variable PAPERLESS_CONSUMER_INOTIFY_DELAY in paperless-ngx.
The only workaround I found so far, is to stop the container, copy over the files, and then start the container again.
The text was updated successfully, but these errors were encountered: