Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotless in a long-running multi-threaded JVM #1555

Closed
blacelle opened this issue Feb 5, 2023 · 6 comments
Closed

Spotless in a long-running multi-threaded JVM #1555

blacelle opened this issue Feb 5, 2023 · 6 comments

Comments

@blacelle
Copy link
Contributor

blacelle commented Feb 5, 2023

Here, I consider Spotless use through CleanThat

First, we try to process files in parallel. It lead to issues with some FormatterFunc which are not thread-safe. Typical example:

  • EclipseJdtFormatterStep: in the end, it relies on a single org.eclipse.jdt.core.formatter.CodeFormatter which is not thread-safe.

Then, we considered instantiating Spotless formatters on a per-thread basis. However, it lead to Metaspace issues (java.lang.OutOfMemoryError: Metaspace). This is quite normal as we end duplicating each Class (e.g. the whole Eclipse JDT eco-system) for each working thread.


While investigating this Metaspace issue, we spot that Eclipse starts a bunch of Thread (for each ClassLoader), hence remaining in RAM, hence maintaining a reference to the whole JarState.

Typical stacks:

1:

InternalWorker(Thread).<init>(ThreadGroup, Runnable, String, long, AccessControlContext, boolean) line: 396	
InternalWorker(Thread).<init>(ThreadGroup, Runnable, String, long) line: 708	
InternalWorker(Thread).<init>(String) line: 541	
InternalWorker.<init>(JobManager) line: 31	
JobManager.<init>() line: 297	
JobManager.getInstance() line: 223	
InternalJob.<clinit>() line: 71	
WorkManager.<init>(Workspace) line: 95	
Workspace.startup(IProgressMonitor) line: 2462	
Workspace.open(IProgressMonitor) line: 2231	
ResourcesPlugin.start(BundleContext) line: 475	
BundleController.addBundle(int, BundleActivator, Function<Bundle,BundleException>) line: 116	
SpotlessEclipseFramework.addPlugin(int, BundleActivator) line: 276	
SpotlessEclipseFramework.setup(SpotlessEclipseConfig) line: 245	
EclipseJdtFormatterStepImpl.<init>(Properties) line: 40	

2:

Worker(Thread).<init>(ThreadGroup, Runnable, String, long, AccessControlContext, boolean) line: 396	
Worker(Thread).<init>(ThreadGroup, Runnable, String, long) line: 708	
Worker(Thread).<init>(String) line: 541	
Worker.<init>(WorkerPool) line: 34	
WorkerPool.jobQueued() line: 155	
JobManager.schedule(InternalJob, long, boolean) line: 1320	
StringPoolJob(InternalJob).schedule(long) line: 385	
StringPoolJob(Job).schedule(long) line: 684	
StringPoolJob.addStringPoolParticipant(IStringPoolParticipant, ISchedulingRule) line: 67	
Workspace.open(IProgressMonitor) line: 2245	
ResourcesPlugin.start(BundleContext) line: 475	
BundleController.addBundle(int, BundleActivator, Function<Bundle,BundleException>) line: 116	
SpotlessEclipseFramework.addPlugin(int, BundleActivator) line: 276	
SpotlessEclipseFramework.setup(SpotlessEclipseConfig) line: 245	
EclipseJdtFormatterStepImpl.<init>(Properties) line: 40	

3:

Worker(Thread).<init>(ThreadGroup, Runnable, String, long, AccessControlContext, boolean) line: 396	
Worker(Thread).<init>(ThreadGroup, Runnable, String, long) line: 708	
Worker(Thread).<init>(String) line: 541	
Worker.<init>(WorkerPool) line: 34	
WorkerPool.jobQueued() line: 155	
WorkerPool.startJob(Worker) line: 269	
Worker.run() line: 58	

We considered calling FormatterStepImpl.Standard -> FormatterFunc.Closeable (e.g. to call org.eclipse.core.internal.jobs.JobManager.shutdown()). However, it seems not implemented in the EclipseJdt case.

We understand it could be fixed through:

  1. com.diffplug.spotless.extra.java.EclipseJdtFormatterStep#getFormatterFunc to add the FormatterFunc.Closeable facet.
  2. com.diffplug.spotless.Jvm.Support#suggestLaterVersionOnError to transfer the optional FormatterFunc.Closeable facet.

Is our approach correct? Do you have recommendations to rely on Spotless on a multithread+long-running environnement?

We may rather work in making each formatter thread-safe (which may not be achievable/quite difficult with some formatters (e.g. Prettier ?)) ?
Should we rather consider being able to share a JarState through multiple instance of the same formatterFunction ?

Thanks

@nedtwigg
Copy link
Member

nedtwigg commented Feb 5, 2023

TL;DR: I won't accept a thread-safety requirement for all of Spotless. I'm happy to merge thread-safety fixes and tests to keep them fixed for any subset of our formatters. In the long run, my advice would be to handle non-threadsafe formatters, but either way could work.

work in making each formatter thread-safe

We could put a thread-safe requirement on any given FormatterStep, but unless we ran threading stress tests on every formatter in our unit test suite, they would not actually stay thread-safe. That requirement would also raise the difficulty level for PR authors looking to add new formatters.

I am fine with making all/a-subset-of existing formatters thread safe, and I am fine with adding tests which keep them thread safe.

I do not want a contributor adding a new step to have thread-safety as a requirement. The build plugins parallelize work in other ways, such that we can get good performance without requiring each step to be thread safe. So the burden for keeping things thread-safe will always fall on the consumers who need that feature. But once you add a thread-safe-stress-test for a given step, the rest of the project can keep it running.

EclipseJdtFormatterStep#getFormatterFunc to add the FormatterFunc.Closeable facet.

We should absolutely do this, but it is blocked on #1524.

recommendations to rely on Spotless on a multithread+long-running environnement

I would recommend some kind of batching system, e.g.

class FormatterBatcher {
  fun clean(Formatter formatter, List<Path> paths) {
    for (int i = 0; i < numProcessors; ++i) {
      launchClean(formatter, paths.subList(...)
    }
  }
}

and maybe restart the whole thing every N minutes to deal with the JDT metaspace issue.

@blacelle
Copy link
Contributor Author

blacelle commented Feb 6, 2023

I won't accept a thread-safety requirement for all of Spotless.

Totally fine with me. The a priori lack of thread-safety was more a mere observation, ground for the rest of the analysis.

My initial strategy was to build a set of Formatters per Thread, to execute Spotless in a seemingly mono-thread fashion. However, it led to Metaspace issues. Hence the suggestion to be able to share a JarState through multiple Formatters (but, as described later, this is generally irrelevant given SpotlessCache and ClassLoaders caching).

I would recommend some kind of batching system, e.g.

In my case, a single JVM would process different stylesheets/repositories. Hence, I (supposedly) needs multiple Formatters.

maybe restart the whole thing every N minutes to deal with the JDT metaspace issue.

This would be the last resort.

The build plugins parallelize work in other ways, such that we can get good performance without requiring each step to be thread safe.

That's a bit optimistic regarding some build-systems. e.g., running Spotless on a multi-module projetcs would leads at the end to multiple Eclipse threads (one per module). Spotless maven-plugin loads a JarState per module, but each JarState returns the same ClassLoader with the help of caching (through SpotlessCache). I now wonder why the cache does not kicks in in my case. This may be my root issue (at first look, it may relates to some tmp stylesheet not always being written in the same place).


EclipseJdtFormatterStep#getFormatterFunc to add the FormatterFunc.Closeable facet.

We should absolutely do this, but it is blocked on #1524.

I had a look, but I do not get how it relates with closing Eclipse resources. I'm fine waiting #1524 to be completed before considering enabling proper closing of Eclipse resources.

@blacelle
Copy link
Contributor Author

blacelle commented Feb 6, 2023

Digging into this, I feel more and more confident this is a cache matter. The SpotlessCache relies on a State key, to re-use or not a CacheLoader. In my case, it does not kicks in as I consider an in-memory content, which is persisted in a temporary File. On a given Formatter, even though the file content is identical, the State is considered different as com.diffplug.spotless.FileSignature (e.g. in com.diffplug.spotless.extra.EclipseBasedStepBuilder.State#settingsFiles) relies on both the File name and a signature of the content (in this case: the file content hence the signature are identical, but the (tmp) File names differs).

The equivalent behavior can be observed with plugin-maven in case of a stylesheet not available as a File (e.g. but available as a Resource from a dependency): see com.diffplug.spotless.maven.FileLocator#locateFile

@blacelle
Copy link
Contributor Author

blacelle commented Feb 6, 2023

The behavior is drastically improved in my case with a cache Content -> TemporaryFile

	protected File locateFile(String stylesheetFile) throws IOException {
		Resource resource = CleanthatUrlLoader.loadUrl(codeProvider, stylesheetFile);

		if (resource.isFile()) {
			return resource.getFile();
		}

		byte[] content = ByteStreams.toByteArray(resource.getInputStream());

		File locatedFile;
		try {
			locatedFile = CONTENT_TO_FILE.get(Base64.getEncoder().encodeToString(content), () -> {
				Path tmpFileAsPath = Files.createTempFile("cleanthat-spotless-eclipse-", ".xml");

				Files.copy(resource.getInputStream(), tmpFileAsPath, StandardCopyOption.REPLACE_EXISTING);
				File tmpFile = tmpFileAsPath.toFile();
				tmpFile.deleteOnExit();

				return tmpFile;
			});
		} catch (ExecutionException e) {
			throw new RuntimeException("Issue provisioning tmpFile for " + resource.getFilename(), e);
		}

		return locatedFile;
	}

Would this be relevant to be introduced in com.diffplug.spotless.maven.FileLocator#locateFile or would this be considered early optimization? (I would rather not do it in maven-plugin as I feel nobody encountered such issue. It may impacts a multiModule maven projects with many modules, if it loads its stylesheet from a dependency).

@nedtwigg
Copy link
Member

nedtwigg commented Feb 6, 2023

I am always in favor of content-addressing, PR for that is welcome!

@blacelle
Copy link
Contributor Author

The main issue is fixed, and is confirmed as due to cache behavior (based on File, instead of content).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants