Skip to content

Performance

Michael O'Brien edited this page Jan 2, 2025 · 7 revisions

Performance

Experiments

Collatz | Hailstone numbers | 3n+1 problem

option 3: Java 8 lambda/streams parallelization

see https://github.com/ObrienlabsDev/performance/issues/19

public void searchCollatzParallel(long oddSearchCurrent, long secondsStart) {
	long batchBits = 5; // adjust this based on the chip architecture 
	long searchBits = 32;
	long batches = 1 << batchBits;
	long threadBits = searchBits - batchBits;
	long threads = 1 << threadBits;
		
	for (long part = 0; part < (batches + 1) ; part++) {	
	    // generate a limited collection for the search space - 32 is a good
		System.out.println("Searching: " + searchBits + " space, batch " + part + " of " 
			+ batches + " with " + threadBits +" bits of " + threads + " threads"  );
			
		List<Long> oddNumbers = LongStream
					.range(1L + (part * threads), ((1 + part) * threads) - 1)
					.filter(x -> x % 2 != 0) // TODO: find a way to avoid this filter using range above
					.boxed()
					.collect(Collectors.toList());
			
		List<Long> results = oddNumbers
				.parallelStream()
				.filter(num -> isCollatzMax(num.longValue(), secondsStart))
				.collect(Collectors.toList());

		results.stream().sorted().forEach(x -> System.out.println(x));
	}
	System.out.println("last number: " + ((1 + (batches) * threads) - 1));
}

20241212: Observation

  • Curiously - running VMs are around 10-25% faster than running native (edit - may be differences on OpenJDK and commercial JDK 21)
  • 13900KS is still faster than the M4 for single core
  • M4 Max is more than double the throughput than the 32 thread 13900/13900
  • M4 Max 40 core GPU is around half the speed of a comparable NVidia RTX-3500 Ada generation mobile card - both of which have 5120 cores
  • https://github.com/obrienlabs/benchmark/issues/12
Clone this wiki locally