Performance

performance testing via single/multithreaded, gpu code - https://github.com/ObrienlabsDev/blog/issues/91
gpu performance - https://github.com/obrienlabs/benchmark/issues/13
cpu performance - https://github.com/ObrienlabsDev/performance/tree/main/cpu https://github.com/ObrienlabsDev/cuda/blob/main/add_example/kernel_collatz.cu https://github.com/obrienlabs/benchmark/issues/12

Experiments

Collatz | Hailstone numbers | 3n+1 problem

option 3: Java 8 lambda/streams parallelization

see https://github.com/ObrienlabsDev/performance/issues/19

public void searchCollatzParallel(long oddSearchCurrent, long secondsStart) {
	long batchBits = 5; // adjust this based on the chip architecture 
	long searchBits = 32;
	long batches = 1 << batchBits;
	long threadBits = searchBits - batchBits;
	long threads = 1 << threadBits;
		
	for (long part = 0; part < (batches + 1) ; part++) {	
	    // generate a limited collection for the search space - 32 is a good
		System.out.println("Searching: " + searchBits + " space, batch " + part + " of " 
			+ batches + " with " + threadBits +" bits of " + threads + " threads"  );
			
		List<Long> oddNumbers = LongStream
					.range(1L + (part * threads), ((1 + part) * threads) - 1)
					.filter(x -> x % 2 != 0) // TODO: find a way to avoid this filter using range above
					.boxed()
					.collect(Collectors.toList());
			
		List<Long> results = oddNumbers
				.parallelStream()
				.filter(num -> isCollatzMax(num.longValue(), secondsStart))
				.collect(Collectors.toList());

		results.stream().sorted().forEach(x -> System.out.println(x));
	}
	System.out.println("last number: " + ((1 + (batches) * threads) - 1));
}

20241212: Observation

Curiously - running VMs are around 10-25% faster than running native (edit - may be differences on OpenJDK and commercial JDK 21)
13900KS is still faster than the M4 for single core
M4 Max is more than double the throughput than the 32 thread 13900/13900
M4 Max 40 core GPU is around half the speed of a comparable NVidia RTX-3500 Ada generation mobile card - both of which have 5120 cores
https://github.com/obrienlabs/benchmark/issues/12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance

Performance

Experiments

Collatz | Hailstone numbers | 3n+1 problem

option 3: Java 8 lambda/streams parallelization

20241212: Observation

Clone this wiki locally