-
Notifications
You must be signed in to change notification settings - Fork 0
Performance
Michael O'Brien edited this page Jan 2, 2025
·
7 revisions
- performance testing via single/multithreaded, gpu code - https://github.com/ObrienlabsDev/blog/issues/91
- gpu performance - https://github.com/obrienlabs/benchmark/issues/13
- cpu performance - https://github.com/ObrienlabsDev/performance/tree/main/cpu https://github.com/ObrienlabsDev/cuda/blob/main/add_example/kernel_collatz.cu https://github.com/obrienlabs/benchmark/issues/12
see https://github.com/ObrienlabsDev/performance/issues/19
public void searchCollatzParallel(long oddSearchCurrent, long secondsStart) {
long batchBits = 5; // adjust this based on the chip architecture
long searchBits = 32;
long batches = 1 << batchBits;
long threadBits = searchBits - batchBits;
long threads = 1 << threadBits;
for (long part = 0; part < (batches + 1) ; part++) {
// generate a limited collection for the search space - 32 is a good
System.out.println("Searching: " + searchBits + " space, batch " + part + " of "
+ batches + " with " + threadBits +" bits of " + threads + " threads" );
List<Long> oddNumbers = LongStream
.range(1L + (part * threads), ((1 + part) * threads) - 1)
.filter(x -> x % 2 != 0) // TODO: find a way to avoid this filter using range above
.boxed()
.collect(Collectors.toList());
List<Long> results = oddNumbers
.parallelStream()
.filter(num -> isCollatzMax(num.longValue(), secondsStart))
.collect(Collectors.toList());
results.stream().sorted().forEach(x -> System.out.println(x));
}
System.out.println("last number: " + ((1 + (batches) * threads) - 1));
}
- Curiously - running VMs are around 10-25% faster than running native (edit - may be differences on OpenJDK and commercial JDK 21)
- 13900KS is still faster than the M4 for single core
- M4 Max is more than double the throughput than the 32 thread 13900/13900
- M4 Max 40 core GPU is around half the speed of a comparable NVidia RTX-3500 Ada generation mobile card - both of which have 5120 cores
- https://github.com/obrienlabs/benchmark/issues/12