-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel attributes #360
Kernel attributes #360
Conversation
/ok to test |
/ok to test |
/ok to test |
/ok to test |
I have a design question for any reviewers to weigh in on. There is another change in the works to add device properties to the Device class, and the way I've implemented that, is to have device_instance.properties -> DeviceProperties, where DeviceProperties lazy queries the properties and exposes them. In short you would get a property like such: device = Device()
device.properties.property_a The reason I put all of the properties in the subclass, is because there are a lot of them, and adding them straight to device would cause device to be very bloated. The question is whether you think I should do the same thing here. Prior to making the deivce property change, I thought this was the best way to implement it, but I am now leaning towards sticking the attributes in a subclass so they would be accessed like: kernel.attributes.attribute_a = True
variable = kernel.attributes.attribute_b One considerable difference is that all the device properties are read only, while some of the kernel attributes are read/write. |
I really think this is the way to go! We definitely do not want to bloat the kernel/device instance when hitting tab. |
ok cool, I agree. Change made |
/ok to test |
updated the review to remove the setters on read/write properties in line with the discussion about deadlock between properties and launch config. + a couple formatting improvements to the docs |
/ok to test |
…luggy-1.5.0 benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/ksimpson/code/cuda-python/cuda_core configfile: pyproject.toml plugins: benchmark-4.0.0 collected 17 items tests/test_module.py xAverage time per call to max_threads_per_block: 0.0000001646 seconds .Average time per call to shared_size_bytes: 0.0000001421 seconds .Average time per call to const_size_bytes: 0.0000001451 seconds .Average time per call to local_size_bytes: 0.0000001464 seconds .Average time per call to num_regs: 0.0000001585 seconds .Average time per call to ptx_version: 0.0000002534 seconds .Average time per call to binary_version: 0.0000001346 seconds .Average time per call to cache_mode_ca: 0.0000001768 seconds .Average time per call to cluster_size_must_be_set: 0.0000002234 seconds .Average time per call to max_dynamic_shared_size_bytes: 0.0000001594 seconds .Average time per call to preferred_shared_memory_carveout: 0.0000001541 seconds .Average time per call to required_cluster_width: 0.0000001443 seconds .Average time per call to required_cluster_height: 0.0000001399 seconds .Average time per call to required_cluster_depth: 0.0000001660 seconds .Average time per call to non_portable_cluster_size_allowed: 0.0000001502 seconds .Average time per call to cluster_scheduling_policy_preference: 0.0000001410 seconds . ====================================== 16 passed, 1 xfailed in 2.66s ====================================== (cuda_126) ksimpson@NV-3KWHSV3:~/code/cuda-python/cuda_core$ python -m pytest tests/test_module.py -s =========================================== test session starts =========================================== platform linux -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0 benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/ksimpson/code/cuda-python/cuda_core configfile: pyproject.toml plugins: benchmark-4.0.0 collected 17 items tests/test_module.py xAverage time per call to max_threads_per_block: 0.0000006603 seconds .Average time per call to shared_size_bytes: 0.0000006781 seconds .Average time per call to const_size_bytes: 0.0000005997 seconds .Average time per call to local_size_bytes: 0.0000006500 seconds .Average time per call to num_regs: 0.0000006209 seconds .Average time per call to ptx_version: 0.0000006196 seconds .Average time per call to binary_version: 0.0000006121 seconds .Average time per call to cache_mode_ca: 0.0000006328 seconds .Average time per call to cluster_size_must_be_set: 0.0000006298 seconds .Average time per call to max_dynamic_shared_size_bytes: 0.0000006944 seconds .Average time per call to preferred_shared_memory_carveout: 0.0000007717 seconds .Average time per call to required_cluster_width: 0.0000006319 seconds .Average time per call to required_cluster_height: 0.0000006384 seconds .Average time per call to required_cluster_depth: 0.0000006286 seconds .Average time per call to non_portable_cluster_size_allowed: 0.0000006788 seconds .Average time per call to cluster_scheduling_policy_preference: 0.0000008922 seconds
/ok to test |
|
Add getters and setters for the kernel attributes.
close #205