Introduction of order=0.5 mode in Single_crystal #1762
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The optimization in Single_crystal to reuse calculations for the previous ray when they are identical was supposed to speed up the calculation with a large factor, but in practice it was found to only reach a speed up of about 2 even with large split numbers. This was explored as a tangent in this issue: #1725
The reason is that the expensive hkl_search is performed both for the ray entering the crystal (which can be reused as the wave vector will be exactly the same as the previous ray when using split), and for when leaving the crystal, hence at most half of the work can be reused.
Here a new mode is introduced, order=0.5, where the attenuation from coherent scattering as the ray leaves the crystal is simply ignored, meaning the number of hkl_search calls can be reduced with a factor equal to the split number.
Here are results for lucine with different values for the order:
![lucine_order_comparison_wide](https://private-user-images.githubusercontent.com/24249970/387269863-067a6f4f-d528-4c37-922b-ae7d8d85266b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNjk4NjMtMDY3YTZmNGYtZDUyOC00YzM3LTkyMmItYWU3ZDhkODUyNjZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYzMTNmYzllN2U5ZDA4MzBkMTRjZjBlNjYyMWI1NmUwYTRiNDAwNjU5ZTJkZTIyMDU1MzkwNTE2NGM2M2Y1MzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.01yGYEzCkITbuwfAm9Xy8byIN9i-aD08QjyezT3uHfc)
![lucine_order_comparison_narrow](https://private-user-images.githubusercontent.com/24249970/387269835-06e39a5d-b3f4-42cc-97db-8f0ccfaf6af3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNjk4MzUtMDZlMzlhNWQtYjNmNC00MmNjLTk3ZGItOGYwY2NmYWY2YWYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJkMzllNDU0MjhhMTA4ZjlhOGRkMWUxYTliZDdkN2MzZmYwYjExNWVkNTcyZGYyNWUwZTFlZjQ4M2Y1NDE3ZTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.o97wUPBqY2BKUuo3G2i-grJcHqrWU4ANBJ7YKf0bdp8)
As well as rubredoxin:
![output_narrow_rubredoxin](https://private-user-images.githubusercontent.com/24249970/387270004-eadf0459-4b33-46f9-93f8-2d8608ac8c12.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNzAwMDQtZWFkZjA0NTktNGIzMy00NmY5LTkzZjgtMmQ4NjA4YWM4YzEyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIxZWU4OTUwYjE0NzI3OGZmNmE2ZDRkNDhjYTEyNDk3ODU1NDhmOTNiZmJlZTRkMTc5MjZmZTkyMDNjYzA4MGUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.3gfIzaFFF7HQrIoQyAIukp9qo-27QVFRy71GWeacYRQ)
The new mode, order=0.5 is stable when changing split value as expected:
![order_half_split_stable](https://private-user-images.githubusercontent.com/24249970/387270204-9b90a536-5aec-4912-9988-21b1acc0be71.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNzAyMDQtOWI5MGE1MzYtNWFlYy00OTEyLTk5ODgtMjFiMWFjYzBiZTcxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNkODRhZTdmZTk0NjE2ODAxMjU1NTY2ZmI3Mjg4MjQ3NDA0YjgxOGFhOGRlNTBhMTg5NDM4OWI5YWMwZWM3MDUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.rMocblFkJKSPe-KPtTSvTHNjcAwJKwPQOgnqIGJ8vy0)
The execution time as a function of order and split is shown in the graf below:
![execution_time_orders](https://private-user-images.githubusercontent.com/24249970/387270602-2adb29a4-3e43-43d9-84e6-7c78ea81e2e9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNzA2MDItMmFkYjI5YTQtM2U0My00M2Q5LTg0ZTYtN2M3OGVhODFlMmU5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThkN2Y1MzdkNTMxYjQ3NTcyODA4ZjI4OTBlMzZiMWFjMmM0YjgxNzYxZGEzMmE1YTQ4YmZmNzliYTg3NjA4OWMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.FmB1tvg8CLlgiawW7RCgwOGXLSZgImKf7T8opoRJ-OY)
This is for a target ncount after the crystal, so split 100 would be done with a factor 100 less rays than split 1. The fastest run with order=0.5 is barely more than a second, while the slowest runs with higher order is about 100 seconds.
The speedup between the different order settings can be seen below, the limit of 2 for order=1 is clearly seen.
![speedup_orders](https://private-user-images.githubusercontent.com/24249970/387270871-307c3e54-b010-4942-8819-acfdb65a5fef.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzgsIm5iZiI6MTczOTEzNjgzOCwicGF0aCI6Ii8yNDI0OTk3MC8zODcyNzA4NzEtMzA3YzNlNTQtYjAxMC00OTQyLTg4MTktYWNmZGI2NWE1ZmVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ1NTI4NjczMDE2NzQ4ZDg0YjljNjU4MmYwZTRkYmQxNDBlYmYyODU5ZmY5MjU3ZDM4M2Q5ZTUwMjc0M2Y1NmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.GaWD63--BcZDlnbivVLBGAXvu2ORB1pxthJtKWmPYwo)
Here a speedup of almost a factor of 30 is seen. The increase in speed is suspected to grow for higher split numbers, but will saturate as the time spent elsewhere in the simulation starts to become the bottleneck.