Please see the paper on https://ieeexplore.ieee.org/document/10433728.
Here is a tip: This is the code for CTE process. For the batch_size in Section V, it means the batch size of the decoder. In the training process, it depends on your device. Maybe 8 or 16 is better.