diff --git a/06_cell_population_example.ipynb b/06_cell_population_example.ipynb index 365cea2..e3eee77 100644 --- a/06_cell_population_example.ipynb +++ b/06_cell_population_example.ipynb @@ -48,15 +48,19 @@ "\n", "These functions contain the ability to receive a seed to set up the random number generator. This is useful for testing and comparing performance as it allows us to guarantee the output of the simulation.\n", "\n", - "The script contain code which runs two single realisations - one which always dies out and one which always grows (the results are guaranteed by manually choosing the seed of the random number generator). The script then runs 100 realisations of the system. Each realisation receives the number of the realisation (from 0-99) as a seed to ensure reproducibility. The runtimes of the system are printed to the console. When I ran the code, the output was:\n", + "The script contain code which runs two single realisations - one which always dies out and one which always grows (the results are guaranteed by manually choosing the seed of the random number generator). The script then runs 200 realisations of the system. Each realisation receives the number of the realisation (from 0-199) as a seed to ensure reproducibility. The runtimes of the system are printed to the console. When I ran the code, the output was:\n", "\n", - "* Single realisation that always dies: 0.29s\n", - "* Single realisation that always grows: 2.25s\n", - "* 100 realisations: 129.7s\n", + "| Simulation Type | Running Time | Plotting Time |\n", + "|----------------------------------------|---------------------------------|---------------|\n", + "| Single realisation that always dies | 6.2 $\\times 10 ^{-5}\\textrm{s}$ | 0.31s |\n", + "| Single realisation that always grows | 2.0s | 0.20 |\n", + "| 200 realisations | 189s | 0.50s |\n", "\n", - "Each of these times includes the time spent to plot the output of the simulation. For the single realisation that dies, this is the majority of the runtime.\n", + "The time to simulate a growing population is significantly longer than the time to simulate a dying population as there are more cells to simulate. Around 20% of the simulations see a growing population. For a realisation which grows, different realisations may take different amounts of time, depending on how large the population gets. The figure below shows the amount of time taken for each realisation to run, with the number of the longest-lasted realisations added as annotation to the figures:\n", "\n", - "The time to simulate a growing population is significantly longer than the time to simulate a dying population as there are more cells to simulate. Just under 20% of the simulations see a growing population. As the number of realisations is not very high, we might expect there to be some variation in the outputs and the runtimes of the simulation with multiple realisations each time it is run." + "

\n", + "\"A\n", + "

" ] }, { @@ -69,17 +73,37 @@ "\n", "As we parallelise the code, we want to keep the interface for the functions a user might call as similar as possible, specifically, `run_single_realisation` and `run_multiple_realisations`. This means it will take minimal effort adapt existing tests and profiling, and any users running the code, or any places where the code is called in existing projects will not need to be changed.\n", "\n", - "Our first attempt to parallelising the code is to use a queue to store te results produced from a number of realisations in the file `06_cell_population_example/queue.py`. To do this we create the new function `run_n_realisation_queue` which is similar to the old function `run_multiple_realisations` but uses a queue to store the results of all realisations performed in a 2D Numpy array. This function will be called by each process. The function `run_multiple_realisations` is adapted to create the queue, start the processes, collect the results from the queue, and process the results. Each process returns a 2D Numpy array with the population at each time for each realisation.\n", + "Our first attempt to parallelising the code is to use a queue to store te results produced from a number of realisations in the file [`06_cell_population_example/queue_version.py`](06_cell_population_example/queue_version.py). To do this we create the new function `run_n_realisation_queue` which is similar to the old function `run_multiple_realisations` but uses a queue to store the results of all realisations performed in a 2D Numpy array. This function will be called by each process. The function `run_multiple_realisations` is adapted to create the queue, start the processes, collect the results from the queue, and process the results. Each process returns a 2D Numpy array with the population at each time for each realisation.\n", "\n", "When altering `run_multiple_realisations` we have made the number of processes an optional argument with a default value of 1. This means that calls made to the function without specifying the number of processes will still work, making integration of the new function into existing projects easier.\n", "\n", - "This implementation doesn't alter the runtime of the single realisations, but decreases the runtime from around 129s to around 69s on 4 cores. This is a decent speedup, but the code is not 4 times faster. Part of the reason for this becomes apparent when we run the code. The code prints when each process has finished its quarter of the realisations. Typically, the processes will finish at significantly different times. In one example I just ran, process 1 finished in 28 seconds, process 2 finished in 53 seconds, process 4 finished in 53 seconds and process 3 finished in 69 seconds. This is because each realisation does not take the same amount of time to run, with realisations that result in quick death of the cell population taking almost no time compared to a realisation where the population grows. If one process happens to simulate 10 realisations out of 25 where the cell population grows, it will take significantly longer to run than a process where only 2 grow. The figure below shows a hypothetical example of how the time each process spends on each realisation might vary.\n", + "This implementation doesn't alter the runtime of the single realisations, but decreases the runtime from around 189s to around 107s on 4 cores. This is a decent speedup, but the code is not 4 times faster. Part of the reason for this becomes apparent when we run the code and view which process is working on each realisation, as in the figure below:\n", "\n", "

\n", - "\"The\n", + "\"The\n", "

\n", "\n", - "Once a process has finished its realisations it will terminate and the physical core will be inactive. The code is left waiting for the slowest process to finish, meaning progressively fewer of the cores are active as the code runs. This is a common problem when parallelising code, and is known as load imbalance and is the main reason why the code is not 4 times faster when run on 4 cores." + "In the example above, Process 4 was running realisations 150-199, which happened to not include many realisations where the population grew and so finished in 26s. Processes 2 and 3 had more long-lived realisations and took 76s and 80s to run respectively. Process 1 happened to have several long-lived realisations and took 107s to run.\n", + "\n", + "Once a process has finished its realisations it will terminate and the physical core will be inactive. The code is left waiting for the slowest process to finish, meaning progressively fewer of the cores are active as the code runs. This is a common problem when parallelising code, and is known as load imbalance. Ideally, we would like a way to keep our processes busy for more of the time to make the overall calculation finish faster." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pool Implementation\n", + "\n", + "To solve the problem of load imbalance, we can use a `Pool`. The advantage of a pool is that it will keep all the processes busy by assigning them new tasks as they finish their previous task. This means processes will be kept busy for more of the time, and the overall calculation will finish faster. This is implemented in the file [`06_cell_population_example/pool_version.py`](06_cell_population_example/pool_version.py).\n", + "\n", + "This version is arguably simpler than the queue version as we don't need to have a function like `run_n_realisation_queue` to use as an interface between `run_multiple_realisations` and `run_single_realisation`. Instead, we can use the `starmap` function from the `Pool` object to run the realisations. Once we receive the results from the `Pool` object, we can process them into a 2D Numpy array and process them as before.\n", + "\n", + "The figure below shows the amount of time each process spent performing each realisation:\n", + "\n", + "

\n", + "\"The\n", + "\n", + "Primarily because of the way the `Pool` distributes work to the processes, the load is now much more evenly balanced between the processes. The code now takes around 83s to run on 4 cores." ] }, { diff --git a/06_cell_population_example/pool_runtimes.png b/06_cell_population_example/pool_runtimes.png index c384247..cce237f 100644 Binary files a/06_cell_population_example/pool_runtimes.png and b/06_cell_population_example/pool_runtimes.png differ diff --git a/06_cell_population_example/pool_version.py b/06_cell_population_example/pool_version.py index ae4cba0..4b8310e 100644 --- a/06_cell_population_example/pool_version.py +++ b/06_cell_population_example/pool_version.py @@ -107,10 +107,6 @@ def run_single_realisation(n_initial, reproduction_probability, mean_lifetime, o return run_time, plotting_time -def run_realisation_interface(args): - return run_realisation(*args) - - def run_multiple_realisations(n_initial, reproduction_probability, mean_lifetime, output_times, n_realisations, output_filepath, n_processes=1): ''' Run multiple realisations of the cell population model and plot the results. @@ -128,7 +124,7 @@ def run_multiple_realisations(n_initial, reproduction_probability, mean_lifetime arguments = [(n_initial, reproduction_probability, mean_lifetime, output_times, i) for i in range(n_realisations)] with multiprocessing.Pool(4) as p: - output_list = p.map(run_realisation_interface, arguments) + output_list = p.starmap(run_realisation, arguments) # Make a 2D array to store the populations of each realisation at each time output_populations = np.array([output[0] for output in output_list]) diff --git a/06_cell_population_example/queue_runtimes.png b/06_cell_population_example/queue_runtimes.png index e1a7468..5edcb18 100644 Binary files a/06_cell_population_example/queue_runtimes.png and b/06_cell_population_example/queue_runtimes.png differ diff --git a/06_cell_population_example/queue_version.py b/06_cell_population_example/queue_version.py index fe9163b..ece10ab 100644 --- a/06_cell_population_example/queue_version.py +++ b/06_cell_population_example/queue_version.py @@ -96,7 +96,7 @@ def run_single_realisation(n_initial, reproduction_probability, mean_lifetime, o ax.set_yscale('log') fig.savefig(output_filepath) - plotting_time = time.time() - run_time + plotting_time = time.time() - run_time - start_time return run_time, plotting_time diff --git a/06_cell_population_example/serial_runtimes.png b/06_cell_population_example/serial_runtimes.png index cdb5e3f..409051f 100644 Binary files a/06_cell_population_example/serial_runtimes.png and b/06_cell_population_example/serial_runtimes.png differ diff --git a/resources/pool_runtimes.png b/resources/pool_runtimes.png new file mode 100644 index 0000000..cce237f Binary files /dev/null and b/resources/pool_runtimes.png differ diff --git a/resources/queue_runtimes.png b/resources/queue_runtimes.png new file mode 100644 index 0000000..5edcb18 Binary files /dev/null and b/resources/queue_runtimes.png differ diff --git a/resources/serial_runtimes.png b/resources/serial_runtimes.png new file mode 100644 index 0000000..409051f Binary files /dev/null and b/resources/serial_runtimes.png differ