Use joblib with mem-mapping for multiprocessing
This replaces the current multiprocessing implementation with parallelization based on joblib including mem-mapping.
Fixes https://github.com/GFZ/arosics/issues/36.
Here are some benchmark results with 2.5k tie points and 10k tie points. The table contains processing times for tie point computation only and the total computation for the entire workflow. The Linux machine has 32 cores, the Windows machine 12 cores. RAM usage is much higher with the former plain multiprocessing implementation on Windows than with joblib. Bold fonts indicate changes in this merge request:
Backend | OS/Branch | 10k | total 10k | 2.5k | total 2.5k |
---|---|---|---|---|---|
multiprocessing with progress | Linux main | 00:51 | 01:16 | 00:13 | 00:36 |
Loky with progress | Linux new | 01:02 | 01:26 | 00:18 | 00:41 |
multiprocessing without progress | Linux new | 00:55 | 01:19 | 00:15 | 00:37 |
multiprocessing with progress | Windows main | 03:40 | 04:19 | 00:55 | 01:33 |
Loky with progress | Windows new | 04:09 | 04:25 | 01:05 | 01:23 |
multiprocessing without progress | Windows new | 04:12 | 04:28 | 01:18 | 01:33 |
So the new implementation is slightly slower than the old one but requires much less memory on Windows and seems to fix deadlocks on Windows. Processing can also be interrupted with Ctrl-C and tracebacks are more useful.