Faster Compile Times Through Parallel Compilation on Multiple Cores? (With Benchmarks)
I currently have access to PCs which are of totally different form factors—one being a desktop-PC, the other being a laptop-PC—and actually have pretty different CPUs—the former featuring a (then-high end) AMD Ryzen AMD Ryzen 9 3900X with 12 physical cores, the latter featuring a (currently mid-tier, and mobile) Intel Core i5-1135G7 with four physical cores—and during programming, I actually expected long waiting times and my patience being put to the test when programming with the mobile i5-1135G7. However, I have to say that for many tasks, I can not feel any difference between the two PCs. Taking a closer look at how the two very different CPUs compare to each other on UserBenchmark reveals very similar single-core scores for these two CPUs, with actually the i5-1135G7 being attributed a 8% higher single-core score compared to the 3900X.
But what about the huge difference in number of cores? Don’t programmers always need a lot of cores? Let us find out by doing some compile time benchmarks, shall we?!
In this blog post, I will compare the Ryzen 9 3900X with the Core i5-1135G7 in terms of compile times across different projects. The comparisons will not be 100% fair because I will not test them with exactly the same hardware components for SSD, RAM, and suchlike. I.e., it will not exactly be an apples to apples comparison. But the hardware specifications are not too different from each other, so at least overall performance trends should become visible. Table 1 shows a comparison of the two systems’ SSD speeds as determined with CrystalDiskMark. While they are not exactly the same, they definitely play in the same ball park which should be sufficient for receiving an impression of performance trends in terms of compile times later on.
Benchmark | Desktop Read | Desktop Write | Laptop Read | Laptop Write |
---|---|---|---|---|
SEQ1M Q8T1 | 1802 MB/s | 1705 MB/s | 2478 MB/s | 1789 MB/s |
SEQ1M Q1T1 | 1603 MB/s | 1591 MB/s | 1580 MB/s | 1477 MB/s |
RND4K Q32T1 | 551 MB/s | 445 MB/s | 318 MB/s | 366 MB/s |
RND4K Q1T1 | 51 MB/s | 170 MB/s | 42 MB/s | 97 MB/s |
Table 1: Read and write speeds of the different SSDs from the two different systems.
Before studying compile times, let us take a look at benchmark results from PCMark 10 in Table 2. The tests were performed once on the desktoop-PC, and twice on the laptop-PC: once plugged in, and once running on battery.
Benchmark | Ryzen 9 3900X | i5-1135G7 plugged | i5-1135G7 battery |
---|---|---|---|
Firefox warm start | 1.56 s | 1.56 s | 1.82 s |
GIMP warm start | 2.33 s | 2.64 s | 2.97 s |
Cut and paste | 0.33 s | 0.32 s | 0.31 s |
Recalculate Stock history CPU | 0.85 s | 0.88 s | 1.01 s |
Edit cells | 0.79 s | 0.83 s | 0.85 s |
Save JPEG | 1.17 s | 1.24 s | 1.27 s |
Gaussian blur | 0.41 s | 0.35 s | 0.42 s |
Table 2: Comparing several selected benchmark results from PCMark 10 shows that for many every-day tasks there is not a huge performance difference between the two CPUs although they feature very different numbers of cores. There were many benchmark results which clearly showed an advantage for the CPU with more cores like, e.g., “Batch transformation”, but these were left out of this table to make the point of compareable single-core performance. Warm start benchmarks have been selected instead of cold start benchmarks, to help decrease influences of different SSD speeds.
Compile Time Benchmarks
Now let us move on to the really interesting data for programmers: Can we see a difference in compile times with CPUs that feature higher numbers of cores? The performance measurements shown in the following have been created from building different projects: ASSIMP (commit 376b3b2), Sascha Willems’ Vulkan Examples (commit ac4deed), and Gears-Vk (commit 08d4c97). The latter is a Vulkan rendering framework which I am maintaining and I’ve included different configurations in the performance measurements: While the repository contains 11 example applications at the specified commit, they are referencing a common framework project. The compile times of the framework project have been measured in isolation, but also the compile times of the whole solution including all example applications. Another variant of compile time measurement includes a custom build step in the stated times. The build step compiles shaders and deploys asset files. It must be expected that the build step does not parallelize well, but nevertheless, it represents a real-world use case.
The first benchmarks are shown in Table 3 and have been bulit with MSBuild
and its default settings. I.e. the build instruction is as simple as:
.\MSBuild.exe path-to-solution.sln
Benchmark | Ryzen 9 3900X | i5-1135G7 plugged | i5-1135G7 battery |
---|---|---|---|
ASSIMP | 0:20.24 | 0:52.50 | 0:55.16 |
Sascha Willems’ Vulkan Examples | 2:17.04 | 2:26.18 | 3:06.63 |
Gears-Vk, framework only | 0:23.58 | 0:26.24 | 0:29.55 |
Gears-Vk + examples w/o custom build step |
2:12.02 | 2:18.63 | 2:31.49 |
Gears-Vk + examples + custom build step |
2:36.94 | 2:52.52 | 3:09.93 |
Table 3: Surprisingly, we cannot really see siginificant performance differences in compile times across the tests. Only in the ASSIMP test, the 12-core CPU is quite a bit faster.
The results from Table 3 are a bit of a disappointment from the 12-core CPU’s point of view. But a quick look into the options that MSBuild
provides, revealed an interesting parameter: -maxcpucount
. It lets MSBuild
compile multiple projects in parallel. Let us investigate the effect this parameter has on compile times, setting it to the maximum number of logigal CPUs on both systems. The results are presented in Table 4.
Benchmark | Ryzen 9 3900X | i5-1135G7 plugged | i5-1135G7 battery |
---|---|---|---|
ASSIMP | 0:20.60 | 0:54.44 | 0:56.99 |
Sascha Willems’ Vulkan Examples | 0:27.25 | 0:56.79 | 0:57.16 |
Gears-Vk, framework only | 0:23.50 | 0:28.34 | 0:29.17 |
Gears-Vk + examples w/o custom build step |
0:40.56 | 1:01.54 | 1:04.99 |
Gears-Vk + examples + custom build step |
0:43.53 | 1:13.03 | 1:14.29 |
Table 4: Compile time benchmark results, generated with MSBuild
and parameters -maxcpucount:8
for i5-1135G7, and -maxcpucount:24
for the Ryzen 9 3900X. The effect of this setting can be observed for all projects which contain multiple projects when compared to the results from Table 3.
The -maxcpucount
option turned out to be a success. Compile times benefit from this setting in solutions which contain more than one projects. Since the benchmarks “ASSIMP” and “Gears-Vk, framework only” only contain a single project, those did not see improvements in compile times.
The 12-core CPU could not outperform the 4-core CPU in the proportion of their physical core numbers. Only in “Sascha Willems’ Vulkan Examples”, the former managed to be twice as fast as the latter. It might well be that there are certain dependencies which do not allow further parallelization, but it is a real-world usage scenario, and it does not appear that the payoff is totally in proportion to the number of cores.
There is yet another interesting option, namely /MP
which can build multiple “compile units” in parallel. In a final benchmark, the effect of this option is investigated, the results are presented in Table 5.
Benchmark | m=1, MP=1 | m=1, MP=24 | m=24, MP=1 | m=24, MP=24 | m=12, MP=12 |
---|---|---|---|---|---|
ASSIMP | 2:45.45 | 0:20.58 | 2:40.09 | 0:18:96 | 0:23:63 |
Sascha Willems’ Vulkan Examples | 2:15.70 | 2:16.65 | 0:25.69 | 0:25.59 | 0:28:28 |
Gears-Vk + examples w/o custom build step |
2:18.50 | 2:18.84 | 0:39.71 | 0:39.60 | 0:39.55 |
Table 5: Different compile time measurements with different settings for -maxcpucount
(abbreviated with ‘m’) /MP
(abbreviated with ‘MP’) and different combinations of those two settings compared to each other. All measurements taken on the Ryzen 9 3900X.
Different values for the /MP
parameter show big effects also in the solutions which only contain one project and could therefore not be sped up by the -maxcpucount
parameter. The performance differences of ASSIMP compile times for different /MP
parameters leads to the conclusion that this setting is turned on by default in contrast to the -maxcpucount
parameter.
For solutions containing multiple projects, the -maxcpucount
parameter resulted in a bigger speed-up than the /MP
parameter, while the former does not help at all if there is only one project in a solution.
Takeaways
Compilation can be parallelized to a certain degree, resulting in significant reductions of compile time. However, parallel compilation is not necessarily turned on by default.
When building solutions with multiple projects using MSBuild
, ensure that the -maxcpucount
parameter is set, otherwise projects will not be built in parallel. Visual Studio appears to build multiple projects in parallel by default since the build times when building a solution from within Visual Studio are very close to the results measured with MSBuild
from Table 4.
The /MP
setting can speed-up inter-project compile time by compiling multiple compile units in parallel. It appears that this setting is on by default in MSBuild
and most likely also when building via Visual Studio. To be totally sure, check the Visual Studio setting
Configuration Properties -> C++ -> General -> Multi-processor Compilation
.
One last, but important point to consider is if /MP
can compile multiple compilation units in parallel, which might be prevented by high degrees of coupling between parts of the code, by excessive use of templates, and propbably a number of other situations. Avoiding such situtaions will lead to faster compile times when /MP
is on—and then yes, you will befenefit the high number of cores in your developer-PC CPU.
While it is definitely beneficial to have multiple cores available for parallel compilation, the speed-up did not turn out to be in proportion to the number of cores. I therefore have to hypothesise that it might be better to go for higher single-core speed instead of a hugely higher core count, if you can only choose one. A different approach might be to gradually get rid of dependencies within your code which prevent further speed-ups through parallel compilation. I encourage you to set it as your goal to improve your project’s structure so much that your compile times continue to decline with higher core counts—I wonder if this leads to a great code structure. Please feel free to share your thoughts especially about possible code improvements in the comments section.
Comments