mm: parallelize deferred struct page initialization within each node
hulk inclusion category: feature bugzilla: 13228 CVE: NA --------------------------- Deferred struct page initialization currently runs one thread per node, but this is a bottleneck during boot on big machines, so use ktask within each pgdatinit thread to parallelize the struct page initialization, allowing the system to take better advantage of its memory bandwidth. Because the system is not fully up yet and most CPUs are idle, use more than the default maximum number of ktask threads. The kernel doesn't know the memory bandwidth of a given system to get the most efficient number of threads, so there's some guesswork involved. In testing, a reasonable value turned out to be about a quarter of the CPUs on the node. __free_pages_core used to increase the zone's managed page count by the number of pages being freed. To accommodate multiple threads, however, account the number of freed pages with an atomic shared across the ktask threads and bump the managed page count with it after ktask is finished. Test: Boot the machine with deferred struct page init three times Machine: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, 88 CPUs, 503G memory, 2 sockets kernel speedup max time per stdev node (ms) baseline (4.15-rc2) 5860 8.6 ktask 9.56x 613 12.4 Signed-off-by:Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by:
Hongbo Yao <yaohongbo@huawei.com> Reviewed-by:
Xie XiuQi <xiexiuqi@huawei.com> Tested-by:
Hongbo Yao <yaohongbo@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
Please register or sign in to comment