Evaluation of an OpenMP Parallelization of Lucas-Kanade on a NUMA-Manycore
Résumé
Lucas-Kanade algorithm is a well-known optical flow estimator widely used in image processing for motion detection and object tracking. As a typical image processing algorithm, the procedure is a series of convolution masks followed by 22 linear systems for the optical flow vectors. Since we are dealing with a stencil computation for each stage of the
algorithm, the overhead from memory accesses is expected to stand as a serious scalability bottleneck, especially on a NUMA manycore configuration. The objective of this study is therefore to investigate an OpenMP parallelization of Lucas-kanade algorithm on a NUMA manycore, including the performance impact of NUMA-aware settings at runtime. Experimental results on a dual-socket INTEL Broadwell-E/EP is provided together with the corresponding technical discussions