Large scale, direct particle-particle, brute force N-body simulations are required to accurately resolve numerically transport processes of energy and angular momentum due to two-body relaxation, and interactions between supermassive black holes and other particles having a much smaller mass. Direct accurate N-body codes are the widely used tool for such simulations, e.g., NBODY4 or NBODY6 (Aarseth 1999, 2003), see also Harfst et al. (2007) for a less complex code variant, used for benchmarks in this paper. Makino (2002) has presented another direct N-body summation code, which is optimized for a quadratic layout of processor (p required to be a square number).