We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this paper, a new method based on a genetic algorithm and Minkowski Island fractal is proposed for multiband antennas. Three-antenna configurations are chosen to validate the proposed optimization procedure. The first configuration is a wide-band antenna, operating in the WLAN (wireless local area network) UNII-2C band. The second configuration is a dual-band antenna, operating in the WLAN UNII-2 and UNII-2C bands. In contrast, the third is a tri-band antenna operating in the UNII-2, UNII-2C, and UNII-3 bands. The optimization process is accelerated by using the Computer Simulation Technology (CST) Application Programming Interface which allows all genetic operators to be performed in MATLAB while the numerical calculations are running in the internal CST Finite-Difference Time-Domain -solver using parallel computing with GPU acceleration. All three designed configurations are manufactured using a $\textstyle0.8\;\text{mm}$ thick FR4 epoxy substrate with a relative dielectric constant of $4.8$. The return loss and the radiation pattern’s measurements agree well with the simulation results. Further, the methodology presented can be very effective in terms of size reduction; the designed antennas are $24 \times 24 \times 0.8\;{\textrm{m}}{{\textrm{m}}^3}$ ($460\;{\textrm{m}}{{\textrm{m}}^3}$).
Monte Carlo algorithms simulates some prescribed number of samples, taking some random real time to complete the computations necessary. This work considers the converse: to impose a real-time budget on the computation, which results in the number of samples simulated being random. To complicate matters, the real time taken for each simulation may depend on the sample produced, so that the samples themselves are not independent of their number, and a length bias with respect to compute time is apparent. This is especially problematic when a Markov chain Monte Carlo (MCMC) algorithm is used and the final state of the Markov chain—rather than an average over all states—is required, which is the case in parallel tempering implementations of MCMC. The length bias does not diminish with the compute budget in this case. It also occurs in sequential Monte Carlo (SMC) algorithms, which is the focus of this paper. We propose an anytime framework to address the concern, using a continuous-time Markov jump process to study the progress of the computation in real time. We first show that for any MCMC algorithm, the length bias of the final state’s distribution due to the imposed real-time computing budget can be eliminated by using a multiple chain construction. The utility of this construction is then demonstrated on a large-scale SMC$ {}^2 $ implementation, using four billion particles distributed across a cluster of 128 graphics processing units on the Amazon EC2 service. The anytime framework imposes a real-time budget on the MCMC move steps within the SMC$ {}^2 $ algorithm, ensuring that all processors are simultaneously ready for the resampling step, demonstrably reducing idleness to due waiting times and providing substantial control over the total compute budget.
The application of finite-difference methods to initial-value problems, with emphasis on parabolic equations, is considered using the one- and two-dimensional unsteady diffusion equations as model problems.Single-step methods are introducted for ordinary differential equations, and more general explicit and implicit methods are articulated for partial differential equations.Numerical stability analysis is covered using the matrix method, von Neumann method, and the modified wavenumber method.These numerical methods are also applied to nonlinear convection problems.A brief introduction to numerical methods for hyperbolic equations is provided, and parallel computing is discussed.
In the context of a motivating study of dynamic network flow data on a large-scale e-commerce website, we develop Bayesian models for online/sequential analysis for monitoring and adapting to changes reflected in node–node traffic. For large-scale networks, we customize core Bayesian time series analysis methods using dynamic generalized linear models (DGLMs). These are integrated into the context of multivariate networks using the concept of decouple/recouple that was recently introduced in multivariate time series. This method enables flexible dynamic modeling of flows on large-scale networks and exploitation of partial parallelization of analysis while maintaining coherence with an over-arching multivariate dynamic flow model. This approach is anchored in a case study on Internet data, with flows of visitors to a commercial news website defining a long time series of node–node counts on over 56,000 node pairs. Central questions include characterizing inherent stochasticity in traffic patterns, understanding node–node interactions, adapting to dynamic changes in flows and allowing for sensitive monitoring to flag anomalies. The methodology of dynamic network DGLMs applies to many dynamic network flow studies.
Voice conversion aims to change a source speaker's voice to make it sound like the one of a target speaker while preserving linguistic information. Despite the rapid advance of voice conversion algorithms in the last decade, most of them are still too complicated to be accessible to the public. With the popularity of mobile devices especially smart phones, mobile voice conversion applications are highly desirable such that everyone can enjoy the pleasure of high-quality voice mimicry and people with speech disorders can also potentially benefit from it. Due to the limited computing resources on mobile phones, the major concern is the time efficiency of such a mobile application to guarantee positive user experience. In this paper, we detail the development of a mobile voice conversion system based on the Gaussian mixture model (GMM) and the weighted frequency warping methods. We attempt to boost the computational efficiency by making the best of hardware characteristics of today's mobile phones, such as parallel computing on multiple cores and the advanced vectorization support. Experimental evaluation results indicate that our system can achieve acceptable voice conversion performance while the conversion time for a five-second sentence only takes slightly more than one second on iPhone 7.
As multi-core computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multi-core computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation.
We present a JASMIN-based two-dimensional parallel implementation of an adaptive combined preconditioner for the solution of linear problems arising in the finite volume discretisation of one-group and multi-group radiation diffusion equations. We first propose the attribute of patch-correlation for cells of a two-dimensional monolayer piecewise rectangular structured grid without any suspensions based on the patch hierarchy of JASMIN, classify and reorder these cells via their attributes, and derive the conversion of cell-permutations. Using two cell-permutations, we then construct some parallel incomplete LU factorisation and substitution algorithms, to provide our parallel -GMRES solver with the help of the default BoomerAMG in the HYPRE library. Numerical results demonstrate that our proposed parallel incomplete LU preconditioner (ILU) is of higher efficiency than the counterpart in the Euclid library, and that the proposed parallel -GMRES solver is more robust and more efficient than the default BoomerAMG-GMRES solver.
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higher-order FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from the Intel MKL on the Intel Xeon (E5-2680 v3, 12 threads) central processing unit (CPU) executed in parallel mode. Compared to the CPU reference implementation based on the Intel MKL functions, the proposed GPU-based LOBPCG method with inexact nullspace filtering allowed us to achieve up to 2.9-fold acceleration.
In this work, we extend Achi Brandt's notion of textbook multigrid efficiency (TME) to massively parallel algorithms. Using a finite element based geometric multigrid implementation, we recall the classical view on TME with experiments for scalar linear equations with constant and varying coefficients as well as linear systems with saddle-point structure. To extend the idea of TME to the parallel setting, we give a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account performance modeling techniques. We illustrate our newly introduced parallel TME measure by large-scale computations, solving problems with up to 200 billion unknowns on a TOP-10 supercomputer.
We present a finite volume based cell-centered method for solving diffusion equations on three-dimensional unstructured grids with general tensor conduction. Our main motivation concerns the numerical simulation of the coupling between fluid flows and heat transfers. The corresponding numerical scheme is characterized by cell-centered unknowns and a local stencil. Namely, the scheme results in a global sparse diffusion matrix, which couples only the cell-centered unknowns. The space discretization relies on the partition of polyhedral cells into sub-cells and on the partition of cell faces into sub-faces. It is characterized by the introduction of sub-face normal fluxes and sub-face temperatures, which are auxiliary unknowns. A sub-cellbased variational formulation of the constitutive Fourier law allows to construct an explicit approximation of the sub-face normal heat fluxes in terms of the cell-centered temperature and the adjacent sub-face temperatures. The elimination of the sub-face temperatures with respect to the cell-centered temperatures is achieved locally at each node by solving a small and sparse linear system. This systemis obtained by enforcing the continuity condition of the normal heat flux across each sub-cell interface impinging at the node under consideration. The parallel implementation of the numerical algorithm and its efficiency are described and analyzed. The accuracy and the robustness of the proposed finite volumemethod are assessed bymeans of various numerical test cases.
The growth of the size of cache and the number of processor cores in modern CPUs is the major factor in advancing the computing performance of modern machines. The effect of CPU cache size in multicore computers on performance, however, has attracted little attention in lubrication and engineering analyses. In this study, the effect of cache size on the computational performance of two parallel iterative methods in solving two Reynolds equations is examined. Four computers, with CPU cache size from 4 to 40 MB and the number of processor cores from 4 to 16, were used. The sizes of the numerical grid were selected to simulate large gridwork (256 × 256) to small gridwork (2048 × 2048) tasks. It is found that the size of CPU cache is a major factor influencing the parallel efficiency in using the RBSOR method. On the other hand, the SPSOR method obtains much higher parallel efficiency than the RBSOR for medium-grained tasks, regardless of the size of CPU cache. The use of the SPSOR can, therefore, provide a much better parallel computing performance than the RBSOR in the cases of having a large number of grids or in a system with limited CPU cache.
We compare the performance of the very popular Tree-GPU code BONSAI with the older Particle-(Multi)Mesh code SUPERBOX. Both code we run on a same hardware using the GPU acceleration for the force calculation. SUPERBOX is a particle-mesh code with high resolution sub-grid and a higher order NGP (nearest grid point) force-calculation scheme. In our research, we are aiming to demonstrate that the new parallel version of SUPERBOX is capable to do the high resolution simulations of the interaction of the system of disc-bulge-halo composed galaxy. We describe the improvement of performance and scalability of SUPERBOX particularly for the Kepler cluster (NVIDIA K20 GPU). A comparison was made with the very popular and publicly available Tree-GPU code BONSAI†.
This paper presents a parallel Newton-Krylov-Schwarz method for the numerical simulation of unsteady flows at high Reynolds number around a high-speed train under crosswind. With a realistic train geometry, a realistic Reynolds number, and a realistic wind speed, this is a very challenging computational problem. Because of the limited parallel scalability, commercial CFD software is not suitable for supercomputers with a large number of processors. We develop a Newton-Krylov-Schwarz based fully implicit method, and the corresponding parallel software, for the 3D unsteady incompressible Navier-Stokes equations discretized with a stabilized finite element method on very fine unstructured meshes. We test the algorithm and software for flows passing a train modeled after China’s high-speed train CRH380B, and we also compare our results with results obtained from commercial CFD software. Our algorithm shows very good parallel scalability on a supercomputer with over one thousand processors.
Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions. The evolution of computer architectures (multi-core and many-core) towards a higher number of cores can only confirm that parallelism is the method of choice for speeding up an algorithm. In the last decade, the graphics processing unit, or GPU, has gained an important place in the field of high performance computing (HPC) because of its low cost and massive parallel processing power. Super-computing has become, for the first time, available to anyone at the price of a desktop computer. In this paper, we survey the concept of parallel computing and especially GPU computing. Achieving efficient parallel algorithms for the GPU is not a trivial task, there are several technical restrictions that must be satisfied in order to achieve the expected performance. Some of these limitations are consequences of the underlying architecture of the GPU and the theoretical models behind it. Our goal is to present a set of theoretical and technical concepts that are often required to understand the GPU and its massive parallelism model. In particular, we show how this new technology can help the field of computational physics, especially when the problem is data-parallel. We present four examples of computational physics problems; n-body, collision detection, Potts model and cellular automata simulations. These examples well represent the kind of problems that are suitable for GPU computing. By understanding the GPU architecture and its massive parallelism programming model, one can overcome many of the technical limitations found along the way, design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations.
The aim of this work is to compare a new uncoupled solver for the cardiac Bidomain model with a usual coupled solver. The Bidomain model describes the bioelectric activity of the cardiac tissue and consists of a system of a non-linear parabolic reaction-diffusion partial differential equation (PDE) and an elliptic linear PDE. This system models at macroscopic level the evolution of the transmembrane and extracellular electric potentials of the anisotropic cardiac tissue. The evolution equation is coupled through the non-linear reaction term with a stiff system of ordinary differential equations (ODEs), the so-called membrane model, describing the ionic currents through the cellular membrane. A novel uncoupled solver for the Bidomain system is here introduced, based on solving twice the parabolic PDE and once the elliptic PDE at each time step, and it is compared with a usual coupled solver. Three-dimensional numerical tests have been performed in order to show that the proposed uncoupled method has the same accuracy of the coupled strategy. Parallel numerical tests on structured meshes have also shown that the uncoupled technique is as scalable as the coupled one. Moreover, the conjugate gradient method preconditioned by Multilevel Hybrid Schwarz preconditioners converges faster for the linear systems deriving from the uncoupled method than from the coupled one. Finally, in all parallel numerical tests considered, the uncoupled technique proposed is always about two or three times faster than the coupled approach.
Hydrodynamic dispersion is a key controlling factor of solute transport in heterogeneousporous media. It critically depends on dimensionality. The asymptotic macrodispersion,transverse to the mean velocity direction, vanishes only in 2D and not in 3D. Using theclassical Gaussian correlated permeability fields with a lognormal distribution ofvariance σY2, thelongitudinal and transverse dispersivities are determined numerically as a function ofheterogeneity and dimensionality. We show that the transverse macrodispersion steeplyincreases with σY2 underlyingthe essential role of flow lines braiding, a mechanism specific to 3D systems. Thetransverse macrodispersion remains however at least two orders of magnitude smaller thanthe longitudinal macrodispersion, which increases even more steeply withσY2. At moderateto high levels of heterogeneity, the transverse macrodispersion also converges much fasterto its asymptotic regime than do the longitudinal macrodispersion. Braiding cannot be thustaken as the sole mechanism responsible for the high longitudinal macrodispersions. Itcould be either supplemented or superseded by stronger velocity correlations in 3D than in2D. This assumption is supported by the much larger longitudinal macrodispersions obtainedin 3D than in 2D, up to a factor of 7 for σY2 = 7.56.
In an air foil bearing analysis the model is usually solved iteratively due in part to the nonlinearity of the modeling Reynolds equation and the compliance of the bearing surface. The solution procedure requires a multiple-level-deep nested iteration, which involves extended solution time and convergence difficulty. In this study, a simple air foil bearing model is used and the compressible-fluid Reynolds equation for modeling gas lubrication is linearized by Newton's method. The discretized equation is solved by one of the two parallel iterative methods, red-black or strip partition successive-over-relaxation (SOR) method. The parallel programming is conducted using OpenMP programming in an eight-core work-station. Then, a numerical damping scheme for the film-profile convergence is presented. Finally, a root-finding process is conducted to iteratively attain the eccentricity of the bearing for a given load. It is found that the numerical damping step is crucial, which allows the use of a larger relaxation factor to have a fast rate of convergence. Both the parallel SOR methods are easy to implement and the red-black SOR method exhibits better efficiency in the studied cases. This study presents a parallel computing scheme for analyzing air foil bearing of bump-type by today's shared-memory multicore platforms.
We introduce and study a parallel domain decomposition algorithm for the simulation of blood flow in compliant arteries using a fully-coupled system of nonlinear partial differential equations consisting of a linear elasticity equation and the incompressible Navier-Stokes equations with a resistive outflow boundary condition. The system is discretized with a finite element method on unstructured moving meshes and solved by a Newton-Krylov algorithm preconditioned with an overlapping restricted additive Schwarz method. The resistive outflow boundary condition plays an interesting role in the accuracy of the blood flow simulation and we provide a numerical comparison of its accuracy with the standard pressure type boundary condition. We also discuss the parallel performance of the implicit domain decomposition method for solving the fully coupled nonlinear system on a supercomputer with a few hundred processors.
This paper addresses a combinatorial optimization problem (COP), namely a variant of the (standard) matrix chain product (MCP) problem where the matrices are square and either dense (i.e. full) or lower/upper triangular. Given a matrix chain of length n, we first present a dynamic programming algorithm (DPA) adapted from the well known standard algorithm and having the same O(n3) complexity. We then design and analyse two optimal O(n) greedy algorithms leading in general to different optimal solutions i.e. chain parenthesizations. Afterwards, we establish a comparison between these two algorithms based on the parallel computing of the matrix chain product through intra and inter-subchains coarse grain parallelism. Finally, an experimental study illustrates the theoretical parallel performances of the designed algorithms.