Search

New method based on genetic algorithm and Minkowski fractal for multiband antenna designs
Bouchra Ezzahry, Taj-Eddin Elhamadi, Mohammed Lamsalli, Naima Amar Touhami
Journal:

International Journal of Microwave and Wireless Technologies / Volume 16 / Issue 3 / April 2024

Published online by Cambridge University Press:

03 October 2023, pp. 466-477
- Article
- - You have access
- PDF
- HTML
- Export citation
In this paper, a new method based on a genetic algorithm and Minkowski Island fractal is proposed for multiband antennas. Three-antenna configurations are chosen to validate the proposed optimization procedure. The first configuration is a wide-band antenna, operating in the WLAN (wireless local area network) UNII-2C band. The second configuration is a dual-band antenna, operating in the WLAN UNII-2 and UNII-2C bands. In contrast, the third is a tri-band antenna operating in the UNII-2, UNII-2C, and UNII-3 bands. The optimization process is accelerated by using the Computer Simulation Technology (CST) Application Programming Interface which allows all genetic operators to be performed in MATLAB while the numerical calculations are running in the internal CST Finite-Difference Time-Domain -solver using parallel computing with GPU acceleration. All three designed configurations are manufactured using a $\textstyle0.8\;\text{mm}$ thick FR4 epoxy substrate with a relative dielectric constant of $4.8$. The return loss and the radiation pattern’s measurements agree well with the simulation results. Further, the methodology presented can be very effective in terms of size reduction; the designed antennas are $24 \times 24 \times 0.8\;{\textrm{m}}{{\textrm{m}}^3}$ ($460\;{\textrm{m}}{{\textrm{m}}^3}$).

1 - Mathematics, Models and Architectures
from Part I - Computing
Edited by Liao Heng, Bill McColl
Book:

Mathematics for Future Computing and Communications

Published online:

03 December 2021

Print publication:

16 December 2021, pp 6-53
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Anytime Monte Carlo
Lawrence M. Murray, Sumeetpal S. Singh, Anthony Lee
Journal:

Data-Centric Engineering / Volume 2 / 2021

Published online by Cambridge University Press:

29 June 2021, e7
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Monte Carlo algorithms simulates some prescribed number of samples, taking some random real time to complete the computations necessary. This work considers the converse: to impose a real-time budget on the computation, which results in the number of samples simulated being random. To complicate matters, the real time taken for each simulation may depend on the sample produced, so that the samples themselves are not independent of their number, and a length bias with respect to compute time is apparent. This is especially problematic when a Markov chain Monte Carlo (MCMC) algorithm is used and the final state of the Markov chain—rather than an average over all states—is required, which is the case in parallel tempering implementations of MCMC. The length bias does not diminish with the compute budget in this case. It also occurs in sequential Monte Carlo (SMC) algorithms, which is the focus of this paper. We propose an anytime framework to address the concern, using a continuous-time Markov jump process to study the progress of the computation in real time. We first show that for any MCMC algorithm, the length bias of the final state’s distribution due to the imposed real-time computing budget can be eliminated by using a multiple chain construction. The utility of this construction is then demonstrated on a large-scale SMC$ {}^2 $ implementation, using four billion particles distributed across a cluster of 128 graphics processing units on the Amazon EC2 service. The anytime framework imposes a real-time budget on the MCMC move steps within the SMC$ {}^2 $ algorithm, ensuring that all processors are simultaneously ready for the resampling step, demonstrably reducing idleness to due waiting times and providing substantial control over the total compute budget.

9 - Finite-Difference Methods for Initial-Value Problems
from Part II - Numerical Methods
Kevin W. Cassel, Illinois Institute of Technology
Book:

Matrix, Numerical, and Optimization Methods in Science and Engineering

Published online:

18 February 2021

Print publication:

04 March 2021, pp 466-526
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The application of finite-difference methods to initial-value problems, with emphasis on parabolic equations, is considered using the one- and two-dimensional unsteady diffusion equations as model problems.Single-step methods are introducted for ordinary differential equations, and more general explicit and implicit methods are articulated for partial differential equations.Numerical stability analysis is covered using the matrix method, von Neumann method, and the modified wavenumber method.These numerical methods are also applied to nonlinear convection problems.A brief introduction to numerical methods for hyperbolic equations is provided, and parallel computing is discussed.

Bayesian dynamic modeling and monitoring of network flows
Xi Chen, David Banks, Mike West
Journal:

Network Science / Volume 7 / Issue 3 / September 2019

Published online by Cambridge University Press:

23 September 2019, pp. 292-318
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the context of a motivating study of dynamic network flow data on a large-scale e-commerce website, we develop Bayesian models for online/sequential analysis for monitoring and adapting to changes reflected in node–node traffic. For large-scale networks, we customize core Bayesian time series analysis methods using dynamic generalized linear models (DGLMs). These are integrated into the context of multivariate networks using the concept of decouple/recouple that was recently introduced in multivariate time series. This method enables flexible dynamic modeling of flows on large-scale networks and exploitation of partial parallelization of analysis while maintaining coherence with an over-arching multivariate dynamic flow model. This approach is anchored in a case study on Internet data, with flows of visitors to a commercial news website defining a long time series of node–node counts on over 56,000 node pairs. Central questions include characterizing inherent stochasticity in traffic patterns, understanding node–node interactions, adapting to dynamic changes in flows and allowing for sensitive monitoring to flag anomalies. The methodology of dynamic network DGLMs applies to many dynamic network flow studies.

Development of a computationally efficient voice conversion system on mobile phones
Part of
- Affect, Emotion and Behavior Processing in Human-Machine Interaction
Shuhua Gao, Xiaoling Wu, Cheng Xiang, Dongyan Huang
Journal:

APSIPA Transactions on Signal and Information Processing / Volume 8 / 2019

Published online by Cambridge University Press:

04 January 2019, e4

Print publication:

2019
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Voice conversion aims to change a source speaker's voice to make it sound like the one of a target speaker while preserving linguistic information. Despite the rapid advance of voice conversion algorithms in the last decade, most of them are still too complicated to be accessible to the public. With the popularity of mobile devices especially smart phones, mobile voice conversion applications are highly desirable such that everyone can enjoy the pleasure of high-quality voice mimicry and people with speech disorders can also potentially benefit from it. Due to the limited computing resources on mobile phones, the major concern is the time efficiency of such a mobile application to guarantee positive user experience. In this paper, we detail the development of a mobile voice conversion system based on the Gaussian mixture model (GMM) and the weighted frequency warping methods. We attempt to boost the computational efficiency by making the best of hardware characteristics of today's mobile phones, such as parallel computing on multiple cores and the advanced vectorization support. Experimental evaluation results indicate that our system can achieve acceptable voice conversion performance while the conversion time for a five-second sentence only takes slightly more than one second on iPhone 7.

A review of literature on parallel constraint solving
IAN P. GENT, IAN MIGUEL, PETER NIGHTINGALE, CIARAN MCCREESH, PATRICK PROSSER, NEIL C. A. MOORE, CHRIS UNSWORTH
Journal:

Theory and Practice of Logic Programming / Volume 18 / Issue 5-6 / September 2018

Published online by Cambridge University Press:

02 August 2018, pp. 725-758
- Article
- - You have access
- PDF
- Export citation
As multi-core computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multi-core computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation.

JASMIN-based Two-dimensional Adaptive Combined Preconditioner for Radiation Diffusion Equations in Inertial Fusion Research
Part of
Xiaoqiang Yue, Xiaowen Xu, Shi Shu
Journal:

East Asian Journal on Applied Mathematics / Volume 7 / Issue 3 / August 2017

Published online by Cambridge University Press:

07 September 2017, pp. 495-507

Print publication:

August 2017
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present a JASMIN-based two-dimensional parallel implementation of an adaptive combined preconditioner for the solution of linear problems arising in the finite volume discretisation of one-group and multi-group radiation diffusion equations. We first propose the attribute of patch-correlation for cells of a two-dimensional monolayer piecewise rectangular structured grid without any suspensions based on the patch hierarchy of JASMIN, classify and reorder these cells via their attributes, and derive the conversion of cell-permutations. Using two cell-permutations, we then construct some parallel incomplete LU factorisation and substitution algorithms, to provide our parallel -GMRES solver with the help of the default BoomerAMG in the HYPRE library. Numerical results demonstrate that our proposed parallel incomplete LU preconditioner (ILU) is of higher efficiency than the counterpart in the Euclid library, and that the proposed parallel -GMRES solver is more robust and more efficient than the default BoomerAMG-GMRES solver.

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
Part of
A. Dziekonski, M. Rewienski, P. Sypek, A. Lamecki, M. Mrozowski
Journal:

Communications in Computational Physics / Volume 22 / Issue 4 / October 2017

Published online by Cambridge University Press:

28 July 2017, pp. 997-1014

Print publication:

October 2017
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higher-order FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from the Intel MKL on the Intel Xeon (E5-2680 v3, 12 threads) central processing unit (CPU) executed in parallel mode. Compared to the CPU reference implementation based on the Intel MKL functions, the proposed GPU-based LOBPCG method with inexact nullspace filtering allowed us to achieve up to 2.9-fold acceleration.

Towards Textbook Efficiency for Parallel Multigrid
Björn Gmeiner, Ulrich Rüde, Holger Stengel, Christian Waluga, Barbara Wohlmuth
Journal:

Numerical Mathematics: Theory, Methods and Applications / Volume 8 / Issue 1 / February 2015

Published online by Cambridge University Press:

03 March 2015, pp. 22-46

Print publication:

February 2015
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this work, we extend Achi Brandt's notion of textbook multigrid efficiency (TME) to massively parallel algorithms. Using a finite element based geometric multigrid implementation, we recall the classical view on TME with experiments for scalar linear equations with constant and varying coefficients as well as linear systems with saddle-point structure. To extend the idea of TME to the parallel setting, we give a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account performance modeling techniques. We illustrate our newly introduced parallel TME measure by large-scale computations, solving problems with up to 200 billion unknowns on a TOP-10 supercomputer.

A Nominally Second-Order Cell-Centered Finite Volume Scheme for Simulating Three-Dimensional Anisotropic Diffusion Equations on Unstructured Grids
Pascal Jacq, Pierre-Henri Maire, Rémi Abgrall
Journal:

Communications in Computational Physics / Volume 16 / Issue 4 / October 2014

Published online by Cambridge University Press:

03 June 2015, pp. 841-891

Print publication:

October 2014
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present a finite volume based cell-centered method for solving diffusion equations on three-dimensional unstructured grids with general tensor conduction. Our main motivation concerns the numerical simulation of the coupling between fluid flows and heat transfers. The corresponding numerical scheme is characterized by cell-centered unknowns and a local stencil. Namely, the scheme results in a global sparse diffusion matrix, which couples only the cell-centered unknowns. The space discretization relies on the partition of polyhedral cells into sub-cells and on the partition of cell faces into sub-faces. It is characterized by the introduction of sub-face normal fluxes and sub-face temperatures, which are auxiliary unknowns. A sub-cellbased variational formulation of the constitutive Fourier law allows to construct an explicit approximation of the sub-face normal heat fluxes in terms of the cell-centered temperature and the adjacent sub-face temperatures. The elimination of the sub-face temperatures with respect to the cell-centered temperatures is achieved locally at each node by solving a small and sparse linear system. This systemis obtained by enforcing the continuity condition of the normal heat flux across each sub-cell interface impinging at the node under consideration. The parallel implementation of the numerical algorithm and its efficiency are described and analyzed. The accuracy and the robustness of the proposed finite volumemethod are assessed bymeans of various numerical test cases.

Effect of CPU Cache Size on OpenMP Computing Performance in Fluid-Film Lubrication Analysis
N. Wang, K.-C. Cha, H.-C. Huang, C.-R. Hsu
Journal:

Journal of Mechanics / Volume 31 / Issue 2 / April 2015

Published online by Cambridge University Press:

12 August 2014, pp. 123-129

Print publication:

April 2015
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The growth of the size of cache and the number of processor cores in modern CPUs is the major factor in advancing the computing performance of modern machines. The effect of CPU cache size in multicore computers on performance, however, has attracted little attention in lubrication and engineering analyses. In this study, the effect of cache size on the computational performance of two parallel iterative methods in solving two Reynolds equations is examined. Four computers, with CPU cache size from 4 to 40 MB and the number of processor cores from 4 to 16, were used. The sizes of the numerical grid were selected to simulate large gridwork (256 × 256) to small gridwork (2048 × 2048) tasks. It is found that the size of CPU cache is a major factor influencing the parallel efficiency in using the RBSOR method. On the other hand, the SPSOR method obtains much higher parallel efficiency than the RBSOR for medium-grained tasks, regardless of the size of CPU cache. The use of the SPSOR can, therefore, provide a much better parallel computing performance than the RBSOR in the cases of having a large number of grids or in a system with limited CPU cache.

Simulation of disc-bulge-halo galaxies using parallel GPU based codes
O. Veles, P. Berczik, A. Just
Journal:

Proceedings of the International Astronomical Union / Volume 10 / Issue S312 / August 2014

Published online by Cambridge University Press:

07 March 2016, pp. 79-81

Print publication:

August 2014
- Article
- - You have access
- PDF
- Export citation
We compare the performance of the very popular Tree-GPU code BONSAI with the older Particle-(Multi)Mesh code SUPERBOX. Both code we run on a same hardware using the GPU acceleration for the force calculation. SUPERBOX is a particle-mesh code with high resolution sub-grid and a higher order NGP (nearest grid point) force-calculation scheme. In our research, we are aiming to demonstrate that the new parallel version of SUPERBOX is capable to do the high resolution simulations of the interaction of the system of disc-bulge-halo composed galaxy. We describe the improvement of performance and scalability of SUPERBOX particularly for the Kepler cluster (NVIDIA K20 GPU). A comparison was made with the very popular and publicly available Tree-GPU code BONSAI†.

A Scalable Numerical Method for Simulating Flows Around High-Speed Train Under Crosswind Conditions
Zhengzheng Yan, Rongliang Chen, Yubo Zhao, Xiao-Chuan Cai
Journal:

Communications in Computational Physics / Volume 15 / Issue 4 / April 2014

Published online by Cambridge University Press:

03 June 2015, pp. 944-958

Print publication:

April 2014
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a parallel Newton-Krylov-Schwarz method for the numerical simulation of unsteady flows at high Reynolds number around a high-speed train under crosswind. With a realistic train geometry, a realistic Reynolds number, and a realistic wind speed, this is a very challenging computational problem. Because of the limited parallel scalability, commercial CFD software is not suitable for supercomputers with a large number of processors. We develop a Newton-Krylov-Schwarz based fully implicit method, and the corresponding parallel software, for the 3D unsteady incompressible Navier-Stokes equations discretized with a stabilized finite element method on very fine unstructured meshes. We test the algorithm and software for flows passing a train modeled after China’s high-speed train CRH380B, and we also compare our results with results obtained from commercial CFD software. Our algorithm shows very good parallel scalability on a supercomputer with over one thousand processors.

A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures
Cristóbal A. Navarro, Nancy Hitschfeld-Kahler, Luis Mateu
Journal:

Communications in Computational Physics / Volume 15 / Issue 2 / February 2014

Published online by Cambridge University Press:

03 June 2015, pp. 285-329

Print publication:

February 2014
- Article
- - You have access
- PDF
- Export citation
Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions. The evolution of computer architectures (multi-core and many-core) towards a higher number of cores can only confirm that parallelism is the method of choice for speeding up an algorithm. In the last decade, the graphics processing unit, or GPU, has gained an important place in the field of high performance computing (HPC) because of its low cost and massive parallel processing power. Super-computing has become, for the first time, available to anyone at the price of a desktop computer. In this paper, we survey the concept of parallel computing and especially GPU computing. Achieving efficient parallel algorithms for the GPU is not a trivial task, there are several technical restrictions that must be satisfied in order to achieve the expected performance. Some of these limitations are consequences of the underlying architecture of the GPU and the theoretical models behind it. Our goal is to present a set of theoretical and technical concepts that are often required to understand the GPU and its massive parallelism model. In particular, we show how this new technology can help the field of computational physics, especially when the problem is data-parallel. We present four examples of computational physics problems; n-body, collision detection, Potts model and cellular automata simulations. These examples well represent the kind of problems that are suitable for GPU computing. By understanding the GPU architecture and its massive parallelism programming model, one can overcome many of the technical limitations found along the way, design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations.

A comparison of coupled and uncoupled solversfor the cardiac Bidomain model ∗∗∗
P. Colli Franzone, L. F. Pavarino, S. Scacchi
Journal:

ESAIM: Mathematical Modelling and Numerical Analysis / Volume 47 / Issue 4 / July 2013

Published online by Cambridge University Press:

07 June 2013, pp. 1017-1035

Print publication:

July 2013
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The aim of this work is to compare a new uncoupled solver for the cardiac Bidomain model with a usual coupled solver. The Bidomain model describes the bioelectric activity of the cardiac tissue and consists of a system of a non-linear parabolic reaction-diffusion partial differential equation (PDE) and an elliptic linear PDE. This system models at macroscopic level the evolution of the transmembrane and extracellular electric potentials of the anisotropic cardiac tissue. The evolution equation is coupled through the non-linear reaction term with a stiff system of ordinary differential equations (ODEs), the so-called membrane model, describing the ionic currents through the cellular membrane. A novel uncoupled solver for the Bidomain system is here introduced, based on solving twice the parabolic PDE and once the elliptic PDE at each time step, and it is compared with a usual coupled solver. Three-dimensional numerical tests have been performed in order to show that the proposed uncoupled method has the same accuracy of the coupled strategy. Parallel numerical tests on structured meshes have also shown that the uncoupled technique is as scalable as the coupled one. Moreover, the conjugate gradient method preconditioned by Multilevel Hybrid Schwarz preconditioners converges faster for the linear systems deriving from the uncoupled method than from the coupled one. Finally, in all parallel numerical tests considered, the uncoupled technique proposed is always about two or three times faster than the coupled approach.

Numerical assessment of 3D macrodispersion in heterogeneous porous media
A. Beaudoin, J. R. De Dreuzy
Journal:

Mechanics & Industry / Volume 14 / Issue 6 / 2013

Published online by Cambridge University Press:

21 January 2014, pp. 461-464

Print publication:

2013
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Hydrodynamic dispersion is a key controlling factor of solute transport in heterogeneousporous media. It critically depends on dimensionality. The asymptotic macrodispersion,transverse to the mean velocity direction, vanishes only in 2D and not in 3D. Using theclassical Gaussian correlated permeability fields with a lognormal distribution ofvariance σY2, thelongitudinal and transverse dispersivities are determined numerically as a function ofheterogeneity and dimensionality. We show that the transverse macrodispersion steeplyincreases with σY2 underlyingthe essential role of flow lines braiding, a mechanism specific to 3D systems. Thetransverse macrodispersion remains however at least two orders of magnitude smaller thanthe longitudinal macrodispersion, which increases even more steeply withσY2. At moderateto high levels of heterogeneity, the transverse macrodispersion also converges much fasterto its asymptotic regime than do the longitudinal macrodispersion. Braiding cannot be thustaken as the sole mechanism responsible for the high longitudinal macrodispersions. Itcould be either supplemented or superseded by stronger velocity correlations in 3D than in2D. This assumption is supported by the much larger longitudinal macrodispersions obtainedin 3D than in 2D, up to a factor of 7 for σY2 = 7.56.

Parallel Iterative Solution Schemes for the Analysis of Air Foil Bearings
N. Wang, S.-H. Chang
Journal:

Journal of Mechanics / Volume 28 / Issue 3 / September 2012

Published online by Cambridge University Press:

09 August 2012, pp. 413-422

Print publication:

September 2012
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In an air foil bearing analysis the model is usually solved iteratively due in part to the nonlinearity of the modeling Reynolds equation and the compliance of the bearing surface. The solution procedure requires a multiple-level-deep nested iteration, which involves extended solution time and convergence difficulty. In this study, a simple air foil bearing model is used and the compressible-fluid Reynolds equation for modeling gas lubrication is linearized by Newton's method. The discretized equation is solved by one of the two parallel iterative methods, red-black or strip partition successive-over-relaxation (SOR) method. The parallel programming is conducted using OpenMP programming in an eight-core work-station. Then, a numerical damping scheme for the film-profile convergence is presented. Finally, a root-finding process is conducted to iteratively attain the eccentricity of the bearing for a given load. It is found that the numerical damping step is crucial, which allows the use of a larger relaxation factor to have a fast rate of convergence. Both the parallel SOR methods are easy to implement and the red-black SOR method exhibits better efficiency in the studied cases. This study presents a parallel computing scheme for analyzing air foil bearing of bump-type by today's shared-memory multicore platforms.

A Parallel Domain Decomposition Algorithm for Simulating Blood Flow with Incompressible Navier-Stokes Equations with Resistive Boundary Condition
Yuqi Wu, Xiao-Chuan Cai
Journal:

Communications in Computational Physics / Volume 11 / Issue 4 / April 2012

Published online by Cambridge University Press:

20 August 2015, pp. 1279-1299

Print publication:

April 2012
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We introduce and study a parallel domain decomposition algorithm for the simulation of blood flow in compliant arteries using a fully-coupled system of nonlinear partial differential equations consisting of a linear elasticity equation and the incompressible Navier-Stokes equations with a resistive outflow boundary condition. The system is discretized with a finite element method on unstructured moving meshes and solved by a Newton-Krylov algorithm preconditioned with an overlapping restricted additive Schwarz method. The resistive outflow boundary condition plays an interesting role in the accuracy of the blood flow simulation and we provide a numerical comparison of its accuracy with the standard pressure type boundary condition. We also discuss the parallel performance of the implicit domain decomposition method for solving the fully coupled nonlinear system on a supercomputer with a few hundred processors.

Greedy algorithms for optimal computing of matrix chain products involving square denseand triangular matrices
Faouzi Ben Charrada, Sana Ezouaoui, Zaher Mahjoub
Journal:

RAIRO - Operations Research / Volume 45 / Issue 1 / January 2011

Published online by Cambridge University Press:

14 March 2011, pp. 1-16

Print publication:

January 2011
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper addresses a combinatorial optimization problem (COP), namely a variant of the (standard) matrix chain product (MCP) problem where the matrices are square and either dense (i.e. full) or lower/upper triangular. Given a matrix chain of length n, we first present a dynamic programming algorithm (DPA) adapted from the well known standard algorithm and having the same O(n 3) complexity. We then design and analyse two optimal O(n) greedy algorithms leading in general to different optimal solutions i.e. chain parenthesizations. Afterwards, we establish a comparison between these two algorithms based on the parallel computing of the matrix chain product through intra and inter-subchains coarse grain parallelism. Finally, an experimental study illustrates the theoretical parallel performances of the designed algorithms.

Search Results

Refine search

Refine search

Actions for selected content:

26 results

New method based on genetic algorithm and Minkowski fractal for multiband antenna designs

1 - Mathematics, Models and Architectures

Anytime Monte Carlo

9 - Finite-Difference Methods for Initial-Value Problems

Summary

Bayesian dynamic modeling and monitoring of network flows

Development of a computationally efficient voice conversion system on mobile phones

A review of literature on parallel constraint solving

JASMIN-based Two-dimensional Adaptive Combined Preconditioner for Radiation Diffusion Equations in Inertial Fusion Research

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

Towards Textbook Efficiency for Parallel Multigrid

A Nominally Second-Order Cell-Centered Finite Volume Scheme for Simulating Three-Dimensional Anisotropic Diffusion Equations on Unstructured Grids

Effect of CPU Cache Size on OpenMP Computing Performance in Fluid-Film Lubrication Analysis

Simulation of disc-bulge-halo galaxies using parallel GPU based codes

A Scalable Numerical Method for Simulating Flows Around High-Speed Train Under Crosswind Conditions

A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures

A comparison of coupled and uncoupled solversfor the cardiac Bidomain model ∗∗∗

Numerical assessment of 3D macrodispersion in heterogeneous porous media

Parallel Iterative Solution Schemes for the Analysis of Air Foil Bearings

A Parallel Domain Decomposition Algorithm for Simulating Blood Flow with Incompressible Navier-Stokes Equations with Resistive Boundary Condition

Greedy algorithms for optimal computing of matrix chain products involving square denseand triangular matrices

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

26 results

Summary