Search

Floating-point arithmetic
Part of
Sylvie Boldo, Claude-Pierre Jeannerod, Guillaume Melquiond, Jean-Michel Muller
Journal:

Acta Numerica / Volume 32 / May 2023

Published online by Cambridge University Press:

11 May 2023, pp. 203-290

Print publication:

May 2023
- Article
- - You have access
  - Open access
- PDF
- Export citation
Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more.

In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

Numerical Effects of the Gaussian Recursive Filters in Solving Linear Systems in the 3Dvar Case Study
Part of
Salvatore Cuomo, Ardelio Galletti, Giulio Giunta, Livia Marcellino
Journal:

Numerical Mathematics: Theory, Methods and Applications / Volume 10 / Issue 3 / August 2017

Published online by Cambridge University Press:

20 June 2017, pp. 520-540

Print publication:

August 2017
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In many applications, the Gaussian convolution is approximately computed by means of recursive filters, with a significant improvement of computational efficiency. We are interested in theoretical and numerical issues related to such an use of recursive filters in a three-dimensional variational data assimilation (3Dvar) scheme as it appears in the software OceanVar. In that context, the main numerical problem consists in solving large linear systems with high efficiency, so that an iterative solver, namely the conjugate gradient method, is equipped with a recursive filter in order to compute matrix-vector multiplications that in fact are Gaussian convolutions. Here we present an error analysis that gives effective bounds for the perturbation on the solution of such linear systems, when is computed by means of recursive filters. We first prove that such a solution can be seen as the exact solution of a perturbed linear system. Then we study the related perturbation on the solution and we demonstrate that it can be bounded in terms of the difference between the two linear operators associated to the Gaussian convolution and the recursive filter, respectively. Moreover, we show through numerical experiments that the error on the solution, which exhibits a kind of edge effect, i.e. most of the error is localized in the first and last few entries of the computed solution, is due to the structure of the difference of the two linear operators.

SPEEDUP Code for Calculation of Transition Amplitudes via the Effective Action Approach
Antun Balaž, Ivana Vidanović, Danica Stojiljković, Dušan Vudragović, Aleksandar Belić, Aleksandar Bogojević
Journal:

Communications in Computational Physics / Volume 11 / Issue 3 / March 2012

Published online by Cambridge University Press:

20 August 2015, pp. 739-755

Print publication:

March 2012
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present Path Integral Monte Carlo C code for calculation of quantum mechanical transition amplitudes for 1D models. The SPEEDUP C code is based on the use of higher-order short-time effective actions and implemented to the maximal order p=18 in the time of propagation (Monte Carlo time step), which substantially improves the convergence of discretized amplitudes to their exact continuum values. Symbolic derivation of higher-order effective actions is implemented in SPEEDUP Mathematica codes, using the recursive Schrödinger equation approach. In addition to the general 1D quantum theory, developed Mathematica codes are capable of calculating effective actions for specific models, for general 2D and 3D potentials, as well as for a general many-body theory in arbitrary number of spatial dimensions.

A Boundary Meshless Method for Solving Heat Transfer Problems Using the Fourier Transform
A. Tadeu, C. S. Chen, J. António, Nuno Simões
Journal:

Advances in Applied Mathematics and Mechanics / Volume 3 / Issue 5 / October 2011

Published online by Cambridge University Press:

03 June 2015, pp. 572-585

Print publication:

October 2011
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Fourier transform is applied to remove the time-dependent variable in the diffusion equation. Under non-harmonic initial conditions this gives rise to a non-homogeneous Helmholtz equation, which is solved by the method of fundamental solutions and the method of particular solutions. The particular solution of Helmholtz equation is available as shown in [4, 15]. The approximate solution in frequency domain is then inverted numerically using the inverse Fourier transform algorithm. Complex frequencies are used in order to avoid aliasing phenomena and to allow the computation of the static response. Two numerical examples are given to illustrate the effectiveness of the proposed approach for solving 2-D diffusion equations.

Search Results

Refine search

Refine search

Actions for selected content:

4 results

Floating-point arithmetic

Numerical Effects of the Gaussian Recursive Filters in Solving Linear Systems in the 3Dvar Case Study

SPEEDUP Code for Calculation of Transition Amplitudes via the Effective Action Approach

A Boundary Meshless Method for Solving Heat Transfer Problems Using the Fourier Transform

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

4 results

Floating-point arithmetic

Numerical Effects of the Gaussian Recursive Filters in Solving Linear Systems in the 3Dvar Case Study

SPEEDUP Code for Calculation of Transition Amplitudes via the Effective Action Approach

A Boundary Meshless Method for Solving Heat Transfer Problems Using the Fourier Transform