Book contents
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
5 - Textures
Published online by Cambridge University Press: 04 May 2022
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
Summary
Chapter 5 continues the theme of digital image manipulation and considers transformations such as rotation or scaling with required pixel interpolation to create the most accurate final result. The GPU hardware texture units are used for this and their features are discussed. The cx utilities provided with our code include wrappers that significantly simplify the creation of CUDA textures. Curiously, these hardware texture units are rarely discussed in other CUDA tutorial material for scientific applications but we find they can give a 5-fold performance boost. We show how OpenCV can be used to provide a simple GUI interface for viewing the transformed images with very little coding effort. We end the chapter with a fully working 3D image registration program using affine transformations applied to volumetric MRI data sets. The 3D affine transformations are about 1500 times faster on the GPU than on the host CPU and a full registration between two MRI images of size 256 × 256 × 256 takes about one second.
- Type
- Chapter
- Information
- Programming in Parallel with CUDAA Practical Guide, pp. 142 - 177Publisher: Cambridge University PressPrint publication year: 2022