Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-15T13:46:25.607Z Has data issue: false hasContentIssue false

Functional and dynamic programming in the design of parallel prefix networks

Published online by Cambridge University Press:  06 December 2010

MARY SHEERAN*
Affiliation:
CSE Department, Chalmers University of Technology, Göteborg, SE-41296, Sweden (e-mail: ms@chalmers.se)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A parallel prefix network of width n takes n inputs, a1, a2, . . ., an, and computes each yi = a1a2 ○ ⋅ ⋅ ⋅ ○ ai for 1 ≤ in, for an associative operator ○. This is one of the fundamental problems in computer science, because it gives insight into how parallel computation can be used to solve an apparently sequential problem. As parallel programming becomes the dominant programming paradigm, parallel prefix or scan is proving to be a very important building block of parallel algorithms and applications. There are many different parallel prefix networks, with different properties such as number of operators, depth and allowed fanout from the operators. In this paper, ideas from functional programming are combined with search to enable a deep exploration of parallel prefix network design. Networks that improve on the best known previous results are generated. It is argued that precise modelling in a functional programming language, together with simple visualization of the networks, gives a new, more experimental, approach to parallel prefix network design, improving on the manual techniques typically employed in the literature. The programming idiom that marries search with higher order functions may well have wider application than the network generation described here.

Type
Articles
Copyright
Copyright © Cambridge University Press 2010

References

Antoy, S. & Hanus, M. (2010) Functional logic programming, Commun. ACM, 53 (4), 7485.CrossRefGoogle Scholar
Axelsson, E., Björk, M. & Sheeran, M. (2005) Teaching hardware description and verification. In International Conference on Microelectronic Systems Education, MSE. IEEE, pp. 119120.Google Scholar
Axelsson, E. (2008) Functional Programming Enabling Flexible Hardware Design at Low Levels of Abstraction. Ph.D. thesis, Chalmers University of Technology.Google Scholar
Axelsson, E., Dévai, G., Horváth, Z., Keijzer, K., Lyckegård, B., Persson, A., Sheeran, M., Svenningsson, J. & Vajda, A. (2010) Feldspar: A domain specific language for digital signal processing algorithms. In Proceedings of the Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign, MemoCode. IEEE Computer Society, pp. 169178.Google Scholar
Bjesse, P., Claessen, K., Sheeran, M. & Singh, S. (1998) Lava: Hardware design in Haskell. In International Conference on Functional Programming, ICFP. ACM, pp. 174184.Google Scholar
Blelloch, G. E. (1990) Prefix Sums and Their Applications. Tech. rept. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University. Also appears in Synthesis of Parallel Algorithms, Reif (ed.), Morgan Kaufmann, 1993.Google Scholar
Brent, R. P. & Kung, H. T. (1982) A regular layout for parallel adders, IEEE Trans. Comput., C-31, 260–264.Google Scholar
Chan, P. K., Schlag, M. D. F., Thomborson, C. D. & Oklobdzija, V. J. (1992) Delay optimization of carry-skip adders and block carry-lookahead adders using multi-dimensional dynamic programming, IEEE Trans. Comput., 41 (8), 920930.CrossRefGoogle Scholar
Claessen, K., Sheeran, M. & Singh, S. (2001) The design and verification of a sorter core. In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2144. Springer, pp. 355369.CrossRefGoogle Scholar
Cormen, T. H., Leiserson, C. E, Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms. 2nd ed.Cambridge, MA: MIT Press.Google Scholar
Fich, F. E. (1982) Two Problems in Concrete complexity: Cycle Detection and Parallel Prefix Computation. Ph.D. thesis, University of California, Berkeley.Google Scholar
Fich, F. E. (1983) New bounds for parallel prefix circuits. In STOC '83: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing. ACM Press, pp. 100109.CrossRefGoogle Scholar
Franchetti, F., de Mesmay, F., McFarlin, D. & Püschel, M. (2009) Operator language: A program generation framework for fast Kernels. In Proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC). Lecture Notes in Computer Science, vol. 5658. Springer, pp. 385410.Google Scholar
Giegerich, R., Meyer, C. & Steffen, P. (2002) Towards a discipline of dynamic programming. In Informatik bewegt: Informatik 2002–32. Jahrestagung der Gesellschaft für Informatik e.v. (gi). Lecture Notes in Informatics. Bonner Köllen Verlag, pp. 344.Google Scholar
Gill, A., Bull, T., Kimmell, G., Perrins, E., Komp, E. & Werling, B. (2010) Introducing Kansas Lava. In Proceedings of the 21st Symposium on Implementation and Application of Functional Languages, IFL'09. Lecture Notes in Computer Science, vol. 6041. Springer, pp. 1835.CrossRefGoogle Scholar
Han, T. & Carlson, D. (1987) Fast area-efficient VLSI adders. In Proceedings of International Symposium on Computer Arithmetic. IEEE, pp. 4956.Google Scholar
haskell.org. (2009) The web page gathers information about Haskell, compilers, tutorial materials, packages and much more.Google Scholar
Hinze, R. (2000) Memo functions, polytypically! In Proceedings of the Second Workshop on Generic Programming, WGP 2000, Jeuring, J. (ed), pp. 17–32.Google Scholar
Hinze, R. (2004) An Algebra of scans. In Mathematics of Program Construction. Lecture Notes in Computer Science, vol. 3125. Springer, pp. 186210.CrossRefGoogle Scholar
Jones, G. & Sheeran, M. (1990) Circuit design in Ruby. In Formal Methods for VLSI Design, Staunstrup, J. (ed). North-Holland, pp. 1370.Google Scholar
Knowles, S. (1999) A family of adders. In Proceedings of International. Symposium on Computer Arithmetic. IEEE Press, pp. 277284.Google Scholar
Kogge, P. M. & Stone, H. S. (1973) A parallel Algorithm for the efficient solution of a general class of recurrence equations, IEEE Trans. Comput., C-22 (8), 786793.CrossRefGoogle Scholar
Ladner, R. E. & Fischer, M. J. (1980) Parallel prefix computation, J. ACM, 27 (4), 831838.CrossRefGoogle Scholar
Lakshmivarahan, S., Dhall, S. K. & Yang, C.-M. (1987) On a new class of optimal parallel prefix circuits with (Size+Depth) = 2n−2 and ⌈logn⌉ ≤ depth ≤ (2⌈logn⌉ −3). In Proceedings of International Conference on Parallel Processing. Pennsylvania State University Press, pp. 5865.Google Scholar
Lin, Y.-C. & Hung, L.-L. (2009) Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2, ACM Trans. Des. Autom. Electron. Syst., 14 (1), 15:115:13.CrossRefGoogle Scholar
Lin, Y.-C., Hsu, Y.-H, & Liu, C.-K. (2003) Constructing H4, a fast depth-size optimal parallel prefix circuit, J. Supercomput., 24 (3), 279304.CrossRefGoogle Scholar
Lin, Y.-C. & Liu, C.-K. (1999) Finding optimal parallel prefix circuits with fan-out 2 in constant time, Inf. Process. Lett., 70 (4), 191195.CrossRefGoogle Scholar
Lin, Y.-C. & Su, C.-Y. (2005) Faster optimal parallel prefix circuits: New algorithmic construction, J. Parallel Distrib. Comput., 65 (12), 15851595.CrossRefGoogle Scholar
Liu, J., Zhu, Y., Zhu, H., Cheng, C.-K. & Lillis, J. (2007) Optimum prefix adders in a comprehensive area, timing and power design space. In ASP-DAC'07: Proceedings of the 2007 Asia and South Pacific Design Automation Conference. Washington, DC, USA: IEEE Computer Society, pp. 609615.Google Scholar
Martel, C., Oklobdzija, V. G., Ravi, R. & Stelling, P. (1995) Design strategies for optimal multiplier circuits. In Proceedings 12th IEEE Symposium on Computer Arithmetic. IEEE, pp. 4249.CrossRefGoogle Scholar
Naylor, M. (2008) Hardware-Assisted and Target-Directed Evaluation of Functional Programs. Ph.D. thesis, University of York.Google Scholar
Naylor, M., Axelsson, E. & Runciman, C. (2007) A functional-logic library for wired. In Proceedings of the ACM SIGPLAN Haskell Workshop, pp. 37–48.CrossRefGoogle Scholar
Pippenger, N. (1987) The complexity of computations by networks, IBM J. Res. Dev. 31 (2), 235243.CrossRefGoogle Scholar
Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W. & Rizzolo, N. (2005) SPIRAL: Code generation for DSP transforms. Proceedings of IEEE, Special Issue on Program Generation, Optimization and Adaptation, 93 (2), 232275.Google Scholar
Sheeran, M. (2003) Finding regularity: Describing and analysing circuits that are almost regular, In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2860. Springer, pp. 418.CrossRefGoogle Scholar
Sheeran, M. (2004) Generating fast multipliers using clever circuits. In Formal Methods in Computer-Aided Design, FMCAD. Lecture Notes in Computer Science, vol. 3312. Springer, pp. 620.CrossRefGoogle Scholar
Sheeran, M. & Parberry, I. (2006) A New Approach to the Design of Optimal Parallel Prefix Circuits. Tech. rept. 2006:1. Chalmers: Department of Computer Science and Engineering.Google Scholar
Singh, S. (1992) Circuit analysis by non-standard interpretation. In Designing Correct Circuits. IFIP Transactions, vol. A-5. North-Holland, pp. 119138.Google Scholar
Singh, S. (2000) Death of the RLOC? In FPGAs for Custom Computing Machines (FCCM). IEEE Computer Society Press, pp. 145152.Google Scholar
Sklansky, J. (1960) Conditional-sum addition logic, IRE Trans. Electron. Comput., EC-9, 226231.CrossRefGoogle Scholar
Snir, M. (1986) Depth-size trade-offs for parallel prefix computation. J. Algebra, 7 (2), 185201.Google Scholar
Svensson, J., Sheeran, M. & Claessen, K. (2010) GPGPU Kernel Implementation and Refinement using Obsidian. In Proceedings of the Seventh International Workshop on Practical Aspects of High-level Parallel Programming, ICCS. Procedia, pp. 20592068.Google Scholar
Voigtländer, J. (2008) Much ado about two: A pearl on parallel prefix computation. In Proceedings of the 35th Symposium on Principles of Programming Languages, Wadler, P. (ed), SIGPLAN Notices, vol. 43, no. 1. ACM Press, pp. 2935.Google Scholar
Vuillemin, J. (2006) Use of dynamic programming to find best topology for given technology for 64 bit adder, work done at Digital in 1992. (private communication).Google Scholar
Wadler, P. (1992) Monads for functional programming. In Proceedings of the Marktoberdorf Summer School on Program Design Calculi, vol. 118. Springer-Verlag, NATO ASI Series F: Computer and systems science.Google Scholar
Zhu, H., Cheng, C.-K. & Graham, R. (2006) On the construction of zero-deficiency parallel prefix circuits with minimum depth, ACM Trans. Des. Autom. Electron. Syst., 11 (2), 387409.CrossRefGoogle Scholar
Submit a response

Discussions

No Discussions have been published for this article.