Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-26T07:26:29.911Z Has data issue: false hasContentIssue false

A run-time algorithm for managing the granularity of parallel functional programs1

Published online by Cambridge University Press:  07 November 2008

Gad Aharoni
Affiliation:
Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem 91904, Israel (e-mail: gadi@cs.huji.ac.il)
Dror G. Feitelson
Affiliation:
Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem 91904, Israel (e-mail: gadi@cs.huji.ac.il)
Amnon Barak
Affiliation:
Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem 91904, Israel (e-mail: gadi@cs.huji.ac.il)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We present an on-line (run-time) algorithm that manages the granularity of parallel functional programs. The algorithm exploits useful parallelism when it exists, and ignores ineffective parallelism in programs that produce many small tasks. The idea is to balance the amount of local work with the cost of distributing the work. This is achieved by ensuring that for every parallel task spawned, an amount of work that equals the cost of the spawn is performed locally. We analyse several cases and compare the algorithm to the optimal execution. In most cases the algorithm competes well with the optimal algorithm, even though the optimal algorithm has information about the future evolution of the computation that is not available to the on-line algorithm. This is quite remarkable considering we have chosen extreme cases that have contradicting optimal executions. Moreover, we show that no other on-line algorithm can be consistently better than it. We also present experimental results that demonstrate the effectiveness of the algorithm.

Type
Articles
Copyright
Copyright © Cambridge University Press 1992

References

Arvind, and Nikhil, R. S. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39 (3): 300318.CrossRefGoogle Scholar
Barak, A. and Shiloh, A. 1985. A distributed load-balancing policy for a multicomputer. IEEE Trans. Softw. – Practice and Experience 15 (9): 901913.CrossRefGoogle Scholar
Barak, A. and Wheeler, R. 1989. MOSIX: an integrated multiprocessor. Proc. Winter USENIX Conf., San Diego, CA, pp. 101112.Google Scholar
Clark, K. L. 1990. Parallel logic programming. The Computer Journal 33 (6): 482500.CrossRefGoogle Scholar
Debray, S. K., Lin, N.-W. and Hermenegildo, M. 1990. Task granularity analysis in logic programs. Programming Languages Design and Implementation, ACM SIGPLAN, White Plains, New York, pp. 174188.Google Scholar
Eager, D. L., Lazowska, E. D. and Zahorjan, J. 1986. Adaptive load sharing in homogeneous distributed systems. IEEE Trans. Softw. Eng. 12 (5): 662675.CrossRefGoogle Scholar
Hudak, P. and Goldberg, B. 1984. Experiments in diffused combinator reduction. ACM Symp. on Lisp and Functional Programming, Austin, TX, pp. 167176.Google Scholar
Hudak, P. and Goldberg, B. 1985. Serial combinators: ‘optimal’ grains of parallelism. Functional Programming Languages and Computer Architecture. Volume 201 of Lecture Notes in Computer Science. Springer-Verlag, pp. 382399.CrossRefGoogle Scholar
Kirkham, C. 1990. The Manchester dataflow project. In Fountain, T. J. and Shute, M. J., editors, Multiprocessor Computer Architectures. North-Holland, pp. 141153.Google Scholar
Lin, F. C. H. and Keller, R. M. 1987. The gradient model load balancing method. IEEE Trans. Softw. Eng. 13(1): 3238.CrossRefGoogle Scholar
Mohr, E., Kranz, D. A. and Halstead, R. H. 1991. Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel & Distributed Syst. 2(3): 264280.CrossRefGoogle Scholar
Peyton Jones, S. L. 1987. The Implementation of Functional Programming Languages. Prentice-Hall.Google Scholar
Peyton Jones, S. L., Clack, C., Salkild, J. and Hardie, M. 1990. GRIP - a high-performance architecture for parallel graph reduction. In Fountain, T. J. and Shute, M. J., editors, Multiprocessor Computer Architectures. North-Holland, pp. 101119.Google Scholar
Rao, V. N. and Kumar, V. 1987. Parallel depth first search. Int. J. Parallel Programming, 16 (6): 479519.CrossRefGoogle Scholar
Sleator, D. and Tarjan, R. 1985. Amortized efficiency of list update and paging rules. Commun. ACM, 28(2): 202208.CrossRefGoogle Scholar
Wu, I.-C. and Kung, H. T. 1991. Communication complexity for parallel divide-and-conquer. 32nd Symp. Foundations of Computer Science, San Juan, Puerto Rico, pp. 151162.Google Scholar
Submit a response

Discussions

No Discussions have been published for this article.