Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-26T04:21:52.310Z Has data issue: false hasContentIssue false

Push versus pull-based loop fusion in query engines

Published online by Cambridge University Press:  10 April 2018

AMIR SHAIKHHA
Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)
MOHAMMAD DASHTI
Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)
CHRISTOPH KOCH
Affiliation:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (e-mails: amir.shaikhha@epfl.ch, mohammad.dashti@epfl.ch, christoph.koch@epfl.ch)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each engine. Similarly, the programming languages community has developed loop fusion techniques to remove intermediate collections in the context of collection programming. We draw parallels between databases (DB) and programming language (PL) research by demonstrating the connection between pipelined query engines and loop fusion techniques. Based on this connection, we propose a new type of pull-based engine, inspired by a loop fusion technique, which combines the benefits of both approaches. Then, we experimentally evaluate the various engines, in the context of query compilation, for the first time in a fair environment, eliminating the biasing impact of ancillary optimizations that have traditionally only been used with one of the approaches. We show that for realistic analytical workloads, there is no considerable advantage for either form of pipelined query engine, as opposed to what recent research suggests. Also, by using micro-benchmarks, which demonstrate certain edge cases on which one approach or the other performs better, we show that our proposed engine dominates the existing engines by combining the benefits of both.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2018 

References

Abadi, D., Madden, S. & Ferreira, M. (2006) Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, pp. 671–682.Google Scholar
Abadi, D. J., Myers, D. S., DeWitt, D. J. & Madden, S. R. (2007) Materialization strategies in a column-oriented DBMS. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE 2007. IEEE, pp. 466–475.Google Scholar
Ahmad, Y. & Koch, C. (2009) DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB 2 (2), 15661569.Google Scholar
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A. & Zaharia, M. (2015) Spark SQL: Relational data processing in spark. In Proceedings of the SIGMOD '15. New York, NY, USA: ACM.Google Scholar
Biboudis, A., Palladinos, N., Fourtounis, G. & Smaragdakis, Y. (2015) Streams à la carte: Extensible pipelines with object algebras. In Proceedings of the 29th European Conference on Object-Oriented Programming, p. 591.Google Scholar
Binnig, C., Hildenbrand, S., & Färber, F. (2009) Dictionary-based order-preserving string compression for main memory column stores. In Proceedings of the SIGMOD '09. ACM, pp. 283–296.Google Scholar
Böhm, C. & Berarducci, A. (1985) Automatic synthesis of typed λ-programs on term algebras. Theor. Comput. Sci. 39, 135154.CrossRefGoogle Scholar
Breazu-Tannen, V. & Subrahmanyam, R. (1991) Logical and Computational Aspects of Programming with Sets/Bags/Lists. Springer.Google Scholar
Breazu-Tannen, V., Buneman, P. & Wong, L. (1992) Naturally Embedded Query Languages. Springer.CrossRefGoogle Scholar
Buchlovsky, P. & Thielecke, H. (2006) A type-theoretic reconstruction of the visitor pattern. Electron. Notes Theor. Comput. Sci. 155, 309329.Google Scholar
Chhugani, J., Nguyen, A. D., Lee, V. W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S. & Dubey, P. (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1 (2), 13131324.Google Scholar
Choi, J.-D., Gupta, M., Serrano, M., Sreedhar, V. C. & Midkiff, S. (1999) Escape analysis for java. ACM SIGPLAN Notices 34 (10), 119.Google Scholar
Coutts, D., Leshchinskiy, R. & Stewart, D. (2007) Stream fusion. From lists to streams to nothing at all. In Proceedings of the ICFP '07.Google Scholar
Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Çetintemel, U. & Zdonik, S. B. (2015) Tupleware: “Big” data, big analytics, small clusters. In Proceedings of the CIDR.Google Scholar
Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N. & Zwilling, M. (2013) Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13. New York, NY, USA: ACM, pp. 1243–1254.Google Scholar
Emir, B., Odersky, M. & Williams, J. (2007) Matching objects with patterns. In Proceedings of the ECOOP'07. Berlin, Heidelberg: Springer-Verlag.Google Scholar
Fegaras, L. & Maier, D. (2000) Optimizing object queries using an effective calculus. TODS 25 (4), 457516.Google Scholar
Gedik, B., Andrade, H., Wu, K.-L., Yu, P. & Doo, M. (2008) SPADE: The system S seclarative stream processing engine. In Proceedings of the SIGMOD.Google Scholar
Gibbons, J. & Oliveira, B. C. d S. (2009) The essence of the iterator pattern. J. Funct. Program. 19 (3–4), 377402.Google Scholar
Gill, A., Launchbury, J. & Peyton Jones, S. L. (1993) A short cut to deforestation. In Proceedings of the FPCA. ACM.Google Scholar
Graefe, G. (1994) Volcano–an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6 (1), 120135.Google Scholar
Graefe, G. (1993) Query evaluation techniques for large databases. CSUR 25 (2), 73169.CrossRefGoogle Scholar
Grust, T. & Scholl, M. (1999) How to comprehend queries functionally. J. Intell. Inform. Syst. 12 (2–3), 191218.CrossRefGoogle Scholar
Grust, T., Mayr, M., Rittinger, J. & Schreiber, T. (2009) FERRY: Database-supported program execution. In Proceedings of the SIGMOD 2009. ACM.Google Scholar
Grust, T., Rittinger, J. & Schreiber, T. (2010) Avalanche-safe LINQ compilation. PVLDB 3 (1–2), 162172.Google Scholar
Hellerstein, J. M., Stonebraker, M. & Hamilton, J. (2007) Architecture of a database system. Found. Trends® Databases 1 (2), 141259.CrossRefGoogle Scholar
Hinze, R., Harper, T. & James, D. W. H. (2011) Theory and practice of fusion. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages, IFL'10. Berlin, Heidelberg: Springer-Verlag, pp. 19–37.Google Scholar
Hirzel, M., Soulé, R., Schneider, S., Gedik, B. & Grimm, R. (2014) A catalog of stream processing optimizations. ACM Comput. Surv. 46 (4), 46:146:34.Google Scholar
Hofer, C. & Ostermann, K. (2010) Modular domain-specific language components in scala. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering, GPCE '10. New York, NY, USA: ACM, pp. 83–92.Google Scholar
Hudak, P. (1996) Building domain-specific embedded languages. ACM Comput. Surv. 28 (4es), 196.Google Scholar
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S. K., Kersten, M. L., (2012) MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35 (1), 4045.Google Scholar
Jones, S. L. P., Hall, C., Hammond, K., Partain, W. & Wadler, P. (1993) The glasgow Haskell compiler: A technical overview. In Proceedings of the UK Joint Framework for Information Technology, Technical Conference, vol. 93. Citeseer.Google Scholar
Jonnalagedda, M. & Stucki, S. (2015) Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, pp. 41–50.Google Scholar
Karpathiotakis, M., Alagiannis, I., Heinis, T, Branco, M. & Ailamaki, A. (2015) Just-in-time data virtualization: Lightweight data management with ViDa. In Proceedings of the CIDR.Google Scholar
Karpathiotakis, M., Alagiannis, I. & Ailamaki, A. (2016) Fast queries over heterogeneous data through engine customization. In Proceedings of the VLDB Endowment 9 (12), 972983.Google Scholar
Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014a) Building efficient query engines in a high-level language. PVLDB 7 (10), 853864.Google Scholar
Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014b) Errata for “Building efficient query engines in a high-level language” PVLDB 7(10):853-864. PVLDB 7 (13), 17841784.Google Scholar
Koch, C. (2010) Incremental query evaluation in a ring of databases. In Proceedings of the PODS 2010. ACM.CrossRefGoogle Scholar
Koch, C. (2014) Abstraction without regret in database systems building: A manifesto. IEEE Data Eng. Bull. 37 (1), 7079.Google Scholar
Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D. & Shaikhha, A. (2014) DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Vldbj 23 (2), 253278.Google Scholar
Krikellas, K., Viglas, S. & Cintra, M. (2010) Generating code for holistic query evaluation. In Proceedings of the ICDE, pp. 613–624.Google Scholar
Li, Z. & Ross, K. A. (1999) Fast joins using join indices. VLDB J. 8 (1), 124.CrossRefGoogle Scholar
Lorie, R. A. (1974) XRM: An Extended (N-ary) Relational Memory. IBM.Google Scholar
Mainland, G., Leshchinskiy, R. & Peyton Jones, S. (2013) Exploiting vector instructions with generalized stream fusion. In Proceedings of the ICFP '13. New York, NY, USA: ACM.Google Scholar
Meijer, E., Beckman, B. & Bierman, G. (2006) LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the SIGMOD '06. ACM.Google Scholar
Murray, D. G., Isard, M. & Yu, Y. (2011) Steno: Automatic optimization of declarative queries. In Proceedings of the PLDI '11. New York, NY, USA: ACM.Google Scholar
Nagel, F., Bierman, G. & Viglas, S. D. (2014) Code generation for efficient query processing in managed runtimes. PVLDB 7 (12), 10951106.Google Scholar
Neumann, T. (2011) Efficiently compiling efficient query plans for modern hardware. PVLDB 4 (9), 539550.Google Scholar
Padmanabhan, S., Malkemus, T., Jhingran, A. & Agarwal, R. (2001) Block oriented processing of relational database operations in modern computer architectures. In Proceedings of the ICDE, pp. 567–574.Google Scholar
Paredaens, J. & Gucht, D. V. (1988) Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21–23, 1988, Austin, Texas, USA, pp. 29–38.Google Scholar
Park, Y., Seo, S., Park, H., Cho, H. K., & Mahlke, S. (2012) SIMD Defragmenter: Efficient ILP realization on data-parallel architectures. In Proceedings of the ACM SIGARCH Computer Architecture News, vol. 40. ACM, pp. 363–374.CrossRefGoogle Scholar
Peyton Jones, S., Leshchinskiy, R., Keller, G. & MT Chakravarty, M.. (2008) Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics, vol. 2. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
Pierce, B. C. (2002) Types and Programming Languages. MIT press.Google Scholar
Polychroniou, O., Raghavan, A. & Ross, K. A. (2015) Rethinking SIMD vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15. New York, NY, USA: ACM, pp. 1493–1508.Google Scholar
Schuh, S., Chen, X. & Dittrich, J. (2016) An experimental comparison of thirteen relational equi-joins in main memory. In Proceedings of the SIGMOD '16. New York, NY, USA: ACM, pp. 1961–1976.Google Scholar
Shaikhha, A., Klonatos, Y. & Koch, C. (2018) Building efficient query engines in a high-level language. Trans. Database Syst. 43 (1).Google Scholar
Shaikhha, A., Klonatos, Y., Parreaux, L., Brown, L., Dashti, M. & Koch, C. (2016) How to architect a query compiler. In Proceedings of the SIGMOD'16.Google Scholar
Shivers, O. & Might, M. (2006) Continuations and transducer composition. In Proceedings of the PLDI '06. ACM.Google Scholar
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N. & Zdonik, S. (2005) C-store: A column-oriented DBMS. In Proceedings of the VLDB '05. VLDB Endowment.Google Scholar
Svenningsson, J. (2002) Shortcut fusion for accumulating parameters & zip-like Functions. In Proceedings of the ICFP '02. ACM.Google Scholar
Tibbetts, R., Yang, S., MacNeill, R. & Rydzewski, D. (2011) StreamBase LiveView: Push-based real-time analytics. In Proceedings of the StreamBase Systems (Jan 2012).Google Scholar
Transaction Processing Performance Council. (2017) TPC-H, a Decision Support Benchmark. http://www.tpc.org/tpch.Google Scholar
Trinder, P. (1992) Comprehensions, a query notation for DBPLs. In Proceedings of the 3rd DBPL Workshop, DBPL3. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, pp. 55–68.Google Scholar
Veldhuizen, T. L. (2014) Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24–28, 2014.Google Scholar
Viglas, S., Bierman, G. M., & Nagel, F. (2014) Processing declarative queries through generating imperative code in managed runtimes. IEEE Data Eng. Bull. 37 (1), 1221.Google Scholar
Vlissides, J., Helm, R., Johnson, R. & Gamma, E. (1995) Design patterns: Elements of reusable object-oriented software. Reading: Addison-Wesley 49 (120), 11.Google Scholar
Wadler, P. (1988) Deforestation: Transforming programs to eliminate trees. In Proceedings of the ESOP'88. Springer, pp. 344–358.CrossRefGoogle Scholar
Wadler, P. (1990) Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90. New York, NY, USA: ACM, pp. 61–78.Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., & Stoica, I. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the NSDI'12. USENIX Association.Google Scholar
Zhou, J. & Ross, K. A. (2002) Implementing database operations using SIMD instructions. In Proceedings of the SIGMOD '02. New York, NY, USA: ACM.Google Scholar
Zukowski, M., Boncz, P. A., Nes, N., & Héman, S. (2005) MonetDB/X100 – A DBMS In The CPU Cache. IEEE Data Eng. Bull. 28, 1722.Google Scholar
Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006) Super-scalar RAM-CPU cache compression. In Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. Washington, DC, USA: IEEE Computer Society, p. 59.Google Scholar
Submit a response

Discussions

No Discussions have been published for this article.