Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-15T12:56:52.056Z Has data issue: false hasContentIssue false

Federated query processing on linked data: a qualitative survey and open challenges

Published online by Cambridge University Press:  30 October 2015

Damla Oguz
Affiliation:
Department of Computer Engineering, Izmir Institute of Technology, 35430 Izmir, Turkey e-mail: damlaoguz@iyte.edu.tr, belginergenc@iyte.edu.tr Department of Computer Engineering, Ege University, 35100 Izmir, Turkey e-mail: oguz.dikenelli@ege.edu.tr IRIT Laboratory, Paul Sabatier University, 31062 Toulouse, France e-mail: shaoyi.yin@irit.fr, abdelkader.hameurlain@irit.fr
Belgin Ergenc
Affiliation:
Department of Computer Engineering, Izmir Institute of Technology, 35430 Izmir, Turkey e-mail: damlaoguz@iyte.edu.tr, belginergenc@iyte.edu.tr
Shaoyi Yin
Affiliation:
IRIT Laboratory, Paul Sabatier University, 31062 Toulouse, France e-mail: shaoyi.yin@irit.fr, abdelkader.hameurlain@irit.fr
Oguz Dikenelli
Affiliation:
Department of Computer Engineering, Ege University, 35100 Izmir, Turkey e-mail: oguz.dikenelli@ege.edu.tr
Abdelkader Hameurlain
Affiliation:
IRIT Laboratory, Paul Sabatier University, 31062 Toulouse, France e-mail: shaoyi.yin@irit.fr, abdelkader.hameurlain@irit.fr

Abstract

A large number of data providers publish and connect their structured data on the Web as linked data. Thus, the Web of data becomes a global data space. In this paper, we initially give an overview of query processing approaches used in this interlinked and distributed environment, and then focus on federated query processing on linked data. We provide a detailed and clear insight on data source selection, join methods and query optimization methods of existing query federation engines. Furthermore, we present a qualitative comparison of these engines and give a complementary comparison of the measured metrics of each engine with the idea of pointing out the major strengths of each one. Finally, we discuss the major challenges of federated query processing on linked data.

Type
Articles
Copyright
© Cambridge University Press, 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J. & Ruckhaus, E. 2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In The Semantic Web ISWC 2011, Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N. & Blomqvist, E. (eds), Lecture Notes in Computer Science 7031, 18–34. Springer.CrossRefGoogle Scholar
Adali, S., Candan, K. S., Papakonstantinou, Y. & Subrahmanian, V. S. 1996. Query caching and optimization in distributed mediator systems. ACM SIGMOD Record 25(2), 137146.CrossRefGoogle Scholar
Akar, Z., Halaç, T. G., Ekinci, E. E. & Dikenelli, O. 2012. Querying the web of interlinked datasets using VOID descriptions. In Linked Data on the Web (LDOW2012).Google Scholar
Alexander, K. & Hausenblas, M. 2009. Describing linked datasets—on the design and usage of VoID, the ‘Vocabulary of Interlinked Datasets’. In WWW 2009 Workshop: Linked Data on the Web (LDOW2009).Google Scholar
Amsaleg, L., Franklin, M. J. & Tomasic, A. 1998. Dynamic query operator scheduling for wide-area remote access. Distributed and Parallel Databases 6(3), 217246.CrossRefGoogle Scholar
Arcangeli, J., Hameurlain, A., Migeon, F. & Morvan, F. 2004. Mobile agent based self-adaptive join for wide-area distributed query processing. Journal of Database Management (JDM) 15(4), 2544.CrossRefGoogle Scholar
Avnur, R. & Hellerstein, J. M. 2000. Eddies: continuously adaptive query processing. ACM SIGMOD Record 29(2), 261272.CrossRefGoogle Scholar
Babu, S., Bizarro, P. & DeWitt, D. 2005. Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD’05, 107–118. ACM.CrossRefGoogle Scholar
Berners-Lee, T. 2006. Linked data—design issues. http://www.w3.org/DesignIssues/LinkedData.html.Google Scholar
Bizarro, P., Babu, S., DeWitt, D. & Widom, J. 2005. Content-based routing: different plans for different data. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05, 757–768. VLDB Endowment.Google Scholar
Bizer, C. 2009. The emerging web of linked data. IEEE Intelligent Systems 24(5), 8792.CrossRefGoogle Scholar
Bizer, C., Heath, T. & Berners-Lee, T. 2009. Linked data—the story so far. International Journal on Semantic Web and Information Systems 5(3), 122.Google Scholar
Blanco, E., Cardinale, Y. & Vidal, M.-E. 2012. Experiences of sampling-based approaches for estimating qos parameters in the web service composition problem. IJWGS 8(1), 130.CrossRefGoogle Scholar
Buil-Aranda, C., Arenas, M., Corcho, O. & Polleres, A. 2013. Federating queries in SPARQL 1.1: syntax, semantics and evaluation. Web Semantics: Science, Services and Agents on the World Wide Web 18(1), 117.CrossRefGoogle Scholar
Buil-Aranda, C., Polleres, A. & Umbrich, J. 2014. Strategies for executing federated queries in SPARQL 1.1. In The Semantic Web—ISWC 2014—13th International Semantic Web Conference, 19–23 October. Proceedings, Part II, 390–405.Google Scholar
Cambazoglu, B. B., Altingovde, I. S., Ozcan, R. & Ulusoy, O. 2012. Cache-based query processing for search engines. ACM Transactions on the Web (TWEB) 6(4), 14.Google Scholar
Cyganiak, R., Zhao, J., Alexander, K. & Hausenblas, M. 2011. Describing linked datasets with the VoID vocabulary. http://rdfs.org/ns/void/.Google Scholar
Deshpande, A. 2004. An initial study of overheads of eddies. ACM SIGMOD Record 33(1), 4449.CrossRefGoogle Scholar
Deshpande, A. & Hellerstein, J. M. 2004. Lifting the burden of history from adaptive query processing. In Proceedings of the Thirtieth International Conference on Very Large Data Bases—Volume 30, VLDB’04, 948–959. VLDB Endowment.CrossRefGoogle Scholar
Deshpande, A., Ives, Z. & Raman, V. 2007. Adaptive query processing. Found Trends Databases 1(1), 1140.CrossRefGoogle Scholar
Fionda, V., Gutierrez, C. & Pirró, G. 2012. Semantic navigation on the web of data: specification of routes, web fragments and actions. In Proceedings of the 21st International Conference on World Wide Web, WWW’12, 281–290. ACM.CrossRefGoogle Scholar
Florescu, D., Levy, A., Manolescu, I. & Suciu, D. 1999. Query optimization in the presence of limited access patterns. ACM SIGMOD Record 28(2), 311322.CrossRefGoogle Scholar
Gan, Q. & Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web, WWW’09, 431–440. ACM.CrossRefGoogle Scholar
Gardarin, G. & Valduriez, P. 1990. Relational Databases and Knowledge Bases. Addison-Wesley Longman Publishing Co., Inc.Google Scholar
Görlitz, O. & Staab, S. 2011a. Federated data management and query optimization for linked open data. In New Directions in Web Data Management 1, Vakali, A. & Jain, L. C. (eds), Studies in Computational Intelligence 331, 109137. Springer.CrossRefGoogle Scholar
Görlitz, O. & Staab, S. 2011b. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the Second International Workshop on Consuming Linked Data (COLD2011), 23 October, Hartig, O., Harth, A. & Sequeda, J. (eds), CEUR Workshop Proceedings 782, CEUR-WS.orgGoogle Scholar
Haas, L. M., Kossmann, D., Wimmers, E. L. & Yang, J. 1997. Optimizing queries across diverse data sources. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB’97, 276–285. Morgan Kaufmann Publishers, Inc.Google Scholar
Han, W.-S., Ng, J., Markl, V., Kache, H. & Kandil, M. 2007. Progressive optimization in a shared-nothing parallel database. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, 809–820. ACM.CrossRefGoogle Scholar
Hartig, O. 2011. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications—Volume Part I, ESWC’11, 154–169. Springer-Verlag.CrossRefGoogle Scholar
Hartig, O. 2013. SQUIN: a traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD’13, 1081–1084. ACM.CrossRefGoogle Scholar
Hartig, O., Bizer, C. & Freytag, J.-C. 2009. Executing SPARQL queries over the web of linked data. In The Semantic Web—ISWC 2009, Bernstein, A., Karger, D., Heath, T., Feigenbaum, L., Maynard, D., Motta, E. & Thirunarayan, K. (eds), Lecture Notes in Computer Science 5823, 293309. Springer.CrossRefGoogle Scholar
Hartig, O. & Langegger, A. 2010. A database perspective on consuming linked data on the web. Datenbank-Spektrum 10(2), 5766.CrossRefGoogle Scholar
Ibaraki, T. & Kameda, T. 1984. On the optimal nesting order for computing n-relational joins. ACM Transactions on Database Systems 9(3), 482502.CrossRefGoogle Scholar
Ives, Z. G., Florescu, D., Friedman, M., Levy, A. & Weld, D. S. 1999. An adaptive query execution system for data integration. ACM SIGMOD Record 28(2), 299310.CrossRefGoogle Scholar
Kabra, N. & DeWitt, D. J. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Record 27(2), 106117.CrossRefGoogle Scholar
Kache, H., Han, W.-S., Markl, V., Raman, V. & Ewen, S. 2006. POP/FED: progressive query optimization for federated queries in DB2. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB’06, 1175–1178. VLDB Endowment.Google Scholar
Lorey, J. & Naumann, F. 2013. Caching and prefetching strategies for SPARQL queries. In The Semantic Web: ESWC 2013 Satellite Events, Cimiano, P., Fernndez, M., Lopez, V., Schlobach, S. & Vlker, J. (eds), Lecture Notes in Computer Science 7955, 4665. Springer.Google Scholar
Lynden, S., Kojima, I., Matono, A. & Tanimura, Y. 2010. Adaptive integration of distributed semantic web data. In Proceedings of the 6th International Conference on Databases in Networked Information Systems, DNIS’10, 174–193. Springer-Verlag.CrossRefGoogle Scholar
Lynden, S., Kojima, I., Matono, A. & Tanimura, Y. 2011. ADERIS: an adaptive query processor for joining federated SPARQL endpoints. In Proceedings of the 2011th Confederated International Conference on the Move to Meaningful Internet Systems—Volume Part II, OTM’11, 808–817. Springer-Verlag.CrossRefGoogle Scholar
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H. & Cilimdzic, M. 2004. Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD’04, 659–670. ACM.CrossRefGoogle Scholar
Martin, M., Unbehauen, J. & Auer, S. 2010. Improving the performance of semantic web applications with SPARQL query caching. In Proceedings of the 7th International Conference on The Semantic Web: Research and Applications—Volume Part II, ESWC’10, 304–318. Springer-Verlag.CrossRefGoogle Scholar
Ozakar, B., Morvan, F. & Hameurlain, A. 2005. Mobile join operators for restricted sources. Mobile Information Systems 1(3), 167184.CrossRefGoogle Scholar
Ozsu, M. & Valduriez, P. 2011. Principles of Distributed Database Systems, 3rd edition. Springer.Google Scholar
Quilitz, B. & Leser, U. 2008. Querying distributed RDF data sources with SPARQL. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, 524–538. Springer-Verlag.CrossRefGoogle Scholar
Rakhmawati, N. A., Umbrich, J., Karnstedt, M., Hasnain, A. & Hausenblas, M. 2013. Querying over federated SPARQL endpoints—a state of the art survey. CoRR abs/1306.1723.Google Scholar
Raman, V., Deshpande, A. & Hellerstein, J. M. 2003. Using state modules for adaptive query processing. In Proceedings of the 19th International Conference on Data Engineering, 5–8 March, 353–364.Google Scholar
Saleem, M., Khan, Y., Hasnain, A., Ermilov, I. & Ngomo, A. N. 2015. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal, 1–26. http://content.iospress.com/articles/semantic-web/sw186.Google Scholar
Saleem, M. & Ngomo, A. N. 2014. HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In The Semantic Web: Trends and Challenges—11th International Conference, ESWC 2014, 25–29 May. Proceedings, 176–191.Google Scholar
Saleem, M., Ngomo, A. N., Parreira, J. X., Deus, H. F. & Hauswirth, M. 2013. DAW: duplicate-aware federated query processing over the web of data. In The Semantic Web—ISWC 2013—12th International Semantic Web Conference, 21–25 October, Proceedings, Part I, 574–590.Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R. & Schmidt, M. 2011. FedX: optimization techniques for federated query processing on linked data. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 601–616.Google Scholar
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C. & Reynolds, D. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web, WWW 2008, 21–25 April, 595–604.Google Scholar
Umbrich, J., Karnstedt, M., Hogan, A. & Parreira, J. X. 2012a. Freshening up while staying fast: towards hybrid SPARQL queries. In Knowledge Engineering and Knowledge Management—18th International Conference, EKAW 2012, 8–12 October. Proceedings, 164–174.Google Scholar
Umbrich, J., Karnstedt, M., Hogan, A. & Parreira, J. X. 2012b. Hybrid SPARQL queries: fresh vs. fast results. In The Semantic Web—ISWC 2012—11th International Semantic Web Conference, 11–15 November, Proceedings, Part I, 608–624.Google Scholar
Urhan, T. & Franklin, M. J. 2000. XJoin: a reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 2733.Google Scholar
Vidal, M., Ruckhaus, E., Lampo, T., Martnez, A., Sierra, J. & Polleres, A. 2010. Efficiently joining group patterns in SPARQL queries. In The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, 30 May 30–3 June, Proceedings, Part I, 228–242.Google Scholar
Wang, X., Tiropanis, T. & Davis, H. C. 2013. LHD: optimising linked data query processing using parallelisation. In Proceedings of the WWW2013 Workshop on Linked Data on the Web, 14 May.Google Scholar
Wiederhold, G. 1992. Mediators in the architecture of future information systems. IEEE Computer 25(3), 3849.CrossRefGoogle Scholar
Williams, G. T. & Weaver, J. 2011. Enabling fine-grained HTTP caching of SPARQL query results. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 762–777.Google Scholar
Wilschut, A. N. & Apers, P. M. G. 1991. Dataflow query execution in a parallel main-memory environment. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, PDIS’91, 68–77. IEEE Computer Society Press.Google Scholar
Yönyül, B. 2014. Performance Management in Federated Linked Data Query Engines. Master’s thesis, Ege University.Google Scholar
Zhou, Y., De, S. & Moessner, K. 2013. Implementation of federated query processing on linked data. In 2013 IEEE 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), 3553–3557.Google Scholar