Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-10T20:15:46.221Z Has data issue: false hasContentIssue false

Hybrid ASP-based Approach to Pattern Mining

Published online by Cambridge University Press:  18 January 2019

SERGEY PARAMONOV*
Affiliation:
KU Leuven, Leuven, Belgium (e-mail: sergey.paramonov@kuleuven.be)
DARIA STEPANOVA
Affiliation:
Max Planck Institute for Informatics, Saarbrücken, Germany (e-mails: dstepano@mpi-inf.mpg.de, pmiettin@mpi-inf.mpg.de)
PAULI MIETTINEN
Affiliation:
Max Planck Institute for Informatics, Saarbrücken, Germany (e-mails: dstepano@mpi-inf.mpg.de, pmiettin@mpi-inf.mpg.de)

Abstract

Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP) seem well suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods focus either on scalability or on generality. In this paper, we make steps toward combining local (frequency, size, and cost) and global (various condensed representations like maximal, closed, and skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence, and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework, we apply it to a problem of approximately tiling a database. Experiments on real-world data sets show the effectiveness of the proposed method and computational gains for itemset, sequence, and graph mining, as well as approximate tiling.

Under consideration in Theory and Practice of Logic Programming.

Type
Original Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work has been supported by the FWO and by the ERC-ADG-201 project 694980 SYNTH funded by the European Research Council. This is an extended version of a paper presented at the RuleML+RR 2017 conference, which has been invited for submission to TPLP. The authors acknowledge the assistance of the RuleML+RR 2017 Program Chairs Stefania Costantini, Enrico Franconi, Fariba Sadri and William van Woensel.

References

Aggarwal, C. C. 2015. Data Mining: The Textbook. Springer, Cham.Google Scholar
Agrawal, R., Imielinski, T. and Swami, A. 1993. Mining association rules between sets of items in large databases. In SIGMOD 1993, ACM, New York, NY, USA, 207216.Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park, 307328.Google Scholar
Alviano, M., Dodaro, C., Faber, W., Leone, N. and Ricca, F. 2013. WASP: A native asp solver based on constraint learning. In International Conference on Logic Programming and Nonmonotonic Reasoning. Springer, Berlin, 5466.10.1007/978-3-642-40564-8_6CrossRefGoogle Scholar
Aoga, J. O. R., Guns, T. and Schaus, P. 2016. An efficient algorithm for mining frequent sequence with constraint programming. In ECML PKDD, Springer, Cham, 315330.Google Scholar
Babai, L. 2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547.Google Scholar
Bonchi, F. and Lucchese, C. 2006. On condensed representations of constrained frequent patterns. Knowledge and Information Systems 9, 2, 180201.10.1007/s10115-005-0201-1CrossRefGoogle Scholar
Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proc. of the 3rd Annual ACM Symposium on Theory of Computing, Shaker Heights, OH, USA, 3–5 May 1971, Harrison, M. A., Banerji, R. B., and Ullman, J. D., Eds. ACM, New York, 151158.Google Scholar
Eiter, T., Brewka, G., Dao-Tran, M., Fink, M., Ianni, G. and Krennwallner, T. 2009a. Combining nonmonotonic knowledge bases with external sources. In Frontiers of Combining Systems, 7th International Symposium, Ghilardi, S. and Sebastiani, R., Eds. Frontiers of Combining Systems. FroCoS 2009. Lecture Notes in Computer Science, vol. 5749. Springer, Berlin, Heidelberg, 16–18 Sep. 2009, 1842.10.1007/978-3-642-04222-5_2CrossRefGoogle Scholar
Eiter, T., Ianni, G. and Krennwallner, T. 2009b. Answer set programming: A primer. In 5th International Reasoning Web Summer School (RW 2009), Brixen/Bressanone, Italy, 30 Aug.–4 Sep. 2009. LNCS, vol. 5689. Springer, Berlin.Google Scholar
Faber, W., Pfeifer, G., Leone, N., Dell’Armi, T. and Ielpa, G. 2008. Design and implementation of aggregate functions in the DLV system. CoRR abs/0802.3137.Google Scholar
Gebser, M., Guyet, T., Quiniou, R., Romero, J. and Schaub, T. 2016. Knowledge-based sequence mining with ASP. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, AAAI, New York, NY, USA, 9–15 July 2016.Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B. and Schaub, T. 2012. Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, San Rafael.10.2200/S00457ED1V01Y201211AIM019CrossRefGoogle Scholar
Gebser, M., Kaufmann, B., Neumann, A. and Schaub, T. 2007. clasp: A conflict-driven answer set solver. In LPNMR, Springer, 260265.Google Scholar
Geerts, F., Goethals, B. and Mielikäinen, T. 2004. Tiling databases. In Proc. of the 7th International Conference on Discovery Science, DS 2004, Springer, Padova, Italy, 2–5 Oct. 2004, 278289.Google Scholar
Gelfond, M. and Lifschitz, V. 1988. The stable model semantics for logic programming. In Proceedings of ICLP/SLP, MIT Press, 10701080.Google Scholar
Guns, T., Dries, A., Nijssen, S., Tack, G. and De Raedt, L. 2017. MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence 244, 629.10.1016/j.artint.2015.09.007CrossRefGoogle Scholar
Guns, T., Nijssen, S., and de Raedt, L. 2013. k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering 25, 2, 402418.10.1109/TKDE.2011.204CrossRefGoogle Scholar
Guns, T., Paramonov, S. and Négrevergne, B. 2016. On declarative modeling of structured pattern mining. In Declarative Learning Based Programming, Papers from the 2016 AAAI Workshop, AAAI Press, Phoenix, AZ, USA, 13 Feb. 2016.Google Scholar
Guyet, T., Moinard, Y. and Quiniou, R. 2014. Using answer set programming for pattern mining. CoRR abs/1409.7777.Google Scholar
Jabbour, S., Sais, L. and Salhi, Y. 2013. Boolean satisfiability for sequence mining. In 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 Oct.–1 Nov. 2013, ACM, 649658.Google Scholar
Jabbour, S., Sais, L. and Salhi, Y. 2015. Decomposition based SAT encodings for itemset mining problems. In PAKDD, Springer, 662674.Google Scholar
Järvisalo, M. 2011. Itemset mining as a challenge application for answer set enumeration. In Proc. of the 11th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2011, Vancouver, Canada, 16–19 May 2011, Springer, 304310.10.1007/978-3-642-20895-9_35CrossRefGoogle Scholar
Kimelfeld, B. and Kolaitis, P. G. 2014. The complexity of mining maximal frequent subgraphs. ACM Transactions on Database Systems 39, 4, 3233.10.1145/2629550CrossRefGoogle Scholar
Lifschitz, V. 2008. What is answer set programming? AAAI, Chicago, IL.Google Scholar
Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 3, 241258.10.1023/A:1009796218281CrossRefGoogle Scholar
Métivier, J., Loudni, S. and Charnois, T. 2013. A constraint programming approach for mining sequential patterns in a sequence database. CoRR abs/1311.6907.Google Scholar
Miettinen, P. 2008. On the positive-negative partial set cover problem. Information Processing Letters 108, 4, 219221.10.1016/j.ipl.2008.05.007CrossRefGoogle Scholar
Miettinen, P. 2009. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Ph.D. thesis, Department of Computer Science, University of Helsinki.Google Scholar
Miettinen, P. 2015. Generalized matrix factorizations as a unifying framework for pattern set mining: Complexity beyond blocks. In ECMLPKDD 2015, Springer, 3652.Google Scholar
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G. and Mannila, H. 2008. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering 20, 10, 13481362.10.1109/TKDE.2008.53CrossRefGoogle Scholar
Négrevergne, B., Dries, A., Guns, T. and Nijssen, S. 2013. Dominance programming for itemset mining. In 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 Dec. 2013, IEEE, 557566.10.1109/ICDM.2013.92CrossRefGoogle Scholar
Négrevergne, B. and Guns, T. 2015. Constraint-based sequence mining using constraint programming. In CPAIOR, Springer, 288305.Google Scholar
Neumann, S. and Miettinen, P. 2017. Reductions for frequency-based data mining problems. In Proc. of the 2017 IEEE International Conference on Data Mining, IEEE, New Orleans, LA, USA, 9971002.10.1109/ICDM.2017.128CrossRefGoogle Scholar
Paramonov, S., van Leeuwen, M., Denecker, M. and De Raedt, L. 2015. An exercise in declarative modeling for relational query mining. In ILP, Springer, 166182.Google Scholar
Pei, J. and Han, J. 2000. Can we push more constraints into frequent pattern mining? In ACM SIGKDD, ACM, Boston, MA, USA, 350354.Google Scholar
Pei, J., Han, J. and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, USA, 2130.Google Scholar
Pensa, R. G. and Boulicaut, J.-F. 2005. Towards fault-tolerant formal concept analysis. In AI*IA 2005, Springer, 212223.Google Scholar
Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B. and Lepailleur, A. 2014. Mining (soft-) skypatterns using dynamic CSP. In CPAIOR, Springer, 7187.Google Scholar
Simons, P., Niemelä, I. and Soininen, T. 2002. Extending and implementing the stable model semantics. Artificial Intelligence 138, 1–2, 181234.10.1016/S0004-3702(02)00187-XCrossRefGoogle Scholar
Tyukin, A., Kramer, S. and Wicker, J. 2014. BMaD – A boolean matrix decomposition framework. In ECML PKDD 2014, Calders, T., Esposito, F., Hüllermeier, E. and Meo, R., Eds., Springer, 481484.Google Scholar
van der Hallen, M., Paramonov, S., Leuschel, M. and Janssens, G. 2016. Knowledge representation analysis of graph mining. CoRR abs/1608.08956.Google Scholar
Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In ICDM 2002, Maebashi City, Japan, Japan, IEEE.Google Scholar
Yan, X., Han, J. and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in large datasets. In SDM. 166–177.10.1137/1.9781611972733.15CrossRefGoogle Scholar
Zaki, M. J., Parthasarathy, S., Ogihara, M. and Li, W. 1997. New algorithms for fast discovery of association rules. Technical report, University of Rochester, Rochester, NY, USA.10.1007/978-1-4615-5669-5_1CrossRefGoogle Scholar