Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-15T10:06:45.688Z Has data issue: false hasContentIssue false

Using automated planning for improving data mining processes

Published online by Cambridge University Press:  07 February 2013

Susana Fernández
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
Tomás de la Rosa
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
Fernando Fernández
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
Rubén Suárez
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
Javier Ortiz
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
Daniel Borrajo
Affiliation:
Departamento de Informática, Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911Leganés (Madrid), Spain; e-mail: susana.fernandez@uc3m.es
David Manzano
Affiliation:
Ericsson España, S.A.U, Madrid R&D Center, Technology & Innovation, C/ Vía de los Poblados 13, 28013Madrid, Spain; e-mail: david.manzano.macho@ericsson.com

Abstract

This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amant, R. S., Cohen, P. R. 1997. Evaluation of a semi-autonomous assistant for exploratory data analysis. In Proceedings of the 1st International Conference on Autonomous Agents, Johnson, W. L. & Hayes-Roth, B. (eds). Marina del Rey, California, United States, 355–362. ACM Press.Google Scholar
Ambite, J. L., Kapoor, D. 2007. Automatically composing data workflows with relational descriptions and shim services. In The Semantic Web, Lecture Notes in Computer Science 4825, 15–29. Springer.CrossRefGoogle Scholar
Bernstein, A., Provost, F., Hill, S. 2005. Towards intelligent assistance for a data mining process: an ontology based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503518.CrossRefGoogle Scholar
Chien, S. A., Mortensen, H. B. 1996. Automating image processing for scientific data analysis of a large image database. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 854859.CrossRefGoogle Scholar
De la Rosa, T., García-Olaya, A., Borrajo, D. 2007. Using cases utility for heuristic planning improvement. In Case-Based Reasoning Research and Development: Proceedings of the 7th International Conference on Case-Based Reasoning, Weber, R. O. & Richter, M. M. Belfast, Northern Ireland, UK, 137–148. Springer Verlag. ISBN 978-3-540-74138-1.Google Scholar
Diamantini, C., Potena, D., Storti, E. 2009. Ontology-driven KDD process composition. In Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science 5772, 285–296. Springer.CrossRefGoogle Scholar
Engels, R. 1996. Planning tasks for knowledge discovery in databases; performing task-oriented user-guidance. In Proceedings of the 2nd International Conference on KDD, Menlo Park, California. AAAI Press.Google Scholar
Etzioni, O., Weld, D. 1994. A softbot-based interface to the internet. Communications of the ACM 37(7), 7276.CrossRefGoogle Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. 1996. From data mining to knowledge discovery in databases. AI Magazine 17(3), 3754.Google Scholar
Fernández, F., Borrajo, D., Fernández, S., Manzano, D. 2009. Assisting data mining through automated planning. In Machine Learning and Data Mining 2009 (MLDM 2009), Perner, P. (ed.), Lecture Notes in Artificial Intelligence 5632, 760–774. Springer-Verlag.CrossRefGoogle Scholar
Fox, M., Long, D. 2003. PDDL2.1: an extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research 20, 61124.CrossRefGoogle Scholar
Ghallab, M., Nau, D., Traverso, P. 2004. Automated Planning—Theory and Practice. Morgan Kaufmann.Google Scholar
Goebel, M., Gruenwald, L. 1999. A survey of data mining and knowledge discovery software tools. SIGKDD Explorations 1, 2033.CrossRefGoogle Scholar
Golden, K. 1997. Planning and Knowledge Representations for Softbots. PhD thesis, University of Washington.Google Scholar
Hilario, M., Kalousis, A., Nguyen, P., Woznica, A. 2009. A data mining ontology for algorithm selection and meta-learning. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 76–87.Google Scholar
Hoffmann, J., Bertoli, P., Helmert, M., Pistore, M. 2009. Message-based web service composition, integrity constraints, and planning under uncertainty: a new connection. Journal of Artificial Intelligence Research 35, 49117.CrossRefGoogle Scholar
Kietz, J.-U., Serban, F., Bernstein, A., Fischer, S. 2009. Towards cooperative planning of data mining workflows. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 1–12.Google Scholar
Livingston, G. R., Rosenberg, J. M., Buchanan, B. G. 2001. Closing the loop: an agenda- and justification-based framework for selecting the next discovery task to perform. IEEE International Conference on Data Mining, Vancouver, BC, Canada, 385. doi: http://doi.ieeecomputersociety.org/10.1109/ICDM.2001.989543.Google Scholar
Michalski, R. S., Kaufman, K. A. 1998. Discovery planning: multistrategy learning in data mining. In Proceedings of the 4th International Workshop on Multistrategy Learning, Desenzano de Garda, Italy, 14–20.Google Scholar
Michie, D., Spiegelhalter, D., Taylor, C. (eds) 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.Google Scholar
Morik, K., Scholz, M. 2003. The miningmart approach to knowledge discovery in databases. In Intelligent Technologies for Information Analysis, Zhong, N. & Liu, J. (eds), 4765. Springer.Google Scholar
Penberthy, J. S., Weld, D. 1992. UCPOP: a sound, complete, partial order planner for ADL. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, San Mateo, CA.Google Scholar
Rodríguez-Moreno, M. D., Borrajo, D., Cesta, A., Oddi, A. 2007. Integrating planning and scheduling in workflow domains. Expert System with Applications, 33(2). Retrieved from http://hdl.handle.net/10016/8289.Google Scholar
Rosset, S., Perlich, C., Zadrozny, B. 2007. Ranking-based evaluation of regression models. Knowledge and Information Systems 12(3), 331353.CrossRefGoogle Scholar
Sumathi, S., Sivanandam, S. 2006. Active data mining. In Studies in Computational Intelligence (SCI), 29. Springer-Verlag.Google Scholar
Witten, I. H., Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. Morgan Kaufmann.Google Scholar
Zakova, M., Kremen, P., Zelezny, F., Lavrac, N. 2008. Planning for data mining workflow composition. In SoKD: ECML/PKDD 2008 Workshop on 3rd Generation Data Mining: Towards Service-oriented Knowledge Discovery, Antwerp, Belgium.Google Scholar