Regular expression pattern matching for XML

HARUO HOSOYA; BENJAMIN C. PIERCE

doi:10.1017/S0956796802004410

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We then show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages with type systems based on regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the type of input values. The type inference algorithm translates types and patterns into regular tree automata, and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and precision of the analysis. We address these issues by introducing a data structure representing these closure operations lazily.

Information

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Bidoit, Nicole Cerrito, Serenella and Thion, Virginie 2004. A first step towardsmodeling semistructured data in hybrid multimodal logic. Journal of Applied Non-Classical Logics, Vol. 14, Issue. 4, p. 447.

Frisch, Alain and Cardelli, Luca 2004. Automata, Languages and Programming. Vol. 3142, Issue. , p. 618.

Møller, Anders and Schwartzbach, Michael I. 2004. Database Theory - ICDT 2005. Vol. 3363, Issue. , p. 17.

McPhillips, Timothy M. and Bowers, Shawn 2005. An approach for pipelining nested collections in scientific workflows. ACM SIGMOD Record, Vol. 34, Issue. 3, p. 12.

Hosoya, Haruo Frisch, Alain and Castagna, Giuseppe 2005. Parametric polymorphism for XML. p. 50.

Kutsia, Temur and Marin, Mircea 2005. Logic for Programming, Artificial Intelligence, and Reasoning. Vol. 3835, Issue. , p. 215.

Gapeyev, Vladimir Levin, Michael Y. Pierce, Benjamin C. and Schmitt, Alan 2005. Compiler Construction. Vol. 3443, Issue. , p. 43.

Hosoya, Haruo Frisch, Alain and Castagna, Giuseppe 2005. Parametric polymorphism for XML. ACM SIGPLAN Notices, Vol. 40, Issue. 1, p. 50.

Castagna, Giuseppe 2005. Database Programming Languages. Vol. 3774, Issue. , p. 1.

Vytiniotis, Dimitrios Washburn, Geoffrey and Weirich, Stephanie 2005. An open and shut typecase. p. 13.

Niehren, Joachim Planque, Laurent Talbot, Jean-Marc and Tison, Sophie 2005. Database Programming Languages. Vol. 3774, Issue. , p. 217.

Murata, Makoto Lee, Dongwon Mani, Murali and Kawaguchi, Kohsuke 2005. Taxonomy of XML schema languages using formal language theory. ACM Transactions on Internet Technology, Vol. 5, Issue. 4, p. 660.

Maneth, Sebastian Perst, Thomas and Seidl, Helmut 2006. Database Theory – ICDT 2007. Vol. 4353, Issue. , p. 254.

Tozawa, Akihiko 2006. Functional and Logic Programming. Vol. 3945, Issue. , p. 81.

Nakano, Keisuke and Mu, Shin-Cheng 2006. Programming Languages and Systems. Vol. 4279, Issue. , p. 340.

Kutsia, Temur 2006. Context Sequence Matching for XML. Electronic Notes in Theoretical Computer Science, Vol. 157, Issue. 2, p. 47.

Lemay, A. Niehren, J. and Gilleron, R. 2006. Grammatical Inference: Algorithms and Applications. Vol. 4201, Issue. , p. 253.

Okui, Satoshi and Suzuki, Taro 2006. Pattern Matching of Incompletely RE-Typed Expressions via Transformation. IPSJ Digital Courier, Vol. 2, Issue. , p. 248.

Frisch, Alain 2006. OCaml + XDuce. p. 192.

Frisch, Alain 2006. OCaml + XDuce. ACM SIGPLAN Notices, Vol. 41, Issue. 9, p. 192.

Download full list

Article contents

Regular expression pattern matching for XML

Abstract

Information

Discussions

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Regular expression pattern matching for XML

Abstract

Information

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests