Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-26T07:15:14.664Z Has data issue: false hasContentIssue false

Regular expression pattern matching for XML

Published online by Cambridge University Press:  29 October 2003

HARUO HOSOYA
Affiliation:
Research Institute for Mathematical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8502, Japan (e-mail: hahosoya@kurims.kyoto-u.ac.jp)
BENJAMIN C. PIERCE
Affiliation:
Department of Computer and Information Science, University of Pennsylvania, 200 S. 33rd Street, Philadelphia, PA 19104-6389, USA (e-mail: bcpierce@cis.upenn.edu)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We then show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages with type systems based on regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the type of input values. The type inference algorithm translates types and patterns into regular tree automata, and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and precision of the analysis. We address these issues by introducing a data structure representing these closure operations lazily.

Type
Article
Copyright
© 2003 Cambridge University Press
Submit a response

Discussions

No Discussions have been published for this article.