Regular expression

From Citizendium
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Definition [?]
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Regular expressions (often shortened to regex, regexes or regexp) are a set of related parsing languages that let you select and extract strings within strings. They provide a set of rules for escaping meta-characters, and specifying groups of characters to be matched.

Regular expressions are widely implemented: they work in a variety of programming and scripting languages including Java, Perl, Python, Ruby and Tcl. They are supported by the Unix tools Awk and Sed, and by the editors Emacs, Vim, TextMate, BBEdit, jEdit and many more.

An example regular expression might be: /^h/ This will match any line that starts with the lower-case letter 'h'. The '^' is part of the regular-expression syntax that refers to the start of the line.

Regular expression syntax is an improvement on simple matching as it allows you to specify alternation, quantification, grouping, word and sentence matching and much more. Regular expressions come from the work done on SNOBOL by Stephen Cole Kleene in the 1950s. In the QED editor, Ken Thompson first implemented a regular expression library, and it was then implemented in ed, which later morphed into ex and then, when screens replaced teletypes, into Vim. Many of the navigation keys in Vim match those used for regular expression matching syntax (^, $, b, w).

There are currently two main families of regular expression syntax: Perl-Compatible Regular Expressions (PCRE) and POSIX-compatible regular expressions. Regular expressions are powerful, but should not be used for parsing structured documents where the use of either a parsing library (such as those used for data and document interchange formats like HTML, XML, SGML, JSON etc.) or a Backus-Naur Form-based parser can be used.