Tuesday, December 14, 2010

Uncovering the regular expression syntax of mystery

Regular expressions (REs) is often mistakenly thought to be only a few people understand the language of a mystery.

Apparently they do look messy if you don't know its syntax, so its code in your eyes just a pile of text junk. In fact, the regular expression is very simple and can be understood. After reading this article, you will be familiar with regular expressions in common syntax. Support for multiple platforms the oldest regular expression is the mathematician StephenKleene in 1956, he is on the increase of natural language on the basis of the results. With full syntax of regular expressions used in character format matches, was later applied to the melting of the area of information technology. Since then, regular expression after several stages of development, the standard has been ISO (international standards organization) approval and confirmation by the OpenGroup organization. The regular expression is not a specific language, but it can be used in a file or characters in the find and replace text in a standard. It has two standards: basic regular expressions (BRE), the extended regular expression (ERE). ERE including BRE capabilities and other concepts. Many programs use regular expressions, including xsh, egrep, sed, vi, and on UNIX platforms the following procedures. They can be accepted in many languages, such as HTML and XML, these adopted usually just a subset of the entire standard. Than you think to normal with regular expressions into cross-platform programming languages, this functionality is also increasingly complete, extensive use has gradually. Search engine on the network, using its e-mail program to use it, even if you are not a UNIX programmer, you can also use the rule language to simplify your processes and shorten your development time. Regular expressions 101 many regular expressions syntax looks very similar, this is because you have not studied before you. A wildcard is a structure type RE, repeat the operation. Let's take a look at the ERE standard most common basic syntax type. In order to be able to provide an example of a particular purpose, I will use several different programs. Characters matching the regular expression of the key that you want to search for a match, if you do not have this concept, the Res will not be useful. Each expression contains the instructions needed to find, as shown in A table. Repeat operator repeat operator, or the number of words that describes a specific number of characters. They are often used for character matching syntax to find the number of rows of characters that can see table B. Anchor refers it to match the format, as shown in Figure C. Use it can make it easier for you to find the universal character of the merge. For example, I used vi line editor commands: s to represent a substitute, the order of the basic syntax is: s/pattern_to_match/pattern_to_substitute/interval Res in another may be that the interval (or insert) symbol. In fact, this symbol is equivalent to a representation OR statement and | symbol. The following statement returns the file sample.txt "nerd" and "merd" handle: egrep "(n | m) erd" sample.txt interval is very powerful, especially when you search for files of different spelling, but you can be in the following examples get the same results: egrep "[nm] erd" sample.txt "when you use the interval function and Res advanced features together, it really is more. Some reserved characters Res last most important characteristics is the reserved characters (also known as specific characters). For example, if you want to find "ne * rd" and "ni * rd" character, the format matches the statement "n [ei] * rd" and "neeeeerd" and "nieieierd", but not the characters you want to find. Because the ' * ' (asterisk) is a reserved character, you must use a backslash to override it, namely: "n [ei] \ * rd". Other reserved characters include: ^ (carat). (period) [(leftbracket}  $(dollarsign)  ((leftparenthesis)  )(rightparenthesis)  | (pipe)  *(asterisk)  +(plussymbol)  ? (questionmark) {(Leftcurlybracket, orleftbrace) once you get above \backslas these characters include the characters in your search, there is no doubt that Res has become very difficult to read. For example, the following PHP search engine in eregi code difficult to read. eregi(\"^[_a-z0-9-]+(\.[ _a-z0-9-]+)@[a-z0-9-]+(\.[ A-z0-9-] +) * $/', $ sendto) you can see, the program's intent is very difficult to grasp. But if you put aside reserved characters, you are often wrongly understood the meaning of the code. Summary in this article, we opened a regular expression of mystery, and lists the ERE standard generic syntax. If you want to view the OpenGroup Organization for a full description of the rule, you can see: RegularExpressions, welcomes you in the discussion area published your question or opinion.

No comments:

Post a Comment