2013-07-28 | openGLUG

Pages

Friday 2 August 2013

A tiny introduction to Regular Expressions

In this tiny post I am going to give you some information about regular expressions.
A Regular expression in computer science is a  specialized string of characters representing one or more strings of characters having a special format. In other words, a regular expression is a generalised representation of a particular family of strings.
For example the regular expression "^([a-z])([0-9])$" represents a set of strings containing two characters, first letter being a lower case alphabet and the second one being a digit between 0 and 9. That means a set containing strings like a6, a3, s5 and so on.
So what those symbols mean? :
In the above example you can see some special characters like '^' and '$'. But what's that weird string mean?. If this is your question, the answer is simple. They mean what they meant to. Well, don't worry, I will help you to get out of this doubt. The caret symbol( '^' ) tells the regular expression engine about what character should be at the beginning of the string to qualify it as the correct match for the regular expression under consideration. Now coming to the next character '('. The parenthesis is used for grouping in regular expressions as you do in mathematics. Nothing more interesting about it. Let's move to the next character '['. This has a special meaning in regular expressions. The things enclosed inside '[ ]' are called ranges. They are called so because the regular expression engine expands it before matching. For example the range [0-3] will be expanded to (0|1|2|3). Here you can see one more special character '|' or pipe symbol. This is an operator and it is called 'OR'. This tells the engine to match any of the character among all the members inside the range. Let's not go in deep about it and let me continue with our example. So far we have one rule which says the string should start with one of the lower case alphabets. I hope you are getting the picture. Now let's continue to the next group. Next group contains a range '[0-9]'. You know what it means. The next character is '$' or the dollar symbol. This does the exact opposite of the caret symbol. That means, it defines what should be the last character. So now from the range 0-9 and the dollar symbol we have another rule which says that the string should end with a single digit integer. Now from the two rules we have, you can easily understand why the regular expression above represents a3, k3 etc.
Uses of Regular Expressions :
Regular Expressions play a vital role in almost all branches of computer science. Compilers do the extensive usage of regular expressions to check the syntax and even for parsing the source code written by a programmer. Also in web development regular expressions are used for verifying URLs, email addresse etc.
So far we had just a small introduction to regular expressions. There are lot of things in regular expressions and it is impossible to put it all in one post. I suggest you to search on internet for information about this particular topic if you are interested about it.
That's all folks!
Thank you for visiting and don't forget to come back. Also I welcome you for commenting if I deserve anything (good or bad, just comment).
HACK ALL YOU CAN!