Regular Expressions

At some point in time you always end up having to write a regular expression to parse something. In the web, it’s normally because the site/service you want to use doesn’t provide and API, so you end up scrapping the page and having to rip the data out. Sometimes you need them for something simple, like finding links inside a block of plain text.

The common problem, I can never remember them and end up redoing them each time, so I’m listing some here.

As most of these include /‘s & similar I normally use # as the pattern delimiter so I don’t have to back slash them and as I work mostly in PHP, the code will be PHP code.

URLs in plain text

With everyone wanting their latest tweet on their websites this is a pretty common case. You can use the plain text version of the tweet from the persons RSS feed (no need for oauth) and replace and urls with a working anchor tag:

$tweet = preg_replace('@(https?://([ -\w\.]+)+(:\d+)?(/([\w/_\.\-\%]*(\?\S+)?)?)?)@i', '<a href="$1">$1</a>', $tweet);

Note - this doesn't like brackets in the URL, need to fix that

Find date & time

Sometimes you need to pull out a date & time from a block of copy (like an email) in an automated way.

preg_match_all('#([0-9-]{4}-[0-9]{2}-[0-9]{2}\s{1}[0-9]{2}:[0-9]{2}:[0-9]{2}|[0-9-]{4}-[0-9]{2}-[0-9]{2})#i', $copy, $matches);

Post code check

Several of our sites have geo-location based services (like finding your nearest dealer for example) and like to incorporate those into the main search facility.

We take what you entered and use a regex to guess if what you typed conforms to a postcode.

preg_match('#^([A-z]{1,3}[0-9]{1,2})+#i', $search_string);

If it matches, then we look for a dealer nearby.