Regular Expressions Tutorial I have searched the web far and near for a
good tutorial on PHP Regular Expressions and I have come up with a
multitude of sites. However, I needed just a little bit of information
from each of the sites and I ended up trying to move between 10
different webpages to get the information I needed at a particular time.
This tutorial is a collation of all those bits of information. Some of
this is my work, but it is mostly good collection of other tutorials
available out there. In order to give authors credit for their work, I
have included ALL the links of those pages and if anyone feels like this
is an outrage, let me know and I will take down the relevant
information. So here goes... Basic Syntax of Regular Expressions (as
from PHPBuilder.com) "ab*": matches a string that has an a followed by
zero or more b's ("a", "ab", "abbb", etc.); "ab+": same, but there's at
least one b ("ab", "abbb", etc.); "ab?": there might be a b or not;
"a?b+$": a possible a followed by one or more b's ending a string. You
can also use bounds, which come inside braces and indicate ranges in the
number of occurences: "ab{2}": matches a string that has an a followed
by exactly two b's ("abb"); "ab{2,}": there are at least two b's ("abb",
"abbbb", etc.); "ab{3,5}": from three to five b's ("abbb", "abbbb", or
"abbbbb"). Note that you must always specify the first number of a range
(i.e, "{0,2}", not "{,2}"). Also, as you might have noticed, the symbols
'*', '+', and '?' have the same effect as using the bounds "{0,}",
"{1,}", and "{0,1}", respectively. First of all, let's take a look at
two special symbols: '^' and '$'. What they do is indicate the start and
the end of a string, respectively, like this: "^The": matches any string
that starts with "The"; "of despair$": matches a string that ends in the
substring "of despair"; "^abc$": a string that starts and ends with
"abc" -- that could only be "abc" itself! "notice": a string that has
the text "notice" in it. You can see that if you don't use either of the
two characters we mentioned, as in the last example, you're saying that
the pattern may occur anywhere inside the string -- you're not "hooking"
it to any of the edges. There are also the symbols '*', '+', and '?',
which denote the number of times a character or a sequence of characters
may occur. What they mean is: "zero or more", "one or more", and "zero
or one." Here are some examples: Now, to quantify a sequence of
characters, put them inside parentheses: "a(bc)*": matches a string that
has an a followed by zero or more copies of the sequence "bc";
"a(bc){1,5}": one through five copies of "bc." There's also the '|'
symbol, which works as an OR operator: "hi|hello": matches a string that
has either "hi" or "hello" in it; "(b|cd)ef": a string that has either
"bef" or "cdef"; "(a|b)*c": a string that has a sequence of alternating
a's and b's ending in a c; A period ('.') stands for any single
character: "a.[0-9]": matches a string that has an a followed by one
character and a digit; "^.{3}$": a string with exactly 3 characters.
Bracket expressions specify which characters are allowed in a single
position of a string: "[ab]": matches a string that has either an a or a
b (that's the same as "a|b"); "[a-d]": a string that has lowercase
letters 'a' through 'd' (that's equal to "a|b|c|d" and even "[abcd]");
"^[a-zA-Z]": a string that starts with a letter; "[0-9]%": a string that
has a single digit before a percent sign; ",[a-zA-Z0-9]$": a string that
ends in a comma followed by an alphanumeric character. You can also list
which characters you DON'T want -- just use a '^' as the first symbol in
a bracket expression (i.e., "%[^a-zA-Z]%" matches a string with a
character that is not a letter between two percent signs). In order to
be taken literally, you must escape the characters "^.[$()|*+?{\" with a
backslash ('\'), as they have special meaning. On top of that, you must
escape the backslash character itself in PHP3 strings, so, for instance,
the regular expression "(\$|¥)[0-9]+" would have the function call:
ereg("(\\$|¥)[0-9]+", $str) (what string does that validate?) Example 1.
Examples of valid patterns * /<\/\w+>/ * |(\d{3})-\d+|Sm *
/^(?i)php[34]/ * {^\s+(\s+)?$} Example 2. Examples of invalid patterns *
/href='(.*)' - missing ending delimiter * /\w+\s*\w+/J - unknown
modifier 'J' * 1-\d3-\d3-\d4| - missing starting delimiter Some useful
PHP Keywords and their use (php.net man pages) preg_split (PHP 3>=
3.0.9, PHP 4 ) preg_split -- Split string by a regular expression
Description array preg_split ( string pattern, string subject [, int
limit [, int flags]]) Returns an array containing substrings of subject
split along boundaries matched by pattern. If limit is specified, then
only substrings up to limit are returned, and if limit is -1, it
actually means "no limit", which is useful for specifying the flags.
flags can be any combination of the following flags (combined with
bitwise | operator): PREG_SPLIT_NO_EMPTY If this flag is set, only
non-empty pieces will be returned by preg_split().
PREG_SPLIT_DELIM_CAPTURE If this flag is set, parenthesized expression
in the delimiter pattern will be captured and returned as well. This
flag was added for 4.0.5. PREG_SPLIT_OFFSET_CAPTURE If this flag is set,
for every occuring match the appendant string offset will also be
returned. Note that this changes the return value in an array where
every element is an array consisting of the matched string at offset 0
and it's string offset into subject at offset 1. This flag is available
since PHP 4.3.0 . Example 1. preg_split() example : Get the parts of a
search string
<?php
// split the phrase by any number of commas or space characters,
// which include " ", \r, \t, \n and \f
$keywords = preg_split ("/[\s,]+/", "hypertext language, programming");
?>
Example 2. Splitting a string into component characters Example 3.
Splitting a string into matches and their offsets
<?php
$str = 'hypertext language programming';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>
will yield: Array ( [0] => Array ( [0] => hypertext [1] => 0 ) [1] =>
Array ( [0] => language [1] => 10 ) [2] => Array ( [0] => programming
[1] => 19 ) ) Note: Parameter flags was added in PHP 4 Beta 3.
<?php
$str = 'string';
$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>
preg_match (PHP 3>= 3.0.9, PHP 4 ) preg_match -- Perform a regular
expression match Description int preg_match ( string pattern, string
subject [, array matches [, int flags]]) Searches subject for a match to
the regular expression given in pattern. If matches is provided, then it
is filled with the results of search. $matches[0] will contain the text
that matched the full pattern, $matches[1] will have the text that
matched the first captured parenthesized subpattern, and so on. flags
can be the following flag: PREG_OFFSET_CAPTURE If this flag is set, for
every occuring match the appendant string offset will also be returned.
Note that this changes the return value in an array where every element
is an array consisting of the matched string at offset 0 and it's string
offset into subject at offset 1. This flag is available since PHP 4.3.0
. The flags parameter is available since PHP 4.3.0 . preg_match()
returns the number of times pattern matches. That will be either 0 times
(no match) or 1 time because preg_match() will stop searching after the
first match. preg_match_all() on the contrary will continue until it
reaches the end of subject. preg_match() returns FALSE if an error
occured. Tip: Do not use preg_match() if you only want to check if one
string is contained in another string. Use strpos() or strstr() instead
as they will be faster. Example 1. Find the string of text "php"
<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match ("/php/i", "PHP is the web scripting language of choice.")) {
print "A match was found.";
} else {
print "A match was not found.";
}
?>
<strong>Example 2.</strong>
Find the word "web"
<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match ("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
print "A match was found.";
} else {
print "A match was not found.";
}
if (preg_match ("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
print "A match was found.";
} else {
print "A match was not found.";
}
?>
<strong>Example 3.</strong>
Getting the domain name out of a URL
<?php
// get host name from URL
preg_match("/^(http:\/\/)?([^\/]+)/i",
"http://www.php.net/index.html", $matches);
$host = $matches[2];
// get last two segments of host name
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
This example will produce: domain name is: php.net Perl Style Delimiters
(as from crazygrrl.com) When using Perl-style matching, the pattern also
has to be enclosed by special delimiters. The default is the forward
slash, though you can use others. For example: /colou?r/ Usually you'll
want to stick with the default, but if you need to use the forward slash
a lot in the actual pattern (especially if you're dealing with
pathnames) you might want to use something else: !/root/home/random! To
make a match case-insensitive, all you need to do is append the option i
to the pattern: /colou?r/i Perl-style functions support these extra
metacharacters (this is not a full list): \b A word boundary, the spot
between word (\w) and non-word (\W) characters. \B A non-word boundary.
\d A single digit character. \D A single non-digit character. \n The
newline character. (ASCII 10) \r The carriage return character. (ASCII
13) \s A single whitespace character. \S A single non-whitespace
character. \t The tab character. (ASCII 9) \w A single word character -
alphanumeric and underscore. \W A single non-word character. Example:
/\bhomer\b/ Have a donut, Homer no match A tale of homeric proportions!
no match Do you think he can hit a homer? match Corresponding to ereg()
is preg_match(). Syntax: preg_match(pattern (string), target (string),
optional_array); Example: $pattern =
"/\b(do(ugh)?nut)\b.*\b(Homer|Fred)\b/i"; $target = "Have a donut,
Homer."; if (preg_match($pattern, $target, $matches)) { print("
<P>Match: $reg[0]</P>
"); print("
<P>Pastry: $reg[1]</P>
"); print("
<P>Variant: $reg[2]</P>
"); print("
<P>Name: $reg[3]</P>
"); } else { print("No match."); } Results: Match: donut, Homer Pastry:
donut Variant: [blank because there was no "ugh"] Name: Homer If you use
the $target "Doughnut, Frederick?" there will be no match, since there
has to be a word boundary after Fred. but "Doughnut, fred?" will match
since we've specified it to be case-insensitive. Contributed code which
is applicable (and very useful!)mkr at binarywerks dot dkA (AFAIK)
correct implementation of Ipv4 validation, this one supports optional
ranges (CIDR notation) and it validates numbers from 0-255 only in the
address part, and 1-32 only after the /plenque at hotmail dot comI wrote
a function that checks if a given regular expression is valid. I think
some of you might find it useful. It changes the error_handler and
restores it, I didn't find any other way to do it.PHP Get_title tag code
which uses simple regex and nice php string functionsZend PHP)
<?php
function get_title_tag($chaine){
$fp = fopen ($chaine, 'r');
while (! feof ($fp)){
$contenu .= fgets ($fp, 1024);
if (stristr($contenu, '<\title>' )){
break;
}
}
if (eregi("", $contenu, $out)) {
return $out[1];
}
else{
return false;
}
}
?>
My Own 'Visitor Trac' code which uses regex XML parsing methods
<?
function valid_ipv4($ip_addr)
{
$num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";
$range="([1-9]|1\d|2\d|3[0-2])";
if(preg_match("/^$num\.$num\.$num\.$num(\/$range)?$/",$ip_addr))
{
return 1;
}
return 0;
}
$ip_array[] = "127.0.0.1";
$ip_array[] = "127.0.0.256";
$ip_array[] = "127.0.0.1/36";
$ip_array[] = "127.0.0.1/1";
foreach ($ip_array as $ip_addr)
{
if(valid_ipv4($ip_addr))
{
echo "$ip_addr is valid<BR>\n";
}
else
{
echo "$ip_addr is NOT valid<BR>\n";
}
}
?>
Function IsRegExp ($sREGEXP) { $sPREVIOUSHANDLER = Set_Error_Handler
("TrapError"); Preg_Match ($sREGEXP, ""); Restore_Error_Handler
($sPREVIOUSHANDLER); Return !TrapError (); } Function TrapError () {
Static $iERRORES; If (!Func_Num_Args ()) { $iRETORNO = $iERRORES;
$iERRORES = 0; Return $iRETORNO; } Else { $iERRORES++; } } (As from
<?php
$referer = $_SERVER['HTTP_REFERER'];
$filename = $_SERVER[REMOTE_ADDR] . '.txt';
//print_r($_SERVER);
if (file_exists($filename)){
$lastvisit = filectime($filename);
$currentdate = date('U');
$difference = round(($currentdate - $lastvisit)/84600);
if ($difference > 7) {
unlink($filename);
$fp = fopen($filename, "a");
}
else $fp = fopen($filename, "a");
}
else $fp = fopen($filename, "a");
if (!$_SERVER['HTTP_REFERER']) $url_test = 'http://dinki.mine.nu/weblog/';
else $url_test = $_SERVER['HTTP_REFERER'];
$new_title = return_title ($url_test);
//print $new_title;
$new_name = stripslashes("<beg>$new_title\n");
$new_URL = stripslashes("<beg>$referer\n");
fwrite($fp,$new_URL);
fwrite($fp,$new_name);
fclose($fp);
$fp = fopen($filename, "r");
$file = implode('', file ($filename));
$foo = preg_split("/<beg>/",$file);
$number = count($foo);
//print $number;
if ($number > 11) {
fclose($fp);
$fp = fopen($filename, "w");
$count = $number - 10;
while ($count < $number) {
$print1 = $foo[$count];
$print2 = $foo[$count+1];
print " <img src = arrow.gif> ";
print "<a href=$print1>$print2</a>"; //print $count;
$count += 2;
$new_name = stripslashes("<beg>$print2");
$new_URL = stripslashes("<beg>$print1");
fwrite($fp,$new_URL);
fwrite($fp,$new_name);
}
fclose($fp);
}
//print_r($foo);
else {
$count = 1;
while ($count <= $number) {
$print1 = $foo[$count];
$print2 = $foo[$count+1];
print " <img src = arrow.gif> ";
print "<a href=$print1>$print2</a>"; //print $count;
$count += 2;
}
fclose($fp);
}
function return_title($url) {
print $filename." ".$difference;
$array = file ($url);
for ($i = 0; $i < count($array); $i++)
{
if (preg_match("/<title>(.*)<\/title>/i",$array[$i], $tag_contents)) {
$title = $tag_contents[1];
$title = strip_tags($title);
}
}
return $title;
}
?>
Good online articles as reference or extra reading # O'Rielly Pocket
Reference - PHP Regular Expressions # A very nice article on PHP Regular
Expressions from DevArticle.com # A good run down of PHP-Regular
expressions with emphasis on code # Regular Expression Creator and
Editor from the makers of PHPEdit # Regular Expressions Library (with
over 430 expressions and growing!!)
No comments:
Post a Comment