GIDForums  

Go Back   GIDForums > Computer Programming Forums > MySQL / PHP Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 12-Feb-2004, 19:42
Perculator Perculator is offline
New Member
 
Join Date: Feb 2004
Posts: 4
Perculator is on a distinguished road

preg_replace question...


Hello -

I've put together a little php script to grab forum posts and parse the bulletin codes into html, however one particular transition seems to be making it screw up. I just wanted to figure out whats going on, cause I seem to be missing something completely with this.

Heres a little snippet of the preg_replace portion....

PHP Code:

$original = array( // All tags are "/\[ start tag text \] between tag text \[\/ end tag text \]/"
                           "/\[[B|b]\](.*)\[\/[B|b]\]/",                        // Bold
                           "/\[[I|i]\](.*)\[\/[I|i]\]/",                        // Italic
                           "/\[[U|u]\](.*)\[\/[U|u]\]/",                        // Underline
                           "/\[[S|s][I|i][Z|z][E|e]=(\d*)\](.*)\[\/[S|s][I|i][Z|z][E|e]\]/",        // Size
                           "/\[[F|f][O|o][N|n][T|t]=(.*)\](.*)\[\/[F|f][O|o][N|n][T|t]\]/",        // Font
                           "/\[[C|c][O|o][L|l][O|o][R|r]=(\w*)\](.*)\[\/[C|c][O|o][L|l][O|o][R|r]\]/",    // Color
                           "/\[[U|u][R|r][L|l]=(.*)\](.*)\[\/[U|u][R|r][L|l]\]/",            // URL
                           "/\[[E|e][M|m][A|a][I|i][L|l]=(.*)\](.*)\[\/[E|e][M|m][A|a][I|i][L|l]\]/",    // Email
                           "/\[[I|i][M|m][G|g]\](.*)\[\/[I|i][M|m][G|g]\]/",                // Image
                           "/\[[C|c][O|o][D|d][E|e]\]([\s\S]*)\[\/[C|c][O|o][D|d][E|e]\]/",        // Code
                           "/\[[P|p][H|h][P|p]\]([\s\S]*)\[\/[P|p][H|h][P|p]\]/",            // PHP
                           "/\[[L|l][I|i][S|s][T|t]=[1]\]([\s\S]*)\[\/[L|l][I|i][S|s][T|t]=[1]\]/",    // Numberd List
                           "/\[[L|l][I|i][S|s][T|t]=[A|a]\]([\s\S]*)\[\/[L|l][I|i][S|s][T|t]=[A|a]\]/",    // Alpha List
                           "/\[[L|l][I|i][S|s][T|t]\]([\s\S]*)\[\/[L|l][I|i][S|s][T|t]\]/",        // Bullet List
                           "/\[\*\](.*)/",                                // List element
                           "/\[[Q|q][U|u][O|o][T|t][E|e]\]([\s\S]*)\[\/[Q|q][U|u][O|o][T|t][E|e]\]/"    // Quote
                         );
        $replace = array(
                           "<B>\\1</B>",                                // Bold
                           "<I>\\1</I>",                                // Italic
                           "<U>\\1</U>",                                // Underline
                           "<FONT SIZE=\\1>\\2</FONT>",                            // Size
                           "<FONT FACE=\"\\1\">\\2</FONT>",                        // Font
                           "<FONT COLOR=\\1>\\2</FONT>",                        // Color
                           "<A HREF=\"\\1\">\\2</A>",                            // URL
                           "<A HREF=\"mailto:\\1\">\\2</A>",                        // Email
                           "<IMG SRC=\"\\1\">",                                // Image
                           "<BLOCKQUOTE><PRE><FONT FACE=\"verdana,arial,helvetica\"
                 SIZE=\"1\">code:</FONT><HR>\\1<HR></PRE></BLOCKQUOTE>",         // Code
                           "<BLOCKQUOTE><PRE><FONT FACE=\"verdana,arial,helvetica\"
                 SIZE=\"1\">php:</FONT><HR>\\1<HR></PRE></BLOCKQUOTE>",            // PHP
                           "<OL TYPE=\"1\">\\1</OL>",                            // Numberd List
                           "<OL TYPE=\"A\">\\1</OL>",                            // Alpha List
                           "<UL>\\1</UL>",                                // Bullet List
                           "<LI>\\1",                                    // List element
                           "<BLOCKQUOTE><PRE><FONT FACE=\"verdana,arial,helvetica\"
                 SIZE=\"1\">quote:</FONT><HR>\\1<HR></PRE></BLOCKQUOTE>"        // Quote
                        );

        $pagetext = preg_replace($original, $replace, $pagetext); 



For the most part this works great, however the URL (and possibly email too) seem to have problems working properly. For the following BB post:

Code:
...about it www.mysite.com. For more information, check out the post www.mysite.com. Do your...

It returns this result:

[html]
...about it <A HREF="http://www.mysite.com/forums/showthread.php?s=&threadid=3357]here[/url]. For more information, check out the post
[url=http://www.mysite.com/forums/showthread.php?s=&threadid=3598">here</A>. Do your...
[/HTML]

So what I would like to know is why its skipping the first match for the url regex? Anyone got any possible ideas?

Much thanks.

-Perc
  #2  
Old 12-Feb-2004, 19:47
Perculator Perculator is offline
New Member
 
Join Date: Feb 2004
Posts: 4
Perculator is on a distinguished road
Oh, and on an odd side note of cleaning that above php up, is there a better way to check for both caps and lowercase in a regex than [U|u][R|r][L|l]?

Thanks.

-Perc
  #3  
Old 13-Feb-2004, 02:54
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough

PCRE (Perl Compatible Regular Expressions) Pattern Modifiers


Hello Perculator,

You need what is known as PCRE Pattern Modifiers in your patterns above.

An example:

PHP Code:

<?php

// the 2 following patterns do the same thing
$find[0] = "/[a-zA-Z]+/";
$find[1] = "/[a-z]+/i"; // by simply appending the "i" modifier to the pattern,
                        // we match both, upper and lower case letters!

// so... to edit one of your existing patterns e.g.
$original[0] = "/\[[B|b]\](.*)\[\/[B|b]\]/";
// you can instead do this..
$original[0] = "/\[b\](.*)\[\/b\]/i";
?>



As for your second problem, this again is a PATTERN MODIFIER issue, the Ungreedy pattern modifier specifically.

Again, to show you how you could use these PCRE pattern modifiers more efficiently in your existing PHP script snippet above, refer to the example code below:

PHP Code:

<?php
$original[6] = "/\[[U|u][R|r][L|l]=(.*)\](.*)\[\/[U|u][R|r][L|l]\]/";  // URL
// you can change this to something like...
$original[6] = "/\[url=(.*)\](.*)\[\/url\]/iU"; // "iU" pattern modifiers
?>


For more detailed information on the various available PCRE Pattern Modifiers, please refer to this page: http://www.php.net/manual/en/pcre.pattern.modifiers.php
  #4  
Old 13-Feb-2004, 05:58
Garth Farley Garth Farley is offline
Awaiting Email Confirmation
 
Join Date: May 2002
Location: Ireland
Posts: 638
Garth Farley is a jewel in the roughGarth Farley is a jewel in the roughGarth Farley is a jewel in the rough
Yup, that's exactly it. Preg functions, by default, are supposed to be as greedy as possible, so using the U modifier tells it not to be.
GF
  #5  
Old 13-Feb-2004, 22:19
Perculator Perculator is offline
New Member
 
Join Date: Feb 2004
Posts: 4
Perculator is on a distinguished road
Spectacular. Worked like a charm.

My last question as a follow up, is why were the bold and italic replacements not being greedy? There were several bold and italic sections, but they all converted properly. I think that's what was throwing me off the most....how it was working for some, and not others.

Reguardless, thank you very much for the quick and useful reply...

-Perc
  #6  
Old 15-Feb-2004, 05:05
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough
Quote:
Originally Posted by Perculator
... is why were the bold and italic replacements not being greedy?...
I am quite certain they are (being GREEDY)... as long as they are in a single line.

What I mean actually: since you did NOT use the "s" (or PCRE_dotall) pattern modifier in your patterns, multiple [b] tags will work quite well, as LONG as each pair appear in separate lines - but the minute there are multiple
PHP Code:

<?php

$find[0]        =  '/\[b\](.*)\[\/b\]/i';
$replace[0]     =  "<strong>$1</strong>";

$before         =  "[b]Bold<strong> tags in a single line, you're screwed! <img src="/images/gid/smilies/icon_lol.gif" border="0" alt="" title="Laughing" class="inlineimg" />

Example follows:

</strong> this.\r\nAnd [b]this[/b].";

$after          =  preg_replace( $find, $replace, $before );



[/b]
PHP Code:

this. And [b]this</strong>.

*/

// So, ideally, $find[0] should look like this:
$find[0]       =  "#\[b\](.*)\[/b\]#isU";
?> 

  #7  
Old 16-Feb-2004, 22:25
Perculator Perculator is offline
New Member
 
Join Date: Feb 2004
Posts: 4
Perculator is on a distinguished road
Very interesting, and very cool. Much thanks JDS...

-Perc
 
 

Recent GIDBlogProblems with the Navy (Chiefs) by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
question of practice magiccreative C++ Forum 1 06-Feb-2004 08:17
a C input question.. tmike C Programming Language 2 19-Sep-2003 03:39
a C input question tmike C Programming Language 1 16-Sep-2003 03:31
a noobish compiler question Charunks C++ Forum 5 03-Sep-2003 03:18
Regular Expressions question. JdS MySQL / PHP Forum 3 04-Nov-2002 15:04

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 01:48.


vBulletin, Copyright © 2000 - 2009, Jelsoft Enterprises Ltd.