grep puzzle

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • KhanyBoy

    grep puzzle

    Hi,

    this should test you guru's. I want a function that accepts text as an
    argument and converts all & into & except where it is a html
    character already such as  , ", and of course &.

    If there is already a php function for this I would like to know, but
    if not, what is the GREP equivilent?

    Thanks
  • Alvaro G Vicario

    #2
    Re: grep puzzle

    *** KhanyBoy wrote/escribió (10 Jun 2004 14:22:24 -0700):[color=blue]
    > this should test you guru's. I want a function that accepts text as an
    > argument and converts all & into & except where it is a html
    > character already such as  , ", and of course &.[/color]

    I can't figure out how you manage to get such a garbled input. Anyway, I
    guess a combination of html_entity_dec ode() and html_entities() should
    help.


    --
    --
    -- Álvaro G. Vicario - Burgos, Spain
    --

    Comment

    • Justin Koivisto

      #3
      Re: grep puzzle

      KhanyBoy wrote:[color=blue]
      > this should test you guru's. I want a function that accepts text as an
      > argument and converts all & into & except where it is a html
      > character already such as  , ", and of course &.
      >
      > If there is already a php function for this I would like to know, but
      > if not, what is the GREP equivilent?[/color]

      OK, this does that, but it may not be a very elegant soltion. I recently
      needed the same functionality for a project involving oscommerce.

      function ampersandFix($x ){
      $x=str_replace( '&','&',$x) ;
      $pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
      '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
      '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
      '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
      '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
      '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
      '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
      '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
      '|die|uuml|yacu te|yen|yuml);`i ';
      $replace='&$1;' ;
      return preg_reacple($p attern,$replace ,$x);
      }


      --
      Justin Koivisto - spam@koivi.com
      PHP POSTERS: Please use comp.lang.php for PHP related questions,
      alt.php* groups are not recommended.

      Comment

      • Christian Fersch

        #4
        Re: grep puzzle

        Justin Koivisto wrote:[color=blue]
        > KhanyBoy wrote:[color=green]
        >> If there is already a php function for this I would like to know, but
        >> if not, what is the GREP equivilent?[/color]
        >
        > OK, this does that, but it may not be a very elegant soltion. I recently
        > needed the same functionality for a project involving oscommerce.
        >
        > function ampersandFix($x ){
        > $x=str_replace( '&','&',$x) ;
        > $pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
        > '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
        > '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
        > '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
        > '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
        > '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
        > '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
        > '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
        > '|die|uuml|yacu te|yen|yuml);`i ';
        > $replace='&$1;' ;
        > return preg_reacple($p attern,$replace ,$x);
        > }[/color]

        A lot faster, but not as accurate:
        $text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);

        You could also use this method (it's a lookahead assertion) with
        Justin's function, which will still be a lot faster than his hack ;)

        Greetings Christian.

        Comment

        • John Dunlop

          #5
          Re: grep puzzle

          KhanyBoy wrote:
          [color=blue]
          > this should test you guru's. I want a function that accepts text as an
          > argument and converts all & into & except where it is a html
          > character already such as  , ", and of course &.[/color]

          Why are entity references recognised in text?

          --
          Jock

          Comment

          • FLEB

            #6
            Re: grep puzzle

            Regarding this well-known quote, often attributed to KhanyBoy's famous "10
            Jun 2004 14:22:24 -0700" speech:
            [color=blue]
            > Hi,
            >
            > this should test you guru's. I want a function that accepts text as an
            > argument and converts all & into & except where it is a html
            > character already such as  , ", and of course &.
            >
            > If there is already a php function for this I would like to know, but
            > if not, what is the GREP equivilent?
            >
            > Thanks[/color]

            It might be a different direction, but here are some functions to determine
            whether something is an HTML entity, using the built-in PHP entity
            functions. It's sort of a recycled reply to an earlier question:

            A simple (rough) test of the concept is online at:


            <?php
            /*
            How to determine whether a given string decodes to an HTML entity
            */

            function contains_entiti es($raw)
            {
            // $raw can be a string of any length
            $raw = trim($raw);
            return (strlen(htmlent ities($raw)) > strlen($raw));
            }

            function is_entity_refer ence($raw)
            {
            // $raw should be a string with only the entity ref in it,
            // in the form "&...;"

            return (preg_match('/&.+;/', $raw)) &&
            (strlen(html_en tity_decode(tri m($raw))) == 1);
            }
            ?>


            --
            -- Rudy Fleminger
            -- sp@mmers.and.ev il.ones.will.bo w-down-to.us
            (put "Hey!" in the Subject line for priority processing!)
            -- http://www.pixelsaredead.com

            Comment

            • Justin Koivisto

              #7
              Re: grep puzzle

              Christian Fersch wrote:
              [color=blue]
              > Justin Koivisto wrote:
              >[color=green]
              >> KhanyBoy wrote:
              >>[color=darkred]
              >>> If there is already a php function for this I would like to know, but
              >>> if not, what is the GREP equivilent?[/color]
              >>
              >>
              >> OK, this does that, but it may not be a very elegant soltion. I
              >> recently needed the same functionality for a project involving
              >> oscommerce.
              >>
              >> function ampersandFix($x ){
              >> $x=str_replace( '&','&amp;',$x) ;
              >> $pattern='`&amp ;(#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
              >> '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
              >> '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
              >> '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
              >> '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
              >> '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
              >> '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
              >> '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
              >> '|die|uuml|yacu te|yen|yuml);`i ';
              >> $replace='&$1;' ;
              >> return preg_reacple($p attern,$replace ,$x);
              >> }[/color]
              >
              >
              > A lot faster, but not as accurate:
              > $text = preg_replace('! &(?![#a-z0-9]{1,7};)!i','&am p',$text);
              >
              > You could also use this method (it's a lookahead assertion) with
              > Justin's function, which will still be a lot faster than his hack ;)[/color]

              Let's fix the hack then, shall we?

              function ampersandFix($x ){
              $pattern='`&(?! (#[0-9]{2,3}|aacute|ac irc|acute|aelig |agrave|amp'.
              '|aring|atilde| auml|brvbar|brk bar|ccedil|cedi l|cent'.
              '|copy|curren|d eg|divide|eacut e|ecirc|egrave| eth|euml'.
              '|frac12|frac14 |frac34|gt|iacu te|icirc|iexcl| igrave'.
              '|iquest|iuml|l aquo|lt|macr|hi bar|micro|middo t|nbsp|not'.
              '|ntilde|oacute |ocirc|ograve|o rdf|ordm|oslash |otilde'.
              '|ouml|para|plu smn|pound|quot| raquo|reg|sect| shy|sup1|sup2'.
              '|sup3|szlig|th orn|times|uacut e|ucirc|ugrave| uml'.
              '|die|uuml|yacu te|yen|yuml);)` i';
              return preg_replace($p attern,'&amp;', $x);
              }

              Now it's faster AND accurate. ;)

              --
              Justin Koivisto - spam@koivi.com
              PHP POSTERS: Please use comp.lang.php for PHP related questions,
              alt.php* groups are not recommended.

              Comment

              Working...