"smart" quotes in PHP

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Martin Goldman

    "smart" quotes in PHP

    Hello all,

    I've been struggling for a few days with the question of how to convert
    "smart" (curly) quotes into straight quotes. I tried playing with the
    htmlentities() function, but all that is doing is changing the smart
    quotes into nonsense characters. I also searched the web for quite a
    while and was unsuccessful in finding a solution.

    What puzzles me is that doing it the other way around is simple enough.
    For example, this works fine in converting a straight quote into an
    "open" smart quote:

    if ($content[$k] == "\"")
    $content = substr($content , 0, $k) . "“" . substr
    ($content, $k+1, strlen($content )-$k+1);

    But the other way around doesn't work. Any ideas?

    Thanks,

    Martin Goldman
    My e-mail addresse's correct domain name is mgoldman.com.
  • Daniel Tryba

    #2
    Re: "smart&quo t; quotes in PHP

    Martin Goldman <www@nowhere.fo o> wrote:[color=blue]
    > I've been struggling for a few days with the question of how to convert
    > "smart" (curly) quotes into straight quotes.[/color]

    Smart/curly quotes? straight quotes? What are these?
    [color=blue]
    > What puzzles me is that doing it the other way around is simple enough.
    > For example, this works fine in converting a straight quote into an
    > "open" smart quote:
    >
    > if ($content[$k] == "\"")
    > $content = substr($content , 0, $k) . "“" . substr
    > ($content, $k+1, strlen($content )-$k+1);[/color]

    Funny way to do a str_replace :)

    What character is represented by #147? AFAIK it's not in any characters
    set I know (ASCII or ISO-8859-x). So your actual problem might be that
    you are using an other encoding for the character you want to preplace
    that PHP is actually using!

    BTW 3rd parameter in htmlentities specifies the character set.

    --

    Daniel Tryba

    Comment

    • Andy Hassall

      #3
      Re: &quot;smart&quo t; quotes in PHP

      On Fri, 14 Nov 2003 17:42:08 GMT, Martin Goldman <www@nowhere.fo o> wrote:
      [color=blue]
      >I've been struggling for a few days with the question of how to convert
      >"smart" (curly) quotes into straight quotes. I tried playing with the
      >htmlentities () function, but all that is doing is changing the smart
      >quotes into nonsense characters. I also searched the web for quite a
      >while and was unsuccessful in finding a solution.[/color]

      You've got to work out what character set the text is encoded in, for
      starters, since 'smart quotes' exist in Microsoft's Codepage 1522 but not in
      the standard ISO 8859 character sets, e.g. iso-8859-15.

      In codepage 1522:

      hex dec Unicode Unicode name
      91 145 8216 LEFT SINGLE QUOTATION MARK
      92 146 8217 RIGHT SINGLE QUOTATION MARK
      93 147 8220 LEFT DOUBLE QUOTATION MARK
      94 148 8221 RIGHT DOUBLE QUOTATION MARK

      But in iso-8859-15, 145-148 aren't defined as printable characters; 128-159
      are reserved for control characters.

      So if you change it to &#147, but output your page encoded in iso-8859-1,
      you're just changing it to the code for a non-printable character. The same
      entity will appear as a left double quotation mark if encoded in Windows-1522
      though.
      [color=blue]
      >What puzzles me is that doing it the other way around is simple enough.
      >For example, this works fine in converting a straight quote into an
      >"open" smart quote:
      >
      > if ($content[$k] == "\"")
      > $content = substr($content , 0, $k) . "“" . substr
      >($content, $k+1, strlen($content )-$k+1);
      >
      >But the other way around doesn't work. Any ideas?[/color]

      In what way doesn't it work? What does str_replace($co ntent, chr(147), '"');
      appear to do in your setup?

      --
      Andy Hassall (andy@andyh.co. uk) icq(5747695) (http://www.andyh.co.uk)
      Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)

      Comment

      • John Dunlop

        #4
        Re: &quot;smart&quo t; quotes in PHP

        Martin Goldman wrote:
        [color=blue]
        > I've been struggling for a few days with the question of how to convert
        > "smart" (curly) quotes into straight quotes.[/color]

        As D. Tryba hinted at, str_replace should work fine. After all,
        you're replacing one character with another.

        $string = str_replace($ch r,'"',$string)

        where $chr is the character you want to replace.
        [color=blue]
        > I tried playing with the htmlentities() function, but all that is doing
        > is changing the smart quotes into nonsense characters.[/color]

        I'd be interested in seeing what you actually tried. Since so-called
        smart quotes aren't in the Latin-1 repertoire, you'd have to specify
        a charset other than the default ISO-8859-1. Say you typed smart
        quotes on a bog standard Windows system by holding down Alt and
        pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use

        $string = htmlentities($s tring,ENT_COMPA T,'cp1252')

        where $string is the string containing smart quotes. That converts
        smart quotes to their respective entity references.
        [color=blue]
        > What puzzles me is that doing it the other way around is simple enough.[/color]

        Eek! I'd have thought that was *more* difficult...
        [color=blue]
        > if ($content[$k] == "\"")
        > $content = substr($content , 0, $k) . "“" . substr
        > ($content, $k+1, strlen($content )-$k+1);[/color]

        How does your script know that the quotation mark was intended as an
        opening quotation mark? ;-)

        In HTML, the character reference “ is undefined. The LEFT DOUBLE
        QUOTATION MARK can be represented using the character reference
        “ or the entity reference &ldquo;. The RIGHT DOUBLE QUOTATION
        MARK can be represented using the character reference ” or the
        entity reference &rdquo;.

        --
        Jock

        Comment

        • Martin Goldman

          #5
          Re: &quot;smart&quo t; quotes in PHP

          John Dunlop <john+usenet@jo hndunlop.info> wrote in
          news:MPG.1a1f80 6fb5038c649897c 5@news.freeserv e.net:
          [color=blue]
          > Martin Goldman wrote:[/color]
          [color=blue]
          > I'd be interested in seeing what you actually tried. Since so-called
          > smart quotes aren't in the Latin-1 repertoire, you'd have to specify
          > a charset other than the default ISO-8859-1. Say you typed smart
          > quotes on a bog standard Windows system by holding down Alt and
          > pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use
          >
          > $string = htmlentities($s tring,ENT_COMPA T,'cp1252')
          >
          > where $string is the string containing smart quotes. That converts
          > smart quotes to their respective entity references.
          >[/color]
          This results in the smart quotes being replaced with nonsense characters.
          The thing is, though, that I'm totally unfamiliar with character sets,
          the differences between them, etc. I've never had any reason to care
          about them. So I'm a little confused about what you guys are talking
          about when it comes to them.
          [color=blue]
          > How does your script know that the quotation mark was intended as an
          > opening quotation mark? ;-)[/color]
          Well, I didn't paste the whole thing. :) I wrote a loop that goes through
          the string. It toggles a flag each time a quotation mark is found. If the
          flag is set, it makes it an open quote; if it's not, it makes it a closed
          quote. Hence the reason I'm not just using a str_replace for that. :)

          Oh, and to answer Mr. Hassall's question -- str_replace(chr (147), "\"",
          $content) doesn't do anything. The exact same string is returned.

          -Martin

          Comment

          • Daniel Tryba

            #6
            Re: &quot;smart&quo t; quotes in PHP

            Martin Goldman <www@nowhere.fo o> wrote:
            [consufed about charsets][color=blue]
            > Oh, and to answer Mr. Hassall's question -- str_replace(chr (147), "\"",
            > $content) doesn't do anything. The exact same string is returned.[/color]

            That might mean that there is nog chr(147) in the string although you
            _see_ a character that might be represented as the character you know as
            147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
            cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol and
            totally lacks the eurosymbol. Thats why if you want to display the uero
            symbol one is encouraged to use the htmlentitie &euro;, which can be
            rendered in any font and any character set (with a fallback to EUR).

            So you job is to figure out how you quote is encoded (just step through
            the string and print the chr value for each character)...

            BTW unicode kind of solves the problem by defining every known character
            in one set, the problem is that not every program supports it yet. But
            unicode also introduces an other problem, the way the characters are
            encoded (eg utf7, utf8, utf16...), I don't know if PHP supports utf16+.

            --

            Daniel Tryba

            Comment

            • Martin Goldman

              #7
              Re: &quot;smart&quo t; quotes in PHP

              Daniel Tryba <news_comp.lang .php@canopus.nl > wrote in news:bp5nhq$d0e $1
              @news.tue.nl:
              [color=blue]
              > That might mean that there is nog chr(147) in the string although you
              > _see_ a character that might be represented as the character you know[/color]
              as[color=blue]
              > 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
              > cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol[/color]
              and[color=blue]
              > totally lacks the eurosymbol. Thats why if you want to display the uero
              > symbol one is encouraged to use the htmlentitie &euro;, which can be
              > rendered in any font and any character set (with a fallback to EUR).
              >
              > So you job is to figure out how you quote is encoded (just step through
              > the string and print the chr value for each character)...[/color]
              Interesting you should suggest this, because I just did that. And indeed,
              it's not coming out as 147. It's coming out as 226, followed by 128,
              followed by 156. I suppose I could do a str_replace for these 3
              characters and replace it with 147. Although, then I'd have to do that
              for every character I want to support. What a drag.

              Thanks,
              Martin

              Comment

              • Andy Hassall

                #8
                Re: &quot;smart&quo t; quotes in PHP

                On Sat, 15 Nov 2003 19:57:14 GMT, Martin Goldman <www@nowhere.fo o> wrote:
                [color=blue]
                >Daniel Tryba <news_comp.lang .php@canopus.nl > wrote in news:bp5nhq$d0e $1
                >@news.tue.nl :
                >[color=green]
                >> That might mean that there is nog chr(147) in the string although you
                >> _see_ a character that might be represented as the character you know
                >> as 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
                >> cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol
                >> and totally lacks the eurosymbol. Thats why if you want to display the uero
                >> symbol one is encouraged to use the htmlentitie &euro;, which can be
                >> rendered in any font and any character set (with a fallback to EUR).
                >>
                >> So you job is to figure out how you quote is encoded (just step through
                >> the string and print the chr value for each character)...[/color]
                >
                >Interesting you should suggest this, because I just did that. And indeed,
                >it's not coming out as 147. It's coming out as 226, followed by 128,
                >followed by 156. I suppose I could do a str_replace for these 3
                >characters and replace it with 147. Although, then I'd have to do that
                >for every character I want to support. What a drag.[/color]

                Your text is encoded in UTF-8. Going back to the characters again:

                hex dec Unicode Unicode name
                91 145 8216 LEFT SINGLE QUOTATION MARK
                92 146 8217 RIGHT SINGLE QUOTATION MARK
                93 147 8220 LEFT DOUBLE QUOTATION MARK
                94 148 8221 RIGHT DOUBLE QUOTATION MARK

                226,128,147 in binary is:

                11100010
                10000000
                10011100

                '1110' in the first few bits of the first byte indicates it is a lead byte for
                a three-byte character. The remaining two are trail bytes, as they start with
                10. So separating out the data gets:

                1110 0010
                10 000000
                10 011100

                => 001000000001110 0 (binary)
                = 8220 (decicmal)

                Which is LEFT DOUBLE QUOTATION MARK.

                --
                Andy Hassall (andy@andyh.co. uk) icq(5747695) (http://www.andyh.co.uk)
                Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)

                Comment

                • Daniel Tryba

                  #9
                  Re: &quot;smart&quo t; quotes in PHP

                  Andy Hassall <andy@andyh.co. uk> wrote:[color=blue][color=green][color=darkred]
                  >>> So you job is to figure out how you quote is encoded (just step through
                  >>> the string and print the chr value for each character)...[/color]
                  >>
                  >>Interesting you should suggest this, because I just did that. And indeed,
                  >>it's not coming out as 147. It's coming out as 226, followed by 128,
                  >>followed by 156. I suppose I could do a str_replace for these 3
                  >>characters and replace it with 147. Although, then I'd have to do that
                  >>for every character I want to support. What a drag.[/color]
                  >
                  > Your text is encoded in UTF-8. Going back to the characters again:[/color]
                  [in depth UTF-8 decoding :)]

                  So Martin, you should take a look at iconv or if your server lacks
                  support utf8_decode(). The latter has also a usercontrib on how to use
                  str_replace on UTF-8 encoded string.

                  --

                  Daniel Tryba

                  Comment

                  • Martin Goldman

                    #10
                    Re: &quot;smart&quo t; quotes in PHP

                    Daniel Tryba <news_comp.lang .php@canopus.nl > wrote in
                    news:bpee7i$5fr $2@news.tue.nl:
                    [color=blue]
                    > Andy Hassall <andy@andyh.co. uk> wrote:[/color]
                    [color=blue]
                    > So Martin, you should take a look at iconv or if your server lacks
                    > support utf8_decode(). The latter has also a usercontrib on how to use
                    > str_replace on UTF-8 encoded string.
                    >[/color]

                    Great. Thanks to everyone to replied.

                    -Martin
                    my correct domain name is mgoldman.com

                    Comment

                    Working...