garbage characters are now on the site, although they weren't thereoriginally

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Lawrence Krubner

    garbage characters are now on the site, although they weren't thereoriginally



    Once upon a time, there were no garbage characters on this page:



    Now there are. For instance:

    The 2nd paragraph from page 114 of “The Zen Of CSS Design”


    For me, there are garbage characters before "The" and after "Design".

    The page has always, always been served as UTF-8.

    I'm having trouble what might have changed, which would cause these
    garbage characters. At a stretch, I think back to an incident a few
    months ago, when our server was hacked, and we had to do a re-install,
    with upgraded versions of stuff like Apache. So I could almost imagine
    Apache sending new headers, except that, in my case, the meta tag
    indicates UTF-8 and when I look at it in FireFox, FireFox correctly
    reads it as UTF-8.

    Anything else that could cause this?

    I can not find a character encoding that renders this page without
    garbage characters.

    -- lawrence krubner
  • Ben C

    #2
    Re: garbage characters are now on the site, although they weren't there originally

    On 2008-06-05, Lawrence Krubner <lawrence@krubn er.comwrote:
    >
    >
    Once upon a time, there were no garbage characters on this page:
    >

    >
    Now there are. For instance:
    >
    The 2nd paragraph from page 114 of “The Zen Of CSS Design”
    >
    >
    For me, there are garbage characters before "The" and after "Design".
    >
    The page has always, always been served as UTF-8.
    >
    I'm having trouble what might have changed, which would cause these
    garbage characters. At a stretch, I think back to an incident a few
    months ago, when our server was hacked, and we had to do a re-install,
    with upgraded versions of stuff like Apache. So I could almost imagine
    Apache sending new headers, except that, in my case, the meta tag
    indicates UTF-8 and when I look at it in FireFox, FireFox correctly
    reads it as UTF-8.
    >
    Anything else that could cause this?
    >
    I can not find a character encoding that renders this page without
    garbage characters.
    The page _is_ valid UTF-8, and the server header says it's UTF-8, and it
    really does contain those characters (a with circumflex, euro symbol, oe
    diphthong ligature thing), encoded in UTF-8.

    How did they get there? Not sure, perhaps you "converted" the file from
    Latin1 to UTF-8 when it already was UTF-8 or something.

    Anyway you should be OK if you just fix the page to contain instead the
    UTF-8 representations of the characters you want (presumably quotation
    marks).

    Never mind the meta tag-- the browser only uses that if the server fails
    to say what the encoding is. In your case the server is. The meta tag
    might as well be correct, but it won't cause or solve a real problem
    here.

    Comment

    • Rik Wasmus

      #3
      Re: garbage characters are now on the site, although they weren't there originally

      On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
      <lawrence@krubn er.comwrote:
      Once upon a time, there were no garbage characters on this page:
      >

      >
      Now there are. For instance:
      >
      The 2nd paragraph from page 114 of “The Zen Of CSS Design�
      >
      >
      For me, there are garbage characters before "The" and after "Design".
      >
      The page has always, always been served as UTF-8.
      >
      I'm having trouble what might have changed, which would cause these
      garbage characters. At a stretch, I think back to an incident a few
      months ago, when our server was hacked, and we had to do a re-install,
      with upgraded versions of stuff like Apache. So I could almost imagine
      Apache sending new headers, except that, in my case, the meta tag
      indicates UTF-8 and when I look at it in FireFox, FireFox correctly
      reads it as UTF-8.
      >
      Anything else that could cause this?
      >
      I can not find a character encoding that renders this page without
      garbage characters.
      Among the top reasons for double utf-8 encoding is an improper database
      export/import.
      --
      Rik Wasmus
      ....spamrun finished

      Comment

      • VK

        #4
        Re: garbage characters are now on the site, although they weren'tthere originally

        On Jun 6, 12:16 am, Lawrence Krubner <lawre...@krubn er.comwrote:
        Once upon a time, there were no garbage characters on this page:
        >

        >
        Now there are. For instance:
        >
        The 2nd paragraph from page 114 of “The Zen Of CSS Design”
        >
        For me, there are garbage characters before "The" and after "Design".
        >
        The page has always, always been served as UTF-8.
        >
        I'm having trouble what might have changed, which would cause these
        garbage characters. At a stretch, I think back to an incident a few
        months ago, when our server was hacked, and we had to do a re-install,
        with upgraded versions of stuff like Apache. So I could almost imagine
        Apache sending new headers, except that, in my case, the meta tag
        indicates UTF-8 and when I look at it in FireFox, FireFox correctly
        reads it as UTF-8.
        >
        Anything else that could cause this?
        >
        I can not find a character encoding that renders this page without
        garbage characters.
        Don't use "smart quotes" in any other way but HTML entities. Better do
        not use them at all, but if really needed then only as HTML entities.
        For static documents always check for quotes damages after having the
        document being open in a rich text editor like say Microsoft Word.
        Better do not open (X)HTML documents in any rich text editor at all.
        Some of golden rules of a successful web-design. See also:

        Comment

        • Lawrence Krubner

          #5
          Re: garbage characters are now on the site, although they weren'tthere originally

          Rik Wasmus wrote:
          On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
          <lawrence@krubn er.comwrote:
          >Once upon a time, there were no garbage characters on this page:
          >>
          >http://www.teamlalala.com/blog/category/css/
          >>
          >Now there are. For instance:
          >>
          >The 2nd paragraph from page 114 of “The Zen Of CSS Design�
          >>
          >>
          >For me, there are garbage characters before "The" and after "Design".
          >>
          >The page has always, always been served as UTF-8.
          >>
          >I'm having trouble what might have changed, which would cause these
          >garbage characters. At a stretch, I think back to an incident a few
          >months ago, when our server was hacked, and we had to do a re-install,
          >with upgraded versions of stuff like Apache. So I could almost imagine
          >Apache sending new headers, except that, in my case, the meta tag
          >indicates UTF-8 and when I look at it in FireFox, FireFox correctly
          >reads it as UTF-8.
          >>
          >Anything else that could cause this?
          >>
          >I can not find a character encoding that renders this page without
          >garbage characters.
          >
          Among the top reasons for double utf-8 encoding is an improper database
          export/import.
          That must be it, then. Is there an automated way to undo the damage? Or
          do I have to fix every post by hand?

          Also, any tips on import/export, for the next time I have to do this?

          --lk



          Comment

          • Keith Hughitt

            #6
            Re: garbage characters are now on the site, although they weren'tthere originally

            On Jun 7, 7:44 pm, Lawrence Krubner <lawre...@krubn er.comwrote:
            Rik Wasmus wrote:
            On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
            <lawre...@krubn er.comwrote:
            Once upon a time, there were no garbage characters on this page:
            >>
            Now there are. For instance:
            >
            The 2nd paragraph from page 114 of “The Zen Of CSS Design�
            >
            For me, there are garbage characters before "The" and after "Design".
            >
            The page has always, always been served as UTF-8.
            >
            I'm having trouble what might have changed, which would cause these
            garbage characters. At a stretch, I think back to an incident a few
            months ago, when our server was hacked, and we had to do a re-install,
            with upgraded versions of stuff like Apache. So I could almost imagine
            Apache sending new headers, except that, in my case, the meta tag
            indicates UTF-8 and when I look at it in FireFox, FireFox correctly
            reads it as UTF-8.
            >
            Anything else that could cause this?
            >
            I can not find a character encoding that renders this page without
            garbage characters.
            >
            Among the top reasons for double utf-8 encoding is an improper database
            export/import.
            >
            That must be it, then. Is there an automated way to undo the damage? Or
            do I have to fix every post by hand?
            >
            Also, any tips on import/export, for the next time I have to do this?
            >
            --lk
            Somewhat off-topic question, but, when you copy-and-paste text in
            windows/unix, is the encoding included in that information?
            I.e. if you saved a document in latin1 and wanted to get it to utf-8,
            could you just coipy and paste the text into a new document
            and save it as utf-8?

            Comment

            • Andreas Prilop

              #7
              Re: garbage characters are now on the site, although they weren'tthere originally

              On Tue, 10 Jun 2008, Keith Hughitt wrote:
              Somewhat off-topic question, but, when you copy-and-paste text in
              windows/unix, is the encoding included in that information?
              What is "windows/unix"?
              I.e. if you saved a document in latin1 and wanted to get it to utf-8,
              could you just coipy and paste the text into a new document
              and save it as utf-8?
              It depends on the program you use.
              On Unix, it depends also on your locale settings.

              --
              In memoriam Alan J. Flavell

              Comment

              • Blinky the Shark

                #8
                Re: garbage characters are now on the site, although they weren't there originally

                Andreas Prilop wrote:
                On Tue, 10 Jun 2008, Keith Hughitt wrote:
                >
                >Somewhat off-topic question, but, when you copy-and-paste text in
                >windows/unix, is the encoding included in that information?
                >
                What is "windows/unix"?
                s/\// or /


                --
                Blinky
                Killing all posts from Google Groups
                The Usenet Improvement Project -- http://improve-usenet.org
                Found 5/08: a free GG-blocking news *feed* -- http://usenet4all.se

                Comment

                • Rik Wasmus

                  #9
                  Re: garbage characters are now on the site, although they weren't there originally

                  On Sun, 08 Jun 2008 01:44:50 +0200, Lawrence Krubner
                  <lawrence@krubn er.comwrote:
                  Rik Wasmus wrote:
                  >On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
                  ><lawrence@krub ner.comwrote:
                  >>Once upon a time, there were no garbage characters on this page:
                  >>>
                  >>http://www.teamlalala.com/blog/category/css/
                  >>>
                  >>Now there are. For instance:
                  >>>
                  >>The 2nd paragraph from page 114 of â€ņœThe Zen Of CSS Designââ‚ ¬ï¿½
                  >>>
                  >>>
                  >>For me, there are garbage characters before "The" and after "Design".
                  >>>
                  >>The page has always, always been served as UTF-8.
                  >>>
                  >>I'm having trouble what might have changed, which would cause these
                  >>garbage characters. At a stretch, I think back to an incident a few
                  >>months ago, when our server was hacked, and we had to do a re-install,
                  >>with upgraded versions of stuff like Apache. So I could almost imagine
                  >>Apache sending new headers, except that, in my case, the meta tag
                  >>indicates UTF-8 and when I look at it in FireFox, FireFox correctly
                  >>reads it as UTF-8.
                  >>>
                  >>Anything else that could cause this?
                  >>>
                  >>I can not find a character encoding that renders this page without
                  >>garbage characters.
                  > Among the top reasons for double utf-8 encoding is an improper
                  >database export/import.
                  >
                  That must be it, then. Is there an automated way to undo the damage? Or
                  do I have to fix every post by hand?
                  I am not aware of a general quick easy fix, ask in a group dedicated to
                  the database of your choice, it isn't an uncommon problem.
                  Also, any tips on import/export, for the next time I have to do this?
                  If MySQL, be sure to set your connection characteristics to the proper
                  values. The first statement in your file to be imported in that case
                  should've been:

                  SET NAMES utf8;

                  HTH,
                  --
                  Rik Wasmus
                  ....spamrun finished

                  Comment

                  • Keith Hughitt

                    #10
                    Re: garbage characters are now on the site, although they weren'tthere originally

                    Hehe, what I meant was on either Windows or Unix (Linux). I'd be
                    interested to know how it works
                    on both systems.


                    On Jun 10, 11:50 am, Andreas Prilop <prilop1...@tra shmail.netwrote :
                    On Tue, 10 Jun 2008, Keith Hughitt wrote:
                    Somewhat off-topic question, but, when you copy-and-paste text in
                    windows/unix, is the encoding included in that information?
                    >
                    What is "windows/unix"?
                    >
                    I.e. if you saved a document in latin1 and wanted to get it to utf-8,
                    could you just coipy and paste the text into a new document
                    and save it as utf-8?
                    >
                    It depends on the program you use.
                    On Unix, it depends also on your locale settings.
                    >
                    --
                    In memoriam Alan J. Flavellhttp://groups.google.c om/groups/search?q=author :Alan.J.Flavell

                    Comment

                    • Andreas Prilop

                      #11
                      Re: garbage characters are now on the site, although they weren'tthere originally

                      Differently.
                      interested to know how it works on both systems.
                      Hehe, what I meant was on either Windows or Unix (Linux). I'd be
                      >
                      >What is "windows/unix"?
                      >>
                      >>windows/unix, is the encoding included in that information?
                      >>Somewhat off-topic question, but, when you copy-and-paste text in
                      On Wed, 11 Jun 2008, Keith Hughitt wrote:

                      --
                      Top-posting.
                      What's the most irritating thing on Usenet?

                      Comment

                      Working...