Notepad and UTF-8

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • The Bicycling Guitarist

    Notepad and UTF-8

    Okay my web site grew up and is moving to a non-Windows server, Unix. I am
    converting my static HTML/CSS files to Drupal content management system. The
    leading white spaces I use to indent text for easy editing are not collapsed
    by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
    environment and ran a sed command to strip whitespace.

    When I opened the files in Notepad they were all on one line each. So I
    tried copying them from Microsoft FrontPage where they looked okay in HTML
    view and pasting them into Notepad then saving over the HTML file. I most
    definitely and carefully chose save as UTF-8 from the list of options
    offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

    Please tell me there is an easier way... I need to
    a) strip leading whitespace from the content of my html files and
    b) save these files as UTF-8 and have them STAY UTF-8. Thanks


  • Ben C

    #2
    Re: Notepad and UTF-8

    On 2008-03-08, The Bicycling Guitarist <Chris@TheBicyc lingGuitarist.n etwrote:
    Okay my web site grew up and is moving to a non-Windows server, Unix. I am
    converting my static HTML/CSS files to Drupal content management system. The
    leading white spaces I use to indent text for easy editing are not collapsed
    by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
    environment and ran a sed command to strip whitespace.
    >
    When I opened the files in Notepad they were all on one line each. So I
    tried copying them from Microsoft FrontPage where they looked okay in HTML
    view and pasting them into Notepad then saving over the HTML file. I most
    definitely and carefully chose save as UTF-8 from the list of options
    offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?
    Not sure what you mean by ANSI. Everything appeared on one line probably
    because cygwin sed put Unix line separators (just CR, not CRLF) at the
    ends of the lines. You can configure cygwin somehow not to do that, I
    think on a per-filesystem basis.

    Most editors even on Windows will sort of half-work with just CR, which
    is probably why it looked OK in FrontPage but not in Notepad.
    Please tell me there is an easier way... I need to
    a) strip leading whitespace from the content of my html files and
    b) save these files as UTF-8 and have them STAY UTF-8. Thanks
    Just don't use Notepad or FrontPage. It could have been the copy and
    pasting from FrontPage that messed up the UTF-8.

    You could try to set up cygwin to use DOS line endings, or just stick to
    Unix line endings. But then you need to be careful because some Windows
    editors may open the file silently and apparently OK with the Unix line
    endings, but then save DOS line endings on the one or two lines you edit
    leaving you with an inconsistent mixture. Without any decent tools it's
    often hard to know what you've actually ended up with or why things are
    going wrong.

    Comment

    • Andreas Prilop

      #3
      Re: Notepad and UTF-8

      On Thu, 13 Mar 2008, Ben C wrote:
      Better to use a Content-Language header and/or set the lang attribute on
      the html element to tell the browser the language so it can use that as
      a hint to pick a font.
      But that does not work in Internet Explorer. It works in Mozilla & Co.

      How about others like Opera?

      --
      In memoriam Alan J. Flavell

      Comment

      • Ben C

        #4
        Re: Notepad and UTF-8

        On 2008-03-13, Andreas Prilop <aprilop2008@tr ashmail.netwrot e:
        On Thu, 13 Mar 2008, Ben C wrote:
        >
        >Better to use a Content-Language header and/or set the lang attribute on
        >the html element to tell the browser the language so it can use that as
        >a hint to pick a font.
        >
        But that does not work in Internet Explorer.
        I didn't know that. It doesn't surprise me though.
        It works in Mozilla & Co.

        How about others like Opera?
        In that test everything gets the same font. I think what Opera does,
        but this is just a guess, is choose a font based on the actual
        characters.

        Although I don't know how they tell the difference between zh-tw and
        zh-cn (languages and codepoints very similar but you need different
        fonts-- simplified characters for zh-cn and traditional ones for zh-tw).

        Comment

        • Andreas Prilop

          #5
          Re: Notepad and UTF-8

          On Thu, 13 Mar 2008, Ben C wrote:
          >
          In that test everything gets the same font. I think what Opera does,
          but this is just a guess, is choose a font based on the actual
          characters.
          If that is true, you should be able to see different fonts for
          Latin letters and Greek letters on

          and different fonts for Latin letters and Cyrillic letters on


          But I doubt. I believe Opera uses only one font for each of
          these two test pages.
          Although I don't know how they tell the difference between zh-tw and
          zh-cn (languages and codepoints very similar but you need different
          fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
          But how to do this with "charset=ut f-8"? The codepoints in Unicode
          are the same for CN and TW and JP.

          --
          Solipsists of the world - unite!

          Comment

          • Ben C

            #6
            Re: Notepad and UTF-8

            On 2008-03-14, Andreas Prilop <aprilop2008@tr ashmail.netwrot e:
            On Thu, 13 Mar 2008, Ben C wrote:
            >
            >>
            >In that test everything gets the same font. I think what Opera does,
            >but this is just a guess, is choose a font based on the actual
            >characters.
            >
            If that is true, you should be able to see different fonts for
            Latin letters and Greek letters on

            and different fonts for Latin letters and Cyrillic letters on

            >
            But I doubt. I believe Opera uses only one font for each of
            these two test pages.
            Probably. I don't know what it does.
            >Although I don't know how they tell the difference between zh-tw and
            >zh-cn (languages and codepoints very similar but you need different
            >fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
            >
            But how to do this with "charset=ut f-8"? The codepoints in Unicode
            are the same for CN and TW and JP.
            Exactly, that was my point.

            Comment

            • Man-wai Chang ToDie

              #7
              Re: Notepad and UTF-8

              Please tell me there is an easier way... I need to
              a) strip leading whitespace from the content of my html files and
              b) save these files as UTF-8 and have them STAY UTF-8. Thanks
              Check out Notepad2 and Notepad++

              Comment

              Working...