Special characters (æøå) and zipfiles

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Roy W. Andersen

    Special characters (æøå) and zipfiles


    I've been searching google about this for days but can't find anything,
    so I'm hoping someone here can help me out.

    I'm trying to create zip-files without needing the zip-file extension in
    PHP, mainly because I need the ability to both create and extract
    zip-files. I've tried a couple of classes found here and there, and they
    all seem to have the same problem. I'm currently using PclZip
    (http://phpconcept.net/pclzip/) but even the simplest one I've tried
    (zip.lib.php from phpMyAdmin) gives the same result.

    This is the problem:

    When I create a zip-file containing any file with special characters in
    their filenames, the characters gets translated into different special
    characters. The three characters I myself am having problems with is the
    Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
    common in my language. The zip-file itself can contain these characters
    without any problems, the only files affected are the ones put into the
    zip-file. Same happens with directories, obviously. The funny thing is,
    if I extract a zip-file using the same class, the conversion gets
    reversed, so the files do end up with the correct names after
    extraction. This of course means that if I upload a zip-file created
    using WinZip or any other zip-application, any files with special
    characters will get translated into completely different characters again.

    I've made a table showing the converted characters which can be found
    here: http://akkar.sourceforge.net/zipchars.html

    Also very strange - I tried making a zip-file containing a zero-length
    file with the special characters in the filename, and when opening that
    zip-file in a hex-editor I wasn't able to find the hex values for the
    converted characters anywhere in the file, but the original filename
    characters were found at the places where I expected them to be.

    If someone can help me figure out what's going on I would really
    appreciate it. I've submitted it as a bug for PclZip but it hasn't
    gotten any response yet, and since I've seen the same thing happen with
    other classes I sort of doubt it's only related to PclZip. I've tried it
    on different servers as well, and with the same result. I've got the
    impression that PclZip is a popular class for managing zip-files, so I'm
    hoping anyone with some experience with it can help me out.

    Thanks in advance :)


    Roy W. Andersen
    --
    ra at broadpark dot no / http://roy.netgoth.org/

    "Hey! What kind of party is this? There's no booze
    and only one hooker!" - Bender, Futurama
  • rl

    #2
    Re: Special characters (æøå) and zipfiles

    Hello Roy,

    the obvious problem is that the data of the files to be zipped, is not
    treated as binary data, as it should. This may root in your own file/
    variable handling. php is loose typed, what means that the type of a
    variable is selected by the php engine automatically. What can go
    wrong...
    thus more info on what you exactly do would be needed to locate the
    problem within your code or the lib used.

    The other thing is the treatment of the file names. these names are
    character data and stored in a zip-file as that, with no information
    on the encoding as I found at least at a first glance at
    Quick Link: .ZIP Application Note .ZIP Application Note PKWARE® introduced the ZIP format in 1989. This new format combined data compression, file ...

    ..
    Latest news coverage, email, free stock quotes, live scores and video are just the beginning. Discover more every day at Yahoo!

    Also states "No support for extended character sets in file names" as
    a limitation to this file format.
    So the only thing you can use for sure is 7-Bit-ASCII. But as the
    ISO-Latin-1 code table is used wide spread (and contains all
    scandinavian special character), the problems you face tends to be
    caused by automatic conversions, too, as a filename typed on your own
    computer shouldn't lead to any difference when again displayed there.
    Have look at 'setlocale' at php.net.

    Cheers,

    Robert

    Comment

    • Roy W. Andersen

      #3
      Re: Special characters (æøå) and zipfiles

      rl wrote:[color=blue]
      > Hello Roy,
      >[/color]
      [snip]

      Thanks for the help, it's appreciated even though it unfortunately
      doesn't help me much. Guess I'll just have to wait for the developer of
      my class to look into it - I don't really know where to begin looking
      for the cause, although I have tried. I suspect the error lies in the
      use of pack() and unpack() which are functions I don't understand how
      work (the PHP manual doesn't help me there - it's just my knowledge of
      working with binary files that's limiting). I'm no expert on character
      encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
      I've been more or less stumbling around blindly in the code, and when
      I've tried some changes I've ended up with corrupted zip-files ;)

      I've tried several different settings using setlocale() and it doesn't
      make any difference at all, so I've concluded that the zip-class doesn't
      use any functions affected by PHP's own locale-setting.

      What's strange though is that I can't find any reference to this problem
      anywhere on the web or the google Usenet archive, and the problem
      doesn't only affect 'æ', 'ø' and 'å' but all special
      language-characters. That lead me to believe it was the settings on my
      server that was causing it, but when testing it on my project's
      sourceforge webspace I got the same thing happening there as well, which
      again tells me the problem is with the class itself.


      Roy W. Andersen
      --
      ra at broadpark dot no / http://roy.netgoth.org/

      "Hey! What kind of party is this? There's no booze
      and only one hooker!" - Bender, Futurama

      Comment

      • Pedro Graca

        #4
        Re: Special characters (æøå) and zipfiles

        Roy W. Andersen wrote:[color=blue]
        > That lead me to believe ... the problem is with the class itself.[/color]

        Can you try the CLI version of PKZip or WinZip?
        I believe they have demo versions available.
        --
        Mail to my "From:" address is readable by all at http://www.dodgeit.com/
        == ** ## !! ------------------------------------------------ !! ## ** ==
        TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
        may bypass my spam filter. If it does, I may reply from another address!

        Comment

        • Chung Leong

          #5
          Re: Special characters (æøå) and zipfiles

          "Roy W. Andersen" <roy-news@netgoth.or g> wrote in message
          news:344dn8F44f jliU1@individua l.net...[color=blue]
          > [ ... ]
          > This is the problem:
          >
          > When I create a zip-file containing any file with special characters in
          > their filenames, the characters gets translated into different special
          > characters. The three characters I myself am having problems with is the
          > Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
          > common in my language. The zip-file itself can contain these characters
          > without any problems, the only files affected are the ones put into the
          > zip-file. Same happens with directories, obviously. The funny thing is,
          > if I extract a zip-file using the same class, the conversion gets
          > reversed, so the files do end up with the correct names after
          > extraction. This of course means that if I upload a zip-file created
          > using WinZip or any other zip-application, any files with special
          > characters will get translated into completely different characters again.
          >
          > I've made a table showing the converted characters which can be found
          > here: http://akkar.sourceforge.net/zipchars.html[/color]

          WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
          = 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
          neither Unicode or ISO-8859-1 compatible. Why they're showing up as
          characters shown in your chart I'm not sure.


          Comment

          • Roy W. Andersen

            #6
            Re: Special characters (æøå) and zipfiles

            Chung Leong wrote:[color=blue]
            > "Roy W. Andersen" <roy-news@netgoth.or g> wrote in message
            >
            > WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
            > = 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
            > neither Unicode or ISO-8859-1 compatible. Why they're showing up as
            > characters shown in your chart I'm not sure.[/color]

            Apparently that was exactly the right piece of information I needed :D

            The PclZip class has an option of passing the files through a callback
            function before adding from or extracting to the archive, and by using
            that option I actually managed to get it working.

            Before adding the file:
            iconv("ISO-8859-1", "CP437", $p_header['filename'])

            And, of course, before extracting:
            iconv("CP437", "ISO-8859-1", $p_header['filename'])

            And now it works! Thank you very much! I'd just written this problem on
            the "Known Issues" list for the upcoming release of my project, and now
            I can safely remove it again! Thank you thank you thank you! And
            everyone else who offered help as well, of course, but this little piece
            of info unlocked the riddle :)


            Roy W. Andersen
            --
            ra at broadpark dot no / http://roy.netgoth.org/

            "Hey! What kind of party is this? There's no booze
            and only one hooker!" - Bender, Futurama

            Comment

            • Roy W. Andersen

              #7
              Re: Special characters (æøå) and zipfiles

              Roy W. Andersen wrote:[color=blue]
              > Before adding the file:
              > iconv("ISO-8859-1", "CP437", $p_header['filename'])[/color]

              I jumped the gun a bit here, but it was the right track atleast :)

              Using CP850 worked, but CP437 didn't handle all my characters that well
              (I remember codepage 850 and/or 865 is what I used back in the days of
              good old MS-DOS).

              Still though, it works now :) Hopefully others with the same problem in
              the future have an easier time finding the answer thanks to this thread
              - I sure wish I had ;)


              Roy W. Andersen
              --
              ra at broadpark dot no / http://roy.netgoth.org/

              "Hey! What kind of party is this? There's no booze
              and only one hooker!" - Bender, Futurama

              Comment

              • rl

                #8
                Re: Special characters (æøå) and zipfiles

                Roy,
                [color=blue]
                > rl wrote:
                >[color=green]
                >> Hello Roy,
                >>[/color]
                > [snip]
                > Thanks for the help, it's appreciated even though it unfortunately
                > doesn't help me much.[/color]
                Doesn't look like you've read.
                [color=blue]
                > Guess I'll just have to wait for the developer of
                > my class to look into it - I don't really know where to begin looking
                > for the cause, although I have tried. I suspect the error lies in the
                > use of pack() and unpack() which are functions I don't understand how
                > work (the PHP manual doesn't help me there - it's just my knowledge of
                > working with binary files that's limiting). I'm no expert on character
                > encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
                > I've been more or less stumbling around blindly in the code, and when
                > I've tried some changes I've ended up with corrupted zip-files ;)[/color]
                pack and unpack leave character encoding completely untouched, if you
                specify the conversion accordingly. That's why I asked for respective
                code sniplets (if you're allowed).
                And: All files to be put to the zip must be opened in binary mode if need
                to be opened by yourself anyway.
                I had no problems putting complete binary files to database and fetching
                out again via unpack and pack.
                [color=blue]
                > I've tried several different settings using setlocale() and it doesn't
                > make any difference at all, so I've concluded that the zip-class doesn't
                > use any functions affected by PHP's own locale-setting.
                >
                > What's strange though is that I can't find any reference to this problem
                > anywhere on the web or the google Usenet archive, and the problem
                > doesn't only affect 'æ', 'ø' and 'å' but all special
                > language-characters. That lead me to believe it was the settings on my
                > server that was causing it, but when testing it on my project's
                > sourceforge webspace I got the same thing happening there as well, which
                > again tells me the problem is with the class itself.[/color]
                Or with your own code ...

                Comment

                Working...