big binary string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Laurent Vogel

    big binary string


    hello,

    I'm writing a script which needs about 60 kbytes of text data.
    Currently it looks like:
    function1("smal l string1");
    function2("smal l string2");
    function1("smal l string3");
    and so on. In order to have it loaded more quickly, I'm considering
    actually puting all the data in one big string (something like:
    1<small string1>2<small string2>1<small string3>...
    ) and compress it (using a naive compression approach, I can
    shrink the 60 kb down to approximately 20 kb, plus 2 or 3 lines of
    javascript code to decompress the stuff). Now the question is:

    QUESTION:
    ---------

    Is it okay to have a statement like:

    mystring = "blablabla\
    blablabla\
    blablabla";

    with 20 kilo bytes of binary blablabla ? What characters may I use in
    the string? (can I use any bytes from 1 to 255, or should I restrict
    to the printable subset of ISO latin 1) ?

    Thanks for any answers,

    Laurent Vogel


  • Lasse Reichstein Nielsen

    #2
    Re: big binary string

    "Laurent Vogel" <lvl@club-internet.fr> writes:
    [color=blue]
    > Is it okay to have a statement like:
    >
    > mystring = "blablabla\
    > blablabla\
    > blablabla";[/color]

    No. Javascript strings cannot span lines. You should do it as:
    var mystring = "blabla...b la"+
    "blablabl.. .." +
    "blablabl.. .." +
    "blablabl.. .." +
    "blablabl.. .."...
    although with that many lines, pure concatenation is probably
    inefficient (quadratic time complexity). I think a better solution is
    to build an array and join it in one go:
    var myarray = ["blabalbal...bl a",
    "blabablablabl" ,
    "bablb",
    ...
    "lbalblab"];
    var mystring = myarray.join("" );
    [color=blue]
    >
    > with 20 kilo bytes of binary blablabla ?[/color]
    [color=blue]
    > What characters may I use in the string? (can I use any bytes from 1
    > to 255, or should I restrict to the printable subset of ISO latin 1)?[/color]

    Good question. That depends, among other things, on the encoding used
    to send the page. ECMAScript says:
    SourceCharacter ::
    any Unicode character

    StringLiteral ::
    " DoubleStringCha racters_opt "
    ' SingleStringCha racters_opt '

    DoubleStringCha racters ::
    DoubleStringCha racter DoubleStringCha racters_opt

    DoubleStringCha racter ::
    SourceCharacter but not double-quote " or backslash \ or LineTerminator
    \ EscapeSequence

    so any Unicode character should work, if you can send it using the
    encoding you use.

    I would restrict myself to, e.g., ISO latin 1, and possibly the
    printable subset. If you need codes outside of that, you can use
    escapes: \xff (hex) or \0177 (octal), or for those that have an
    escape, you can use it: \n. You probably don't want to use the unicode
    escape, e.g., \u21e7.

    /L
    --
    Lasse Reichstein Nielsen - lrn@hotpop.com
    Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit. html>
    'Faith without judgement merely degrades the spirit divine.'

    Comment

    • Dr John Stockton

      #3
      Re: big binary string

      JRS: In article <3f6a98e2$0$209 53$7a628cd7@new s.club-internet.fr>, seen
      in news:comp.lang. javascript, Laurent Vogel <lvl@club-internet.fr>
      posted at Fri, 19 Sep 2003 07:49:20 :-
      [color=blue]
      > ...
      > I can
      >shrink the 60 kb down to approximately 20 kb, plus 2 or 3 lines of
      >javascript code to decompress the stuff). Now the question is:[/color]
      [color=blue]
      >Is it okay to have a statement like:
      >
      > mystring = "blablabla\
      > blablabla\
      > blablabla";[/color]

      AIUI, \ is not part of the standard; it may not always work.

      [color=blue]
      >with 20 kilo bytes of binary blablabla ? What characters may I use in
      >the string? (can I use any bytes from 1 to 255, or should I restrict[/color]

      0 to 255 ? But you would need to escape " \ and CR/LF (& LS, PS, FF?),
      at the very very least.
      [color=blue]
      >to the printable subset of ISO latin 1) ?[/color]

      Using characters 33 to 126 should be safe. IIRC, MIME uses 64 of them,
      storing 3 arbitrary bytes in 4 legible ones, and PostScript can use 85
      of them, storing 4 in 5. No lesser expansion from fully-compressed
      seems worth the effort.

      BUT :

      Over all links other than dial-up and long-distance radio, 60kB will not
      take very long; and, AIUI, the lower-speed links are generally data-
      compressed by hardware. Probably, if you know the structure of your
      data, you can compress more tightly; but I wonder whether you can make a
      gain sufficient to justify the effort?


      H'mmm - IIRC, "Basic English" can say anything that needs to be said,
      generally, with 850 simple words; and for special topics 150 special
      words. So if the text is so written, you could send a word-list plus
      for each string the numbers for each word. Whether French can be so
      treated I do not know.

      --
      © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
      <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
      <URL:http://www.merlyn.demo n.co.uk/js-index.htm> JS maths, dates, sources.
      <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/JS/&c., FAQ topics, links.

      Comment

      • Laurent Vogel

        #4
        Re: big binary string


        Thanks for the answers.
        I don't quite understand why using ASCII 32 (space) in
        strings should cause any problem. Just to be sure I will
        restrict myself to the range 32-126.
        As for whether compression can be worth the trouble,
        frankly I don't know, but I maintain that I'm able to
        shrink approximately 60k into 20k of comressed data (plus
        three lines added for the decompression code). As an
        example here is the decompressor together with the
        (compressed) description of the compression format used.


        <html><head></head><body><scr ipt>
        /* here is the uncompressing routine */
        function u(s){var n,i=0,g=functio n(){if((n=s.cha rCodeAt(i++)-32)>=64)
        n=(n-64)*95+s.charCo deAt(i++)-32;},o,d="",j=0 ;for(;;){g();d+ =s.substr(i,
        n);
        i+=n;j+=n;g();i f(0==(o=n))retu rn d;g();while(n-- >0)d+=d.charAt( j++
        -o);}}

        /* and here is sample compressed data */
        document.write( u([
        "X<p>This is an example of compression for Javascript.
        The@0#mat`d$(\n very s",
        "i`f$%, and5$'best de^$5bed by the algorithm N%)mented in=%#\nuna\")?n g
        rout",
        "ine:</p>\n\n<p>Concep tuall`n& P4Hg will read objects one at a time from
        \n`",
        "`$'\"input\"au &Ealways append data to a \"buffer\". Wheaj&#jobbE$!d `x$
        az% ",
        "D() containsa`/!e`q&!.b-)(Now here`g$ bz- bP(<blockquote> <pre>forever
        {\n ",
        "b?&(<i>n</i>1$ av' 3(/ bytes verbatimbT& bS+% into1& az' `v,&offset`h%
        7$'q",
        "uit if:/$== 0aLK&copiedaJ& `i. @& aP'+current end`l$! fz$ ah, az2',
        allowe=",
        "$(overlappeQ%& egionsb#%' (i.e.a9/$&lt;ag)%)\n} </cj$\"</c|+\"\n
        dO%Lumbers ",
        "smaller than 64 are encoded as ASCII /$% 32 +af%!n[%*. Bigger
        \n0&!s[-!ugC%",
        "3two printable ascii^'#s (c9% a!.1to \n126 included)ii& i]* i\\%-crude,
        but",
        " itc,&#s a<& b1% i<0 a $%. Notg_$ hZ$ c_< j 3\"onj (4 provides run
        lengthbV",
        "& `c$ kg$$freeh9*!Ol) $)urse, perb*%\"nchG% ff$(e poor wiJ$ a.$)ared
        withff$",
        "!llC)-ors\nlike gzipbZ&%honorcX % af-2atio can be achiev`q$ jJ( b5+ j$'
        jC**",
        "many repealf$*substr ingsb-& "].join("")));
        </script></body></html>


        regards,

        Laurent Vogel
        -- remove "ima" and "ictor" to get my email address

        *** Sent via Developersdex http://www.developersdex.com ***
        Don't just participate in USENET...get rewarded for it!

        Comment

        • Steve van Dongen

          #5
          Re: big binary string

          On 21 Sep 2003 18:52:09 GMT, Laurent Vogel
          <LimaVictorLima @club-internet.fr> wrote:
          [color=blue]
          >
          >Thanks for the answers.
          >I don't quite understand why using ASCII 32 (space) in
          >strings should cause any problem. Just to be sure I will
          >restrict myself to the range 32-126.
          >As for whether compression can be worth the trouble,
          >frankly I don't know, but I maintain that I'm able to
          >shrink approximately 60k into 20k of comressed data (plus
          >three lines added for the decompression code). As an
          >example here is the decompressor together with the
          >(compressed) description of the compression format used.
          >
          >
          ><html><head> </head><body><scr ipt>
          >/* here is the uncompressing routine */
          >function u(s){var n,i=0,g=functio n(){if((n=s.cha rCodeAt(i++)-32)>=64)
          >n=(n-64)*95+s.charCo deAt(i++)-32;},o,d="",j=0 ;for(;;){g();d+ =s.substr(i,
          >n);
          >i+=n;j+=n;g(); if(0==(o=n))ret urn d;g();while(n-- >0)d+=d.charAt( j++
          >-o);}}[/color]

          Have you done performance testing of compressing data like this vs.
          using uncompressed data over various link speeds? I'd be very wary of
          the above. String concatenation is one of the slowest operations you
          can do in Javascript due to naive memory allocation, and when you
          start concatenating large strings repeatedly in a loop... Well, lets
          just say that we used to have some scripts that did that. We were
          dealing with data sizes up in the 200-400K range and the script took
          15 minutes to run. After reducing the number of string concatenations
          where one of the two operands was very large the script only took 2.5
          minutes. You need to test whether this is worthwhile. I believe John
          is correct and it will not be.

          Regards,
          Steve

          Comment

          Working...