Fast search for all positions in a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • czechboy

    Fast search for all positions in a string

    Hi,
    what is the fastest way to search for all occurences of a string in a
    very long string (from several KB to MB). What I want to get is an
    array of numbers with positions of each occurence.
    thanks
  • czechboy

    #2
    Re: Fast search for all positions in a string

    Also I would like to mention that I need case insensitive search.

    Comment

    • Bart Van der Donck

      #3
      Re: Fast search for all positions in a string

      czechboy wrote:
      what is the fastest way to search for all occurences of a string in a
      very long string (from several KB to MB). What I want to get is an
      array of numbers with positions of each occurence.
      var str = 'fdda54kokDdggf A54lf8754a544ro 9a54ypx9Wa54z'
      var mat = 'a54'
      var arr = str.split(mat)
      var res = new Array()
      var r = 0
      for (var i=0; i<arr.length-1; ++i) {
      res[i] = 1 + r + arr[i].length
      r += arr[i].length + mat.length
      }
      alert(res)

      Takes ca. 1 second runtime on my Firefox 2.0.0.14. for a 1MB string.
      MSIE is slower as expected. I don't see relevant possible performance
      boosts unless finding a way to not loop the array (which I don't think
      is possible here). Generally I would recommend to avoid such heavy
      strings.

      The search is case-sensitive.

      Hope this helps,

      --
      Bart

      Comment

      • Jorge

        #4
        Re: Fast search for all positions in a string

        On Jun 4, 11:58 am, czechboy <oldrich.s...@g mail.comwrote:
        Also I would like to mention that I need case insensitive search.
        <html><head><st yle>text { font-size: 16px; }</style>
        <script>
        window.onload= function () {
        var txt= "", b= [], i, d= document, t,
        y= function (p) { return d.createElement (p) },
        getMS= function (p) { return (new Date()).getTime ()-p };

        txt+= " Lorem ipsum dolor sit amet, consectetuer adipiscing.";
        txt+= " Nullam consequat lectus sit amet enim. Proin , ipsum";
        txt+= " eu tincidunt iaculis, massa enim quam, at molestie ";
        txt+= "enim tellus ac dolor. Aenean tempor tortor. ";
        txt+= "Vestibulum sed sem. Curabitur sed erat a nibh tristique.";
        for (i=0; i<6; i++) { txt+= txt+txt+txt+txt }

        t= "txt.length = " + txt.length;
        t+= " : \""+txt.substr( 0,30)+"...\"<br ><br>";
        (d.body.appendC hild(y('text')) ).innerHTML= t;

        //Replace this f() with others' versions to compare them.
        function search (whatTxt, inTxt) {
        var l, p, a= [], i= 0;
        inTxt= inTxt.toLowerCa se();
        l= (whatTxt= whatTxt.toLower Case()).length;
        while ((p= inTxt.indexOf(w hatTxt,i)) >= 0) {
        a.push(p);
        i= p+l;
        }
        return a;
        }

        b= ["lorem","IPSUM" ,"veni","iMac", "CoNsEqUaT","." ," ","r","iN"];
        b.push(" LoReM iPsUm DoLoR ");
        setTimeout(func tion () {
        var t, a, mS, c= b.shift();
        mS= getMS(0);
        a= search(c,txt);
        mS= getMS(mS);
        t= "\""+c+"\" : "+a.length+ " : "+mS+" mS<br>";
        (d.body.appendC hild(y('text')) ).innerHTML= t;
        if (b.length) { setTimeout(argu ments.callee, 111) } else {
        (d.body.appendC hild(y('text')) ).innerHTML= "Done.";
        }
        }, 1);
        };
        </script></head><body></body></html>

        I just turn it all to lowercase before matching,
        so it's case-insensitive. But that's not very smart : there
        has to be a better way. (RegExps /i ?).

        Replace search () with other versions to compare.

        Gives me this results in Safari/Mac :
        The test string is ~4Mb long.

        txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

        "lorem" : 15625 : 121 mS
        "IPSUM" : 31250 : 144 mS
        "veni" : 0 : 128 mS
        "iMac" : 0 : 125 mS
        "CoNsEqUaT" : 15625 : 130 mS
        "." : 93750 : 173 mS
        " " : 625000 : 648 mS
        "r" : 187500 : 258 mS
        "iN" : 46875 : 156 mS
        " LoReM iPsUm DoLoR " : 15625 : 146 mS
        Done.

        And this in IE8b/WinXP :

        txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

        "lorem" : 15625 : 190 mS
        "IPSUM" : 31250 : 310 mS
        "veni" : 0 : 140 mS
        "iMac" : 0 : 150 mS
        "CoNsEqUaT" : 15625 : 230 mS
        "." : 93750 : 491 mS
        " " : 625000 : 2804 mS
        "r" : 187500 : 530 mS
        "iN" : 46875 : 331 mS
        " LoReM iPsUm DoLoR " : 15625 : 250 mS
        Done.

        HTH,
        --Jorge.

        Comment

        • Jorge

          #5
          Re: Fast search for all positions in a string

          On Jun 4, 5:09 pm, Bart Van der Donck <b...@nijlen.co mwrote:
          >
             var str = 'fdda54kokDdggf A54lf8754a544ro 9a54ypx9Wa54z'
             var mat = 'a54'
             var arr = str.split(mat)
             var res = new Array()
             var r = 0
             for (var i=0; i<arr.length-1; ++i)  {
               res[i] = 1 + r + arr[i].length
               r += arr[i].length + mat.length
             }
             alert(res)
          >
          Pasted in as :

          //Replace this f() with others' versions to compare them.
          function search (whatTxt, inTxt) {
          var i, arr= inTxt.split(wha tTxt),
          res= [], r= 0;

          for (i=0; i<arr.length-1; ++i) {
          res[i]= 1 + r + arr[i].length;
          r+= arr[i].length + whatTxt.length;
          }

          return res;
          }

          Gives :

          Safari/Mac :
          txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

          "Lorem" : 15625 : 21 mS
          "ipsum" : 31250 : 52 mS
          "veni" : 0 : 10 mS
          "iMac" : 0 : 19 mS
          "consequat" : 15625 : 39 mS
          "." : 93750 : 109 mS
          " " : 625000 : 706 mS
          "r" : 187500 : 278 mS
          "in" : 46875 : 85 mS
          " Lorem ipsum dolor " : 15625 : 49 mS
          Done.

          IE8b/WinXP :

          txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

          "Lorem" : 15625 : 150 mS
          "ipsum" : 31250 : 281 mS
          "veni" : 0 : 90 mS
          "iMac" : 0 : 100 mS
          "consequat" : 15625 : 171 mS
          "." : 93750 : 601 mS
          " " : 625000 : 3695 mS
          "r" : 187500 : 1112 mS
          "in" : 46875 : 190 mS
          " Lorem ipsum dolor " : 15625 : 180 mS
          Done.

          You win... :-)

          Although that's case-sensitive.

          --Jorge.

          Comment

          • Dr J R Stockton

            #6
            Re: Fast search for all positions in a string

            In comp.lang.javas cript message <ad6736ef-bbc7-4e37-a70e-2e65c3f286f0@r6
            6g2000hsg.googl egroups.com>, Wed, 4 Jun 2008 02:53:29, czechboy
            <oldrich.svec@g mail.composted:
            >what is the fastest way to search for all occurences of a string in a
            >very long string (from several KB to MB). What I want to get is an
            >array of numbers with positions of each occurence.
            This could perhaps be developed into a clean method :-

            S = "0aaaa1bbbb2222 ccccccc3dd"
            A = []
            J = 10 // safety-belt
            R = new RegExp("\\d+", "gi")

            while (J--) {
            M = R.exec(S)
            if (!M) break
            A.push(RegExp.l astIndex-RegExp.lastMatc h.length)
            }
            A // Result - [0,5,10,21]

            It's a good idea to read the newsgroup c.l.j and its FAQ. See below.

            --
            (c) John Stockton, nr London UK. ?@merlyn.demon. co.uk IE7 FF2 Op9 Sf3
            news:comp.lang. javascript FAQ <URL:http://www.jibbering.c om/faq/index.html>.
            <URL:http://www.merlyn.demo n.co.uk/js-index.htmjscr maths, dates, sources.
            <URL:http://www.merlyn.demo n.co.uk/TP/BP/Delphi/jscr/&c, FAQ items, links.

            Comment

            Working...