regular expression help...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ian Richardson

    regular expression help...

    I'm looking to use Javascript to pull apart a page of HTML I have
    already fetched. The page contains a table, within which there are rows
    containing...

    Either:

    0000000 - 0000000<some html>00<some html>0

    or:

    0000000 - 0000000<some html>00 - 00<some html>0

    I'm interested in extracting only the numbers (which may not always be
    0!), in each case. I bet this can be done using a regular expression (or
    two). Can anyone help?

    Thanks,

    Ian
  • Alberto

    #2
    Re: regular expression help...

    A possible way, not necessarily the best but a way is:

    <script>
    var foo="0000000 - 0000000<some html>00<some html>0"
    alert(foo.match (/\d+/g))
    </script>

    You can add properties to the script tag if you prefer, not so important.

    I hope that was close to what you may need.
    Note that you'd first check to be safer if match actually returned an array
    or, if no match, null.

    Maybe better solutions will follow. Mine is just one.

    ciao
    Alberto


    "Ian Richardson" <zathras@chaos. org.uk> ha scritto nel messaggio
    news:2ojb6sFb9p kfU1@uni-berlin.de...[color=blue]
    > I'm looking to use Javascript to pull apart a page of HTML I have
    > already fetched. The page contains a table, within which there are rows
    > containing...
    >
    > Either:
    >
    > 0000000 - 0000000<some html>00<some html>0
    >
    > or:
    >
    > 0000000 - 0000000<some html>00 - 00<some html>0
    >
    > I'm interested in extracting only the numbers (which may not always be
    > 0!), in each case. I bet this can be done using a regular expression (or
    > two). Can anyone help?
    >
    > Thanks,
    >
    > Ian[/color]


    Comment

    • Mick White

      #3
      Re: regular expression help...

      Ian Richardson wrote:[color=blue]
      > Either:
      >
      > 0000000 - 0000000<some html>00<some html>0
      >
      > or:
      >
      > 0000000 - 0000000<some html>00 - 00<some html>0
      >
      > I'm interested in extracting only the numbers (which may not always be
      > 0!), in each case. I bet this can be done using a regular expression (or
      > two). Can anyone help?
      >
      > Thanks,
      >
      > Ian[/color]
      You don't necessarily need to use regex, you can use the DOM methods,
      for example (without error checking):

      var tds=document.ge tElementsByTagN ame("TD")

      //Then loop through the collection
      var numbers=[];
      for(var i=0;i<tds.lengt h;i++){
      if(!isNaN(tds.i tem(i).firstChi ld.data)){
      numbers.push(td s.item(i).first Child.data)
      }
      }

      I'm not sure that the method above is the best at identifying Numbers,
      and it will fail if you use tags within the <td> tag, but it is
      something to get you thinking.


      Mick

      Comment

      • Ian Richardson

        #4
        Re: regular expression help...

        Mick White wrote:[color=blue]
        > Ian Richardson wrote:
        >[color=green]
        >> Either:
        >>
        >> 0000000 - 0000000<some html>00<some html>0
        >>
        >> or:
        >>
        >> 0000000 - 0000000<some html>00 - 00<some html>0
        >>
        >> I'm interested in extracting only the numbers (which may not always be
        >> 0!), in each case. I bet this can be done using a regular expression
        >> (or two). Can anyone help?
        >>
        >> Thanks,
        >>
        >> Ian[/color]
        >
        > You don't necessarily need to use regex, you can use the DOM methods,
        > for example (without error checking):[/color]

        I can't use DOM methods if all I'm dealing with is one long string of
        HTML which contains the numeric data I wish to extract...

        I can't just search through the string looking for numeric data as I'm
        likely to pick up other stuff which I don't need.

        I was really hoping for a regular expression to do extract multiples of:

        7 digits and 7 digits separated by " - "
        (HTML I'm not interested in)
        Either 2 digits, or 2 digits followed by " - " and another 2 digits
        (HTML I'm not interested in)
        1 digit

        ....from within an HTML table, the markup of which can change...

        Any other ideas?

        Thanks,

        Ian

        Comment

        • Michael Winter

          #5
          Re: regular expression help...

          On Thu, 19 Aug 2004 15:53:28 +0100, Ian Richardson <zathras@chaos. org.uk>
          wrote:

          [snip]
          [color=blue]
          > I was really hoping for a regular expression to do extract multiples of:
          >
          > 7 digits and 7 digits separated by " - "
          > (HTML I'm not interested in)
          > Either 2 digits, or 2 digits followed by " - " and another 2 digits
          > (HTML I'm not interested in)
          > 1 digit
          >
          > ...from within an HTML table, the markup of which can change...[/color]

          Alberto's suggestion is a good start, but it depends on whether the
          mark-up can contain numbers itself. If it can, there needs to be more
          structure in the expression. Try:

          /(\d{7}) - (\d{7})\x3c.+\x 3e(\d{2})( - (\d{2}))?\x3c.+ \x3e(\d)/

          When used with the exec() method, it will return an array that contains:

          element 0 - ignore
          1 - first group of seven digits
          2 - second group
          3 - first group of two digits
          4 - ignore
          5 - second group of two digits
          (undefined if they don't exist)
          6 - final digit

          This appears to be safe, even if numbers appear in the separating HTML,
          but without live data, I can't test properly.

          Give it a try. If it doesn't work and you can't modify the expression,
          show the exact test case you used and we will see what we can do.

          Good luck,
          Mike

          --
          Michael Winter
          Replace ".invalid" with ".uk" to reply by e-mail.

          Comment

          • Shawn Milo

            #6
            Re: regular expression help...

            Ian Richardson <zathras@chaos. org.uk> wrote in message news:<2ojb6sFb9 pkfU1@uni-berlin.de>...[color=blue]
            > I'm looking to use Javascript to pull apart a page of HTML I have
            > already fetched. The page contains a table, within which there are rows
            > containing...
            >
            > Either:
            >
            > 0000000 - 0000000<some html>00<some html>0
            >
            > or:
            >
            > 0000000 - 0000000<some html>00 - 00<some html>0
            >[/color]
            <snip>


            Try this:

            <script type="text/javascript">


            var reStrip = new RegExp("([0-9]{7}) -
            ([0-9]{7})<[^>]+>(([0-9]{2})( - ([0-9]{2}))?)<[^>]+>([0-9]{1})",
            'gi');

            var strTest = '';

            strTest = '0000000 - 0000000<some html>00<some html>0';
            if (strTest.match( reStrip)){

            alert('Match!') ;

            }else{

            alert('No Match!');

            }




            // The first number will be returned as $1, as in:
            // strTest.replace (reStrip, '$1')
            //
            // Just be aware that in the case of the second set of numbers (
            // ('00 - 00' instead of '00', you will have additional values
            (one for each
            // matching set of parethesis..


            strTest = '0000000 - 0000000<some html>00 - 00<some html>0';
            if (strTest.match( reStrip)){

            alert('Match!') ;

            }else{

            alert('No Match!');

            }



            </script>

            Comment

            Working...