Extract links from Javascript (not using Javascript)?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • chrisspencer02@yahoo.com

    Extract links from Javascript (not using Javascript)?

    I am looking for a method to extract the links embedded within the
    Javascript in a web page: an ActiveX component, or example code in
    C++/Pascal/etc. I am looking for a general solution, not one tailored
    to a particular page/script.

    Hopefully, the problem can be solved without recreating a complete
    Javascript interpreter. Any ideas?

  • Ira Baxter

    #2
    Re: Extract links from Javascript (not using Javascript)?


    <chrisspencer02 @yahoo.com> wrote in message
    news:1148670236 .974770.107310@ j55g2000cwa.goo glegroups.com.. .[color=blue]
    > I am looking for a method to extract the links embedded within the
    > Javascript in a web page: an ActiveX component, or example code in
    > C++/Pascal/etc. I am looking for a general solution, not one tailored
    > to a particular page/script.
    >
    > Hopefully, the problem can be solved without recreating a complete
    > Javascript interpreter. Any ideas?[/color]

    If you expect to have any chance at getting at links that are anything
    that other coded directly in a string liveral, you will need at least a full
    JavaScript parser.
    See http://www.semanticdesigns.com/Produ...nds/index.html
    for a JavaScript front end that is designed to be used in custom tasks
    like this.

    --
    Ira Baxter, CTO
    software, analysis, translation, porting, modification, generation, synthesis, reengineering, reverse engineering, testing, Hogan System, PLEX, AS/400, migration, program transformation, rules, parser, domain-specific, legacy, C/C++, Java, XML, COBOL, Ada, Fortran, PL/1, Agile, Testing, refactoring




    Comment

    • Randy Webb

      #3
      Re: Extract links from Javascript (not using Javascript)?

      chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:[color=blue]
      > I am looking for a method to extract the links embedded within the
      > Javascript in a web page: an ActiveX component, or example code in
      > C++/Pascal/etc. I am looking for a general solution, not one tailored
      > to a particular page/script.[/color]

      There are too many possibilities to deal with for a solution to that
      question to be simple and/or general. Just too many ways that a URL can
      be put together in script.

      Can you give a general example of what you are trying to do though?
      --
      Randy
      comp.lang.javas cript FAQ - http://jibbering.com/faq & newsgroup weekly
      Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/

      Comment

      • Randy Webb

        #4
        Re: Extract links from Javascript (not using Javascript)?

        Ira Baxter said the following on 5/26/2006 3:44 PM:[color=blue]
        > <chrisspencer02 @yahoo.com> wrote in message
        > news:1148670236 .974770.107310@ j55g2000cwa.goo glegroups.com.. .[color=green]
        >> I am looking for a method to extract the links embedded within the
        >> Javascript in a web page: an ActiveX component, or example code in
        >> C++/Pascal/etc. I am looking for a general solution, not one tailored
        >> to a particular page/script.
        >>
        >> Hopefully, the problem can be solved without recreating a complete
        >> Javascript interpreter. Any ideas?[/color]
        >
        > If you expect to have any chance at getting at links that are anything
        > that other coded directly in a string liveral, you will need at least a full
        > JavaScript parser.[/color]

        And even that is not a guarantee of success.
        [color=blue]
        > See http://www.semanticdesigns.com/Produ...nds/index.html
        > for a JavaScript front end that is designed to be used in custom tasks
        > like this.[/color]

        It is designed to parse out any and all URL's that a document possesses?

        I find that a dubious claim.

        --
        Randy
        comp.lang.javas cript FAQ - http://jibbering.com/faq & newsgroup weekly
        Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/

        Comment

        • chrisspencer02@yahoo.com

          #5
          Re: Extract links from Javascript (not using Javascript)?

          Randy Webb wrote:[color=blue]
          > chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:[color=green]
          > > I am looking for a method to extract the links embedded within the
          > > Javascript in a web page: an ActiveX component, or example code in
          > > C++/Pascal/etc. I am looking for a general solution, not one tailored
          > > to a particular page/script.[/color]
          >
          > There are too many possibilities to deal with for a solution to that
          > question to be simple and/or general. Just too many ways that a URL can
          > be put together in script.
          >
          > Can you give a general example of what you are trying to do though?[/color]

          I would like to transform web pages "in the wild" into tables of links
          for a site map, regardless of whether those links are encoded in HTML,
          CSS, Flash, Javascript, etc. Sounds like this is not possible,
          particularly for event-driven aspects of the script like rollover image
          menus?

          Comment

          • Randy Webb

            #6
            Re: Extract links from Javascript (not using Javascript)?

            chrisspencer02@ yahoo.com said the following on 5/26/2006 8:44 PM:[color=blue]
            > Randy Webb wrote:[color=green]
            >> chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:[color=darkred]
            >>> I am looking for a method to extract the links embedded within the
            >>> Javascript in a web page: an ActiveX component, or example code in
            >>> C++/Pascal/etc. I am looking for a general solution, not one tailored
            >>> to a particular page/script.[/color]
            >> There are too many possibilities to deal with for a solution to that
            >> question to be simple and/or general. Just too many ways that a URL can
            >> be put together in script.
            >>
            >> Can you give a general example of what you are trying to do though?[/color]
            >
            > I would like to transform web pages "in the wild" into tables of links
            > for a site map, regardless of whether those links are encoded in HTML,
            > CSS, Flash, Javascript, etc. Sounds like this is not possible,
            > particularly for event-driven aspects of the script like rollover image
            > menus?
            >[/color]

            It could be done with regards to the CSS, HTML, and JS aspects, but it
            wouldn't be a pretty task to try to accomplish. Just trying to resolve
            relative paths would be a major headache.

            --
            Randy
            comp.lang.javas cript FAQ - http://jibbering.com/faq & newsgroup weekly
            Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/

            Comment

            • Thomas 'PointedEars' Lahn

              #7
              Re: Extract links from Javascript (not using Javascript)?

              chrisspencer02@ yahoo.com wrote:
              [color=blue]
              > Randy Webb wrote:[color=green]
              >> chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:[color=darkred]
              >> > I am looking for a method to extract the links embedded within the
              >> > Javascript in a web page: an ActiveX component, or example code in
              >> > C++/Pascal/etc.[/color][/color][/color]

              Obviously you are not yet sure what to use, so a newsgroup dedicated to a
              certain (group of) language(s), like this one, is not the place to start.
              Try comp.infosystem s.www.authoring.misc, or comp.lang.misc.
              [color=blue][color=green][color=darkred]
              >> > I am looking for a general solution, not one tailored
              >> > to a particular page/script.[/color]
              >>
              >> There are too many possibilities to deal with for a solution to that
              >> question to be simple and/or general. Just too many ways that a URL can
              >> be put together in script.
              >>
              >> Can you give a general example of what you are trying to do though?[/color]
              >
              > I would like to transform web pages "in the wild" into tables of links
              > for a site map,[/color]

              A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
              elements), not tables. A table is a table is a table. [psf 3.8]
              [color=blue]
              > regardless of whether those links are encoded in HTML, CSS, Flash,
              > Javascript, etc. Sounds like this is not possible,[/color]

              It is possible to a certain point (I don't think decompiling Flash is
              possible easily). There is software for that already (Web spiders),
              and you could use its output.
              [color=blue]
              > particularly for event-driven aspects of the script like rollover image
              > menus?[/color]

              The rollover effect has to take place on existing markup, so it does not
              matter here. You will have difficulties to recognize not gracefully
              degrading client-side generated menus, and those that use pseudo-links
              like (<a href="javascrip t:somefunction( )">...</a>), though.

              Which also tells you that unless you are using server-side J(ava)Script,
              J(ava)Script is not the appropriate language for generating the site map.
              However, e.g. it can help with letting the user expand/collapse it later.


              PointedEars
              --
              This is Usenet. It is a discussion group, not a helpdesk. You post
              something, we discuss it. If you have a question and that happens to get
              answered in the course of the discussion, then great. If not, you can
              have a full refund of your membership fees. -- Mark Parnell in alt.html

              Comment

              • chrisspencer02@yahoo.com

                #8
                Re: Extract links from Javascript (not using Javascript)?

                Thomas 'PointedEars' Lahn wrote:[color=blue]
                > chrisspencer02@ yahoo.com wrote:
                >[color=green]
                > > Randy Webb wrote:[color=darkred]
                > >> chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:
                > >> > I am looking for a method to extract the links embedded within the
                > >> > Javascript in a web page: an ActiveX component, or example code in
                > >> > C++/Pascal/etc.[/color][/color]
                >
                > Obviously you are not yet sure what to use, so a newsgroup dedicated to a
                > certain (group of) language(s), like this one, is not the place to start.
                > Try comp.infosystem s.www.authoring.misc, or comp.lang.misc.[/color]

                I am not *unsure* what language to use to solve this problem; actually
                I don't care. My question is about algorithms for parsing and
                interpreting Javascript.

                [color=blue][color=green][color=darkred]
                > >> > I am looking for a general solution, not one tailored
                > >> > to a particular page/script.
                > >>
                > >> There are too many possibilities to deal with for a solution to that
                > >> question to be simple and/or general. Just too many ways that a URL can
                > >> be put together in script.
                > >>
                > >> Can you give a general example of what you are trying to do though?[/color]
                > >
                > > I would like to transform web pages "in the wild" into tables of links
                > > for a site map,[/color]
                >
                > A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
                > elements), not tables. A table is a table is a table. [psf 3.8][/color]

                I do not mean "table" as in HTML table, but "table" as in raw data set.

                [color=blue][color=green]
                > > regardless of whether those links are encoded in HTML, CSS, Flash,
                > > Javascript, etc. Sounds like this is not possible,[/color]
                >
                > It is possible to a certain point (I don't think decompiling Flash is
                > possible easily). There is software for that already (Web spiders),
                > and you could use its output.[/color]

                Have you used any that actually extract links from Javascript? I have
                not, though I know some claim to do so.

                [color=blue][color=green]
                > > particularly for event-driven aspects of the script like rollover image
                > > menus?[/color]
                >
                > The rollover effect has to take place on existing markup, so it does not
                > matter here. You will have difficulties to recognize not gracefully
                > degrading client-side generated menus, and those that use pseudo-links
                > like (<a href="javascrip t:somefunction( )">...</a>), though.
                >
                > Which also tells you that unless you are using server-side J(ava)Script,
                > J(ava)Script is not the appropriate language for generating the site map.
                > However, e.g. it can help with letting the user expand/collapse it later.[/color]

                Again, I am not looking to write a solution *in* Javascript
                (necessarily), I am looking to read links *from* Javascript using
                whatever tools are available.

                Comment

                • Andy Baxter

                  #9
                  Re: Extract links from Javascript (not using Javascript)?

                  chrisspencer02 said:
                  [color=blue]
                  > I am looking for a method to extract the links embedded within the
                  > Javascript in a web page: an ActiveX component, or example code in
                  > C++/Pascal/etc. I am looking for a general solution, not one tailored
                  > to a particular page/script.[/color]

                  How general do you want this to be - a completely general solution is
                  probably impossible. I'm not being arsey about this - I'm just interested
                  in the problem.

                  E.g. sometimes people are going to write code which is something like this:

                  var siteName="http://lofty.dyndns.in fo";
                  ....
                  var paths=Array("im ages","js");
                  ....
                  var filename="icon. gif";
                  ....
                  var url=siteName+pa ths+filename;

                  So if you come at it from the side of parsing the code to see if there are
                  any valid links embedded in it, you won't get them all without (in
                  the worst case) writing some AI that is on a par with a human javascript
                  programmer...

                  If you come at it from the side of running the code in a javascript
                  interpreter to see what links it generates, it could be just as bad. E.g.
                  someone might have a puzzle page that links you to another page when
                  you've solved the problem. To get at the url this way, you would have to
                  write some AI that could firstly work out that it /was/ a puzzle page, and
                  then solve the puzzle, which is even worse.

                  In practice it's probably not that bad, so you're probably better off
                  spending some time reading people's javascript, looking for common ways
                  people do stuff (e.g. rollover buttons), and then writing code tailored to
                  those.

                  --


                  remove 'n-u-l-l' to email me. html mail or attachments will go in the spam
                  bin unless notified with [html] or [attachment] in the subject line.

                  Comment

                  • Thomas 'PointedEars' Lahn

                    #10
                    Re: Extract links from Javascript (not using Javascript)?

                    chrisspencer02@ yahoo.com wrote:
                    [color=blue]
                    > Thomas 'PointedEars' Lahn wrote:[color=green]
                    >> chrisspencer02@ yahoo.com wrote:[color=darkred]
                    >> > Randy Webb wrote:
                    >> >> chrisspencer02@ yahoo.com said the following on 5/26/2006 3:03 PM:
                    >> >> > I am looking for a method to extract the links embedded within the
                    >> >> > Javascript in a web page: an ActiveX component, or example code in
                    >> >> > C++/Pascal/etc.[/color]
                    >>
                    >> Obviously you are not yet sure what to use, so a newsgroup dedicated to a
                    >> certain (group of) language(s), like this one, is not the place to start.
                    >> Try comp.infosystem s.www.authoring.misc, or comp.lang.misc.[/color]
                    >
                    > I am not *unsure* what language to use to solve this problem; actually
                    > I don't care. My question is about algorithms for parsing and
                    > interpreting Javascript.[/color]

                    Interpretation of "Javascript " would first include the recognition that
                    there are different implementations of ECMAScript: JavaScript, JScript,
                    Opera-ECMAScript, KJS; just to name the most widely distributed ones.

                    Whether script code executes or not, i.e. whether there is a "link" or
                    not, would depend entirely on how tight something is coded to a specific
                    implementation, let alone a specific execution environment or, object
                    model.

                    Second, if you would stick to strictly ECMAScript-conforming code as
                    should be expected by an interoperable Web site that is to be parsed,
                    the matter of interpretation includes how you want to recognize what
                    is a "link" or not. Because

                    var img = new Image();
                    img.src = "foo";

                    could be considered a link (to an image resource named `foo').

                    var img = new Object();
                    img.src = "foo";

                    could not.

                    As for recognizing links and pseudo-links such as

                    function updateFrame(o)
                    {
                    var f = window.parent.f rames['foo'];
                    if (f && f.document)
                    {
                    f.document.URL = "bar/" + o.href;
                    return false;
                    }

                    return true;
                    }

                    <a href="blurb.htm l" onclick="return updateFrame(thi s);"

                    or the ill-conceived

                    <a href="#" onclick="locati on = foo + 'bar'">...</a>

                    <a href="javascrip t:someFunction( )">...</a>

                    or even something dynamically scripted like

                    <script type="text/javascript">
                    var a = document.create Element("a");
                    if (a && isMethod(a.appe ndChild, a.addEventListe ner,
                    document.create TextNode, document.body.a ppendChild))
                    {
                    a.appendChild(d ocument.createT extNode("foo")) ;
                    a.addEventListe ner('click',
                    function(e)
                    {
                    if (!e) e = window.event;
                    if (e)
                    {
                    (dhtml.getElem( "id", "bar") || {click: function(){}}). onclick();
                    if (isMethod(e.sto pPropagation)) e.stopPropagati on();
                    if (isMethod(e.pre ventDefault)) e.preventDefaul t();
                    if (typeof e.cancelBubble != "undefined" ) e.cancelBubble = true;
                    }
                    },
                    false);

                    document.body.a ppendChild(a);
                    }
                    </script>

                    how would you even /know/ that there is a "link" and where it points to
                    without implementing the script engine along with its execution environment
                    itself? I think there are far too many variables here to make even an
                    educated guess.
                    [color=blue][color=green][color=darkred]
                    >> > regardless of whether those links are encoded in HTML, CSS, Flash,
                    >> > Javascript, etc. Sounds like this is not possible,[/color]
                    >>
                    >> It is possible to a certain point (I don't think decompiling Flash is
                    >> possible easily). There is software for that already (Web spiders),
                    >> and you could use its output.[/color]
                    >
                    > Have you used any that actually extract links from Javascript?[/color]

                    No. Probably for good reason.
                    [color=blue]
                    > Again, I am not looking to write a solution *in* Javascript
                    > (necessarily), I am looking to read links *from* Javascript
                    > using whatever tools are available.[/color]

                    I don't think this is very much on topic here.


                    PointedEars

                    Comment

                    Working...