Parsing JavaScript from HTML file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • baldwasagar
    New Member
    • Mar 2008
    • 4

    Parsing JavaScript from HTML file

    I want to parse a HTML file in Java which has JavaScript also in it. I want to fetch the data of Java Script tag also. The tag is SCRIPT. Please help with suggestions / solutions.

    I have tried using Java HTMLEditorKit API but it does not work for SCRIPT tag/

    Regards,
    Sagar
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    Originally posted by baldwasagar
    I want to parse a HTML file in Java which has JavaScript also in it. I want to fetch the data of Java Script tag also. The tag is SCRIPT. Please help with suggestions / solutions.

    I have tried using Java HTMLEditorKit API but it does not work for SCRIPT tag/

    Regards,
    Sagar
    I haven't tried it but define an HTML.UnknownTag with the "SCRIPT" as the
    identifier. Then try to get an HTMLDocument.It erator from the HTMLDocument
    given this tag. Given this iterator you should be able to get the text between tags
    of the SCRIPT type.

    kind regards,

    Jos

    Comment

    • blitzkreig
      New Member
      • Mar 2008
      • 7

      #3
      You can use Jericho HTML Parser.
      http://jerichohtml.sou rceforge.net/doc/index.html

      The author says that the ASP, JSP, PSP, PHP and Mason server tags are explicitly recognised by the parser

      Comment

      • baldwasagar
        New Member
        • Mar 2008
        • 4

        #4
        I have tried using it Jos but it does not works.


        <SCRIPT> document.write ( "<A onclick=\"OpenM SDigestWin()\" target=\"MSDige stWin\" HREF = \"" + msdigestLink + "dna_reading_fr ame=1&access_me thod=Accession+ Number&hide_pro tein_sequence=2 &open_reading_f rame=1&coverage _map=0+138+13+3 2+14+38+6+25+14 +5+12+77+12+ 34+13+4+14+97+9 +23+27&search_c ycle=1&accessio n_num=P02769\"> 22%</A>" );</SCRIPT>


        I want to parse this code and want the text between anchor tag which I have made bold. I am trying using HTMLEditorKit.P arserCallback but it does not works for above piece of code.

        Please help me with the solution / suggestions.

        regards,

        Comment

        • baldwasagar
          New Member
          • Mar 2008
          • 4

          #5
          I have build my own methowd which parse the SCRIPT Tag. Using HTMLEditorKit.P arserCallback, I diverted the control flow whenever a SCRIPT tag occured.

          This method would have handle any <SCRIPT> tag in a HTML file. Also< i wanted the <A> which is placed in SCRIPT tag which i am also able to get. thanks for every1 who contributed to my queriers.

          Below is the code.

          BufferedReader br;

          boolean checkLineForScr ipt = false;
          String tempGlobalStrin g = "";


          String parsedScriptTag = "", tempString = "", fullScriptTag = "",
          scriptStart = "", scriptEnd = "", str = "", actualDataToken = "";

          int getScriptTagSta rtPosition = 0 , getScripttagEnd Position = -1,
          tempPosition = -1, lastindex = 0, anchorTagEndPos ition = -1,
          anchorTagStartP osition = -1;

          try
          {
          if(checkLineFor Script == true)
          {
          getScriptTagSta rtPosition = tempGlobalStrin g.indexOf("<SCR IPT");
          lastindex = tempGlobalStrin g.length();
          getScripttagEnd Position = tempGlobalStrin g.indexOf("</SCRIPT>");

          if(getScriptTag StartPosition > getScripttagEnd Position)
          {
          fullScriptTag = tempGlobalStrin g.substring(get ScriptTagStartP osition, lastindex);
          in: while((scriptEn d = br.readLine()) != null)
          {
          getScripttagEnd Position = scriptEnd.index Of("</SCRIPT>");
          if(getScripttag EndPosition == -1)
          {
          fullScriptTag = fullScriptTag + scriptEnd;
          continue in;
          }
          else
          {
          tempString = scriptEnd.subst ring(0,getScrip ttagEndPosition );
          fullScriptTag = tempString + fullScriptTag;
          anchorTagEndPos ition = fullScriptTag.i ndexOf("</A>");
          if(anchorTagEnd Position > -1)
          {
          anchorTagStartP osition = fullScriptTag.i ndexOf("\">");
          actualDataToken = fullScriptTag.s ubstring(anchor TagStartPositio n+2,
          anchorTagEndPos ition);
          //System.out.prin tln(actualDataT oken);
          }
          System.out.prin tln(fullScriptT ag);
          break in;
          }
          }
          }
          else if(getScriptTag StartPosition < getScripttagEnd Position)
          {
          parsedScriptTag = tempGlobalStrin g.substring(get ScriptTagStartP osition, getScripttagEnd Position);
          anchorTagEndPos ition = parsedScriptTag .indexOf("</A>");
          if(anchorTagEnd Position > -1)
          {
          anchorTagStartP osition = parsedScriptTag .indexOf("\">") ;
          actualDataToken = parsedScriptTag .substring(anch orTagStartPosit ion+2,
          anchorTagEndPos ition);
          //System.out.prin tln(actualDataT oken);
          }
          System.out.prin tln(parsedScrip tTag);
          }
          }
          else
          {
          out: while ((scriptStart = br.readLine()) != null)
          {
          getScriptTagSta rtPosition = scriptStart.ind exOf("<SCRIPT") ;
          lastindex = scriptStart.len gth();
          getScripttagEnd Position = scriptStart.ind exOf("</SCRIPT>");

          if(getScriptTag StartPosition > getScripttagEnd Position)
          {
          fullScriptTag = scriptStart.sub string(getScrip tTagStartPositi on, lastindex);
          in: while((scriptSt art = br.readLine()) != null)
          {
          getScripttagEnd Position = scriptStart.ind exOf("</SCRIPT>");
          if(getScripttag EndPosition == -1)
          {
          fullScriptTag = fullScriptTag + scriptStart;
          continue in;
          }
          else
          {
          tempString = scriptStart.sub string(0,getScr ipttagEndPositi on);
          fullScriptTag = tempString + fullScriptTag;
          anchorTagEndPos ition = fullScriptTag.i ndexOf("</A>");
          if(anchorTagEnd Position > -1)
          {
          anchorTagStartP osition = fullScriptTag.i ndexOf("\">");
          actualDataToken = fullScriptTag.s ubstring(anchor TagStartPositio n+2,
          anchorTagEndPos ition);
          //System.out.prin tln(actualDataT oken);
          }
          //System.out.prin tln(fullScriptT ag);
          break out;
          }
          }
          }
          else if(getScriptTag StartPosition < getScripttagEnd Position)
          {
          parsedScriptTag = scriptStart.sub string(getScrip tTagStartPositi on, getScripttagEnd Position);
          anchorTagEndPos ition = parsedScriptTag .indexOf("</A>");
          if(anchorTagEnd Position > -1)
          {
          anchorTagStartP osition = parsedScriptTag .indexOf("\">") ;
          actualDataToken = parsedScriptTag .substring(anch orTagStartPosit ion+2
          , anchorTagEndPos ition);
          //System.out.prin tln(actualDataT oken);
          }
          //System.out.prin tln(parsedScrip tTag);
          break out;
          }
          }

          lastindex = scriptStart.len gth();
          str = scriptStart.sub string(getScrip ttagEndPosition , lastindex);
          tempPosition = str.indexOf("<S CRIPT");
          lastindex = str.length();
          }
          if(tempPosition > 0)
          {
          checkLineForScr ipt = true;
          tempGlobalStrin g = str.substring(t empPosition, lastindex);
          }
          else
          checkLineForScr ipt = false;

          Comment

          Working...