Why are the loops taking so much time to execute in following program

**chaarmann** · Jan 25 '17, 12:58 PM

You have written a method search() that
searches through the whole string and does not stop when finding the first.
Quotation:

Code:

while(m.find()){count++;}

But you use it in a way that it would be sufficient to find the first one:
Quotation:

Code:

if(search("</script>",w)>0) ...

So why do you not make a method searchFirst() that doesn't have a while-loop but just returns after finding first occurrence?
No need to use regular expressions then, just use searchString.in dexOf(searchKey ).

Second, you split the string into parts:

Code:

h_text = h.split(">");

then assemble it into a new string:

Code:

filtered_text += w; filtered_text += "\n"

and split it again:

Code:

html_text = filtered_text.split("\n");

So why do you not just put your splitted parts in variable w directly into html_text array?
for example html_text[i] = w ?

This is also a performance-no-go:

Code:

h = "<" + h;

this will copy the whole string again in memory. It cannot just append "<" in front of the existing string without shifting all characters in memory.

That's the reason why it is so slow. If you want to have high performance, do it this way:
Use a regular expression on your string "html" that deletes all script tags and the stuff inside.
Then split it into single lines.
Like so:

Code:

html=html.replaceAll("\\<script>.*?\\</script>", "");
html_text = html.split("\n");

Now measure the performance of these two lines that replaces your whole code logic. It should be much faster!

**rspvsanjay** · Feb 14 '17, 07:01 AM

ok, thank you

how to write this code by replaceAll method:

String extractText(Str ing s) throws IOException
{
String html = fj.toHtmlString (s); //extracted html source code from wikipedia
String filtered_text=" ";
System.out.prin tln("extracted \n\n");
String []html_text = html.split("\n" );
long start = System.currentT imeMillis();

for(String h:html_text)
{ //System.out.prin tln("ky4"+h);
if(Pattern.comp ile("</strong>", Pattern.CASE_IN SENSITIVE + Pattern.LITERAL ).matcher(h).fi nd())
{

}
else if(Pattern.comp ile("<strong", Pattern.CASE_IN SENSITIVE + Pattern.LITERAL ).matcher(h).fi nd())
{

}
else
{
filtered_text += h;
filtered_text += "\n";
}
}
long end = System.currentT imeMillis();
System.out.prin tln("loop end in "+(end-start)/1000+" seconds"+" or "+(end-start)+" miliseconds");//System.out.prin tln(++i2+" th loop end in "+(end-start)/1000+" seconds");
return filtered_text;
}

**chaarmann** · Feb 16 '17, 12:35 PM

1.)To enhance performance, you should compile a pattern outside the for-loop, that means only once! Then apply it many times (matcher) inside the for-loop.
2.) If you have a pattern A and a pattern B, then do not write two if-statements searching the whole string for it in each.
Just seach the string once with the combined pattern "A|B". This will go through the string only once.
3.) Do not split the text first at newlines and then put the filtered pieces together. Just apply the pattern once on the whole original string. This way it could be up to 10 times faster.
4.) Using StringBuilder intead of "+" to concatenate many strings is much faster.

**rspvsanjay** · Feb 16 '17, 01:27 PM

my problem is resolved thank you

Why are the loops taking so much time to execute in following program

Why are the loops taking so much time to execute in following program

Comment

Comment

Comment

Comment