help me remove the junk values from html files...in c# 2008

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Smish
    New Member
    • Jan 2007
    • 51

    help me remove the junk values from html files...in c# 2008

    hi,hope atleast this question would b answered...
    Am using C# 2008
    Am reading a html file

    using :

    streamreader sr = new streamreader(pa th,Encoding.DEf ault);

    string input = sr.Read();

    streamwriter sw = new streamwriter(pa th);

    sw.Write(input) ;
    sw.Close();




    here the html file initially is proper but after writing it takes junk character which is visible through the browser

    ..thanx...
  • Sick0Fant
    New Member
    • Feb 2008
    • 121

    #2
    Originally posted by Smish
    hi,hope atleast this question would b answered...
    Am using C# 2008
    Am reading a html file

    using :

    streamreader sr = new streamreader(pa th,Encoding.DEf ault);

    string input = sr.Read();

    streamwriter sw = new streamwriter(pa th);

    sw.Write(input) ;
    sw.Close();




    here the html file initially is proper but after writing it takes junk character which is visible through the browser

    ..thanx...
    rdr.Read() reads one char. You want rdr.ReadToEnd()

    Comment

    • Smish
      New Member
      • Jan 2007
      • 51

      #3
      Originally posted by Sick0Fant
      rdr.Read() reads one char. You want rdr.ReadToEnd()
      i read it using s.REadToEnd(); too
      but still the problem is same...:(

      Comment

      • Sick0Fant
        New Member
        • Feb 2008
        • 121

        #4
        Originally posted by Smish
        i read it using s.REadToEnd(); too
        but still the problem is same...:(
        Okay, what are these "junk" values of which you speak?

        Comment

        • Smish
          New Member
          • Jan 2007
          • 51

          #5
          i have pasted the code snippet which am using here....

          StreamReader sr = File.OpenText(G :\\HtmlReports\ \Original.htm);
          string input = sr.ReadToEnd();
          input = input.Trim();
          StreamWriter strWriter = new StreamWriter("G :\\HtmlReports\ \ReplacedHtm.ht m");
          strWriter.Write (input);


          The junk values which am able to see in the browser is
          "ï¿½ï¿½ï¿½ï ¿½ " or sme other format...


          the html file has space in betwen to beautify the document namely " (a)<span style='mso-spacerun:yes'> </span>Apple</span>" and whn i read this file through streamreader the space ie betwn "'mso-spacerun:yes'>" and "</span>"is encoded in to these junk values ..hence am geting these junk values..

          So do you kno any means to remove them ??

          thanx

          Comment

          • Sick0Fant
            New Member
            • Feb 2008
            • 121

            #6
            Originally posted by Smish
            i have pasted the code snippet which am using here....

            StreamReader sr = File.OpenText(G :\\HtmlReports\ \Original.htm);
            string input = sr.ReadToEnd();
            input = input.Trim();
            StreamWriter strWriter = new StreamWriter("G :\\HtmlReports\ \ReplacedHtm.ht m");
            strWriter.Write (input);


            The junk values which am able to see in the browser is
            "ï¿½ï¿½ï¿½ï ¿½ " or sme other format...


            the html file has space in betwen to beautify the document namely " (a)<span style='mso-spacerun:yes'> </span>Apple</span>" and whn i read this file through streamreader the space ie betwn "'mso-spacerun:yes'>" and "</span>"is encoded in to these junk values ..hence am geting these junk values..

            So do you kno any means to remove them ??

            thanx
            Well, the whitespace didn't matter in my test... I always was able to get the html document without the junk.

            If all you're doing is moving one file to another location, .NET has a method to do that which is much more efficient than writing the file yourself.

            But anyway, I would suspect that those spaces are actually characters that can't be displayed by your editor (though usually, it's been my experience that they at least display *something*--usually a smiley face or something). The quickest way to test this theory is go back to your editor and delete all the "whitespace " where you're getting the junk and replace it with spaces or tabs. If you're still getting the junk, you'll want to watch the input variable after you've read the document, to see if it's getting read correctly.

            Come back if you still get junk.

            Comment

            • Smish
              New Member
              • Jan 2007
              • 51

              #7
              If i remove the white space thn it works fine..but thn the files cming from the user would be in any format..so hw to remove these whitespaces???

              Comment

              Working...