ifstream file gives weird character for .html extension

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • eagerlearner
    New Member
    • Jul 2007
    • 29

    ifstream file gives weird character for .html extension

    When I open .html file i got a few weird characters printed out which preceding the content inside the .html file. But when I have the same file content with .txt extension, it has not problem. Anyone knows why ? Thanks.

    Code:
    #include <iostream>
    #include <string>
    #include <fstream>
    using namespace std;
    
    int main()
    {
    	string buff, temp;
    	ifstream file;
    	file.open("htmlfile.html");
    	if(file.is_open())
    		cout << "opened";
    	while(getline(file,temp))
    		buff += temp;
    	cout << buff;
    	return 0;
    }
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    Originally posted by eagerlearner
    When I open .html file i got a few weird characters printed out which preceding the content inside the .html file. But when I have the same file content with .txt extension, it has not problem. Anyone knows why ? Thanks.

    Code:
    #include <iostream>
    #include <string>
    #include <fstream>
    using namespace std;
    
    int main()
    {
    	string buff, temp;
    	ifstream file;
    	file.open("htmlfile.html");
    	if(file.is_open())
    		cout << "opened";
    	while(getline(file,temp))
    		buff += temp;
    	cout << buff;
    	return 0;
    }
    Are you sure the two files are identical? What would happen if you got rid of
    the other .txt file, rename your htmlfile.html to htmlfile.txt, do the same in your
    program and try again? The ifstream.open() function couldn't care less about
    the file extension afaik.

    kind regards,

    Jos

    Comment

    • eagerlearner
      New Member
      • Jul 2007
      • 29

      #3
      Originally posted by JosAH
      Are you sure the two files are identical? What would happen if you got rid of
      the other .txt file, rename your htmlfile.html to htmlfile.txt, do the same in your
      program and try again? The ifstream.open() function couldn't care less about
      the file extension afaik.

      kind regards,

      Jos
      Thanks for that hint,
      Now I delete both the .html and .txt and I go to my project file and recreate both file with following steps, then it works fine with my original code. FYI, I am using Windows Vista.
      - Right click "new" > "text document"
      - Name it as "txt.txt"
      - open and type "abc"
      - save the file
      - using the same text file
      - change to "def"
      - Then now "Save as..."
      - type the file name as "html.html"

      But I am still curious why before this it prints out the weird characters.

      Comment

      • eagerlearner
        New Member
        • Jul 2007
        • 29

        #4
        lol, actually the real problem is

        1) Open the file with notepad .. save as .. choose ANSI encoding

        run the code.


        2) Open the file with notepad .. save as .. choose UTF-8 encoding

        run the code.

        it only works in ANSII, but I never realize that I save the file as UTF-8

        Comment

        • JosAH
          Recognized Expert MVP
          • Mar 2007
          • 11453

          #5
          Originally posted by eagerlearner
          Thanks for that hint,
          Now I delete both the .html and .txt and I go to my project file and recreate both file with following steps, then it works fine with my original code. FYI, I am using Windows Vista.
          - Right click "new" > "text document"
          - Name it as "txt.txt"
          - open and type "abc"
          - save the file
          - using the same text file
          - change to "def"
          - Then now "Save as..."
          - type the file name as "html.html"

          But I am still curious why before this it prints out the weird characters.
          Well, we'd never know, would we? You've removed the cause from your hard disk;
          you threw it into the abyss of oblivion; we've lost it for the posterity; the cause is
          no more; it shuffled off its mortal coil; it's gone to meet its maker; it joined the
          choir invisible; it's pushing daisies; it's an ex-cause; it's rotting away where it
          is; the valley of tears doesn't matter to it anymore; it's growing funny wings.

          kind regards,

          Jos ;-)

          Comment

          • eagerlearner
            New Member
            • Jul 2007
            • 29

            #6
            Thanks, you are good poet, but I don't really understand. lol found more information in Wikipedia, in case if anyone like to know what is those three weird characters that I just knew as well.

            Windows
            Although not part of the standard, many Windows programs (including Windows Notepad) use the byte sequence EF BB BF at the beginning of a file to indicate that the file is encoded using UTF-8. This is the Byte Order Mark U+FEFF encoded in UTF-8, which appears as the ISO-8859-1 characters "" in most text editors and web browsers not prepared to handle UTF-8.

            Comment

            Working...