Converting Word files to HTML in Word Cleaner

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Al Moritz

    Converting Word files to HTML in Word Cleaner

    Hi all,

    I was always told that the conversion of Word files to HTML as done by
    Word itself sucks - you get a lot of unnecessary code that can
    influence the design on web browsers other than Internet Explorer. Our
    computer expert in my company had told me already a while ago that I
    should learn HTML and encode myself. I was never inclined to do so (I
    am no computer expert), and when upon his suggestion I looked how my
    pages (converted to HTML in Word) appeared in Netscape, they looked
    just fine.

    Lately however, some pages of my website that looked correct in
    Explorer got a screwed-up look in Netscape. Furthermore, when I
    recently converted Word documents on my new Mac, uploaded them to the
    web and looked at them on a PC, I was absolutely horrified. All kinds
    of strange characters appeared, and I took the pages off as fast as I
    had put them on.

    This did it for me: I had to get some serious HTML code design going.
    Still not inclined to learn HTML however (something you can criticize
    me for, but not point of this topic), I did some search on the web,
    and found the new program Word Cleaner:

    Batch convert documents in seconds offline on Windows into different formats like Word DOCX, PDF, HTML, TXT, ODT, RTF formats. Cleanup HTML with templates.


    They claim that it's so good blah blah and that it cleans up Word
    files professionally blah blah, but instead of having to believe them
    before you buy they offer a free 15 days trial version. I downloaded
    it. I discovered that the program does convert Word/HTML files made on
    a PC, but not those made on a Mac - what it does though is converting
    Word.rtf files from both PC and Mac. And that conversion of rtf
    documents is what I used (it also converts txt. files) - on my laptop
    it takes 2 seconds for an 80 kb document to convert.

    I was amazed. My HTML file sizes shrunk in half, and there was so
    much less code! Moreover, the webpages created in Word Cleaner looked
    identical to those created in Word on Explorer, and the few files
    converted in Word that looked screwed up on Netscape now looked fine,
    converted in Word Cleaner.

    I showed this to our computer expert in my company, and he said this
    really looks good - it actually looks like HTML design from a
    professional web designer, he said. Hmmm, you can judge for yourself.
    Go to my website:



    and look at the HMTL source of any page except my main page.

    (That one looks correct in both Explorer and Netscape but has a few
    font problems in Safari - so I guess there is still some crappy code
    hidden somewhere. That file was converted to Word.rtf from a Word.html
    file, and from there converted to .html in Word Cleaner. All the other
    files were never .html files before, only Word.rtf or Word.doc (and
    from there rtf) files, before being converted to .html in Word
    Cleaner).

    See for yourself ("view - source" of the files), to judge what you
    think of the HTML code as generated by Word Cleaner. For comparison
    purposes, I also have uploaded the file "donnerstag 2" which you can
    view when you go to the link "Donnerstag aus Licht" and then insert a
    "2" between "donnerstag " and ".htm" in the URL. "donnerstag 2" is
    identical to "donnerstag " but was converted to HTML in Word - look at
    the gigantic file size (file - properties) and all the unnecessary,
    crappy codes!
  • West

    #2
    Re: Converting Word files to HTML in Word Cleaner

    "Al Moritz" wrote in message >
    [...]


    Maybe Al's post was Spam, maybe not?!

    $99 --- waaaay too expensive!

    Its a very simple and quick task to convert MSWord files to HTML without
    MSWord bloated code. If you use a wysiwyg html editor here's one method --

    1. Copy and Paste the content from a word document into your Outlook Express
    (or other email client)
    2. Format as plain text, then Copy and Paste your plain text content into
    your wysiwyg FPage, Namo or 'whatever' editor.

    Maybe there are other tried and trusted simple methods to rip that word
    bloat, without having to spend ?!

    :-)

    --
    W



    Comment

    • Blinky the Shark

      #3
      Re: Converting Word files to HTML in Word Cleaner

      Peacenik wrote:
      [color=blue]
      > "Al Moritz" <amoritz@cellsi gnal.com> wrote in message
      > news:bf0c591d.0 307190641.3d9da 1d5@posting.goo gle.com...[/color]
      [color=blue][color=green]
      >> This did it for me: I had to get some serious HTML code design going.
      >> Still not inclined to learn HTML however (something you can criticize
      >> me for, but not point of this topic), I did some search on the web,
      >> and found the new program Word Cleaner:[/color][/color]
      [color=blue]
      > ...and at this point, the red flag goes up, saying, "SPAM! SPAM! SPAM!"[/color]

      Not only crossposted, but multi-crossposted: there's at least one
      other copy crossposted to a bunch of MS groups.

      --
      Blinky Linux RU 297263
      Spam: The Boulder Pledge http://snurl.com/bpledge
      Digest: Best of Internet Oracularities http://snurl.com/dig_oracle

      Comment

      • Al Moritz

        #4
        Re: Converting Word files to HTML in Word Cleaner

        "Peacenik" <criskity1@insi ghtBBB.ReplaceB BBwithBBandPutD otComAfterItcom > wrote in message news:<sqiSa.871 74$wk6.23122@rw crnsc52.ops.asp .att.net>...
        [color=blue][color=green]
        > > This did it for me: I had to get some serious HTML code design going.
        > > Still not inclined to learn HTML however (something you can criticize
        > > me for, but not point of this topic), I did some search on the web,
        > > and found the new program Word Cleaner:[/color]
        >
        > ...and at this point, the red flag goes up, saying, "SPAM! SPAM! SPAM!"[/color]

        And West says:
        Maybe Al's post was Spam, maybe not?!

        Haha, that's what you get when you're enthusiastic about something:-)
        Oh well, enthusiasm has no place anymore in this cynical world I guess
        <g>
        I thought my:

        They claim that it's so good blah blah and that it cleans up Word
        files professionally blah blah,...

        would be a clear signature that this is was no spam. Or have you ever
        seen self-deprecating spam? Me, never. Only TV commercials are
        sometimes self-deprecating, and then only in some rare cases and when
        the product is already super-established.

        Anyway, I haven't spent any money on the program yet (I still have a
        few days left on my trial version), but I will. It's just too
        convenient.

        Oh well, I waste my money, you waste your time!

        No, of course you don't, if you're proficient in HTML (I'm not). But
        even if you're proficient, I could imagine that the program might save
        you some time – converting in 2 seconds and then some amendments by
        hand, if necessary. That might still be faster than doing it by hand
        from scratch for every page – even with a fixed template at hand.
        Maybe I'm wrong, maybe not.

        I would appreciate in any case, if you could give me feedback on the
        HTML code (again, not my main page, but any other page on my site).
        Does it look good to you?

        Comment

        • Andy Mabbett

          #5
          Re: Converting Word files to HTML in Word Cleaner

          In message <xPrSa.461870$3 C2.12638484@new s3.calgary.shaw .ca>, Andrew
          Fedoniouk <andrew@terra-informatica.org > writes[color=blue]
          >Andrew. Author of the BlockNote.[/color]

          I can't see anything on your pages, that says BlockNote produces valid
          HTML.

          I did see this, though:

          <http://blocknote.net/features.html>

          Tables are essential in shaping and defining the layout of HTML
          documents.

          and your own pages are not only invalid, but mix CSS and non-CSS
          presentational markup.

          The same applies to your parent home page:

          <http://terra-informatica.org >

          which is clearly produced by BlockNote, and includes these gems:

          <TD nowrap bgcolor=#ffccff valign=middle align=center><F ONT
          size=3> &nbsp;</FONT><A href="c-smile/index.htm"><FON T size=4
          color=#a0522d>C-SMILE</A></FONT></U></TD>


          TD nowrap bgcolor=#ffcc66 valign=middle align=center><F ONT
          size=3> </FONT>micro<FONT size=3> </FONT><A
          href="utils/index.htm"><FON T size=4
          color=#a0522d>S MILES</A></FONT></U></TD>

          <DIV align=center>&n bsp;</DIV>


          (FU set)
          --
          Andy Mabbett
          USA imprisons children without trial, at Guantanamo Bay:
          <http://news.bbc.co.uk/1/hi/world/south_asia/2970279.stm>
          <http://web.amnesty.org/library/Index/ENGAMR510582003 ?open&of=ENG-USA>

          Comment

          • Nico Schuyt

            #6
            Re: Converting Word files to HTML in Word Cleaner

            Andy Mabbett wrote:[color=blue]
            > Andrew Fedoniouk wrote[/color]
            [color=blue][color=green]
            >> Andrew. Author of the BlockNote.[/color][/color]
            [color=blue]
            > I can't see anything on your pages, that says BlockNote produces valid
            > HTML.[/color]

            Compare an editor with a photo camera. You can make ugly pictures with a
            Nikon of $5000 (or an award winning one with a camera of $20) :-)
            I even create valid HTML/CSS with FrontPage

            Nico


            Comment

            • Nico Schuyt

              #7
              Re: Converting Word files to HTML in Word Cleaner

              Andrew Fedoniouk wrote:[color=blue]
              > http://blocknote.net
              > Andrew. Author of the BlockNote.
              > http://terra-informatica.org[/color]

              Nice editor!
              Don't have time to do a complete test, so a few questions:
              - Can I include a doc type?
              - Is it possible to apply CSS tags from the linked stylesheet?
              - Am I right that the built in validator is limited? (no warning for missing
              alt tag for example)
              Regards,
              Nico





              Comment

              • Andy Mabbett

                #8
                Re: Converting Word files to HTML in Word Cleaner

                In message <3f1a6b72$0$289 05$1b62eedf@new s.euronet.nl>, Nico Schuyt
                <nschuyt@hotmai l.com> writes[color=blue][color=green][color=darkred]
                >>> Andrew. Author of the BlockNote.[/color][/color]
                >[color=green]
                >> I can't see anything on your pages, that says BlockNote produces valid
                >> HTML.[/color]
                >
                >Compare an editor with a photo camera. You can make ugly pictures with
                >a Nikon of $5000 (or an award winning one with a camera of $20) :-) I
                >even create valid HTML/CSS with FrontPage[/color]

                Why buy a dog, then bark yourself?
                --
                Andy Mabbett
                USA imprisons children without trial, at Guantanamo Bay:
                <http://news.bbc.co.uk/1/hi/world/south_asia/2970279.stm>
                <http://web.amnesty.org/library/Index/ENGAMR510582003 ?open&of=ENG-USA>

                Comment

                • Andrew Fedoniouk

                  #9
                  Re: Converting Word files to HTML in Word Cleaner

                  Thank Andy for your response!

                  IMHO!

                  Andy Mabbett > I can't see anything on your pages, that says BlockNote
                  produces valid HTML.

                  Aha! Valid HTML?! Each browser has its own understanding of validity....
                  More over, valid HTML is sort of fuzzy set for one given browser. The same
                  formatting element could work in one place and couldn't in another.
                  Superposition of such fuzzy sets gives us the Valid HTML Cloud.

                  The art of web design is to walk inside the cloud and stop if densitiy of
                  the cloud substance become low. (C) My :)

                  In BlockNote I've tried to outline 100% valid HTML border and use it.

                  And more about "validity":

                  There are three statements taken from standards (not exactly - just an
                  idea):

                  1. Client browser MUST understand IMG tags without ALT attribute.
                  2. All images SHOULD have ALT attribute.
                  3. All of us MUST as much as they can to reduce pollution (this implies
                  reduce bandwidth as much as you can).

                  Conflicting statements, huh?
                  [color=blue]
                  >
                  > I did see this, though:
                  >
                  > <http://blocknote.net/features.html>
                  >
                  > Tables are essential in shaping and defining the layout of HTML
                  > documents.
                  >
                  > and your own pages are not only invalid, but mix CSS and non-CSS
                  > presentational markup.
                  >
                  > The same applies to your parent home page:
                  >
                  > <http://terra-informatica.org >
                  >[/color]

                  Beg my pardon! I am not a Web Designer. I just wanted things done.
                  It is not democratic at all to create such damned complex sandwich from
                  SGML/HTML/XML, CSS and JavaScript and ask everybody to follow the rules.
                  I just don't have enough time to follow on these "should have"s.

                  It is an Internet - common place and I MUST have an opportunity to expose
                  myself by myself.
                  And I am not a full dummy there. Trust me. (At least I have master degree in
                  rocket science. Literally :)

                  Instead you will ask me to pay you my money and we will build for you nice,
                  clean (inside) HTML?
                  Otherwise you'll be "not valid"? This way? Thanks. Next time and on the
                  different globe.

                  I am trying to build a tool not for Professional Web Designers (my respect
                  them, honestly) but for the rest of us.
                  Democratic and simple. Yep, it costs something. The price of democracy?

                  DIXI.

                  And one more thought. HTML standard must be redesigned completely from
                  scratch. Now it is a cesspit with remains of SGML, CSS scales, OBJECTs,
                  EMBEDEDs, frames... You can select any part of it and find that it is not
                  complete or even conflicts with other parts. And XHTML seems like brand new
                  cerement for it.

                  IMHO!
                  [color=blue]
                  > which is clearly produced by BlockNote, and includes these gems:
                  >
                  > <TD nowrap bgcolor=#ffccff valign=middle align=center><F ONT
                  > size=3> &nbsp;</FONT><A href="c-smile/index.htm"><FON T size=4
                  > color=#a0522d>C-SMILE</A></FONT></U></TD>
                  > TD nowrap bgcolor=#ffcc66 valign=middle align=center><F ONT
                  > size=3> </FONT>micro<FONT size=3> </FONT><A
                  > href="utils/index.htm"><FON T size=4
                  > color=#a0522d>S MILES</A></FONT></U></TD>
                  > <DIV align=center>&n bsp;</DIV>[/color]

                  Yep! These are good ones. I appreciate you a lot! Will be fixed ASAP!
                  I know, BlockNote is good but not perfect :)

                  Andrew Fedoniouk.




                  Comment

                  • Nick Kew

                    #10
                    Re: Converting Word files to HTML in Word Cleaner

                    In article <%7ESa.498017$V i5.12927643@new s1.calgary.shaw .ca>, one of infinite monkeys
                    at the keyboard of "Andrew Fedoniouk" <andrew@terra-informatica.org > wrote:[color=blue]
                    > (a whole bunch of clueless drivel)[/color]

                    Well, thanks for the insight into why one alleged authoring tool has no idea
                    about HTML. Not that it comes as any surprise.

                    --
                    Nick Kew

                    In urgent need of paying work - see http://www.webthing.com/~nick/cv.html

                    Comment

                    • Nico Schuyt

                      #11
                      Re: Converting Word files to HTML in Word Cleaner

                      Barbara de Zoete wrote:[color=blue][color=green]
                      >> Nico Schuyt wrote:[/color][/color]
                      [color=blue][color=green]
                      >> I even create valid HTML/CSS with FrontPage[/color][/color]
                      [color=blue]
                      > Can you explain how you do that? :-)
                      > Since FrontPage AFAIK doesn't add a <!DOCTYPE declaration at all for
                      > example? And sometimes removes the doctype, even if you've put it
                      > there yourself? And uses tons of non-CSS to 'make up' rather than
                      > mark up a page, which gets all mixed up with your CSS?[/color]

                      Use it like you should:
                      - Create a valid template and a good stylesheet.
                      - Copy the template for every new page.
                      - Don't use HTML markup where you can use the markup out of the stylesheet
                      FP has a reasonable integration with the stylesheet. Mark a piece of text
                      (in wysiwyg) and apply (or remove) the required style.
                      It doesn't change HTML if settings are correct.
                      [color=blue]
                      > Or do you mean you create valid HTML/CSS with FrontPage, using the
                      > 'html'-view to clean up what FP just created for you?[/color]

                      Partially, yes. Some of the editing I do in html-view, some in wysiwyg.
                      WYSIWYG
                      - Make designs. Just for the look, not for the coding.
                      - Text editing and applying styles
                      - Importing text
                      - Changing text into links
                      - Creating blank tables and add or remove rows.
                      - Insert a picture, see if it fits, resize it if necessary and change the
                      size in a graphic editor and refresh in FP to get the proper width and
                      height.

                      HTML-view:
                      - Add or change doctype
                      - Create inline or internal style (Though I avoid those styles and those
                      styles can be created with FP too)

                      GENERAL
                      - Handy tools for testing on broken links, file management etc.
                      - The wysiwyg-view corresponds to the result in IE (and in most cases to
                      Mozilla)

                      If so, IMO it's[color=blue]
                      > not realy FP that creates the HTML/CSS. How is that any different
                      > from using, lets say, NotePad?[/color]

                      See above.
                      So, perfect editor?
                      No:
                      - Even when used carefully it adds pieces of code I don't want.
                      - No good editor for external stylesheets (But I prefer to do it by hand
                      anyway instead of TopStyle for example)
                      - Very limited validator. (But AFAIK there is no good one for local use)
                      - Not able to parse PHP locally in preview
                      Didn't find a better editor so far however.
                      [color=blue]
                      > Or do you mean "it validates, so it's good"? Never mind that it's
                      > about 30k too big.[/color]

                      Results are reasonable:


                      Kwaliteitsinformatie nodig als het gaat over leasen? Lees dan de teksten op onze site. Dan ben je er zeker van dat het goed zit.

                      etc

                      Regards,
                      Nico


                      Comment

                      • Ben M

                        #12
                        Re: Converting Word files to HTML in Word Cleaner

                        >>> I MUST have an opportunity to expose myself by myself.[color=blue][color=green]
                        >>
                        >> Possibly, but not in front of the children.[/color]
                        >
                        > :))
                        >
                        > Thanks!
                        > Beg my pardon for my English. It is not my favorite one :))
                        >[color=green][color=darkred]
                        >>> And I am not a full dummy there. Trust me. (At least I have master
                        >>> degree[/color][/color]
                        > in
                        > rocket science. Literally :)
                        >[color=green]
                        >> So. It's official. HTML *isn't* rocket science.[/color]
                        >
                        > :)) "HTML *isn't* a science" better.[/color]

                        HTML is a formally specified markup language with clearly defined semantics.
                        I would say that HTML is the result of scientific thought.

                        The roots of formally specified languages can be traced back to scientific
                        underpinnings. The fact that your earlier posts imply a lack of knowledge
                        concerning what validity in a HTML document actually means, this shows in a
                        lack of understanding of the underlying formal nature of HTML and XHTML.

                        The fact that scientific thought has gone into the creation of HTML is
                        undoubted (thanks TBL at CERN). In fact the formal nature of HTML and its
                        SGML underpinnings means that it is possible to state with certainty that a
                        document that claims to be a HTML document is in fact part of the set of
                        valid HTML documents (there may be some caveats to this with very early
                        versions of HTML).

                        There are of course differences between the various HTML versions, for
                        example a valid HTML 3.2 document may not be a valid HTML 4.01 document
                        (assuming we change the DTD) due to a missing alt attribute, like you
                        mentioned earlier in the thread. However the formal basis of HTML means that
                        such differences between specifications can be easily understood.

                        Such inconsistencies between specifications are formally noted and your
                        earlier rant about HTML being inconsistent is fairly pointless. HTML 2 is
                        not HTML 3 is not HTML 4. Inconsistencies internal to a specification are of
                        course problems (which is why there are errata documents), however you did
                        not point any of them out, just inconsistencies between different
                        specifications.

                        follow ups set to comp.infosystem s.www.authoring.html
                        --
                        BenM



                        Comment

                        • Barbara de Zoete

                          #13
                          Re: Converting Word files to HTML in Word Cleaner

                          Nico Schuyt wrote:[color=blue]
                          > Barbara de Zoete wrote:[color=green][color=darkred]
                          >>> Nico Schuyt wrote:[/color][/color]
                          >[color=green][color=darkred]
                          >>> I even create valid HTML/CSS with FrontPage[/color][/color]
                          >[color=green]
                          >> Can you explain how you do that? :-)
                          >> Since FrontPage AFAIK doesn't add a <!DOCTYPE declaration at all for
                          >> example? And sometimes removes the doctype, even if you've put it
                          >> there yourself? And uses tons of non-CSS to 'make up' rather than
                          >> mark up a page, which gets all mixed up with your CSS?[/color]
                          >
                          > Use it like you should:
                          > - Create a valid template and a good stylesheet.[/color]
                          ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
                          And where or how do you do that? Within FP or with some other program?

                          <snip>
                          [color=blue]
                          > GENERAL
                          > - Handy tools for testing on broken links, file management etc.[/color]

                          This I agree with. This is about the only part of FP I *do* use. But as you
                          put it yourself, this makes FP a handy tool for me. Nothing more. Not the
                          WYSIWYG web-edit and -design program it is said to be.
                          [color=blue]
                          > - The wysiwyg-view corresponds to the result in IE (and in most cases
                          > to Mozilla)[/color]

                          I'm not very advanced in html and or CSS, but so far I find that writing my
                          source by hand is by far the fastest way of creating pages that work in
                          various browsers. Espescially the lack of the possibility in FP to work on
                          your stylesheet and the html-source at the same time (and on top of that
                          have two or more browsers running to show immitiate results), makes me move
                          away from FP further and further.

                          <snip>

                          But for a fast 'setup' of a site, quick 'preview' of a general lay-out or
                          design, it does work. So occationally I too fall back on FP.


                          --

                          Barbara

                          http://home.wanadoo.nl/b.de.zoete/index.html - NL


                          Comment

                          • picayunish

                            #14
                            Re: Converting Word files to HTML in Word Cleaner

                            When Nico Schuyt was making a web page, a :-? appears and wrote:[color=blue]
                            >
                            > But, again, FP is not perfect but I didn't find a better alternative.[/color]

                            What about DW as an alternative.
                            --
                            Edwin van der Vaart (Geen familie van....)
                            http://www.semi-conductors.nl/ PHP Redirect to semi-conductor.nl
                            http://www.semi-conductor.nl/ Links to Semiconductors sites
                            http://members.chello.nl/e.vandervaart/ Experimental site
                            http://host.deluxnetwork.com/~evdvaart/ Personal site


                            Comment

                            • Nico Schuyt

                              #15
                              Re: Converting Word files to HTML in Word Cleaner

                              picayunish wrote:[color=blue]
                              > Nico Schuyt wrote:[/color]
                              [color=blue][color=green]
                              >> But, again, FP is not perfect but I didn't find a better alternative.[/color][/color]
                              [color=blue]
                              > What about DW as an alternative.[/color]

                              Price too high :-)
                              Already some progress in the Jeannie-site?
                              Nico


                              Comment

                              Working...