How to fix PHP/HTML webpages that display Word resumes with funky characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • comp.lang.php

    How to fix PHP/HTML webpages that display Word resumes with funky characters

    I have a textarea where people can cut & paste their resume.
    Unfortunately they often cut & paste their Word resume into the
    textarea, funky characters and all.

    This causes the display to be mangled from the HTML end when people
    view pages with these resumes stored as a MySQL text field entry.

    How do I fix this, also, how do I fix the displays of those already
    entered this way?

    Thanx
    Phil

  • fiziwig

    #2
    Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

    By "funky characters" I assume you mean the letter 'e' with the cute
    little French decoration on top. Try this:

    $new_string=str _replace('é','e ',$old_string);

    To replace the decorated e's with plain vanilla e's.

    (The first e is the decorated on, in case it doesn't show up in your
    newsreader. For windows users that's entered by holding down the alt
    ket and typing 0233 on the num pad.)

    You might also need to do it for the upper case 'E' inc ase people
    write "RESUMÉ" in caps.

    --gary

    Comment

    • fiziwig

      #3
      Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

      Yikes! That's scary! Fraid I have no suggestions for that situation,
      unless you can identify the cases where it happens and actualy do a
      string replace on that long cryptic string. If it's invariant, that is,
      which it probably isn't. :-( --gary

      Comment

      • Roger Dodger

        #4
        Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

        It is a huge task to decode Word from it's proprietary format to plain
        text or regular old HTML. All online resume's I've submitted in a form
        have required me to paste in either a plain text or rich text format.
        That would be the easiest approach. User just has to Save as... from
        the Word File menu.


        comp.lang.php wrote:[color=blue]
        > fiziwig wrote:[color=green]
        > > By "funky characters" I assume you mean the letter 'e' with the cute
        > > little French decoration on top. Try this:
        > >
        > > $new_string=str _replace('é','e ',$old_string);
        > >
        > > To replace the decorated e's with plain vanilla e's.
        > >
        > > (The first e is the decorated on, in case it doesn't show up in your
        > > newsreader. For windows users that's entered by holding down the alt
        > > ket and typing 0233 on the num pad.)
        > >
        > > You might also need to do it for the upper case 'E' inc ase people
        > > write "RESUMÉ" in caps.
        > >
        > > --gary[/color]
        >
        > No that's not the case. What is happening is when someone copies and
        > pastes a Word document "as-is", like with its own proprietary spacing,
        > fonts, etc., this is what you'll see on an HTML page:
        >
        > University of North Carolina
        > ����¯ �¿�½��ï ¿½���¢� ����à ?¯���à ?��¿��ï ¿½���½� ����à ?¯���à ?��¿��ï ¿½���½
        > Charlotte
        >
        > Phil[/color]

        Comment

        • comp.lang.php

          #5
          Re: How to fix PHP/HTML webpages that display Word resumes with funky characters


          Roger Dodger wrote:[color=blue]
          > It is a huge task to decode Word from it's proprietary format to plain
          > text or regular old HTML. All online resume's I've submitted in a form
          > have required me to paste in either a plain text or rich text format.
          > That would be the easiest approach. User just has to Save as... from
          > the Word File menu.
          >
          >[/color]

          Right. I know looking forward that I can give the person the option to
          upload their Word doc or PDF (a security issue in and of itself -
          yikes!) alongside cutting & pasting, but there are two unanswered
          questions:

          1) What about those already in the database as text field values? What
          do I do about those?

          2) What is to stop the "challenged among us" from cutting & pasting a
          Word doc even though they have the option to upload, aside from telling
          them to do so?

          Phil
          [color=blue]
          > comp.lang.php wrote:[color=green]
          > > fiziwig wrote:[color=darkred]
          > > > By "funky characters" I assume you mean the letter 'e' with the cute
          > > > little French decoration on top. Try this:
          > > >
          > > > $new_string=str _replace('é','e ',$old_string);
          > > >
          > > > To replace the decorated e's with plain vanilla e's.
          > > >
          > > > (The first e is the decorated on, in case it doesn't show up in your
          > > > newsreader. For windows users that's entered by holding down the alt
          > > > ket and typing 0233 on the num pad.)
          > > >
          > > > You might also need to do it for the upper case 'E' inc ase people
          > > > write "RESUMÉ" in caps.
          > > >
          > > > --gary[/color]
          > >
          > > No that's not the case. What is happening is when someone copies and
          > > pastes a Word document "as-is", like with its own proprietary spacing,
          > > fonts, etc., this is what you'll see on an HTML page:
          > >
          > > University of North Carolina
          > > ����¯ �¿�½��ï ¿½���¢� ����à ?¯���à ?��¿��ï ¿½���½� ����à ?¯���à ?��¿��ï ¿½���½
          > > Charlotte
          > >
          > > Phil[/color][/color]

          Comment

          • fiziwig

            #6
            Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

            To prevent people from cutting and pasting Word docs, why not just scan
            the contents of the textarea for some teltale characters or sequences
            that would indicate a Word doc, and then treat that as an error and not
            put it into the DB.

            --gary

            Comment

            • comp.lang.php

              #7
              Re: How to fix PHP/HTML webpages that display Word resumes with funky characters


              fiziwig wrote:[color=blue]
              > To prevent people from cutting and pasting Word docs, why not just scan
              > the contents of the textarea for some teltale characters or sequences
              > that would indicate a Word doc, and then treat that as an error and not
              > put it into the DB.
              >
              > --gary[/color]

              If that were a single pattern, then I would do so, but I don't know
              what that pattern is, moreover, then what prevents them from copying
              and pasting literally anything else they can think of?

              Phil

              Comment

              • fiziwig

                #8
                Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

                Perhaps limit them to pasting in text that contains only some set of
                printable characters. a..z, A..Z, 0..9, and the standard punctuation
                and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
                is not in the set then reject the text. Presumably those Word docs have
                some binary info in them, or at least it looks like they contained a
                lot of characters not in the standard set. Of course you'd have to take
                other languages into account if you planned to allowing posting in
                other than English.

                --gary

                Comment

                • comp.lang.php

                  #9
                  Re: How to fix PHP/HTML webpages that display Word resumes with funky characters


                  fiziwig wrote:[color=blue]
                  > Perhaps limit them to pasting in text that contains only some set of
                  > printable characters. a..z, A..Z, 0..9, and the standard punctuation
                  > and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
                  > is not in the set then reject the text. Presumably those Word docs have
                  > some binary info in them, or at least it looks like they contained a
                  > lot of characters not in the standard set. Of course you'd have to take
                  > other languages into account if you planned to allowing posting in
                  > other than English.
                  >
                  > --gary[/color]

                  Exactly, and that will make that kind of check nearly impossible to
                  perform. Besides, you have to remember that their resume already
                  exists as ASCII (just screwed-up ASCII) because it will display FROM
                  the database table text field query. That is, when you get the resume,
                  it's already in ASCII, so there are no non-ASCII characters to find
                  anymore, just a bunch of funky, screwed-up, yet 100% ASCII characters.
                  Those already-submitted resumes are the problem we're dealing with,
                  preventing others from doing so is only half the battle.

                  Phil

                  Comment

                  • Tim Roberts

                    #10
                    Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

                    "comp.lang. php" <phillip.s.powe ll@gmail.com> wrote:
                    [color=blue]
                    >I have a textarea where people can cut & paste their resume.
                    >Unfortunatel y they often cut & paste their Word resume into the
                    >textarea, funky characters and all.
                    >
                    >This causes the display to be mangled from the HTML end when people
                    >view pages with these resumes stored as a MySQL text field entry.
                    >
                    >How do I fix this, also, how do I fix the displays of those already
                    >entered this way?[/color]

                    You might consider using this as an intelligence test when evaluating the
                    resume...
                    --
                    - Tim Roberts, timr@probo.com
                    Providenza & Boekelheide, Inc.

                    Comment

                    • Jim Carlock

                      #11
                      Re: How to fix PHP/HTML webpages that display Word resumes with funky characters

                      "comp.lang. php" <phillip.s.powe ll@gmail.com> wrote:[color=blue]
                      > Exactly, and that will make that kind of check nearly impossible to
                      > perform.[/color]

                      Do all your character stripping and such, then present the information
                      back to them and activate a "Are you sure ?" button.

                      What's to prevent someone from typing in someone elses resume ?
                      Do you collect information such as a phone number or email address?

                      IF you collect an email address, send an automatic response to the
                      enduser which asks them to click upon an activation link.

                      You can store the garbage in an unapproved table and once it's
                      been approved you can move it to the legitimate recordset.

                      Hope this helps.

                      Jim Carlock
                      Post replies to the group.


                      Comment

                      • Jerry Stuckle

                        #12
                        Re: How to fix PHP/HTML webpages that display Word resumes with funkycharacters

                        comp.lang.php wrote:[color=blue]
                        > I have a textarea where people can cut & paste their resume.
                        > Unfortunately they often cut & paste their Word resume into the
                        > textarea, funky characters and all.
                        >
                        > This causes the display to be mangled from the HTML end when people
                        > view pages with these resumes stored as a MySQL text field entry.
                        >
                        > How do I fix this, also, how do I fix the displays of those already
                        > entered this way?
                        >
                        > Thanx
                        > Phil
                        >[/color]

                        Phil,

                        I guess I look at the problem differently.

                        If I request a resume in Word format, I expect it in Word format. If someone
                        else sends in a plain text file, they aren't even considered for employment.
                        And vice versa.

                        I mean - they're trying to find a job. If you ask them for plain text and they
                        can't even follow that simple direction, could they follow more complicated
                        instructions? Would you want to hire them?

                        --
                        =============== ===
                        Remove the "x" from my email address
                        Jerry Stuckle
                        JDS Computer Training Corp.
                        jstucklex@attgl obal.net
                        =============== ===

                        Comment

                        • comp.lang.php

                          #13
                          Re: How to fix PHP/HTML webpages that display Word resumes with funky characters


                          Jerry Stuckle wrote:[color=blue]
                          > comp.lang.php wrote:[color=green]
                          > > I have a textarea where people can cut & paste their resume.
                          > > Unfortunately they often cut & paste their Word resume into the
                          > > textarea, funky characters and all.
                          > >
                          > > This causes the display to be mangled from the HTML end when people
                          > > view pages with these resumes stored as a MySQL text field entry.
                          > >
                          > > How do I fix this, also, how do I fix the displays of those already
                          > > entered this way?
                          > >
                          > > Thanx
                          > > Phil
                          > >[/color]
                          >
                          > Phil,
                          >
                          > I guess I look at the problem differently.
                          >
                          > If I request a resume in Word format, I expect it in Word format. If someone
                          > else sends in a plain text file, they aren't even considered for employment.
                          > And vice versa.
                          >
                          > I mean - they're trying to find a job. If you ask them for plain text and they
                          > can't even follow that simple direction, could they follow more complicated
                          > instructions? Would you want to hire them?
                          >[/color]

                          You do have a point. Unfortunately, it was never a requirement before
                          for them to upload a non-text resume, in fact, it wasn't set up so that
                          they could do so beforehand until recently, so they had no choice but
                          to copy and paste, even if we told them text-only.

                          However, I can see how if the instructions say "copy and paste a
                          text-based resume only" and you can only cut and paste, perhaps you
                          might think that a Word resume isn't text.

                          But future managers in college don't know this. At least those we've
                          encountered.

                          Be afraid.

                          Phil
                          [color=blue]
                          > --
                          > =============== ===
                          > Remove the "x" from my email address
                          > Jerry Stuckle
                          > JDS Computer Training Corp.
                          > jstucklex@attgl obal.net
                          > =============== ===[/color]

                          Comment

                          Working...