Generating a site map

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • theschaef
    New Member
    • Sep 2008
    • 14

    Generating a site map

    Hey there everyone,
    Sorry if this isnt the right place for this post.
    I am building a script to generate a site map for me. For the most part it works fine, but a few errors are popping up.
    Google isn't liking several of my urls. For instance, i have a url that ends in "trews.html ". It looks like this line is fine in the site map, but when i type the url into the browser i get a 404 error as if the page doesn't exists. If i erase the url and type it out manually it works fine however.. and when i see the websites i recently visited, it lists "t%08rews.h tml" as one of them. It seems that perl is interpretting some kind of character that isnt there? I'm not sure. How could this be corrected?
  • eWish
    Recognized Expert Contributor
    • Jul 2007
    • 973

    #2
    Without seeing some code it is we can't tell what your problem is. Also, you might post some of the expected output.

    --Kevin

    Comment

    • theschaef
      New Member
      • Sep 2008
      • 14

      #3
      Well I have one program which is generating all of my html pages as well as a text file containing all of the urls that are being created. It seems as if the problem is stemming from there as my sitemap program is using that text file to create its site map. The problem is that I can't see the problem.. in the text file it looks just fine as "trews.html ", but if i copy it into the browser it cant find that site without me re-writing it. And I can't really post the code as it is very long, just wondering if anyone had seen something like this before and could lead me on the right track.

      Comment

      • theschaef
        New Member
        • Sep 2008
        • 14

        #4
        Copying the url into word it appears as normal as "trews.html ", but when copying from word into the browser it now appears as "t rews.html".

        Comment

        • eWish
          Recognized Expert Contributor
          • Jul 2007
          • 973

          #5
          If you look up URL Encoding then %08 means that it is a backspace character.

          So, I guess that your editor is adding some characters that don't need to be there. If you view and save your file using notepad or another text editor (not word or a word processor) do you still have the same problems?

          --Kevin

          Comment

          • KevinADC
            Recognized Expert Specialist
            • Jan 2007
            • 4092

            #6
            Make sure your text file is plain text. Re-save the file as txt or ASCII instead of doc or other word processing format.

            Comment

            • theschaef
              New Member
              • Sep 2008
              • 14

              #7
              The file is .txt with UTF-8 encoding.

              Comment

              • theschaef
                New Member
                • Sep 2008
                • 14

                #8
                I have also tried opening the file with textwrangler and using the "convert to ASCII" option.. but no luck.

                Comment

                • theschaef
                  New Member
                  • Sep 2008
                  • 14

                  #9
                  so i guess the question now is how could i use perl to remove a backspace?

                  Comment

                  • KevinADC
                    Recognized Expert Specialist
                    • Jan 2007
                    • 4092

                    #10
                    Quite odd to ever see a backspace in a text file but I guess its possible. You can try this. \b inside a charcater class is a backspace:

                    $str =~ s/[\b]//g;

                    Comment

                    • theschaef
                      New Member
                      • Sep 2008
                      • 14

                      #11
                      That did the trick for me. Thanks so much. It was the weirdest thing ever... it looked like it was fine when i copied the url. If you tried to hit backspace to erase it, it took two clicks on the spot where the extra character was to take it out.

                      Comment

                      • KevinADC
                        Recognized Expert Specialist
                        • Jan 2007
                        • 4092

                        #12
                        You're welcome

                        Comment

                        Working...