i used read() and readlines() but format is not the same ??

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • moroccanplaya
    New Member
    • Jan 2011
    • 80

    i used read() and readlines() but format is not the same ??

    is there a way to keep the format of a html source code for example when you view a source code of a website, is there anyway to keep the same structure, i have tried using read() readlines() i tried saving the source code then opening it up in my program and still failed to keep the same format, is there anyway around this problem ??
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    HTML was designed for data display with a focus on how it looks. It won't display the same in a GUI widget as it does in a browser. A browser, such as Google Chrome, uses the HTML tags to interpret page content. Is that what you are referring to? Maybe you could parse the source code with BeautifulSoup or lxml.html and display the content only.

    Comment

    • moroccanplaya
      New Member
      • Jan 2011
      • 80

      #3
      for example this is the first line of google.com source code and it is shown in one line

      <!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta

      i want this to be displayed all in one line unlike two lines as sown in the above example

      hope that makes sense

      Comment

      • Smygis
        New Member
        • Jun 2007
        • 126

        #4
        As far as I can tell you don't actually have an issue.

        And if you have an issue I can not reproduce it. Since I believe it does not have anything to do with Python. But rather think its an issue with width limitation for characters in whatever you use to view the output. Because in Python it's correctly stored as one line when I use readlines.

        Comment

        • moroccanplaya
          New Member
          • Jan 2011
          • 80

          #5
          are you outputting the source code in the tkinter text widget ??

          Comment

          • moroccanplaya
            New Member
            • Jan 2011
            • 80

            #6
            plus readline outputs the whole source code in one line which i do not want
            Last edited by moroccanplaya; Feb 17 '12, 03:42 PM. Reason: spelling

            Comment

            • dwblas
              Recognized Expert Contributor
              • May 2008
              • 626

              #7
              You will have to post some code for a better answer. Incomplete questions equal incomplete answers. This works for me
              Code:
              import urllib2
              fp= urllib2.urlopen('http://www.python.org/')
              
              for rec in fp:
                  print rec
                  print "-"*60

              Comment

              • moroccanplaya
                New Member
                • Jan 2011
                • 80

                #8
                to bvdet i have downloaded the lxml parser how would you parse the url?

                to dwlabs i have trouble explaining this here is my full code if you run it and type in the any url it will get the source code and display it in a text box when you compare the sourcecode from the website and the source code been copied into the text box the format is different the only exception is python.org,

                Code:
                #!/usr/bin/env python
                from tkinter import *
                from tkinter import ttk
                import tkinter.messagebox
                import os
                import urllib.request
                 
                sourcecode = ""
                 
                def geturl(*args): #accept an argument for return
                    path = urlname.get()
                    if path == "":
                           tkinter.messagebox.showinfo("error", "please enter a url")
                    if "http://" not in path:
                        http ="http://"
                        path = http + path
                    with urllib.request.urlopen(path) as url:
                        sourcecode = url.read()
                        global storecode
                        storecode = sourcecode
                        string2 = "source code copied from : " + path
                        tkinter.messagebox.showinfo("copied", string2)
                        Text.delete(1.0, END)#delete currently in text box
                        Text.insert(tkinter.END,storecode)
                        return
                 
                            #find white spaces in source code
                def count_white_space():
                    path = urlname.get()
                    if path == "":
                           tkinter.messagebox.showinfo("error", "please enter a url")
                 
                    if "http://" not in path:
                        http ="http://"
                        path = http + path
                        with urllib.request.urlopen(path) as url:
                            sourcecode = url.readlines()
                            global storecode
                            storecode = sourcecode
                            whitespace = 0
                            for item in str(sourcecode):
                                if item == ' ':
                                    whitespace +=1
                            string1 = "There are " + str(whitespace) + " white spaces in: " + path
                            tkinter.messagebox.showinfo("whitespace", string1)
                 
                 
                app = Tk()
                app.title(" text editor")
                 
                content = ttk.Frame(app, padding=(3,3,12,12))
                content.grid(column=0, row=0,sticky=(N, S, E, W))
                 
                #creat label
                labeltext = StringVar()
                labeltext.set("enter url:")
                label1 = ttk.Label(content, textvariable=labeltext).grid(column=0, row=1, columnspan=1, rowspan=1, sticky=(N, W), padx=5)
                 
                #create text box
                urlname = StringVar()# text being enterd in tht text box is stored in urlname
                 
                urlname_entry = ttk.Entry(content, textvariable=urlname, width=67)
                urlname_entry.grid(column=1, row=1, columnspan=3,rowspan=5, sticky=(N,W) )
                #focus in the text box so user dont have to click on
                urlname_entry.focus()
                #create button
                 
                button1 = ttk.Button(content,text="get source", command=geturl)
                button2 = ttk.Button(content,text="count white spaces", command=count_white_space)
                button1.grid(column=3,row=0, columnspan=1, rowspan=2, sticky=(N,W))
                button2.grid(column=4,row=0,columnspan=2, rowspan=2, sticky=(N,W))
                 
                 
                scroll = tkinter.Scrollbar(content,borderwidth=2)
                Text = tkinter.Text(content,wrap=CHAR, width=50, height=20)
                scrollh = tkinter.Scrollbar(content,borderwidth=2, orient=HORIZONTAL)
                 
                 
                scrollh.config(command=Text.xview)
                Text.config(xscrollcommand=scrollh.set)
                 
                scroll.config(command=Text.yview)
                Text.config(yscrollcommand=scroll.set, wrap=tkinter.NONE,)
                 
                Text.grid(row=2, column=1,columnspan=1, rowspan=3, sticky=(N))
                scroll.grid(row=2,column=2, sticky='ns', rowspan=3)
                scrollh.grid(row=6, rowspan=1, column=1, sticky='ew')
                 
                app.columnconfigure(0, weight=1)
                app.rowconfigure(0, weight=1)
                content.columnconfigure(0, weight=3)
                content.columnconfigure(1, weight=3)
                content.columnconfigure(2, weight=3)
                content.columnconfigure(3, weight=1)
                content.columnconfigure(4, weight=1)
                content.rowconfigure(1, weight=1)
                 
                 
                 
                 
                #text = Text(app, width=80,height=40, wrap='none').grid(row=2, column=2)
                 
                 
                 
                 
                 
                 
                #adds spacing between widgets
                for child in app.winfo_children(): child.grid_configure(padx=5, pady=5)
                for child in content.winfo_children(): child.grid_configure(padx=5, pady=5)
                 
                app.bind('<Return>',geturl) #enter can also be hit
                 
                 
                 
                app.mainloop()
                by the way im using python 3.2

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  Parsing the XML won't help you obtain the output you want. I tried it and the first line was 863 characters long. I am using Python 2.7.2.

                  Comment

                  • moroccanplaya
                    New Member
                    • Jan 2011
                    • 80

                    #10
                    so guessing there is no way around this problem

                    Comment

                    Working...