is there a way to keep the format of a html source code for example when you view a source code of a website, is there anyway to keep the same structure, i have tried using read() readlines() i tried saving the source code then opening it up in my program and still failed to keep the same format, is there anyway around this problem ??
i used read() and readlines() but format is not the same ??
Collapse
X
-
Tags: None
-
HTML was designed for data display with a focus on how it looks. It won't display the same in a GUI widget as it does in a browser. A browser, such as Google Chrome, uses the HTML tags to interpret page content. Is that what you are referring to? Maybe you could parse the source code with BeautifulSoup or lxml.html and display the content only. -
for example this is the first line of google.com source code and it is shown in one line
<!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta
i want this to be displayed all in one line unlike two lines as sown in the above example
hope that makes senseComment
-
As far as I can tell you don't actually have an issue.
And if you have an issue I can not reproduce it. Since I believe it does not have anything to do with Python. But rather think its an issue with width limitation for characters in whatever you use to view the output. Because in Python it's correctly stored as one line when I use readlines.Comment
-
-
plus readline outputs the whole source code in one line which i do not wantComment
-
to bvdet i have downloaded the lxml parser how would you parse the url?
to dwlabs i have trouble explaining this here is my full code if you run it and type in the any url it will get the source code and display it in a text box when you compare the sourcecode from the website and the source code been copied into the text box the format is different the only exception is python.org,
Code:#!/usr/bin/env python from tkinter import * from tkinter import ttk import tkinter.messagebox import os import urllib.request sourcecode = "" def geturl(*args): #accept an argument for return path = urlname.get() if path == "": tkinter.messagebox.showinfo("error", "please enter a url") if "http://" not in path: http ="http://" path = http + path with urllib.request.urlopen(path) as url: sourcecode = url.read() global storecode storecode = sourcecode string2 = "source code copied from : " + path tkinter.messagebox.showinfo("copied", string2) Text.delete(1.0, END)#delete currently in text box Text.insert(tkinter.END,storecode) return #find white spaces in source code def count_white_space(): path = urlname.get() if path == "": tkinter.messagebox.showinfo("error", "please enter a url") if "http://" not in path: http ="http://" path = http + path with urllib.request.urlopen(path) as url: sourcecode = url.readlines() global storecode storecode = sourcecode whitespace = 0 for item in str(sourcecode): if item == ' ': whitespace +=1 string1 = "There are " + str(whitespace) + " white spaces in: " + path tkinter.messagebox.showinfo("whitespace", string1) app = Tk() app.title(" text editor") content = ttk.Frame(app, padding=(3,3,12,12)) content.grid(column=0, row=0,sticky=(N, S, E, W)) #creat label labeltext = StringVar() labeltext.set("enter url:") label1 = ttk.Label(content, textvariable=labeltext).grid(column=0, row=1, columnspan=1, rowspan=1, sticky=(N, W), padx=5) #create text box urlname = StringVar()# text being enterd in tht text box is stored in urlname urlname_entry = ttk.Entry(content, textvariable=urlname, width=67) urlname_entry.grid(column=1, row=1, columnspan=3,rowspan=5, sticky=(N,W) ) #focus in the text box so user dont have to click on urlname_entry.focus() #create button button1 = ttk.Button(content,text="get source", command=geturl) button2 = ttk.Button(content,text="count white spaces", command=count_white_space) button1.grid(column=3,row=0, columnspan=1, rowspan=2, sticky=(N,W)) button2.grid(column=4,row=0,columnspan=2, rowspan=2, sticky=(N,W)) scroll = tkinter.Scrollbar(content,borderwidth=2) Text = tkinter.Text(content,wrap=CHAR, width=50, height=20) scrollh = tkinter.Scrollbar(content,borderwidth=2, orient=HORIZONTAL) scrollh.config(command=Text.xview) Text.config(xscrollcommand=scrollh.set) scroll.config(command=Text.yview) Text.config(yscrollcommand=scroll.set, wrap=tkinter.NONE,) Text.grid(row=2, column=1,columnspan=1, rowspan=3, sticky=(N)) scroll.grid(row=2,column=2, sticky='ns', rowspan=3) scrollh.grid(row=6, rowspan=1, column=1, sticky='ew') app.columnconfigure(0, weight=1) app.rowconfigure(0, weight=1) content.columnconfigure(0, weight=3) content.columnconfigure(1, weight=3) content.columnconfigure(2, weight=3) content.columnconfigure(3, weight=1) content.columnconfigure(4, weight=1) content.rowconfigure(1, weight=1) #text = Text(app, width=80,height=40, wrap='none').grid(row=2, column=2) #adds spacing between widgets for child in app.winfo_children(): child.grid_configure(padx=5, pady=5) for child in content.winfo_children(): child.grid_configure(padx=5, pady=5) app.bind('<Return>',geturl) #enter can also be hit app.mainloop()
Comment
-
Comment