hi ,everyone,i want to scrape something from
http://search.dangdang .com/search_pub.php? key=python
my code is :
the output is :
0 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208723 65&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2087236 5_1_22591_p','' ,'',''); None
1 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=202553 54&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2025535 4_2_12605_p','' ,'',''); None
2 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208365 65&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2083656 5_3_2361_p','', '',''); None
3 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210046 15&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2100461 5_4_3387_p','', '',''); None
4 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210630 86&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2106308 6_5_18815_p','' ,'',''); None
5 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206784 61&ref=search-1-pub s('click','pyth on','01.54.04.0 3,01.54.06.18', '','86_1_25','' ,'','20678461_6 _3967_p','','', 'RECO'); None
6 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206503 63&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2065036 3_7_62_p','','' ,'RECO'); 黑客之道:漏洞发掘的艺术(原书 第二版)(赠1CD)(电子制品 CD-ROM)(
7 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=207679 32&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2076793 2_8_4475_p','', '','RECO'); Binary Hacks――黑客秘笈100选
8 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=205961 89&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2059618 9_9_639_p','',' ',''); None
9 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=209476 80&ref=search-1-pub s('click','pyth on','01.54.24.0 0,01.54.06.18', '','86_1_25','' ,'','20947680_1 0_7295_p','','' ,''); None
10 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210503 68&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2105036 8_11_7039_p','' ,'',''); None
11 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=206679 66&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2066796 6_12_383_p','', '',''); None
12 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210224 93&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2102249 3_13_5183_p','' ,'',''); None
13 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=479654 &ref=search-1-pub s('click','pyth on','01.54.06.0 8,01.54.06.18', '','86_1_25','' ,'','479654_14_ 2095_p','','',' RECO'); Perl语言编程(第三版)
14 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=209998 55&ref=search-1-pub s('click','pyth on','01.54.10.0 0','','86_1_25' ,'','','2099985 5_15_6715_p','' ,'','RECO'); 程序员的思维修炼:开发认知潜能 的九堂课
15 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206962 03&ref=search-1-pub s('click','pyth on','01.54.06.0 8','','86_1_25' ,'','','2069620 3_16_31615_p',' ','','RECO'); Perl语言入门(第五版)(原 书名:Learni ng Perl,5/e)
16 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=206706 43&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2067064 3_17_24_p','',' ',''); 可爱的
17 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=203622 10&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2036221 0_18_32_p','',' ',''); 学习
18 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=905323 6&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','9053236 _19_4_p','','', ''); 学习
19 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208507 80&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2085078 0_20_1055_p','' ,'',''); None
20 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=204490 68&ref=search-1-pub s('click','pyth on','01.54.06.0 8','','86_1_25' ,'','','2044906 8_21_38_p','',' ','RECO'); 精通Perl
21 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=211278 16&ref=search-1-pub s('click','pyth on','01.54.24.0 0,01.54.06.18', '','86_1_25','' ,'','21127816_2 2_12545_p','',' ',''); None
22 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=211076 33&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2110763 3_23_19245_p',' ','',''); Hadoop权威指南(第2版) 修订升级版
23 None http://bang.dangdang.c om/product_redirec t.php?product_i d=9317290 None None
24 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=931729 0&ref=search-1-pub s('click','pyth on','01.54.06.0 6,01.49.01.11,0 1.54.26.00','', '86_1_25','','' ,'9317290_24_81 727_p','','','' ); Java编程思想(第4版)
25 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=207731 86&ref=search-1-pub s('click','pyth on','01.54.06.1 7','','86_1_25' ,'','','2077318 6_25_80479_p',' ','',''); Android应用开发揭秘
the problem is x.text ,for example:
1.
<a name="p_name" target="_blank" href="http://product.dangdan g.com/product.aspx?pr oduct_id=208723 65&ref=search-1-pub" onclick="s('cli ck','python','0 1.54.06.18','', '86_1_25','','' ,'20872365_1_22 591_p','','','' );">
<font class="skcolor_ ljg">Python</font>
基础教程(第2版)
</a>
what i want to get is "Python 基础教程(第2版) ",the output is None
2:
<a name="p_name" target="_blank" href="http://product.dangdan g.com/product.aspx?pr oduct_id=206706 43&ref=search-1-pub" onclick="s('cli ck','python','0 1.54.06.18','', '86_1_25','','' ,'20670643_17_2 4_p','','',''); ">
可爱的
<font class="skcolor_ ljg">Python</font>
</a>
what i want to get is "可爱的python" ,the output is 可爱的
would you mind to tell me how to revise my code?
http://search.dangdang .com/search_pub.php? key=python
my code is :
Code:
import urllib
import lxml.html
down='http://search.dangdang.com/search_pub.php?key=python'
file=urllib.urlopen(down).read()
root=lxml.html.fromstring(file)
tnodes = root.xpath("//div[@class='listitem detail']//li[@class='maintitle']//a")
for i,x in enumerate(tnodes):
print i," ",x.get('name'),x.get('href'),x.get('onclick'),x.text,"\n"
0 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208723 65&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2087236 5_1_22591_p','' ,'',''); None
1 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=202553 54&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2025535 4_2_12605_p','' ,'',''); None
2 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208365 65&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2083656 5_3_2361_p','', '',''); None
3 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210046 15&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2100461 5_4_3387_p','', '',''); None
4 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210630 86&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2106308 6_5_18815_p','' ,'',''); None
5 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206784 61&ref=search-1-pub s('click','pyth on','01.54.04.0 3,01.54.06.18', '','86_1_25','' ,'','20678461_6 _3967_p','','', 'RECO'); None
6 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206503 63&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2065036 3_7_62_p','','' ,'RECO'); 黑客之道:漏洞发掘的艺术(原书 第二版)(赠1CD)(电子制品 CD-ROM)(
7 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=207679 32&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2076793 2_8_4475_p','', '','RECO'); Binary Hacks――黑客秘笈100选
8 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=205961 89&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2059618 9_9_639_p','',' ',''); None
9 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=209476 80&ref=search-1-pub s('click','pyth on','01.54.24.0 0,01.54.06.18', '','86_1_25','' ,'','20947680_1 0_7295_p','','' ,''); None
10 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210503 68&ref=search-1-pub s('click','pyth on','01.54.19.0 0','','86_1_25' ,'','','2105036 8_11_7039_p','' ,'',''); None
11 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=206679 66&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2066796 6_12_383_p','', '',''); None
12 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=210224 93&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2102249 3_13_5183_p','' ,'',''); None
13 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=479654 &ref=search-1-pub s('click','pyth on','01.54.06.0 8,01.54.06.18', '','86_1_25','' ,'','479654_14_ 2095_p','','',' RECO'); Perl语言编程(第三版)
14 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=209998 55&ref=search-1-pub s('click','pyth on','01.54.10.0 0','','86_1_25' ,'','','2099985 5_15_6715_p','' ,'','RECO'); 程序员的思维修炼:开发认知潜能 的九堂课
15 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=206962 03&ref=search-1-pub s('click','pyth on','01.54.06.0 8','','86_1_25' ,'','','2069620 3_16_31615_p',' ','','RECO'); Perl语言入门(第五版)(原 书名:Learni ng Perl,5/e)
16 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=206706 43&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2067064 3_17_24_p','',' ',''); 可爱的
17 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=203622 10&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2036221 0_18_32_p','',' ',''); 学习
18 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=905323 6&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','9053236 _19_4_p','','', ''); 学习
19 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=208507 80&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2085078 0_20_1055_p','' ,'',''); None
20 pr_name http://product.dangdan g.com/product.aspx?pr oduct_id=204490 68&ref=search-1-pub s('click','pyth on','01.54.06.0 8','','86_1_25' ,'','','2044906 8_21_38_p','',' ','RECO'); 精通Perl
21 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=211278 16&ref=search-1-pub s('click','pyth on','01.54.24.0 0,01.54.06.18', '','86_1_25','' ,'','21127816_2 2_12545_p','',' ',''); None
22 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=211076 33&ref=search-1-pub s('click','pyth on','01.54.06.1 8','','86_1_25' ,'','','2110763 3_23_19245_p',' ','',''); Hadoop权威指南(第2版) 修订升级版
23 None http://bang.dangdang.c om/product_redirec t.php?product_i d=9317290 None None
24 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=931729 0&ref=search-1-pub s('click','pyth on','01.54.06.0 6,01.49.01.11,0 1.54.26.00','', '86_1_25','','' ,'9317290_24_81 727_p','','','' ); Java编程思想(第4版)
25 p_name http://product.dangdan g.com/product.aspx?pr oduct_id=207731 86&ref=search-1-pub s('click','pyth on','01.54.06.1 7','','86_1_25' ,'','','2077318 6_25_80479_p',' ','',''); Android应用开发揭秘
the problem is x.text ,for example:
1.
<a name="p_name" target="_blank" href="http://product.dangdan g.com/product.aspx?pr oduct_id=208723 65&ref=search-1-pub" onclick="s('cli ck','python','0 1.54.06.18','', '86_1_25','','' ,'20872365_1_22 591_p','','','' );">
<font class="skcolor_ ljg">Python</font>
基础教程(第2版)
</a>
what i want to get is "Python 基础教程(第2版) ",the output is None
2:
<a name="p_name" target="_blank" href="http://product.dangdan g.com/product.aspx?pr oduct_id=206706 43&ref=search-1-pub" onclick="s('cli ck','python','0 1.54.06.18','', '86_1_25','','' ,'20670643_17_2 4_p','','',''); ">
可爱的
<font class="skcolor_ ljg">Python</font>
</a>
what i want to get is "可爱的python" ,the output is 可爱的
would you mind to tell me how to revise my code?
Comment