Extracting a javascript var from an html page?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Hydex
    New Member
    • May 2010
    • 1

    Extracting a javascript var from an html page?

    Hey,

    I need to parse some information from a page, very unfortunately it's contained in a javascript format variable which is then used by a function to generate some sort of gui.

    Anyway, here is the piece of script extracted from the page :
    Code:
    <script language="JavaScript"><!--
    	var myMenu =
    	[
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif" /></i></span>', '/', '#', '', '', 
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1', '#', '', '',
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1.1', '#', '', '',,['','TitleDoc1.1 (Doc1.1.txt)','https://***/docman/view.php/22/9/Doc1.1.txt','','DescDoc1.1 10Char' ],
    		],
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1.2', '#', '', '',,
    		],,['','Agenda1 (Dummy.txt)','https://***/docman/view.php/22/14/Dummy.txt','','DescriptionAgenda1' ],['','TitleDoc1 (Doc1.txt)','https://***/docman/view.php/22/8/Doc1.txt','','DescDoc1 10 char' ],
    		],
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat2', '#', '', '',,['','TitleDoc2 (Doc2.txt)','https://***/docman/view.php/22/10/Doc2.txt','','DescriptionDoc2' ],['','TitleDoc2.1 (Doc2.1.txt)','https://***/docman/view.php/22/12/Doc2.1.txt','','DescriptionDoc2.1' ],['','TitleDoc2.2 (Doc2.2.txt)','https://***/docman/view.php/22/11/Doc2.2.txt','','DescriptionDoc2.2' ],
    		],
    		['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Uncategorized Submissions', '#', '', '',,['','uncategDocument (DocPasCateg.txt)','https://***/docman/view.php/22/13/DocPasCateg.txt','','DescriptionUncatDoc' ],
    		],		]
    	];
    	ctDraw ('myMenuID', myMenu, ctThemeXP1, 'ThemeXP', 0, 1);
    	--></script>
    It's confusing, sorry :/
    I need to extract some informations from here, like categories names (Cat1, Cat1.1, Cat2...) and documents inside these categories (
    Any way to retrieve this var from Python without writing many complex regexp? I'd like to navigate it as some sort of list which would make things much easier to parse imo.

    Thanks :)
  • Glenton
    Recognized Expert Contributor
    • Nov 2008
    • 391

    #2
    I think regular expressions were more or less designed for this kind of thing. Embrace them! They will (slowly) become your friends...

    Comment

    Working...