Hey,
I need to parse some information from a page, very unfortunately it's contained in a javascript format variable which is then used by a function to generate some sort of gui.
Anyway, here is the piece of script extracted from the page :
It's confusing, sorry :/
I need to extract some informations from here, like categories names (Cat1, Cat1.1, Cat2...) and documents inside these categories (
Any way to retrieve this var from Python without writing many complex regexp? I'd like to navigate it as some sort of list which would make things much easier to parse imo.
Thanks :)
I need to parse some information from a page, very unfortunately it's contained in a javascript format variable which is then used by a function to generate some sort of gui.
Anyway, here is the piece of script extracted from the page :
Code:
<script language="JavaScript"><!--
var myMenu =
[
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif" /></i></span>', '/', '#', '', '',
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1', '#', '', '',
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1.1', '#', '', '',,['','TitleDoc1.1 (Doc1.1.txt)','https://***/docman/view.php/22/9/Doc1.1.txt','','DescDoc1.1 10Char' ],
],
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat1.2', '#', '', '',,
],,['','Agenda1 (Dummy.txt)','https://***/docman/view.php/22/14/Dummy.txt','','DescriptionAgenda1' ],['','TitleDoc1 (Doc1.txt)','https://***/docman/view.php/22/8/Doc1.txt','','DescDoc1 10 char' ],
],
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Cat2', '#', '', '',,['','TitleDoc2 (Doc2.txt)','https://***/docman/view.php/22/10/Doc2.txt','','DescriptionDoc2' ],['','TitleDoc2.1 (Doc2.1.txt)','https://***/docman/view.php/22/12/Doc2.1.txt','','DescriptionDoc2.1' ],['','TitleDoc2.2 (Doc2.2.txt)','https://***/docman/view.php/22/11/Doc2.2.txt','','DescriptionDoc2.2' ],
],
['<span class="JSCookTreeFolderClosed"><i><img alt="" src="' + ctThemeXPBase + 'folder1.gif" /></i></span><span class="JSCookTreeFolderOpen"><i><img alt="" src="' + ctThemeXPBase + 'folderopen1.gif"></i></span>', 'Uncategorized Submissions', '#', '', '',,['','uncategDocument (DocPasCateg.txt)','https://***/docman/view.php/22/13/DocPasCateg.txt','','DescriptionUncatDoc' ],
], ]
];
ctDraw ('myMenuID', myMenu, ctThemeXP1, 'ThemeXP', 0, 1);
--></script>
I need to extract some informations from here, like categories names (Cat1, Cat1.1, Cat2...) and documents inside these categories (
Any way to retrieve this var from Python without writing many complex regexp? I'd like to navigate it as some sort of list which would make things much easier to parse imo.
Thanks :)
Comment