Hi,
I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.
Here is my class:
My reasoning for using the char encode decode was that if there were characters outside of the utf-8 character encoding format then these would be taken care of, is this correct? The sql list was something i lifted from someone else function to ensure that the string is sql safe.
Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.
Any ideas or feedback are welcome.
Thanks,
Chromis
I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.
Here is my class:
Code:
<cfcomponent>
<cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
<cfargument name="paragraph" type="string" required="yes">
<cfscript>
/**
* Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
* The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
*
* @param paragraph String you want XHTML / XML formatted.
* @return Returns a string.
* @author ****
* @version 1.0, December 10th, 2008
*/
var returnValue = '';
var newParagraph = arguments.paragraph;
var sqlList = "-- ,'";
var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
/* Make sql safe */
newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));
/* Make XML and UTF-8 Safe */
newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
/* Break into paragraphs */
newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
newParagraphCount = ArrayLen(newParagraph);
for(i=1;i LTE newParagraphCount;i=i+1) {
//WriteOutput(newParagraph[i]);
/* Ignore blank lines */
if(newParagraph[i] NEQ "") {
/* Remove excess paragraph elements */
REReplace(newParagraph[i], "<?p*>", "", "All");
/* Loop through array of paragraphs wrapping in p elements, skipping list elements */
containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
if(containsList EQ 0) {
returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
}
else {
returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);
}
}
}
return trim(returnValue);
</cfscript>
</cffunction>
</cfcomponent>
Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.
Any ideas or feedback are welcome.
Thanks,
Chromis
Comment