Correct way to format strings for entry into RSS Feed

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chromis
    New Member
    • Jan 2008
    • 113

    Correct way to format strings for entry into RSS Feed

    Hi,

    I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
    I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.

    Here is my class:

    Code:
    <cfcomponent>
    	<cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
    		<cfargument name="paragraph" type="string" required="yes">
            
    		<cfscript>
    		/**
    		 * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
    		 * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
    		 * 
    		 * @param paragraph String you want XHTML / XML formatted. 
    		 * @return Returns a string. 
    		 * @author **** 
    		 * @version 1.0, December 10th, 2008
    		 */
    		 
    		var returnValue = '';
    		var newParagraph = arguments.paragraph;
    		var sqlList = "-- ,'";
    		var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
    		
    		/* Make sql safe */
    		newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));	
    			
    		/* Make XML and UTF-8 Safe */
    		newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
    		
    		/* Break into paragraphs */
    		newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
    		newParagraphCount = ArrayLen(newParagraph);
    		
    		for(i=1;i LTE newParagraphCount;i=i+1) {
    			
    			//WriteOutput(newParagraph[i]);
    			
    			/* Ignore blank lines */
    			if(newParagraph[i] NEQ "") {
    				
    				/* Remove excess paragraph elements */
    				REReplace(newParagraph[i], "<?p*>", "", "All");
    				  
    				/* Loop through array of paragraphs wrapping in p elements, skipping list elements */
    				containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
    				if(containsList EQ 0) { 
    					returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
    				}
    				else {
    					returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);				
    				}
    			}
    		}
    		return trim(returnValue);
    		</cfscript>
    	</cffunction>
    </cfcomponent>
    My reasoning for using the char encode decode was that if there were characters outside of the utf-8 character encoding format then these would be taken care of, is this correct? The sql list was something i lifted from someone else function to ensure that the string is sql safe.

    Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.

    Any ideas or feedback are welcome.

    Thanks,

    Chromis
  • acoder
    Recognized Expert MVP
    • Nov 2006
    • 16032

    #2
    I don't have experience in RSS feeds specifically, but the validation does look right.

    I would say rather than a large list of incorrect characters, how about a list of valid characters or a reg exp.

    Comment

    • chromis
      New Member
      • Jan 2008
      • 113

      #3
      The thing is the content will be coming from a user (copy and pasted from word for instance) so any characters could be input through it, so the ideal solution would be to convert the incorrect characters rather than delete them. Should i carry on down this route?

      Comment

      • acoder
        Recognized Expert MVP
        • Nov 2006
        • 16032

        #4
        Oh, I see. In that case, that sounds right. I was thinking more in terms of validation.

        Comment

        Working...