Safely cut off short preview version of long string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hans Gruber

    Safely cut off short preview version of long string

    Hi all,

    Here`s a problem I have been working on for a while, but can`t seem to
    solve satisfactory.

    I have a database with blog entries. Because each of those entries has a
    variable length which can be quite long, I want to build an overview page.
    Of each entry there will be a preview version, say 700 characters max.

    My problem has to do with HTML tags. If for example an entry contains a
    <BLOCKQUOTE> with a large quote, my function would break off somewhere
    halfway in the quote. The end result of course won`t have the
    </BLOCKQUOTE>, rendering the resulting page horribly bad.

    I would like to build a function that breaks a string up to max X
    characters long, but plays it safe when it encounters any HTML tag: it
    does not matter if the end result is a string of say 670 characters long,
    it only matters that it approximates the max character setting and doesn`t
    mess up the HTML tags.

    Can anyone point me in the right direction?

    Hans
  • Peter Fox

    #2
    Re: Safely cut off short preview version of long string

    Following on from Hans Gruber's message. . .[color=blue]
    >My problem has to do with HTML tags. If for example an entry contains a
    ><BLOCKQUOTE> with a large quote, my function would break off somewhere
    >halfway in the quote. The end result of course won`t have the
    ></BLOCKQUOTE>, rendering the resulting page horribly bad.
    >
    >I would like to build a function that breaks a string up to max X
    >characters long, but plays it safe when it encounters any HTML tag: it
    >does not matter if the end result is a string of say 670 characters long,
    >it only matters that it approximates the max character setting and doesn`t
    >mess up the HTML tags.[/color]

    A simple way would be to decide where your end point was going to be
    roughly (not inside <...>) then leave all the remaining tags but remove
    the text.

    The reason for putting all the following tags in is that you can have
    complex nested structures where you'd have to do lots of complicated
    parsing - just not worth the effort. Also the entry could start with
    say <center> and end with </center> many pages apart.


    eg.
    1 - split string to get 1st X chars and work with remainder of string
    2 - explode remainder by '<' so that tags _except possibly in array[0]_
    will be the first part and therefore look like "ATAG>some text" (or
    "/ATAG>some text")
    3 - if array[0] doesn't contain a '>' this is tail of a tag
    (NB /sort of/ there are two exceptions - no more tags at all and this
    tag followed immediately by another in which case '>' would appear as
    last character if you see what I mean)
    4 - Now strip the bits after '>' from the array , implode with '<' and
    add to end of text.

    --
    PETER FOX Not the same since the pancake business flopped
    peterfox@eminen t.demon.co.uk.n ot.this.bit.no. html
    2 Tees Close, Witham, Essex.
    Gravity beer in Essex <http://www.eminent.dem on.co.uk>

    Comment

    Working...