Simple question on string extraction

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Epetruk

    Simple question on string extraction

    Hi,

    I'm having to modify a PHP script even though I have little knowledge of PHP
    itself. The script extracts specific strings from an html file, and I need
    to it extract some further information.

    Specifically, each file represents an article written by an author. The
    author's name is typically preceded by a 'By' or a 'by', then it goes on
    till there's a carriage return.

    So for example, the file might contain something like this:


    The Need For Regeneration

    by <b>John Smith</b>

    We have seen the waste that has been produced....

    (rest of article)


    or


    How To Make Lots and Lots of Money Writing PHP

    by The Supreme Coder

    The first thing you need to know about making money is...

    (rest of article)


    So I need code that will start searching the file from the beginning for the
    words 'by ' or 'By ', then grab everything that follows that until it gets
    to a new line and assign that to a variable. In the examples I have given
    above, it would grab '<b>John Smith</b>' and 'The Supreme Coder'. I've seen
    a function called preg_match which might do the job, but it uses regular
    expressions which I have little knowledge of.

    Would any person be so kind as to post what arguments I would need to call
    this function with?

    TIA,

    --
    Akin

    aknak at aksoto dot idps dot co dot uk


  • WombatBoy

    #2
    Re: Simple question on string extraction

    I've been doing something similar myself, but wanted to avoid the chance of
    getting an accidental early string match.

    The strpos() function will let you locate a string within another string
    (I'm assuming here that you've got the whole html page as a single string),
    and, if required, you can specify a starting position.

    So something like

    $p1 = strpos($rec,"</header>");

    would let you get beyond the html header, then

    $p2 = strpos($rec," by ",$p1);

    would let you find the first occurrence of " by " beyond position $p1 (or
    maybe "by<", depending whether there's a space there or not)

    then you can search for <b> and </b> in the same way, adjust your sums a
    bit, and get

    $author = substr($rec,$st art,$length);

    where $start will probably be something like $p1+3 and $length something
    like $p2-$p1-2, or whatever it turns out to be, and whichever way round $p1
    and $p2 end up.


    Hope this helps. As an alternative you might try the explode function using
    " by " as the string to split $rec on, and then check each array element.




    "Epetruk" <nobody@blackho le.com> wrote in message
    news:3njvqpF1sm 7fU1@individual .net...[color=blue]
    > Hi,
    >
    > I'm having to modify a PHP script even though I have little knowledge of
    > PHP
    > itself. The script extracts specific strings from an html file, and I need
    > to it extract some further information.
    >
    > Specifically, each file represents an article written by an author. The
    > author's name is typically preceded by a 'By' or a 'by', then it goes on
    > till there's a carriage return.
    >
    > So for example, the file might contain something like this:
    >
    >
    > The Need For Regeneration
    >
    > by <b>John Smith</b>
    >
    > We have seen the waste that has been produced....
    >
    > (rest of article)
    >
    >
    > or
    >
    >
    > How To Make Lots and Lots of Money Writing PHP
    >
    > by The Supreme Coder
    >
    > The first thing you need to know about making money is...
    >
    > (rest of article)
    >
    >
    > So I need code that will start searching the file from the beginning for
    > the
    > words 'by ' or 'By ', then grab everything that follows that until it gets
    > to a new line and assign that to a variable. In the examples I have given
    > above, it would grab '<b>John Smith</b>' and 'The Supreme Coder'. I've
    > seen
    > a function called preg_match which might do the job, but it uses regular
    > expressions which I have little knowledge of.
    >
    > Would any person be so kind as to post what arguments I would need to call
    > this function with?
    >
    > TIA,
    >
    > --
    > Akin
    >
    > aknak at aksoto dot idps dot co dot uk
    >
    >[/color]


    Comment

    Working...