Help with parsing data out of a HTML File?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Cliff Roman

    Help with parsing data out of a HTML File?

    I have a league for a game where we get exports after every session in HTML
    format

    It is broken down into 3 sections and each section has a Table with the
    results

    Right now I have to create 3 csv files manually out of it

    I would like to learn how to create a PHP script that can parse the data for
    me out of the html results and create the csv files that I need

    Can anyone give me some good (for newbie) references on where I can go to
    learn how to do this? I have been searching for a few days and I have not
    really been able to find anything that helps me.

    Thank you in advance


  • Reply via newsgroup

    #2
    Re: Help with parsing data out of a HTML File?

    Cliff Roman wrote:
    [color=blue]
    > I have a league for a game where we get exports after every session in HTML
    > format
    >
    > It is broken down into 3 sections and each section has a Table with the
    > results
    >
    > Right now I have to create 3 csv files manually out of it
    >
    > I would like to learn how to create a PHP script that can parse the data for
    > me out of the html results and create the csv files that I need
    >
    > Can anyone give me some good (for newbie) references on where I can go to
    > learn how to do this? I have been searching for a few days and I have not
    > really been able to find anything that helps me.
    >
    > Thank you in advance
    >
    >[/color]

    I can't write the code for you but I suggest you have a look at
    striptags() (or is it strip_tags() ?) to remove the html code, then use
    something like explode() or implode() to do whatever it is you want to do...

    You could also try making a sample html file available online because
    spaces or other delimiters that would signify the begining of a value
    and the end of another needs clarifying - As you look at your html
    table, you know that the border of the table seperates on value from
    another - But if your table cells are plain numeric and there's nothing
    else to confuse things, then it shouldn't be too difficult - the biggest
    problem is not the data, but the crap that might sit around it...

    If you still have difficulties, let us know and I'll try my hand to help
    out... but I'd really want to see some sort of sample output to know
    what it is you want to chew and spit.

    randelld

    Comment

    • B. Johannessen

      #3
      Re: Help with parsing data out of a HTML File?

      -----BEGIN PGP SIGNED MESSAGE-----
      Hash: SHA1

      Cliff Roman wrote:[color=blue]
      > I would like to learn how to create a PHP script that can parse the
      > data for me out of the html results and create the csv files that I
      > need[/color]

      The new XML features of PHP5 could probably do that with ease. Check
      out http://slides.bitflux.ch/phpconf2003/slide_23.html or the whole
      presentation at http://slides.bitflux.ch/phpconf2003/

      If upgrading your web server to PHP5 is not an option, you could
      still install the PHP5 cli just for runnning this conversion.


      Bob

      - --
      | B. Johannessen <bob@db.org> +47 97 15 20 09 - http://db.org/
      | Mail & Spam - News, Drafts & Standards - http://db.org/blog/
      | On The Origin Of Spam; Spam Statistics - http://db.org/spam/
      - --
      -----BEGIN PGP SIGNATURE-----

      iD8DBQFAN3Mkooi sUyMOFlgRAtlKAJ 0VFManpx3fpZE1q 4G+AD1f37ZWTgCf Y49R
      qCOdowV7dxUZejS 33WgzT0Y=
      =mO8F
      -----END PGP SIGNATURE-----

      Comment

      • Cliff Roman

        #4
        Re: Help with parsing data out of a HTML File?

        Imagine that this file looked like this

        <H3 style="color : #173D54">
        Session: Practice
        </H3>
        <H3 style="color : #173D54">
        Date: 02/18/04
        </H3>

        Here is the part that I am unsure of.. My first step I guess is learning
        how to say basically
        After "Session:" The next word or string would = $session (in this case
        Practice)

        or in a repeating area (like a table)

        <TD style="backgrou nd : #CEDAEB">
        1
        </TD>
        <TD style="backgrou nd : #CEDAEB">
        79
        </TD>

        How I would end up saying
        After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank (in
        this case 1)
        then have it say
        After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score

        etc

        Maybe I am approaching this all wrong, I am unsure

        Thanks



        "Reply via newsgroup" <reply-to-newsgroup@pleas e.com> wrote in message
        news:VwCZb.5647 36$JQ1.315614@p d7tw1no...[color=blue]
        > Cliff Roman wrote:
        >[color=green]
        > > I have a league for a game where we get exports after every session in[/color][/color]
        HTML[color=blue][color=green]
        > > format
        > >
        > > It is broken down into 3 sections and each section has a Table with the
        > > results
        > >
        > > Right now I have to create 3 csv files manually out of it
        > >
        > > I would like to learn how to create a PHP script that can parse the data[/color][/color]
        for[color=blue][color=green]
        > > me out of the html results and create the csv files that I need
        > >
        > > Can anyone give me some good (for newbie) references on where I can go[/color][/color]
        to[color=blue][color=green]
        > > learn how to do this? I have been searching for a few days and I have[/color][/color]
        not[color=blue][color=green]
        > > really been able to find anything that helps me.
        > >
        > > Thank you in advance
        > >
        > >[/color]
        >
        > I can't write the code for you but I suggest you have a look at
        > striptags() (or is it strip_tags() ?) to remove the html code, then use
        > something like explode() or implode() to do whatever it is you want to[/color]
        do...[color=blue]
        >
        > You could also try making a sample html file available online because
        > spaces or other delimiters that would signify the begining of a value
        > and the end of another needs clarifying - As you look at your html
        > table, you know that the border of the table seperates on value from
        > another - But if your table cells are plain numeric and there's nothing
        > else to confuse things, then it shouldn't be too difficult - the biggest
        > problem is not the data, but the crap that might sit around it...
        >
        > If you still have difficulties, let us know and I'll try my hand to help
        > out... but I'd really want to see some sort of sample output to know
        > what it is you want to chew and spit.
        >
        > randelld[/color]


        Comment

        • Reply via newsgroup

          #5
          Re: Help with parsing data out of a HTML File?

          Cliff Roman wrote:
          [color=blue]
          > Imagine that this file looked like this
          >
          > <H3 style="color : #173D54">
          > Session: Practice
          > </H3>
          > <H3 style="color : #173D54">
          > Date: 02/18/04
          > </H3>
          >
          > Here is the part that I am unsure of.. My first step I guess is learning
          > how to say basically
          > After "Session:" The next word or string would = $session (in this case
          > Practice)
          >
          > or in a repeating area (like a table)
          >
          > <TD style="backgrou nd : #CEDAEB">
          > 1
          > </TD>
          > <TD style="backgrou nd : #CEDAEB">
          > 79
          > </TD>
          >
          > How I would end up saying
          > After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank (in
          > this case 1)
          > then have it say
          > After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score
          >
          > etc
          >
          > Maybe I am approaching this all wrong, I am unsure
          >
          > Thanks
          >
          >[/color]

          Ignore the html tags, but look left to right, like you would be reading
          a book and confirm a few things for me...

          First, (again, reading from left to right) the start of a new table
          begins with "Session:" true?

          Second, the next description that is fixed is "Date:" true?

          Third, until the next "Session", everything else that follows is
          numeric, true?

          Fourth, how many columns wide is your table, or is it variable?

          Last, are your tables one under the other, or side by side? Or do they
          have anything else that might get in the way.

          Why?

          Well I can try and bash out a script once I know some rough facts. I
          can use strip_tags() to get rid of the html, then after that we are left
          with a stream of text. We can use explode to put each word/number in to
          an element of an array on its own - We can use "Switch:" as a flag to
          indicate a new table of scores is starting or ending... and we can have
          everything space delimited which makes it easy to read and re-write
          everything else...

          I'll keep an eye here for your answer and I will *try* to help further,
          however I can't guarantee...

          randelld

          Comment

          • Cliff Roman

            #6
            Re: Help with parsing data out of a HTML File?

            I would need the script to create 3 csv files for me.. if you can show me
            an example of the first one then I would be more than happy/willing to work
            through it and figure it out. I am just not sure where to start

            So let me just look at the first one

            Lets say for example the script.php file was in its own directory and the
            results were in a /results directory. Lets assume the file is called
            results.html

            If I remove the html tags it would look like this

            Session: Qualifying
            P # DRIVER TIME
            1 15 J_Doe 22.288
            2 2 J_Smith 22.310
            3 7 M_Johnson 22.376
            etc..

            The final result I would need would be something like this.. (in a file
            called qual.csv)

            1,15,J_Doe,22.2 88
            2,2,J_Smith,22. 310
            3,7,M_Johnson,2 2.376
            etc

            I really appreciate the help you have given so far.. like I said, I have no
            problem working it out on my own if I can get an example

            Thanks,
            Cliff

            "Reply via newsgroup" <reply-to-newsgroup@pleas e.com> wrote in message
            news:WlWZb.5885 70$ts4.528366@p d7tw3no...[color=blue]
            > Cliff Roman wrote:
            >[color=green]
            > > Imagine that this file looked like this
            > >
            > > <H3 style="color : #173D54">
            > > Session: Practice
            > > </H3>
            > > <H3 style="color : #173D54">
            > > Date: 02/18/04
            > > </H3>
            > >
            > > Here is the part that I am unsure of.. My first step I guess is[/color][/color]
            learning[color=blue][color=green]
            > > how to say basically
            > > After "Session:" The next word or string would = $session (in this[/color][/color]
            case[color=blue][color=green]
            > > Practice)
            > >
            > > or in a repeating area (like a table)
            > >
            > > <TD style="backgrou nd : #CEDAEB">
            > > 1
            > > </TD>
            > > <TD style="backgrou nd : #CEDAEB">
            > > 79
            > > </TD>
            > >
            > > How I would end up saying
            > > After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank[/color][/color]
            (in[color=blue][color=green]
            > > this case 1)
            > > then have it say
            > > After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score
            > >
            > > etc
            > >
            > > Maybe I am approaching this all wrong, I am unsure
            > >
            > > Thanks
            > >
            > >[/color]
            >
            > Ignore the html tags, but look left to right, like you would be reading
            > a book and confirm a few things for me...
            >
            > First, (again, reading from left to right) the start of a new table
            > begins with "Session:" true?
            >
            > Second, the next description that is fixed is "Date:" true?
            >
            > Third, until the next "Session", everything else that follows is
            > numeric, true?
            >
            > Fourth, how many columns wide is your table, or is it variable?
            >
            > Last, are your tables one under the other, or side by side? Or do they
            > have anything else that might get in the way.
            >
            > Why?
            >
            > Well I can try and bash out a script once I know some rough facts. I
            > can use strip_tags() to get rid of the html, then after that we are left
            > with a stream of text. We can use explode to put each word/number in to
            > an element of an array on its own - We can use "Switch:" as a flag to
            > indicate a new table of scores is starting or ending... and we can have
            > everything space delimited which makes it easy to read and re-write
            > everything else...
            >
            > I'll keep an eye here for your answer and I will *try* to help further,
            > however I can't guarantee...
            >
            > randelld[/color]


            Comment

            Working...