Finding a key word in a text file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Noam Dekers

    Finding a key word in a text file

    Hi all,
    I would like to find a word stored in a text file.

    Structure: I have one file named keyWords.txt that stores some key
    words I'm interested in finding. In addition I also have a file named
    textOrigin.txt in which I store the text to search in.
    I would like my prog to check if a certain word appears in the text
    and than to tell me what line it found it in (if it did...).

    My problem is that the script can't find the words I'm looking for. I
    took one word from the word list and put it into the text file to be
    searched, for some reason this word is not found by the prog. I used
    'enter' at the end of each line. The word being used is on line 3 in
    the keyWords.txt file. I have some reason to belive that the reason
    lie here:
    if ($pos)
    {
    echo " line $i: $storeWord[$n]\n";
    }
    I also tried it with if (!$pos === FALSE) {...} but nothing there
    either...
    Anyone?
    Thank you very much for any help!

    Dekers


    the keyWords.txt file:
    -------------------------------
    Recording Site
    Recording Type
    INTRA
    SUA
    MUA
    LFP
    Acquisition Type
    Windowed
    Digitilized
    Electrode Type
    Tetrode
    Metal
    Pipette
    Pipette Charakteristics
    Tetrode Charakteristics
    Electrode Tip Length (ìm)
    Electrode Tip OD (ìm)
    Bandwidth in Hz
    Impegance in MegOhm
    Number of Penetrations
    Neurons Encountered
    Neurones Analysed
    Spike Amplitude in ìVolts
    Spike Width in msec
    Number of Pyramidal
    Number of Interneurons
    Background Activity
    Max Modulation (Spikes/Sec)
    Min Modulation (Spikes/Sec)

    The textOrigin.txt file:
    -------------------------------
    I found the INTRA inside


    My code:
    *************** ****



    PHP:--------------------------------------------------------------------------------
    <?php

    $filesource = "keyWords.t xt";
    $fp = fopen ($filesource, "r");

    $storeWord = array();

    if ($fp)
    { $i = 0;
    while (!feof($fp))
    {
    /*************** *************** ****
    get the list of words to find and
    store them in an array for later use
    *************** *************** ****/
    $line = fgets ($fp, 100);
    $storeWord[$i] = $line;

    $i = $i+1;

    }

    fclose($fp);
    }
    else
    echo "File could not be found";


    /*************** *************** ****
    open the text source file, pick each line
    and compare it to the complete list of key words
    *************** *************** ****/
    $filesource = "textOrigin.txt ";
    $fp = fopen ($filesource, "r");
    if ($fp)
    { $i = 1; //this is the line number
    while (!feof($fp))
    {


    $textLine = fgets ($fp, 300);


    /*************** *************** ****
    compare all the words stored in the array
    with each line in the origin file
    *************** *************** ****/
    for($n=0; $n<=count($stor eWord)-1; $n=$n+1)
    {
    $pos = strpos($textLin e, $storeWord[$n]);
    if ($pos)
    {
    echo " line $i: $storeWord[$n]\n";
    }
    $i = $i+1;
    }

    }

    fclose($fp);
    }
    else
    echo "File could not be found";


    ?>
  • Default User

    #2
    Re: Finding a key word in a text file

    Noam Dekers wrote:[color=blue]
    >
    > Hi all,
    > I would like to find a word stored in a text file.[/color]
    [color=blue]
    >
    > the keyWords.txt file:
    > -------------------------------
    > Recording Site[/color]

    [snip]
    [color=blue]
    > The textOrigin.txt file:
    > -------------------------------
    > I found the INTRA inside
    >
    > My code:
    > *************** ****
    >
    > PHP:--------------------------------------------------------------------------------
    > <?php
    >
    > $filesource = "keyWords.t xt";
    > $fp = fopen ($filesource, "r");
    >
    > $storeWord = array();
    >
    > if ($fp)
    > { $i = 0;
    > while (!feof($fp))
    > {
    > /*************** *************** ****
    > get the list of words to find and
    > store them in an array for later use
    > *************** *************** ****/
    > $line = fgets ($fp, 100);
    > $storeWord[$i] = $line;
    >
    > $i = $i+1;[/color]

    The index isn't necessary, just do

    $storeWord[] = $line;


    The biggest problem is that you are storing lines, not keywords. In the
    first place, many of these lines are multiple words. Is that what you
    want? If so, then call them key phrases.

    But most importantly, what about the newline at the end? That screws up
    matching. See the manual:

    fgets

    (PHP 3, PHP 4 )
    fgets -- Gets line from file pointer
    Description
    string fgets ( resource handle [, int length])

    Returns a string of up to length - 1 bytes read from the file pointed to
    by handle. Reading ends when length - 1 bytes have been read, on a
    newline (which is included in the return value), or on EOF (whichever
    comes first). If no length is specified, the length defaults to 1k, or
    1024 bytes.


    Remove whitespace from the end with chop().


    [color=blue]
    > /*************** *************** ****
    > open the text source file, pick each line
    > and compare it to the complete list of key words
    > *************** *************** ****/[/color]

    Again, even with removing the newline you have phrases not words. If you
    want words, you'll need more processing.



    Brian Rodenborn

    Comment

    • Noam Dekers

      #3
      Re: Finding a key word in a text file

      Default User <first.last@com pany.com> wrote in message news:<3F392601. 677C01BE@compan y.com>...[color=blue]
      > Noam Dekers wrote:[color=green]
      > >
      > > Hi all,
      > > I would like to find a word stored in a text file.[/color]
      >[color=green]
      > >
      > > the keyWords.txt file:
      > > -------------------------------
      > > Recording Site[/color]
      >
      > [snip]
      >[color=green]
      > > The textOrigin.txt file:
      > > -------------------------------
      > > I found the INTRA inside
      > >
      > > My code:
      > > *************** ****
      > >
      > > PHP:--------------------------------------------------------------------------------
      > > <?php
      > >
      > > $filesource = "keyWords.t xt";
      > > $fp = fopen ($filesource, "r");
      > >
      > > $storeWord = array();
      > >
      > > if ($fp)
      > > { $i = 0;
      > > while (!feof($fp))
      > > {
      > > /*************** *************** ****
      > > get the list of words to find and
      > > store them in an array for later use
      > > *************** *************** ****/
      > > $line = fgets ($fp, 100);
      > > $storeWord[$i] = $line;
      > >
      > > $i = $i+1;[/color]
      >
      > The index isn't necessary, just do
      >
      > $storeWord[] = $line;
      >[/color]
      Thank you for this advice - I find the most difficult thing is to make
      my code efficient. I have only little experience with PHP...[color=blue]
      >
      > The biggest problem is that you are storing lines, not keywords. In the
      > first place, many of these lines are multiple words. Is that what you
      > want? If so, then call them key phrases.[/color]

      Well, I some times want to search for words and sometimes for phrases.
      There are certain places when a constant set of words is written in
      the same order and I would like to track it as is.[color=blue]
      >
      > But most importantly, what about the newline at the end? That screws up
      > matching. See the manual:
      >
      > fgets
      >
      > (PHP 3, PHP 4 )
      > fgets -- Gets line from file pointer
      > Description
      > string fgets ( resource handle [, int length])
      >
      > Returns a string of up to length - 1 bytes read from the file pointed to
      > by handle. Reading ends when length - 1 bytes have been read, on a
      > newline (which is included in the return value), or on EOF (whichever
      > comes first). If no length is specified, the length defaults to 1k, or
      > 1024 bytes.
      >
      >
      > Remove white space from the end with chop().
      >[/color]
      Well - I must say I didn't understand that one. Why is white space a
      problem? Don't I have a white space in both cases: keyWord/keyPhrase
      and on the line of the original text? Or - don't I actually have a
      'new line' at the end of each line? - I think I have new line because
      I always press enter in the end of each line.

      Thank you for any answer, Dekers.[color=blue]
      >
      >[color=green]
      > > /*************** *************** ****
      > > open the text source file, pick each line
      > > and compare it to the complete list of key words
      > > *************** *************** ****/[/color]
      >
      > Again, even with removing the newline you have phrases not words. If you
      > want words, you'll need more processing.
      >
      >
      >
      > Brian Rodenborn[/color]

      Comment

      • Default User

        #4
        Re: Finding a key word in a text file

        Noam Dekers wrote:[color=blue][color=green]
        > > The biggest problem is that you are storing lines, not keywords. In the
        > > first place, many of these lines are multiple words. Is that what you
        > > want? If so, then call them key phrases.[/color]
        >
        > Well, I some times want to search for words and sometimes for phrases.
        > There are certain places when a constant set of words is written in
        > the same order and I would like to track it as is.[/color]

        Ok, but that's not how you have the code written. You are reading in
        lines from a file, storing them in an array, then searching for these
        strings within another string. For instance, your data set has
        "Electrode Type", but not "Electrode" . There's no way for you to search
        for just that word as you have it coded. Is that what you want, does the
        data set cover all substrings?

        Again, key words is the wrong term. I don't care, I just want to make
        sure you know what you want.

        [color=blue][color=green]
        > > But most importantly, what about the newline at the end? That screws up
        > > matching. See the manual:[/color][/color]
        [color=blue][color=green]
        > > Remove white space from the end with chop().
        > >[/color]
        > Well - I must say I didn't understand that one. Why is white space a
        > problem? Don't I have a white space in both cases: keyWord/keyPhrase
        > and on the line of the original text? Or - don't I actually have a
        > 'new line' at the end of each line? - I think I have new line because
        > I always press enter in the end of each line.[/color]

        Whitespace is any of the following: newline, carriage return, space
        character, tab, some others. In this case, as you are reading from the
        file with fgets(), the newlines that you use to separate the lines in
        your file are retained. You might think the first line read from your
        file is "Recording Site" but it is really "Recording Site\n" where '\n'
        is the newline character. So you won't get a match with strpos() or
        strstr() unless that exact string appears.

        Use chop() or rtrim() to remove that from the end of each line, along
        with any spaces that might be there invisibly. Actually, trim() might be
        best, in case there are any space characters leading the strings.



        Brian Rodenborn

        Comment

        Working...