How to access random lines in textfile

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hans A

    How to access random lines in textfile

    I have a textfile "textfile.t xt" containing a list of words. There is
    one word on each line. I want to pick two random lines from this
    textfile, and I have tried to do something like:

    //Loading the file into an array:
    $textarray = file("textfile. txt);

    //Using array_rand to pick two random words
    $rand_numbers = array_rand($tex tarray, 2);

    //Reading out the two words:
    $rand_word_one = $textarray[$rand_numbers[0]];
    $rand_word_two = $textarray[$rand_numbers[1]];

    This seems to work ok if the textfile is small, but when I try a larger
    textfile, I get an error indicating a memory overload. I am not very
    surprised, to load the whole file using file() seems unnecessary.

    I guess a better sollution would be to pick two random numbers between
    1 and the total number of lines in the textfile, and then try to read
    out these line numbers using readline etc, but how can I do this? Any
    suggestions are welcome!

    /H.A.

  • Ewoud Dronkert

    #2
    Re: How to access random lines in textfile

    On 23 May 2005 06:38:08 -0700, Hans A wrote:[color=blue]
    > pick two random numbers between 1 and the total number of lines
    > in the textfile, and then try to read out these line numbers[/color]

    $fname = 'textfile.txt';
    $lines = 1000; //number of lines in the text file
    $words = 2; //number of words (lines) to pick

    $skip = array();
    for ( $i = 0; $i < $words; ++$i )
    {
    $r = mt_rand( 0, $lines - $words );
    $lines -= $r;
    $skip[] = $r;
    }

    $word = array();
    $fh = fopen( $fname );
    for ( $i = 0; $i < $words; ++$i )
    {
    for ( $j = 0; $j <= $skip[$i]; ++$j )
    $w = fgets( $fh );
    $word[] = trim( $w );
    }

    echo 'Random words: '.implode( ', ', $word );


    --
    Firefox Web Browser - Rediscover the web - http://getffox.com/
    Thunderbird E-mail and Newsgroups - http://gettbird.com/

    Comment

    • dspohn

      #3
      Re: How to access random lines in textfile

      $lines = file("file.txt" );

      echo $lines[0]; //First line;
      echo $lines[3]; //fourth line;

      Comment

      • Chung Leong

        #4
        Re: How to access random lines in textfile

        Just fseek() to a random location in the file, then do 2 fgets()--the
        first to remove the potentially truncated line, the second to get the
        next line.

        Comment

        • Matt Raines

          #5
          Re: How to access random lines in textfile

          On Mon, 23 May 2005, Ewoud Dronkert wrote:
          [color=blue]
          > On 23 May 2005 06:38:08 -0700, Hans A wrote:[color=green]
          >> pick two random numbers between 1 and the total number of lines
          >> in the textfile, and then try to read out these line numbers[/color][/color]

          You don't need to know the number of lines in the text file before you
          start to select one line at random.

          Reading a line at a time from the file, just update the selected line if
          floor(mt_rand(0 , $currentLineNum ber - 1)) is 0. At the end, your selected
          line is a fair and random choice, but you don't need to store more than 2
          lines in memory at any one time and there's no need to count the lines
          first.

          Something like:

          <?php

          $file = fopen($filename );
          $counter = 0;
          while (($line = fgets($file)) !== false) {
          if (floor(mt_rand( 0, $counter++)) == 0) {
          $selectedLine = $line;
          }
          }
          fclose($file);
          [... do something with $selectedLine ...]

          ?>

          If there's one line, it always matches, since mt_rand(0, 0) always returns
          0.

          If there are two lines, there is a 1 in 2 chance the second line will
          overwrite our $selectedLine.

          By the third iteration, the 1 in 3 chance the third line will match is
          split evenly (on average) between the matches and non-matches for the
          second line, if you see what I mean, leaving a 1 in 3 chance for each of
          the three lines. And so ad nauseum.

          Fairly applying the requirement to select *two* lines at random, you might
          have to run through the file twice, ignoring the line you selected last
          time. In this case you'd need to store the selected line's index as well
          as its content. Or you could just add a second if statement to the loop,
          but you'd have to work around the chance of selecting the same line twice.

          --
          Matt

          Comment

          • Ewoud Dronkert

            #6
            Re: How to access random lines in textfile

            On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:[color=blue]
            > Reading a line at a time from the file, just update the selected line if
            > floor(mt_rand(0 , $currentLineNum ber - 1)) is 0.[/color]

            Will never get beyond first line! And if implemented properly, I don't
            believe it's fair (but can't be bothered to do the math, sorry).

            Neither was my suggestion by the way; every next number is totally
            dependent on the previous one:
            [color=blue]
            > $skip = array();
            > for ( $i = 0; $i < $words; ++$i )
            > {
            > $r = mt_rand( 0, $lines - $words );
            > $lines -= $r;
            > $skip[] = $r;
            > }[/color]

            But it was an easy way to avoid picking the same number twice or more.


            --
            Firefox Web Browser - Rediscover the web - http://getffox.com/
            Thunderbird E-mail and Newsgroups - http://gettbird.com/

            Comment

            • Matt Raines

              #7
              Re: How to access random lines in textfile

              On Tue, 24 May 2005, Ewoud Dronkert wrote:
              [color=blue]
              > On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:[color=green]
              >> Reading a line at a time from the file, just update the selected line
              >> if floor(mt_rand(0 , $currentLineNum ber - 1)) is 0.[/color]
              >
              > Will never get beyond first line![/color]

              Perhaps I didn't make it clear that you need to iterate across every line
              of the file even if you find a match. The point is that at each iteration
              you change your selected line if the call to mt_rand returns 0; on the
              first line it should always match. A certain number of runs
              (1/numberOfLines) will never match again. The others will be distributed
              evenly across the lines in the file.
              [color=blue]
              > And if implemented properly, I don't believe it's fair (but can't be
              > bothered to do the math, sorry).[/color]

              Think of it like this: 100% of runs will match on the first line.
              <----------1----------->

              On the second line, 50% of runs will overwrite the selected line with the
              current line.
              <----1-----><----2----->

              On the third line, a third of runs will overwrite with the current line.
              But half of those (a sixth) will not have matched on line 2, and the other
              half will.
              <--1---><3-><--2---><3->

              This works out, when rearranged below, at exactly one third chance of
              matching each of the three lines.
              <--1---><--2---><--3--->

              You can continue to apply this logic for as many lines as you like, but
              I'm lazy so I choose to stop here. (note: also the reason I didn't bother
              to rejig it to return two lines instead of one :) ).

              Disclaimer: I'm pretty sure I didn't come up with this logic. I probably
              read it in a book once.

              Cheers,
              --
              Matt


              Comment

              • Ewoud Dronkert

                #8
                Re: How to access random lines in textfile

                On Tue, 24 May 2005 14:28:44 +0100, Matt Raines wrote:[color=blue]
                > Perhaps I didn't make it clear that [...][/color]

                No, sorry, my fault. I half expected one thing then didn't read on very
                well.

                Your solution is nifty, but rather expensive because of every call to
                mt_rand() or rand() on each line of the file, especially if every word
                chosen requires another complete walk of the file (or concurrent but
                different rand() calls). My algorithm requires only one walk of the file
                for any number of random words picked, or two if the number of lines is
                not known.

                What is the best way to pick k numbers from range n while optimizing
                speed, storage and/or randomness? Maybe:

                $n = 1000;
                $k = 2;
                $a = range( 0, $n - 1 );
                for ( $i = 0; $i < $k; ++$i )
                {
                $j = mt_random( $i, $n - 1 );
                $a[$i] = $a[$j];
                $a[$j] = $i;
                }
                $b = array_slice( $a, 0, $k );

                (Still just as many calls to mt_rand() as no. of words picked).
                Btw, is this for-loop (but then with $i<$n) the way the shuffle function
                is implemented?
                To prep $b for acting as $skip from my first post:

                sort( $b );
                for ( $i = 1; $i < $k; ++$i )
                $b[$i] -= $b[$i - 1] + 1;


                --
                Firefox Web Browser - Rediscover the web - http://getffox.com/
                Thunderbird E-mail and Newsgroups - http://gettbird.com/

                Comment

                Working...