Problem with Numpy Standard Deviation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • crumble113
    New Member
    • Oct 2012
    • 3

    Problem with Numpy Standard Deviation

    Can anyone please tell me where I am going wrong with this for loop which is meant to take as input a specific corpus, sample size and number of samples and then give the averages of the expected sentiment tokens, normalised lexical diversity and probability of short sentences. It is also meant to give me the standard deviation of these three statistics too. I'm a real beginner with Python so not really sure where I've gone wrong Thanks a lot in advance.

    Code:
    def test_iterate(corpus_reader, sample_size, number_of_samples):
    for i in xrange(number_of_samples):
    tokens = corpus_reader.sample_words_by_sents(sample_size)
    sents = corpus_reader.sample_sents(sample_size)
    expected_sentiment_tokens(tokens)
    normalised_lexical_diversity(tokens)
    prob_short_sents(sents)
    stats = expected_sentiment_tokens(tokens)
    stats_two = normalised_lexical_diversity(tokens)
    stats_three = prob_short_sents(sents)
    print "Average expected no of sentiment tokens: %s" % average(stats)
    print "Average normalised lexical diversity: %s" % average(stats_two)
    print "Average probability of short sentences: %s" % average(stats_three)
    print "Standard deviation of sentiment tokens: %s" % std(stats)
    print "Standard deviation of normalised lexical diversity: %s" % std(stats_two)
    print "Standard deviation of probability of short sentences: %s" % std(stats_three)
    When I call for example
    Code:
    test_iterate(tcr, 500, 3)
    , the following output is given:

    Code:
    127.333333333 
    2.08398681196 
    0.506 
    116.25 
    2.21737363871 
    0.518 
    123.333333333 
    1.9821801535 
    0.534 
    Average expected no of sentiment tokens: 110.416666667 
    Average normalised lexical diversity: 2.89485940038 
    Average probability of short sentences: 0.518 
    Standard deviation of sentiment tokens: 0.0 
    Standard deviation of normalised lexical diversity: 0.0 
    Standard deviation of probability of short sentences: 0.0
    Last edited by Rabbit; Oct 15 '12, 10:00 PM. Reason: Reverting post
  • dwblas
    Recognized Expert Contributor
    • May 2008
    • 626

    #2
    I'm a real beginner with Python so not really sure where I've gone wrong
    What is wrong with what you have? What do you expect vs. what is printed? It looks like you are not saving the results from each iteration in the for() loop so are only printing the final pass but I can't tell from what you have submitted. Add some print statements similar to this
    Code:
    def test_iterate(corpus_reader, sample_size, number_of_samples):
         for i in xrange(number_of_samples):
             tokens = corpus_reader.sample_words_by_sents(sample_size)
             sents = corpus_reader.sample_sents(sample_size)
             expected_sentiment_tokens(tokens)
             normalised_lexical_diversity(tokens)
             prob_short_sents(sents)
    
             stats = expected_sentiment_tokens(tokens)
             print "stats in for loop =", stats
    
             stats_two = normalised_lexical_diversity(tokens)
             stats_three = prob_short_sents(sents)
    
         print "using stats =", stats, type(stats)
         print "Average expected no of sentiment tokens: %s" % average(stats)
    If "stats" is anything other than a numpy array then you are probably averaging one number instead of a list of numbers.

    Comment

    • crumble113
      New Member
      • Oct 2012
      • 3

      #3
      Thanks for the quick reply. I just tried that and this is what outputted
      Code:
      stats in for loop = 191.473684211
      stats in for loop = 186.277777778
      stats in for loop = 182.473684211
      stats in for loop = 182.611111111
      using stats = 182.611111111 <type 'float'>
      Average expected no of sentiment tokens: 182.611111111
      Do you know how I could fix my code to make it work correctly? Just need the averages and standard deviations for each statistic.

      Thanks again.
      Last edited by Rabbit; Oct 15 '12, 10:02 PM. Reason: Reverting post

      Comment

      • dwblas
        Recognized Expert Contributor
        • May 2008
        • 626

        #4
        stats in for loop = 182.611111111 <----- same value
        using stats = 182.611111111 <type 'float'> on both lines
        You have to append each of the values in the loop to a numpy array and average the array (some definitions). Start with a simple average only and then expand the code. I am sure there are many tutuorials/examples on the web for arrays, average, and standard deviation.

        Comment

        • crumble113
          New Member
          • Oct 2012
          • 3

          #5
          Could you please tell me what's wrong with this following code?
          Code:
          def test_iterate(corpus_reader, sample_size, number_of_samples):
          for i in xrange(number_of_samples):
          tokens = corpus_reader.sample_words_by_sents(sample_size)
          sents = corpus_reader.sample_sents(sample_size)
          print expected_sentiment_tokens(tokens)
          s = ([expected_sentiment_tokens(tokens)])
          s.append(expected_sentiment_tokens(tokens))
          print "Average expected no of sentiment tokens: %s" % average(s)
          Code:
          test_iterate(rcr, 500, 3)
          gives output

          Code:
          191.823529412
          185.117647059
          185.166666667
          Average expected no of sentiment tokens: 185.166666667
          the average is being assigned just the last value
          Last edited by Rabbit; Oct 15 '12, 10:02 PM. Reason: Reverting post

          Comment

          • Rabbit
            Recognized Expert MVP
            • Jan 2007
            • 12517

            #6
            You're just getting the last value because that's all you have available. In line 6, you replace whatever you had with the most current value. Then in line 7, you append the most current value. The result being an average of the most current value. Then in the next iteration, you do it all over again. You will only ever have 2 of the most current values in there.

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #7
              I am reverting your posts for posterity.

              Please don't edit your posts and remove all traces of your question. If someone were to visit this thread to see the answer, we would like them to be able to view everything. Also, if you've solved the issue, please post the answer so that others facing the same issue may benefit from your solution.

              Comment

              Working...