statistical analysis with decimal data retrieved from MySQL DB

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ipab
    New Member
    • Apr 2010
    • 2

    statistical analysis with decimal data retrieved from MySQL DB

    I'm a complete Python/MySQL newbie - trying to retrieve and manipulate data from a MySQL DB over a VPN/LAN from an Ubuntu/Python shell.

    I've imported all the modules I think I need - MySQLdb, decimal, scipy, scipy.stats, numpy. I have been able to open the connector to the MySQL DB and retrieve the data of interest. My problem is as follows

    1. I open a connection with

    mydb = MySQLdb.connect (host = "xxx", port = yyy, user = "zzz", passwd = "nnnn", db = "whatineed" )
    cursor = mydb.cursor()

    2. For some reason, each entry is a tuple though I am retrieving a single column from the MySQL table with a command

    cursor.execute( "SELECT difference FROM growth_record")

    3. Data from the DB comes back in the format

    ((Decimal('-3.0600'),), (Decimal('-0.8500'),), (Decimal('-1.1900'),), (Decimal('-1.7000'),), (Decimal('0.480 0'),),...

    4. I am able to pull out the individual values by doing

    result = cursor.fetchall ()
    for i in range(len(resul t)):
    a = result[i][0]

    OR to get a floating point result
    for i in range(len(resul t)):
    a = float(str(resul t[i][0]))

    5. My problems

    a) I am not able to create an array with the assignment
    a[i] = result[i][0] - keep getting a type mismatch. I'd like to create an array with the direct results without the Tuple returned by the DB.

    b) I am not able to perform statistical analyses from scipy.stats (e.g. mean) with the results. The functions keep expecting floating point arguments, the numbers returned from the DB are decimal numbers.

    Is there an obviously simple way to do either?

    Thank you
    Bapi
  • Glenton
    Recognized Expert Contributor
    • Nov 2008
    • 391

    #2
    Hi

    Good work. Here's an interactive session, which should help. I've defined result as the tuple in your point 3, and I've imported decimal and scipy.stats.

    Then
    Code:
    In [16]: a=[]
    
    In [17]: for r in result:
       ....:     a.append(float(r[0]))
       ....:     
       ....:     
    
    In [18]: a
    Out[18]: 
    [-3.0600000000000001,
     -0.84999999999999998,
     -1.1899999999999999,
     -1.7,
     0.47999999999999998]
    
    In [19]: mean(a)
    Out[19]: -1.264
    For numpy arrays, instead of lists, the append function generally doesn't work/exist in my experience. So it's better to define it up front at the right size, and then update the elements:

    Code:
    In [21]: from numpy import *
    
    In [22]: a=zeros((5))
    
    In [23]: a
    Out[23]: array([ 0.,  0.,  0.,  0.,  0.])
    
    In [26]: for i,r in enumerate(result):
       ....:     a[i]=r[0]
       ....:     
       ....:     
    
    In [27]: a
    Out[27]: array([-3.06, -0.85, -1.19, -1.7 ,  0.48])
    
    In [28]: mean(a)
    Out[28]: -1.264
    In general the assignment (a[i]=foo) only works if a[i] already exists, which may be the source of problem a.

    Also, things like mean only work on lists or arrays or similar iterables.

    Good luck, and let us know how it goes!

    Comment

    • ipab
      New Member
      • Apr 2010
      • 2

      #3
      Glenton,

      Thanks. Both methods worked. I appreciate your response.

      Regards
      Bapi
      p.s. do you happen to know of any math modules that work with the Decimal data type?

      Comment

      Working...