Can I get help importing an excel document into mysql using python please?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ronparker
    New Member
    • Aug 2010
    • 27

    Can I get help importing an excel document into mysql using python please?

    Hello,
    This is my first time posting and just my second day using python on mysql, so please be patient with me. I should also say, I am using a linux machine. Using python I was able to make a database and a table. However, I am having difficulty getting the values from an excel and putting them into the table. The Excel document has just three columns, which are:
    MATCH - this is just a four digit number
    DATE - this is just a date in the 2009-06-29 format
    POS - this is either just the phrase 'LONG' or 'SHORT'

    However, after I do this and try to view the data, the MATCH values come out perfectly fine. The date values come out 0000-00-00 for all the dates, so that is not correct. and for the POS, I get this error:
    mysql_exception s.OperationalEr ror: (1054, “Unkown column 'SHORT' in 'field list'”)

    I have tried changing the variable types from VARCHAR to TEXT or DATE, but that didn't seem to work. Really any suggestions would be greatly appreciated.

    The Code I am using is:
    Code:
    from xlrd import open_workbook, cellname
    import MySQLdb as mysql
    db=mysql.connect(db_info)
    c=db.cursor()
    c.execute('CREATE DATABASE actual_data')
    
    db=mysql.connect(db_info)
    c=db.cursor()
    c.execute('CREATE TABLE actual_data_table (MATCH_ID VARCHAR(35) NOT NULL,DATETIME VARCHAR(35) NOT NULL,POS_TYPE VARCHAR(35) NOT NULL)')
    
    file_to_import='actualdata.xls'
    column_count=5
    book=open_workbook(file_to_import)
    sheet=book.sheet_by_index(0)
    conn=mysql.connect(db_info)
    cursor=conn.cursor()
    
    for row_index in range(sheet.nrows):
    
    	row_num=row_index
    	MATCH_ID=sheet.cell(row_index,0).value
    	DATETIME=sheet.cell(row_index,1).value
    	POS_TYPE=sheet.cell(row_index,2).value
    	
    	cursor.execute('INSERT INTO actual_data_table(MATCH_ID, DATETIME, POS_TYPE) VALUES (%s, %s, %s)'%(MATCH_ID, DATETIME, POS_TYPE))
    	
    	(MATCH_ID,DATETIME,POS_TYPE)
    	
    
    cursor.close()
    conn.commit()
    conn.close()
    Last edited by bvdet; Aug 18 '10, 01:24 PM. Reason: Add code tags - please use tags around posted code: [code].....[/code]
  • dwblas
    Recognized Expert Contributor
    • May 2008
    • 626

    #2
    You first want to make sure that the data types are the same, so read the first 10 recs or so and print out the fields, and type(DATE), etc. and make sure they are strings. Second, I think the format for a MySQL insert is (note that you should be using "c" according to your code, and not "cursor")
    c.execute('inse rt into actual_data_tab le values ("%s","%s","%s" )' % (MATCH_ID, DATETIME, POS_TYPE))

    Another option just in case because I use another SQL and have both of these examples in the MySQL notes
    c.execute(u'''I NSERT INTO actual_data_tab le VALUES (%s, %s, %s)''', MATCH_ID, DATETIME, POS_TYPE)
    This avoids SQL injection AFAIK

    Finally, you have duplicate code. You can delete the first occurrence of
    Code:
    db=mysql.connect(db_info)
    c=db.cursor()
    c.execute('CREATE DATABASE actual_data')

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #3
      Excel stores dates as integers. When you read an actual date from an Excel file with xlrd, the date comes in as a float. It represents the number of days since Jan 1, 1900 (Windows). To input the proper information into your database, you can first convert the float into a date. This works for me on my system:
      Code:
      import time
      import datetime
      
      wb = xlrd.open_workbook('data.xls')
      sheet1 = wb.sheet_by_index(0)
      
      rowList = sheet1.row_values(1)
      dateStr = datetime.datetime(*time.strptime("Dec 30 1899", "%b %d %Y")[:6]) + \
      datetime.timedelta(days=int(rowList[1]))
      I used "Dec 30, 1899" instead of "Jan 1, 1900" because of a 2 day difference between expected results and actual results. It's possible Excel and Python calculate leap years differently.

      Comment

      • ronparker
        New Member
        • Aug 2010
        • 27

        #4
        importing excel document with dates into mysql using python

        Hello,
        I am attempting to import an excel document into a mysql database using python. The excel document has three columns of data: MATCH_ID, DATETIME, and POS_TYPE. Using the code I'll show at the end, I have successfully imported the excel document except for one problem. The column DATETIME has values in the format of 'yyyy-mm-dd'. Yet when I try the importing, all the values just come out as '0000-00-00'. When I tried importing this as a varchar file, all of the numbers that came out were just a large five-digit number. I;m thinking this may have something to do with the formating of the cells in excel. The other two values which are just a number string and a short word import perfectly. I have tried changing the variable type, but that was not helpful. Clearly form my code you can see I quite new at this. Your help will be very much appreciated.

        Code:
        from xlrd import open_workbook, cellname
        import MySQLdb as mysql
        db=mysql.connect(db_info)
        c=db.cursor()
        c.execute('CREATE DATABASE actual_data')
        
        db=mysql.connect(db_info)
        c=db.cursor()
        c.execute('CREATE TABLE actual_data_table (MATCH_ID VARCHAR(35) NOT NULL,DATETIME DATE NOT NULL,POS_TYPE VARCHAR(35) NOT NULL, TICKER VARCHAR(35) NOT NULL, EXIT_TYPE TEXT NOT NULL)')
        
        file_to_import='actualdata.xls'
        column_count=5
        book=open_workbook(file_to_import)
        sheet=book.sheet_by_index(0)
        db=mysql.connect(db_info)
        c=db.cursor()
        
        for row_index in range(sheet.nrows):
        
        	row_num=row_index
        	MATCH_ID=sheet.cell(row_index,0).value
        	DATETIME=sheet.cell(row_index,1).value
        	POS_TYPE=sheet.cell(row_index,2).value
        	c.execute('insert into actual_data_table values ("%s","%s","%s")' % (MATCH_ID, DATETIME, POS_TYPE))
        	(MATCH_ID,DATETIME,POS_TYPE)
        	
        cursor.close()
        conn.commit()
        conn.close()

        Comment

        • bvdet
          Recognized Expert Specialist
          • Oct 2006
          • 2851

          #5
          Ron,

          Please read my reply to your thread here.

          To summarize, Excel stores dates as a number and formats the number for display. The number represents the number of days since Jan 1, 1900 (Windows). You can use that number to calculate the date.

          bvdet

          Comment

          • ronparker
            New Member
            • Aug 2010
            • 27

            #6
            Sorry, I did not keep scrolling down to see the second part of your message. However, that was a LOT of help. THANK YOU!!

            However, I have another question that should be much quicker. Using the following code I selected 11 numbers.
            Code:
            c.execute('SELECT REF_PNL FROM actual_data_table WHERE REF_ENTRY_TIME>"2010-06-29 09:45:00" and REF_ENTRY_TIME<"2010-06-29 10:00:00"')
            result1=c.fetchall()
            when I try to output these results, they come out like this:
            >>>result1
            (('15.64',), ('-5.2',), ('24.25',), ('22.18',), ('14.52',), ('21.8',), ('21.2804',), ('-0.13',), ('15.48',), ('2.2',), ('-6.81',))
            This is fine, however, I can't seem to do any computations with these numbers. I'm trying to take the average of these numbers using numpy. But then I get this error:
            TypeError: cannot perform reduce with flexible type

            The result looks like a tuple within a tuple or something. So is there anyway to break the results down into regular computable numbers?
            Last edited by bvdet; Aug 20 '10, 08:25 PM. Reason: Fix code tags. Use a forward slash for the closing code tag.

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              No problem:
              Code:
              >>> result
              (('15.64',), ('-5.2',), ('24.25',), ('22.18',), ('14.52',), ('21.8',), ('21.2804',), ('-0.13',), ('15.48',), ('2.2',), ('-6.81',))
              >>> for item in result:
              ... 	print float(item[0])
              ... 	
              15.64
              -5.2
              24.25
              22.18
              14.52
              21.8
              21.2804
              -0.13
              15.48
              2.2
              -6.81
              >>> sum([float(item[0]) for item in result])/len(result)
              11.382763636363636
              >>>

              Comment

              Working...