Data Conversion?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jooon3
    New Member
    • Sep 2008
    • 8

    Data Conversion?

    Hi,

    I am responsible for a statistical analysis of a certain data for a research project. The program I want to run is matlab. So I want to make every cell to numbers but some exceptions with missing data.

    Assume I have the following form of data. (I made up this.)

    ID DATE X1 X2 X3 X4 X5 X6
    1 18SEP2008 5 3 4 -1 -2 A
    2 19SEP2008 9999 4 2 9 3 C
    3 16SEP2008 2 0 9999 -8 -7 B

    ....



    For X1 throught X3, 9999 means a missing data.
    For X4 and X5, any negative number is a missing data.

    I want to convert each date to the number representing days since Aug 10, 2008, for example.
    For X1 through X6, all the missing data would be converted to NaN, or anything that can be recognized as missing by matlab.
    For X6, A would be converted to 0 and B to 1, and C to 2. (the numbers don't matter as long as they are numbers distinguishing among different states A, B and so on.)

    In the end, I want the data to be reorganized as:

    ID DATE X1 X2 X3 X4 X5 X6
    1 39 5 3 4 NaN NaN 0
    2 40 NaN 4 2 9 3 2
    3 37 2 0 NaN NaN NaN 1

    ....

    (I may convert some data in matlab if it is more appropriate.)



    I am new to Access and so please understand if my question is too basic.
    If you can help me in whatever ways, please do that for me.


    Thank you,

    Joon
  • ADezii
    Recognized Expert Expert
    • Apr 2006
    • 8834

    #2
    Just subscribing, will come up with hopefully an approach within a day or two,

    Comment

    • youmike
      New Member
      • Mar 2008
      • 69

      #3
      It's not entirely clear what you need to do in detail, but as a starting point you should know that Access stores dates as numbers with the zero point being 31 Dec 1899. So, to get the number of days since your chosen start date, you simply need to subtract that date from each imported date. If you are doing this in a query, you will need to enclose the start date in hash signs - '#'. If you are trying to use Visual Basic, you should probably use the VBA function
      CDate(mm/dd/yyyy) to define your start date. The help files have a lot of guidance about how Access deals with dates. I don't know where you are, but you should probably be careful about the precise format of date you use. Because Access is US software, it is generally more comfortable with mm/dd/yy and you may very well use dd/mm/yy or even yy/mm/dd.

      Comment

      • ADezii
        Recognized Expert Expert
        • Apr 2006
        • 8834

        #4
        1. Let's make the simple assumption that your test data is stored in a Table named tblTestData. Original idea isn't it! (LOL).
        2. Create the following Table named tblFinalResults whose purpose is indicated by its Name. It will hold the converted Values suitable for MetLab. Define the following Fields:
          • [ID] - {LONG}
          • [DATE] - {INTEGER}
          • X1 - {TEXT}
          • X2 - {TEXT}
          • X3 - {TEXT}
          • X4 - {TEXT}
          • X5 - {TEXT}
          • X6 - {TEXT}
        3. Copy and Paste the following 'Public' Function into a Standard Code Module. Herein lies the logic which will perform the conversion based on your Specifications, then append the data to the tblFinalResults .
          Code:
          Public Function fConvertDataToMatLab()
          Dim MyDB As DAO.Database
          Dim rstTestData As DAO.Recordset
          Dim rstFinal As DAO.Recordset
          Dim lngID As Long
          Dim dteDate As Date
          Dim intCounter_1 As Integer
          Dim intDateDiff As Integer
          Dim varX1, varX2, varX3, varX4, varX5, varX6    'Variants
          Const dteBaseDate As Date = #8/10/2008#     'change if so desired
          
          Set MyDB = CurrentDb
          Set rstTestData = MyDB.OpenRecordset("tblTestData", dbOpenForwardOnly)
          Set rstFinal = MyDB.OpenRecordset("tblFinalResults", dbOpenDynaset)
          
          'DELETE any pre-existing results
          CurrentDb.Execute "Delete * From tblFinalResults", dbFailOnError
          
          With rstTestData
            Do While Not .EOF
              lngID = ![ID]
              'DDMMMYYYY is not a valid Date Format, but DD-MMM-YYYY is. We must convert to this
              'Format so that we can calculate the differential between this Data and a Base Date
              '(dteBaseDate), 8/10/2008 in this case
              dteDate = CDate(Left$(![Date], 2) & "-" & Mid$(![Date], 3, 3) & "-" & Right$(![Date], 4))
              intDateDiff = DateDiff("d", dteBaseDate, dteDate)
                If ![X1] = 9999 Then      '9999 indicates missing data, convert to NaN
                  varX1 = "NaN"
                Else
                  varX1 = ![X1]           'maintain existing value
                End If
                If ![X2] = 9999 Then      '9999 indicates missing data, convert to NaN
                  varX2 = "NaN"
                Else
                  varX2 = ![X2]           'maintain existing value
                End If
                If ![X3] = 9999 Then      '9999 indicates missing data, convert to NaN
                  varX3 = "NaN"
                Else
                  varX3 = ![X3]           'maintain existing value
                End If
                If ![X4] < 0 Then         'any Value < 0 is considered to be missing data,
                  varX4 = "NaN"           'convert to NaN
                Else
                  varX4 = ![X4]           'maintain existing value
                End If
                If ![X5] < 0 Then         'any Value < 0 is considered to be missing data,
                  varX5 = "NaN"           'convert to NaN
                Else
                  varX5 = ![X5]           'maintain existing value
                End If
                  varX6 = Asc(![X6]) - 65     'single characters A thru Z converted to
                rstFinal.AddNew               '0 to 25 respectively
                  rstFinal![ID] = ![ID]
                  rstFinal![Date] = intDateDiff
                  rstFinal![X1] = varX1
                  rstFinal![X2] = varX2
                  rstFinal![X3] = varX3
                  rstFinal![X4] = varX4
                  rstFinal![X5] = varX5
                  rstFinal![X6] = varX6
                rstFinal.Update
                .MoveNext
            Loop
          End With
          
          rstFinal.Close
          Set rstFinal = Nothing
          rstTestData.Close
          Set rstTestData = Nothing
          
          'Finally, let's see the results!
          DoCmd.OpenTable "tblFinalResults", acViewNormal, acReadOnly
          DoCmd.Maximize
          End Function
        4. Call the Function in any manner you so desire, I'll post the simplest:
          Code:
          Call fConvertDataToMatLab
        5. OUTPUT (tblFinalResult s):
          Code:
          ID	DATE	X1	X2	X3	X4	X5	X6
          1	39	5	3	4	NaN	NaN	0
          2	40	NaN	4	2	9	3	2
          3	37	2	0	NaN	NaN	NaN	1
        6. If you actually performed all the previous steps, shame on you! Just download the Test Database used for this Thread, I've made it available as an Attachment.
        7. Any questions, please feel free to ask. Either myself, or someone else, will be happy to answer them for you.

        Comment

        • jooon3
          New Member
          • Sep 2008
          • 8

          #5
          Thank you, Adezii.
          I really appreciate your help.

          Joon

          Comment

          • ADezii
            Recognized Expert Expert
            • Apr 2006
            • 8834

            #6
            Originally posted by jooon3
            Thank you, Adezii.
            I really appreciate your help.

            Joon
            You are quite welcome, Joon.

            Comment

            • Megalog
              Recognized Expert Contributor
              • Sep 2007
              • 378

              #7
              Nice conversion!

              But Joon should know that for the X6 conversion, it is case sensitive. 'A' will not result in the same number as 'a'. This would only be an issue I suppose if it's all being keyed in by hand, in which case a validation could be ran.

              Comment

              • ADezii
                Recognized Expert Expert
                • Apr 2006
                • 8834

                #8
                Originally posted by Megalog
                Nice conversion!

                But Joon should know that for the X6 conversion, it is case sensitive. 'A' will not result in the same number as 'a'. This would only be an issue I suppose if it's all being keyed in by hand, in which case a validation could be ran.
                Good point Megalog. I made the assumption, perhaps erroneously, that the data existing in the pre-converted Format would be in a rigid, consistent state. To allow for this possibility, all that would be needed is a simple code adjustment:
                Code:
                varX6 = Asc(UCase$(![X6])) - 65
                P.S. - The OP seemed to indicate that this value itself was not critical, only that numeric values generated from it would be 'Unique' for each Character and be able to be Grouped.

                Comment

                Working...