Extracting Relevant Data from a Block of Text Containing some Junk

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ravnwing
    New Member
    • May 2014
    • 19

    Extracting Relevant Data from a Block of Text Containing some Junk

    Hello.
    For my job I get in txt files that look fairly well like gibberish if you don't know what you are looking for (and honestly a bit like gibberish even if you do.) My problem is that in the text files there are long repetitions of junk with relevant data in them. For example:

    Code:
    z260006U62425L                            U62425L                               
    z22000000000060000N# 000000000137000000000000137000000000006PzS000006000        
    z24JpPpN 6 JvWvL(S) QUpRTZ pNpL#5,z97Np xRpSS 5#Ly PLpTvy zpSv,z97Np STp7       
    z24NLvSS STvvL S#L7y 5#Ly PLpTvy STRpP                                          
    z24xy 0006 6 7601666060 0000006N#  00000000702 7901009033 00000000000 RP        
    z24xy 0006 1 7601666010 0000006N#  00000006055 7901009033 00000000000 zN        
    z24xy 0006 2 7601666020 0000006N#  00000000394 7901009033 00000000000 zN        
    z24xy 0006 5 7601666050 0000006N#  00000000014 7901009033 00000000000 RP
    And then patterns like this repeats 80 times or so, depending on the order it comes from. But in each of those blocks there is relevant information that I would like to keep
    What I would like to do is set it up so that a program or macro sifts through the blocks of text and puts the relevant information into the appropriate fields in a table in Microsoft Access. I know what the information is, and where it is, line and character. I just need a way to tell the program to ignore this set number of characters, copy these ones, then jump to the next line and continue. Does anyone have any idea how this can be done?

    Please get back to me asap. Thank you so much.
    Last edited by zmbd; May 12 '14, 06:26 PM. Reason: [Rabbit{Please use [code] and [/code] tags when posting code or formatted data.}][z{Oh, there's a format to that datadump?!(^_^) LOL}]
  • Seth Schrock
    Recognized Expert Specialist
    • Dec 2010
    • 2965

    #2
    It looks like you might be able to import the data from your text file and use the space as the field delimiter for those last lines, but as you said it all looks like gibberish, so I can't be sure. What exactly are you wanting it to look like once it has been processed?

    Comment

    • ravnwing
      New Member
      • May 2014
      • 19

      #3
      okay, so this is an image of the block of text with the fields I want to save for the access table highlighted. You are right about the spaces for the most part, but the second line is where that doesn't really work. Any ideas?

      Last edited by zmbd; May 12 '14, 04:40 PM. Reason: [z[third party sites are generally blocked for our experts, Please use the [advanced] posting option}]

      Comment

      • zmbd
        Recognized Expert Moderator Expert
        • Mar 2012
        • 5501

        #4
        1) You're going to have to explain just exactly what/where the data is coming from.

        2) in this mess:
        Code:
        z260006U62425L U62425L 
        z22000000000060000N# 000000000137000000000000137000000000006PzS00000600 0 
        z24JpPpN 6 JvWvL(S) QUpRTZ pNpL#5,z97Np xRpSS 5#Ly PLpTvy zpSv,z97Np STp7 
        z24NLvSS STvvL S#L7y 5#Ly PLpTvy STRpP
        <<snip>>
        Just what exactly it is that you're after

        3) Most of the Experts here work in secure environments. Thus, the third party storage sites are blocked. Furthermore, using third part site will invalidate the thread should it ever go down, or you decide you need the space in the account (^_^) please use the [advanced] posting option

        >HOWEVER<
        Please keep in mind, images/files are rarely needed if the question is well explained to begin with....
        Last edited by zmbd; May 12 '14, 06:26 PM. Reason: [z{putting in code tags as Rabbit did in OP}]

        Comment

        • Seth Schrock
          Recognized Expert Specialist
          • Dec 2010
          • 2965

          #5
          It sounds like you are going to need to read the file line by line into Access (See VBA Standard Text File I/O Statements) and then you will just have to know which line you are on (using a loop most likely) and then you can use a combination of the Left(), Mid(), and Right() functions to get what you need.

          Comment

          • NeoPa
            Recognized Expert Moderator MVP
            • Oct 2006
            • 32645

            #6
            Without a clearly-stated set of rules that can be used to determine what is junk and what is relevant data this question cannot be answered sensibly. There are various techniques that can be used in different circumstances. Without knowing the circumstances then anything we suggest is as likely to be misleading as it is to be helpful.

            Remember, whatever is designed into the program won't have someone there making an intelligent choice based on what you understand. Your understanding will have to be incorporated into the code. To help with that we have to have that understanding specified at the very least.

            Comment

            • ravnwing
              New Member
              • May 2014
              • 19

              #7
              I'm sorry, I didnt know how to do the advanced options. This is the first time I am using this site, as well as just beginning to teach myself access and vba.

              Most lines in this has some relevant information. The first line, for instance, I need characters 7-14. On the second I need 4-13. I think what Seth had to say in terms of using the left, right, and mid functions was a good one to pick out the relevant data. My question then becomes how do I take the data that I have picked out and put into variables in vba and move them into named fields in an access table?

              going with what I said before, the first set of numbers (ln1, chr 7-14) is an id number, and the second set of numbers (ln2, chr 4-13) is a quantity. Provided I can use a macro to start a new record, how would I then use visual basic to move those values, ID and quantity, into their appropriate fields?

              Comment

              • Seth Schrock
                Recognized Expert Specialist
                • Dec 2010
                • 2965

                #8
                In your OP, you mention that the pattern repeats about 80 times. Is the pattern the whole thing or just lines 5 through 8? If the whole thing, is it always an 8 line pattern?

                Comment

                • ravnwing
                  New Member
                  • May 2014
                  • 19

                  #9
                  The pattern is the whole thing, not just the repeated lines. And unfortunately it is not always an 8 line pattern, occasionally there is no line 4. But the relevant data that starts in line 5 always has "z24xy", so once I hit line line 4 I can check if the line begins with z24xy, and if it does not then just move to the next line and begin collecting the data from there. Lines 3, and in this case 4, have no relevant information.
                  Last edited by zmbd; May 13 '14, 06:10 PM. Reason: [z{check your Bytes.com inbox (^_^) }]

                  Comment

                  • zmbd
                    Recognized Expert Moderator Expert
                    • Mar 2012
                    • 5501

                    #10
                    As Seth said, follow the link to the IO information
                    You'll need to read in each line
                    From your very small dataset
                    I'd use the split() function to parse the string into an array based on spaces.
                    Then check each cell for the required data or starting string.
                    Depending how many records you'r dealing with would depend on if I would use an sql-execute method or open a recordset to push the data to table.

                    Comment

                    • ravnwing
                      New Member
                      • May 2014
                      • 19

                      #11
                      The problem is that I have no idea what I'm doing. I have spent, literally, the last 6 hours, and at least that much time yesterday trying to teach myself how to do what I need to do and I'm still totally lost.

                      The I/O information is not really built well to be a beginners understanding of VBA, I think its more meant to teach people who already know how to use VBA well how to do something new. The FreeFile information has no context to me at all, the open command makes sense to me, but then the input line code has a command and two parameters, but it needs three: a file string being accessed, a line being retrieved and a place to put the line. And then when I am parsing out the information that I need from the line that I have retrieved, I would have to understand how the mid knows what line to pull strings from. The fields that I need are these:

                      Field 1: ln1, mid(8, 7)
                      Field 2: ln2, mid(4, 11)
                      Field 3: ln2, mid(22, 13)
                      Field 4: ln2, mid(35, 15)
                      Field 5: ln5, mid(14, 8)
                      Field 6: ln5, mid(36, 11)
                      Field 7: ln6, mid(36, 11)
                      Field 8: ln7, mid(36, 11)
                      Field 9: ln8, mid(36, 11)

                      And what is most important is that I need to understand, once I have all of these into separate variables or more likely an array of variables, how do I use VBA WITHIN ACCESS to put all of these fields into a record and then start a new record so that I can begin again. I have some experience with other languages, so once I understand the logic behind doing that much I can write the code to put it into a loop to put in all the data until the file is done. I just dont understand how to work the code in this setting; using vba within access.

                      I hope that makes at least some more sense.

                      Comment

                      • zmbd
                        Recognized Expert Moderator Expert
                        • Mar 2012
                        • 5501

                        #12
                        The I/O information is not really built well to be a beginners understanding of VBA, I think its more meant to teach people who already know how to use VBA well how to do something new.
                        Agreed, it's not intended to be a tutorial in the basics of VBA script writing. Instead, it covers an often overlooked set of tools native to the VBA/BASIC language.

                        So let's throw a little context for you:

                        I'll assume that you know some basics (punny (^o^) ) about the editor and so forth; however, I know by default certain options are not set in Access so if you will read thru [*]> Before Posting (VBA or SQL) Code
                        >> Now I'm not after the forum rules here, what I'm after are the Options and the basic trouble shooting steps for later<<
                        Keep this link handy for when we do the debug\compile stuff.

                        So you have a text file that we need to open, read, and parse
                        Using the IO
                        We need to find first free pointer that we can assign our file to so we use the function FreeFile():
                        intPointer = FreeFile

                        More than likely we could simply use #1 however best practice is to use freefile to return the next available pointer.

                        So we have this number "intPointer " and we're going to assign the path of the text file that we want to parse to this value, and in this case it doesn't sound like we need to alter the file so I'll just open it to for reading (Output):
                        Open "C:\data\in.txt " For Output As intPointer
                        See that AS and intPointer, so for now on when using the IO functions we're coing to use "intPointer," just like we use "Me." for forms, to refer to the file located at "C:\data\in.txt "

                        So great, we have the file open (we hope and should test for but not right now - indeed, we should have checked to see if the file was available before trying to open it).

                        Now the file is open as i lets upull the data, taking from yhour posted data OP.
                        Line Input #intPointer, strReadLineIn
                        therefor:
                        strReadLineIn= "z260006U62 425L U62425L "
                        Now here's where things get funn... what to do with this...
                        I use the Split() or midstring to parse the infromation and then you will need to save it to a table.

                        Most likely you'll do this in a loop until you reach the endoffill (EOF)

                        You now need to close the file, a Close intPointer

                        Write your intial code to at least open and read the file.
                        debug.print strReadLineIN will print the line to the immediate window (press <ctrl><g> to open) so you can see how your file is being opened and parsed. Anyway, debug and run your code... let us know how things progress and we'll take the next step.

                        If you are truely stuck, even after he basic debugging, then post your work as described in the link above (and here)> Before Posting (VBA or SQL) Code and we'll tweek from there.
                        Last edited by zmbd; May 13 '14, 11:27 PM. Reason: [z{opps typo on line input}]

                        Comment

                        • ravnwing
                          New Member
                          • May 2014
                          • 19

                          #13
                          I'm not sure what isnt working, but it certainly doesnt like something. When I run it this way, it doesnt like the double "line input" that was recommended. The debugger tells me that it was expecting a #, so I guess it thought that the second line input was supposed to be the intpointer, but then when I take it away it gives me runtime error 54, the help of which is perfectly unhelpful.

                          I dont know if it matters, but as I said before, I'm running the VBA out of access. Any recommendations ?

                          Code:
                          Private Sub text_reader()
                          
                              Dim strLineIn
                          
                              intPointer = FreeFile
                              
                              Open "H:\test.txt" For Output As intPointer
                          
                              Line Input Line Input #intPointer, strLineIn, strLineIn
                              
                              Close intPointer
                              
                              Debug.Print strLineIn
                              
                          
                          
                          End Sub

                          Comment

                          • ravnwing
                            New Member
                            • May 2014
                            • 19

                            #14
                            and yes, I know that I didnt go on to parse the line, but I thought I would test that it could read the file first.

                            Comment

                            • zmbd
                              Recognized Expert Moderator Expert
                              • Mar 2012
                              • 5501

                              #15
                              Sorry, I had a typo in that post.

                              Try the following:

                              Code:
                              Private Sub text_reader()
                                  Dim intPointer As Integer
                                  Dim strLineIn As String
                              '
                              'Find the first free pointer
                                  intPointer = FreeFile
                              '
                              'open the text file for read only
                                  
                                  Open "H:\test.txt" For Input Access Read As intPointer
                              '
                              'While not the end of the file, read the line and
                              'and print in the debug window
                                  Do While Not EOF(intPointer)
                                      Line Input #intPointer, strLineIn
                                      Debug.Print strLineIn
                                  Loop
                              '
                              'close the file
                                  Close intPointer
                              End Sub
                              Note the difference in the variable casting... as a specific type of data
                              Last edited by zmbd; May 14 '14, 07:56 PM. Reason: z{fixed typo}]

                              Comment

                              Working...