Processing a Flat File

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • larre
    New Member
    • Apr 2007
    • 3

    Processing a Flat File

    Hi guys, i have a flat file, that i would like to populate into a database. The problem is that the file format is not standard, each record differs from the other depending on the fields available. I have tried using some ETL tools but they usually expect a file in a specific format.

    The file has a format similar to:

    Variable 1=Blah|Variable 2=2|Variable3=w oot|Variable4=h oop
    Variable 1=Blah|Variable 2=2|Variable4=h oop|Variable5=H ive
    Variable 1=Blah|Variable 2=2|Variable3=w oot

    The portion before the = indicates which variable it is and the order in which they appear cannot be guaranteed.

    Any idea how to get this done? Im sure this can be done with a scripting language like perl but dont have much experience in perl programming so if you could point me in the right direction it would be appreciated.
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    it looks like | is the field delimiter so you would use the split function to return an array of the fields:

    Code:
    my @array = split(/\|/);

    Comment

    • larre
      New Member
      • Apr 2007
      • 3

      #3
      Originally posted by KevinADC
      it looks like | is the field delimiter so you would use the split function to return an array of the fields:

      Code:
      my @array = split(/\|/);

      Thanks i got it working now.

      Comment

      • larre
        New Member
        • Apr 2007
        • 3

        #4
        Thanks i got it working now, but im facing another issue with Regex

        as stated the file is in the format |field=blah|fie ld2=|field3=3 and so on.
        So the following is the file format

        1. the variables are not in any specific order, (The First Field is always the same though, but thats not required)

        2. The field could be in the record with a Null value (field1=xxx|fie ld2=|field3=yyy )

        3. The variable could be totally missing from the record so |field2=| would not even be there. (Field1=xxx|fie ld3=yyyy)

        I have the following code,

        Code:
        if ($a =~ /\|(BAL=.+?)\|/i )
        				{print OUTFILE "$1 \|";
        				} 
        				else {
        				print OUTFILE "BAL=  \| ";
        				}
        And so on...

        The code above returns the correct value where the field has a value (BAL=xxxx)
        or if it's not even in the record, But when i have the scenerio where it exists in the record but with null i.e (|BAL=|) it returns (|BAL=|) as well as the following fields in the record upto the next field with a value i.e something=xxx|.

        This is throwing off my formating as i need a standardised format to load into MYSQL.

        How do i get around this issue and ensure that even if there is nothing it just returns the BAL=| for all fields?

        Comment

        • KevinADC
          Recognized Expert Specialist
          • Jan 2007
          • 4092

          #5
          avoid using $a and $b in your code except for using with sort(), they are special variables that perl uses internally for sorting lists. Your regexp is probably failing becuase you have use the '+' (one or more) quantifier instead of '*' (zero or more):

          Code:
          if ($n =~ /\|(BAL=.*?)\|/i )
          but you may need to quantify the last pipe '|' if there is a chance it's not present:

          Code:
          if ($a =~ /\|(BAL=.+?)\|?/i )

          the '?' in that context means zero or one

          Comment

          Working...