Skipping null/empty fields caught by split()

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sicarie
    Recognized Expert Specialist
    • Nov 2006
    • 4677

    Skipping null/empty fields caught by split()

    I am attempting to parse a CSV, but am not allowed to install the CSV parsing module because of "security reasons" (what a joke), so I'm attempting to use 'split' to break up a comma-delimited file.

    My issue is that as soon as an "empty" field comes up (two commas in a row), split seems to think the line is done and goes to the next one.

    Everything I've read online says that split will return a null field, but I don't know how to get it to go to the next element and not just skip to the next line.

    [code=perl]
    while (<INFILE>) {
    # use 'split' to avoid module-dependent functionality
    # split line on commas, OS info in [3] (4th group, but
    # counting starts first element at 0)

    # line = <textonly>,<tex t+num>,<ip>,<wh atIwant>,
    chomp($_);
    @a_splitLine = split (/,/, $_);

    # move OS info out of string to avoid accidentally
    # parsing over stuff
    $s_info = $a_splitLine[3];
    [/code]

    Could anyone see either a better way to accomplish what I'm trying to do, or help get split to capture all the elements?

    I was thinking I could run a simple substitution before parsing of a known string (something ridiculous that'll never show up in my data - like &^%$#), then split, and then when printing, if that matches the current item, just print some sort of whitespace, but that doesn't sound like the best method to me - like I'm overcomplicatin g it.
  • RonB
    Recognized Expert Contributor
    • Jun 2009
    • 589

    #2
    My issue is that as soon as an "empty" field comes up (two commas in a row), split seems to think the line is done and goes to the next one.
    No it doesn't. You have a flawed impression of what's happening.

    Code:
    C:\TEMP>type test.pl
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Data::Dumper;
    
    my $str = 'a,,,b,,,,6,,';
    my @fields = split /,/, $str;
    print Dumper @fields;
    Code:
    C:\TEMP>test.pl
    $VAR1 = 'a';
    $VAR2 = '';
    $VAR3 = '';
    $VAR4 = 'b';
    $VAR5 = '';
    $VAR6 = '';
    $VAR7 = '';
    $VAR8 = '6';
    Code:
    C:\TEMP>perldoc -f split
        split /PATTERN/,EXPR,LIMIT
        split /PATTERN/,EXPR
        split /PATTERN/
        split   Splits the string EXPR into a list of strings and returns that
                list. [b]By default, empty leading fields are preserved, and empty
                trailing ones are deleted.[/b] (If all fields are empty, they are
                considered to be trailing.)
    ....
    ....
    ....

    Comment

    • sicarie
      Recognized Expert Specialist
      • Nov 2006
      • 4677

      #3
      Interesting, so then how would I access the b or the 6?

      [code=perl]
      #!/bin/perl

      use strict;
      use warnings;
      use Data::Dumper;

      my $str = 'a,,,b,,,,6,,';
      my @fields = split /,/, $str;
      my $n = 0;
      print Dumper @fields;
      while ($fields[$n]) {
      print "$n: $fields[$n]\n";
      $n++;
      }
      print "done!\n";
      [/code]
      [code=shell]
      $ ./splitTest.pl
      $VAR1 = 'a';
      $VAR2 = '';
      $VAR3 = '';
      $VAR4 = 'b';
      $VAR5 = '';
      $VAR6 = '';
      $VAR7 = '';
      $VAR8 = '6';
      0: a
      done!
      [/code]

      In the above, my attempt to print with a while loop stops as soon as the first empty set is reached. I'm guessing I'd have to check each one to see which are valid and which are not, but what am I looking for - null?

      Comment

      • RonB
        Recognized Expert Contributor
        • Jun 2009
        • 589

        #4
        If you know which field/index you want, then simply print that field.

        If you want/need to loop over the array elements, then use a for or foreach loop, not a while loop.
        Code:
        for my $i ( 0..$#fields ) {
            # only print fields that have a value
            print "induce $i = '$fields[$i]'\n" if length $fields[$i];
        }

        Comment

        • numberwhun
          Recognized Expert Moderator Specialist
          • May 2007
          • 3467

          #5
          I have to agree with Ron. Since this is a csv file, you should already know which field is what. All you would have to do is reference it by its index. Otherwise, you can use the code above to iterate through each one and pull out the variables with values other than null.

          Regards,

          Jeff

          Comment

          • sicarie
            Recognized Expert Specialist
            • Nov 2006
            • 4677

            #6
            Cool, thanks. I am really only interested in one of those fields, but then have to make sure once I edit that field, I re-append all the others back on, so I will play around with that.

            Thanks again!

            Comment

            Working...