An Odd Delimeter

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ScarletFox
    New Member
    • Feb 2008
    • 4

    An Odd Delimeter

    For some strange reason my file is split by the  character. I didn't choose the character, and it is, at the moment, unlikely to change. In Unix the character appears as ^[ inside a file, but any time I run attempt to run a command on the line to split the character, it deletes the character after the ^[ and fails to even find a ^[.

    Line appears as follows. With the ^[ representing the character above.

    123456^[SomeString^[AnotherString^[0.00000 sec

    I am attempting to get the time before the sec, but so far my commands have failed because character tends to delete the first number before the .
    My string ends up

    123456omeString notherString.00 000 sec

    The Strings may contain * characters, Spaces, and Underscores making it difficult to split.

    and I can not retrieve the correct number of seconds.
    Standard Unix awk, grep, and other commands have been attempted.
    Attempts to use Perl's Regular expression on just the ^ return no results
    Code:
    @cutline = split(/\^/, $line, 5);
    splitting on \[ also produces no results.

    If Anyone can offer some help, it would be appreciated.
    Edit:::
    (I've noticed the Character doesn't appear at all in the forum either...how fun. It looks a bit like this <- )
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    post some of the lines from the file or better attach some of them to a post.

    Comment

    • ScarletFox
      New Member
      • Feb 2008
      • 4

      #3
      As requested here are a couple lines of the file as seen in WordPad

      000000*XXX_XX XXX_XXXX0.3056 87 sec
      000003xxx*XXX X_XXXX_XXXX_XXX X0.046740 sec

      As seen in vi
      000000^[*^[XXX_XXXXX_XXXX^[0.305687 sec
      000003^[xxx*^[XXXX_XXXX_XXXX_ XXXX^[0.046740

      Replace the x's with whatever you'd like, it can be any length and include spaces. the X's have varying lengths as well. I'm trying to retrieve the time in seconds which can range from less then 1 second to more then 10 so I can't know exactly how many numbers to retrive prior to the space.

      Once again, the delimter that is used, (which shows in WordPad) will probably not be seen. It is a Left pointing arrow, and I havn't been able to find an ASCII value of it yet.

      Comment

      • eWish
        Recognized Expert Contributor
        • Jul 2007
        • 973

        #4
        Since I can not see the actual delimiter here is an example using [ as the delimiter. The regex I am using is greedy which can be fine tuned for your needs. Essentially what the regex does is looks for the last delimiter, then gets the data following it provided it starting with the numeric character until the end of the string.

        [CODE=perl]my @sec_array;
        my @data = ('000000^[some string here^[000.00 sec',
        '000000^[some string here^[111.11 sec',
        '000000^[some string here^[222.22 sec',
        '000000^[some string here^[333.33 sec');

        for(@data) {
        push @sec_array, $_ =~ /\[(\d+.*)$/g;
        }

        print join("\n", @sec_array);[/CODE]

        Prints
        000.00 sec
        111.11 sec
        222.22 sec
        333.33 sec

        --Kevin

        Comment

        • ScarletFox
          New Member
          • Feb 2008
          • 4

          #5
          I have been able to get the sequence after a delimiter in the past. The problem in this case is that I can not find a matching expression for the delimiter, since that doesn't seem possible at the moment (if I can't get the blasted thing to show); Is there was a way to retrieve the time going backwards? What I would like then is, from the final space before 'sec', to retrieve X many digits a '.' then 1 or 2 more digits going in reverse. Or if that isn't feasible, find the first digit then step back and pick up the double until the space. I've been looking through my books and online for the subject, but keep coming up short. I appreciate the help.

          Comment

          • eWish
            Recognized Expert Contributor
            • Jul 2007
            • 973

            #6
            Using the code I posted above you can set the \d (digits) to a minimum and maximum number if you wish.

            Code:
            excerpt from [URL=http://perldoc.perl.org/5.8.8/perlre.html]perlre[/URL]
            {n}    Match exactly n times
            {n,}   Match at least n times
            {n,m}  Match at least n but not more than m times
            As an example if you changed the regex above to this then, you can tell it the minimum and maximum numeric characters to allow.

            [CODE=perl]push @sec_array, $_ =~ /\[(\d{1,3}\.\d{2} ).*$/g;[/CODE]
            --Kevin

            Comment

            • ScarletFox
              New Member
              • Feb 2008
              • 4

              #7
              Thanks for the help. I've found that the delimiter is detected if I just have it detect everything other then a character or digit. Your code came in handy. Thanks again.

              Comment

              Working...