Complex regular expression?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • M Wells

    Complex regular expression?

    Hi All,

    I couldn't find a regular expressions group to ask this in, so I
    thought I'd ask here as I'm a little familiar with php's regular
    expressions syntax.

    I have a comma delimited text file that I need to change to being tab
    delimited.

    My problem is that commas appear in the values of one of my columns,
    and I'm trying to think of a graceful way of changing the other commas
    (ie those that do indicate the delimitation of a field, rather than
    which appear within the value of a field) in the file to tabs without
    affecting the commas that appear in the column in question.

    An example of the contents of the file would be:

    1,"1","20040301 ","08-08","BOOK, RETAIL",20.00,2 3.56
    2,"1","20040301 ","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15 .99,22.00

    So, I'm trying to create a regular expression that will change all the
    commas to tabs, except where the comma(s) appear within quotes.

    I've tried several different approaches, including a three-step
    process where I just change the commas that appear within quotes to a
    known 'escape' value, then changing all the commas to tabs, then
    changing the 'escape' values back to commas, but I can't seem to
    create a regular expression that will take into account the
    possibility of several commas appearing between quotes.

    I'm wondering if anyone can help me understand this better?

    Many thanks in advance,

    Murray
  • Chung Leong

    #2
    Re: Complex regular expression?

    "M Wells" <planetquirky@p lanetthoughtful .org> wrote in message
    news:oaba701duh blolu9qknrf5176 svlltfqko@4ax.c om...[color=blue]
    > Hi All,
    >
    > I couldn't find a regular expressions group to ask this in, so I
    > thought I'd ask here as I'm a little familiar with php's regular
    > expressions syntax.
    >
    > I have a comma delimited text file that I need to change to being tab
    > delimited.
    >
    > My problem is that commas appear in the values of one of my columns,
    > and I'm trying to think of a graceful way of changing the other commas
    > (ie those that do indicate the delimitation of a field, rather than
    > which appear within the value of a field) in the file to tabs without
    > affecting the commas that appear in the column in question.
    >
    > An example of the contents of the file would be:
    >
    > 1,"1","20040301 ","08-08","BOOK, RETAIL",20.00,2 3.56
    > 2,"1","20040301 ","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15 .99,22.00
    >
    > So, I'm trying to create a regular expression that will change all the
    > commas to tabs, except where the comma(s) appear within quotes.
    >
    > I've tried several different approaches, including a three-step
    > process where I just change the commas that appear within quotes to a
    > known 'escape' value, then changing all the commas to tabs, then
    > changing the 'escape' values back to commas, but I can't seem to
    > create a regular expression that will take into account the
    > possibility of several commas appearing between quotes.[/color]

    A not so elegant way:

    function to_tab($matches ) {
    return strtr($matches[1], ",", "\t") . $matches[2];
    }

    $r = preg_replace_ca llback('/([^"]*)("?[^"]*"?)/', 'to_tab', $s);


    Comment

    • Johannes Müller

      #3
      Re: Complex regular expression?

      M Wells schrieb:
      [color=blue]
      > Hi All,
      >
      > I couldn't find a regular expressions group to ask this in, so I
      > thought I'd ask here as I'm a little familiar with php's regular
      > expressions syntax.
      >
      > I have a comma delimited text file that I need to change to being tab
      > delimited.
      >
      > My problem is that commas appear in the values of one of my columns,
      > and I'm trying to think of a graceful way of changing the other
      > commas (ie those that do indicate the delimitation of a field,
      > rather than which appear within the value of a field) in the file
      > to tabs without affecting the commas that appear in the column in
      > question.
      >
      > An example of the contents of the file would be:
      >
      > 1,"1","20040301 ","08-08","BOOK, RETAIL",20.00,2 3.56
      > 2,"1","20040301 ","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15 .99,22.00
      >
      > So, I'm trying to create a regular expression that will change all
      > the commas to tabs, except where the comma(s) appear within quotes.
      >
      > I've tried several different approaches, including a three-step
      > process where I just change the commas that appear within quotes to a
      > known 'escape' value, then changing all the commas to tabs, then
      > changing the 'escape' values back to commas, but I can't seem to
      > create a regular expression that will take into account the
      > possibility of several commas appearing between quotes.
      >
      > I'm wondering if anyone can help me understand this better?
      >
      > Many thanks in advance,
      >
      > Murray[/color]

      Another way to solve this problem is to replace all commas which where
      NOT followed by spaces... If u can be sure that commas in quotes always
      have a space behind them...

      $new_string = preg_replace('/\,([\S])/',"\t$1",$strin g);

      *Hannes*

      Comment

      • Chris Hope

        #4
        Re: Complex regular expression?

        M Wells wrote:
        [color=blue]
        > I have a comma delimited text file that I need to change to being tab
        > delimited.
        >
        > My problem is that commas appear in the values of one of my columns,
        > and I'm trying to think of a graceful way of changing the other commas
        > (ie those that do indicate the delimitation of a field, rather than
        > which appear within the value of a field) in the file to tabs without
        > affecting the commas that appear in the column in question.[/color]

        Gets line from file pointer and parse for CSV fields


        If you have commas inside the quoted fields this function takes care of it
        for you. You can specify what sort of delimiter as well (eg tab, comma etc)

        Chris

        --
        Chris Hope
        The Electric Toolbox Ltd

        Comment

        • Garp

          #5
          Re: Complex regular expression?


          "M Wells" <planetquirky@p lanetthoughtful .org> wrote in message
          news:oaba701duh blolu9qknrf5176 svlltfqko@4ax.c om...[color=blue]
          > Hi All,
          >
          > I couldn't find a regular expressions group to ask this in, so I
          > thought I'd ask here as I'm a little familiar with php's regular
          > expressions syntax.
          >
          > I have a comma delimited text file that I need to change to being tab
          > delimited.
          >
          > My problem is that commas appear in the values of one of my columns,
          > and I'm trying to think of a graceful way of changing the other commas
          > (ie those that do indicate the delimitation of a field, rather than
          > which appear within the value of a field) in the file to tabs without
          > affecting the commas that appear in the column in question.
          >
          > An example of the contents of the file would be:
          >
          > 1,"1","20040301 ","08-08","BOOK, RETAIL",20.00,2 3.56
          > 2,"1","20040301 ","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15 .99,22.00
          >
          > So, I'm trying to create a regular expression that will change all the
          > commas to tabs, except where the comma(s) appear within quotes.
          >
          > I've tried several different approaches, including a three-step
          > process where I just change the commas that appear within quotes to a
          > known 'escape' value, then changing all the commas to tabs, then
          > changing the 'escape' values back to commas, but I can't seem to
          > create a regular expression that will take into account the
          > possibility of several commas appearing between quotes.
          >
          > I'm wondering if anyone can help me understand this better?
          >
          > Many thanks in advance,
          >
          > Murray[/color]

          Had a bit of a tinker, came up with this:

          <?php
          $x='1,2,3,"some text in quotes",4,5,"so me more, this time with a comma"';
          preg_match_all( '/(".*?")/',$x,$r);

          $r[0] now looks like this:
          Array
          (
          [0] => "some text in quotes"
          [1] => "some more, this time with a comma"
          )

          As you can see, the non-greediness of the regexp handles is the key. Run
          your original line through substr_replace( ) to get these strings replaced
          with tokens and resume where you left off.

          HTH
          Garp



          Comment

          Working...