Parsing challenge...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Artco News

    Parsing challenge...

    I thought I ask the scripting guru about the following.

    I have a file containing records of data with the following format(first
    column is the label):

    CODE#1^DESCRIPT ION^CODE#2^NOTE S
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^ano ther notes
    NN-112^description of NN-112^BBC23^multi line
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here

    How do I parse so I can insert them in the database, e.g. MySQL/Access?

    Perhaps there are an advanced scripting language can do this easily.

    Thanks


  • Ara.T.Howard

    #2
    Re: Parsing challenge...

    On Tue, 7 Oct 2003, Artco News wrote:
    [color=blue]
    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPT ION^CODE#2^NOTE S
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^ano ther notes
    > NN-112^description of NN-112^BBC23^multi line
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    >
    > Perhaps there are an advanced scripting language can do this easily.[/color]

    ruby is one of the more advanced :-)

    ~/eg/ruby > cat ./parse.rb

    #!/usr/bin/env ruby

    txt = <<-txt
    CODE#1^DESCRIPT ION^CODE#2^NOTE S
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^ano ther notes
    NN-112^description of NN-112^BBC23^multi line
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here
    txt


    pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
    tuples = txt.scan pat

    tuples.map{|tup le| p tuple}


    ~/eg/ruby > ./parse.rb

    [" CODE#1", "DESCRIPTIO N", "CODE#2", "NOTES"]
    [" NN-110", "an info of NN-001", "BRY234", "some notes"]
    [" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
    [" NN-112", "descriptio n of NN-112", "BBC23", "multiline\ n notes blah\n blah\n blah"]
    [" NN-113", "info info", "MNO12", "some notes here"]

    -a
    =============== =============== ======
    | Ara Howard
    | NOAA Forecast Systems Laboratory
    | Information and Technology Services
    | Data Systems Group
    | R/FST 325 Broadway
    | Boulder, CO 80305-3328
    | Email: ahoward@noaa.go v
    | Phone: 303-497-7238
    | Fax: 303-497-7259
    | The difference between art and science is that science is what we understand
    | well enough to explain to a computer. Art is everything else.
    | -- Donald Knuth, "Discover"
    | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\ x0a\""; done'
    =============== =============== ======

    Comment

    • Paulus Magnus

      #3
      Re: Parsing challenge...


      "Artco News" <artconews@veri zon.net> wrote in message
      news:CTDgb.3152 2$yU5.13084@nwr dny01.gnilink.n et...[color=blue]
      > I thought I ask the scripting guru about the following.
      >
      > I have a file containing records of data with the following format(first
      > column is the label):
      >
      > CODE#1^DESCRIPT ION^CODE#2^NOTE S
      > NN-110^an info of NN-001^BRY234^some notes
      > NN-111^1st line data
      > 2nd line data
      > 3rd line data^BRT345^ano ther notes
      > NN-112^description of NN-112^BBC23^multi line
      > notes blah
      > blah
      > blah
      > NN-113^info info^MNO12^some notes here
      >
      > How do I parse so I can insert them in the database, e.g. MySQL/Access?[/color]

      <?
      //Assuming we use file to read the file we'll get each line in an array, so
      we'll use $testdata as our sample file
      $testdata = array ();
      $testdata[] = "CODE#1^DESCRIP TION^CODE#2^NOT ES\r\n";
      $testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
      $testdata[] = "NN-111^1st line data\r\n";
      $testdata[] = "2nd line data\r\n";
      $testdata[] = "3rd line data^BRT345^ano ther notes\r\n";
      $testdata[] = "NN-112^description of NN-112^BBC23^multi line\r\n";
      $testdata[] = "notes blah\r\n";
      $testdata[] = "blah\r\n";
      $testdata[] = "blah\r\n";
      $testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

      $dbdata = array ();
      $row = "";
      $cnt = 0;
      foreach ($testdata as $line) {
      $delimiters = preg_match_all ("/\^/", $line, $waste);
      if (($cnt + $delimiters) > 3) {
      $dbdata[] = $row;
      $cnt = $delimiters;
      $row = $line;
      } else {
      $row .= $line;
      $cnt += $delimiters;
      }
      }
      $dbdata[] = $row;
      print_r ($dbdata);
      ?>

      .... produces ...

      Array (
      [0] => CODE#1^DESCRIPT ION^CODE#2^NOTE S
      [1] => NN-110^an info of NN-001^BRY234^some notes
      [2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^ano ther
      notes
      [3] => NN-112^description of NN-112^BBC23^multi line notes blah blah blah
      [4] => NN-113^info info^MNO12^some notes here
      )

      You can then easily iterate through this array, exploding each line by the ^
      and creating the INSERT INTO table VALUES (); bits of SQL.

      Paulus


      Comment

      • Useko Netsumi

        #4
        Re: Parsing challenge...

        this script failed if any of the cell is blank/no-value,
        e.g:

        CODE#1^DESCRIPT ION^CODE#2^NOTE S
        NN-110^^^some notes
        NN-111^1st line data
        2nd line data
        3rd line data^BRT345^ano ther notes
        NN-112^description of NN-112^BBC23^multi line
        notes blah
        blah
        blah
        NN-113^info info^MNO12^some notes here


        "Ara.T.Howa rd" <ahoward@fsl.no aa.gov> wrote in message
        news:Pine.LNX.4 .53.03100722185 60.32521@eli.fs l.noaa.gov...[color=blue]
        > On Tue, 7 Oct 2003, Artco News wrote:
        >[color=green]
        > > I thought I ask the scripting guru about the following.
        > >
        > > I have a file containing records of data with the following format(first
        > > column is the label):
        > >
        > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
        > > NN-110^an info of NN-001^BRY234^some notes
        > > NN-111^1st line data
        > > 2nd line data
        > > 3rd line data^BRT345^ano ther notes
        > > NN-112^description of NN-112^BBC23^multi line
        > > notes blah
        > > blah
        > > blah
        > > NN-113^info info^MNO12^some notes here
        > >
        > > How do I parse so I can insert them in the database, e.g. MySQL/Access?
        > >
        > > Perhaps there are an advanced scripting language can do this easily.[/color]
        >
        > ruby is one of the more advanced :-)
        >
        > ~/eg/ruby > cat ./parse.rb
        >
        > #!/usr/bin/env ruby
        >
        > txt = <<-txt
        > CODE#1^DESCRIPT ION^CODE#2^NOTE S
        > NN-110^an info of NN-001^BRY234^some notes
        > NN-111^1st line data
        > 2nd line data
        > 3rd line data^BRT345^ano ther notes
        > NN-112^description of NN-112^BBC23^multi line
        > notes blah
        > blah
        > blah
        > NN-113^info info^MNO12^some notes here
        > txt
        >
        >
        > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
        > tuples = txt.scan pat
        >
        > tuples.map{|tup le| p tuple}
        >
        >
        > ~/eg/ruby > ./parse.rb
        >
        > [" CODE#1", "DESCRIPTIO N", "CODE#2", "NOTES"]
        > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
        > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",[/color]
        "BRT345", "another notes"][color=blue]
        > [" NN-112", "descriptio n of NN-112", "BBC23", "multiline\ n notes[/color]
        blah\n blah\n blah"][color=blue]
        > [" NN-113", "info info", "MNO12", "some notes here"]
        >
        > -a
        > =============== =============== ======
        > | Ara Howard
        > | NOAA Forecast Systems Laboratory
        > | Information and Technology Services
        > | Data Systems Group
        > | R/FST 325 Broadway
        > | Boulder, CO 80305-3328
        > | Email: ahoward@noaa.go v
        > | Phone: 303-497-7238
        > | Fax: 303-497-7259
        > | The difference between art and science is that science is what we[/color]
        understand[color=blue]
        > | well enough to explain to a computer. Art is everything else.
        > | -- Donald Knuth, "Discover"
        > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print[/color]
        \"\x3a\x2d\x29\ x0a\""; done'[color=blue]
        > =============== =============== ======[/color]


        Comment

        • Useko Netsumi

          #5
          Re: Parsing challenge...

          Got it! I just have to replace the (+) sign with (*) for blank or any
          string.

          Next, how do I insert those values into MySQL database, assuming I have
          those table defined. Thanks.

          "Useko Netsumi" <usenets@nyc.rr .com> wrote in message
          news:bm08ec$gd9 eb$1@ID-159205.news.uni-berlin.de...[color=blue]
          > this script failed if any of the cell is blank/no-value,
          > e.g:
          >
          > CODE#1^DESCRIPT ION^CODE#2^NOTE S
          > NN-110^^^some notes
          > NN-111^1st line data
          > 2nd line data
          > 3rd line data^BRT345^ano ther notes
          > NN-112^description of NN-112^BBC23^multi line
          > notes blah
          > blah
          > blah
          > NN-113^info info^MNO12^some notes here
          >
          >
          > "Ara.T.Howa rd" <ahoward@fsl.no aa.gov> wrote in message
          > news:Pine.LNX.4 .53.03100722185 60.32521@eli.fs l.noaa.gov...[color=green]
          > > On Tue, 7 Oct 2003, Artco News wrote:
          > >[color=darkred]
          > > > I thought I ask the scripting guru about the following.
          > > >
          > > > I have a file containing records of data with the following[/color][/color][/color]
          format(first[color=blue][color=green][color=darkred]
          > > > column is the label):
          > > >
          > > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
          > > > NN-110^an info of NN-001^BRY234^some notes
          > > > NN-111^1st line data
          > > > 2nd line data
          > > > 3rd line data^BRT345^ano ther notes
          > > > NN-112^description of NN-112^BBC23^multi line
          > > > notes blah
          > > > blah
          > > > blah
          > > > NN-113^info info^MNO12^some notes here
          > > >
          > > > How do I parse so I can insert them in the database, e.g.[/color][/color][/color]
          MySQL/Access?[color=blue][color=green][color=darkred]
          > > >
          > > > Perhaps there are an advanced scripting language can do this easily.[/color]
          > >
          > > ruby is one of the more advanced :-)
          > >
          > > ~/eg/ruby > cat ./parse.rb
          > >
          > > #!/usr/bin/env ruby
          > >
          > > txt = <<-txt
          > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
          > > NN-110^an info of NN-001^BRY234^some notes
          > > NN-111^1st line data
          > > 2nd line data
          > > 3rd line data^BRT345^ano ther notes
          > > NN-112^description of NN-112^BBC23^multi line
          > > notes blah
          > > blah
          > > blah
          > > NN-113^info info^MNO12^some notes here
          > > txt
          > >
          > >
          > > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
          > > tuples = txt.scan pat
          > >
          > > tuples.map{|tup le| p tuple}
          > >
          > >
          > > ~/eg/ruby > ./parse.rb
          > >
          > > [" CODE#1", "DESCRIPTIO N", "CODE#2", "NOTES"]
          > > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
          > > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",[/color]
          > "BRT345", "another notes"][color=green]
          > > [" NN-112", "descriptio n of NN-112", "BBC23", "multiline\ n notes[/color]
          > blah\n blah\n blah"][color=green]
          > > [" NN-113", "info info", "MNO12", "some notes here"]
          > >
          > > -a
          > > =============== =============== ======
          > > | Ara Howard
          > > | NOAA Forecast Systems Laboratory
          > > | Information and Technology Services
          > > | Data Systems Group
          > > | R/FST 325 Broadway
          > > | Boulder, CO 80305-3328
          > > | Email: ahoward@noaa.go v
          > > | Phone: 303-497-7238
          > > | Fax: 303-497-7259
          > > | The difference between art and science is that science is what we[/color]
          > understand[color=green]
          > > | well enough to explain to a computer. Art is everything else.
          > > | -- Donald Knuth, "Discover"
          > > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print[/color]
          > \"\x3a\x2d\x29\ x0a\""; done'[color=green]
          > > =============== =============== ======[/color]
          >
          >[/color]


          Comment

          • Robert Klemme

            #6
            Re: Parsing challenge...


            "Artco News" <artconews@veri zon.net> schrieb im Newsbeitrag
            news:CTDgb.3152 2$yU5.13084@nwr dny01.gnilink.n et...[color=blue]
            > I thought I ask the scripting guru about the following.
            >
            > I have a file containing records of data with the following format(first
            > column is the label):
            >
            > CODE#1^DESCRIPT ION^CODE#2^NOTE S
            > NN-110^an info of NN-001^BRY234^some notes
            > NN-111^1st line data
            > 2nd line data
            > 3rd line data^BRT345^ano ther notes
            > NN-112^description of NN-112^BBC23^multi line
            > notes blah
            > blah
            > blah
            > NN-113^info info^MNO12^some notes here
            >
            > How do I parse so I can insert them in the database, e.g. MySQL/Access?
            >
            > Perhaps there are an advanced scripting language can do this easily.[/color]

            Ruby:

            #!/usr/bin/ruby

            def process(rec)
            while rec.size > 4
            dbRec = rec.slice!( 0..3 )
            # db insertion here
            p dbRec
            end
            end

            rec = []

            while ( line = gets )
            line.chomp!
            rec.concat( line.split('^') )
            process rec
            end

            process rec

            Comment

            • Ara.T.Howard

              #7
              Re: Parsing challenge...

              On Wed, 8 Oct 2003, Useko Netsumi wrote:
              [color=blue]
              > Got it! I just have to replace the (+) sign with (*) for blank or any
              > string.
              >
              > Next, how do I insert those values into MySQL database, assuming I have
              > those table defined. Thanks.[/color]


              file: parse.rb
              ----CUT----
              #!/usr/bin/env ruby
              require 'mysql'

              # command line args
              host, user, passwd, db, relation = ARGV
              db ||= 'test'
              relation ||= 'test'

              # connect to db
              mysql = Mysql.connect host, user, passwd
              mysql.select_db db

              # parse
              txt = DATA.read
              pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox
              tuples = txt.scan pat

              # insert tuples
              sql = "insert into %s values('%s','%s ','%s','%s')"
              tuples.each do |tuple|
              begin
              insert = sql % [relation, *tuple]
              mysql.query insert
              rescue Exception => e
              p e
              end
              end

              # show results
              res = mysql.query('se lect * from %s' % [relation])
              while((row = res.fetch_row))
              p row
              end


              # sample input is embedded below - can be read via DATA object
              __END__
              CODE#1^DESCRIPT ION^CODE#2^NOTE S
              NN-110^an info of NN-001^BRY234^some notes
              NN-111^1st line data
              2nd line data
              3rd line data^BRT345^ano ther notes
              NN-112^description of NN-112^BBC23^multi line
              notes blah
              blah
              blah
              NN-113^info info^MNO12^some notes here
              ----CUT----



              running it looks like:

              ~/eg/ruby > ./parse.rb
              ["CODE#1", "DESCRIPTIO N", "CODE#2", "NOTES"]
              ["NN-110", "an info of NN-001", "BRY234", "some notes"]
              ["NN-111", "1st line data\n2nd line data\n3rd line data", "BRT345", "another notes"]
              ["NN-112", "descriptio n of NN-112", "BBC23", "multiline\nnot es blah\nblah\nbla h"]
              ["NN-113", "info info", "MNO12", "some notes here"]


              i created a database named 'test', and a table named 'test' using 'create
              table test(f0 text,f1 text,f2 text,f3 text)'

              hope that gets you going.

              -a









              [color=blue]
              >
              > "Useko Netsumi" <usenets@nyc.rr .com> wrote in message
              > news:bm08ec$gd9 eb$1@ID-159205.news.uni-berlin.de...[color=green]
              > > this script failed if any of the cell is blank/no-value,
              > > e.g:
              > >
              > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
              > > NN-110^^^some notes
              > > NN-111^1st line data
              > > 2nd line data
              > > 3rd line data^BRT345^ano ther notes
              > > NN-112^description of NN-112^BBC23^multi line
              > > notes blah
              > > blah
              > > blah
              > > NN-113^info info^MNO12^some notes here
              > >
              > >
              > > "Ara.T.Howa rd" <ahoward@fsl.no aa.gov> wrote in message
              > > news:Pine.LNX.4 .53.03100722185 60.32521@eli.fs l.noaa.gov...[color=darkred]
              > > > On Tue, 7 Oct 2003, Artco News wrote:
              > > >
              > > > > I thought I ask the scripting guru about the following.
              > > > >
              > > > > I have a file containing records of data with the following[/color][/color]
              > format(first[color=green][color=darkred]
              > > > > column is the label):
              > > > >
              > > > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
              > > > > NN-110^an info of NN-001^BRY234^some notes
              > > > > NN-111^1st line data
              > > > > 2nd line data
              > > > > 3rd line data^BRT345^ano ther notes
              > > > > NN-112^description of NN-112^BBC23^multi line
              > > > > notes blah
              > > > > blah
              > > > > blah
              > > > > NN-113^info info^MNO12^some notes here
              > > > >
              > > > > How do I parse so I can insert them in the database, e.g.[/color][/color]
              > MySQL/Access?[color=green][color=darkred]
              > > > >
              > > > > Perhaps there are an advanced scripting language can do this easily.
              > > >
              > > > ruby is one of the more advanced :-)
              > > >
              > > > ~/eg/ruby > cat ./parse.rb
              > > >
              > > > #!/usr/bin/env ruby
              > > >
              > > > txt = <<-txt
              > > > CODE#1^DESCRIPT ION^CODE#2^NOTE S
              > > > NN-110^an info of NN-001^BRY234^some notes
              > > > NN-111^1st line data
              > > > 2nd line data
              > > > 3rd line data^BRT345^ano ther notes
              > > > NN-112^description of NN-112^BBC23^multi line
              > > > notes blah
              > > > blah
              > > > blah
              > > > NN-113^info info^MNO12^some notes here
              > > > txt
              > > >
              > > >
              > > > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
              > > > tuples = txt.scan pat
              > > >
              > > > tuples.map{|tup le| p tuple}
              > > >
              > > >
              > > > ~/eg/ruby > ./parse.rb
              > > >
              > > > [" CODE#1", "DESCRIPTIO N", "CODE#2", "NOTES"]
              > > > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
              > > > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",[/color]
              > > "BRT345", "another notes"][color=darkred]
              > > > [" NN-112", "descriptio n of NN-112", "BBC23", "multiline\ n notes[/color]
              > > blah\n blah\n blah"][color=darkred]
              > > > [" NN-113", "info info", "MNO12", "some notes here"]
              > > >
              > > > -a
              > > > =============== =============== ======
              > > > | Ara Howard
              > > > | NOAA Forecast Systems Laboratory
              > > > | Information and Technology Services
              > > > | Data Systems Group
              > > > | R/FST 325 Broadway
              > > > | Boulder, CO 80305-3328
              > > > | Email: ahoward@noaa.go v
              > > > | Phone: 303-497-7238
              > > > | Fax: 303-497-7259
              > > > | The difference between art and science is that science is what we[/color]
              > > understand[color=darkred]
              > > > | well enough to explain to a computer. Art is everything else.
              > > > | -- Donald Knuth, "Discover"
              > > > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print[/color]
              > > \"\x3a\x2d\x29\ x0a\""; done'[color=darkred]
              > > > =============== =============== ======[/color]
              > >
              > >[/color]
              >
              >
              >[/color]

              =============== =============== ======
              | Ara Howard
              | NOAA Forecast Systems Laboratory
              | Information and Technology Services
              | Data Systems Group
              | R/FST 325 Broadway
              | Boulder, CO 80305-3328
              | Email: ara.t.howard@no aa.gov
              | Phone: 303-497-7238
              | Fax: 303-497-7259
              | The difference between art and science is that science is what we understand
              | well enough to explain to a computer. Art is everything else.
              | -- Donald Knuth, "Discover"
              | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\ x0a\""; done'
              =============== =============== ======

              Comment

              Working...