best design for parse

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • GS

    #16
    Re: best design for parse

    thank you.

    you do have a point but the application I have in mind to get most of easy
    to do but boring and repetitive task out user quickly to get their buy in
    for the next phrase. The application is not going to be perfect on version 0
    but must be flexible to adapt to need change.

    Furthermore I choose normalizing date format to yyyy-mm-dd because that is
    the standard string date format that is acceptable by almost all standard
    windows applications
    for the users that I deal with despite locale, despite default display
    format.

    as a side note right now this application at version zero is not to automate
    everything but help users to do their jobs and help us to gain understanding
    of what they do. at the same time validate the transform process that will
    be used later for automation. version 1 will automate a lot more and may
    actually drive some excel, word application process


    you could say the version zero is closer to Mickey mouse utility with, if
    you wish

    "Stephany Young" <noone@localhos twrote in message
    news:ukTMMqsMHH A.4712@TK2MSFTN GP04.phx.gbl...
    Again you're missing the point.
    >
    I think the best thing you can do is post a relatively small sample of the
    text you are attempting to parse.
    >
    While you're doing that, execute the following and observe the results. It
    demonstrates what I am talking about:
    >
    Dim _source As String = "On 07/01/2007 the quick brown fox jumps over
    the
    lazy dog." & Environment.New Line & _
    "On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On 11/01/07 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On 01/12/07 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "On 15/01 the quick brown fox again jumps over the lazy dog." &
    Environment.New Line & _
    "The part number XYZ/72/84 is now discontinued."
    >
    Dim _regex As New
    >
    Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{
    2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")
    >
    Dim _candidates As Integer = 0
    Dim _matches As Integer = 0
    >
    Dim _match As Match = _regex.Match(_s ource)
    >
    While _match.Success
    _candidates += 1
    Console.WriteLi ne("{0} found at index {1}", _match.Value,
    _match.Index)
    Try
    Console.WriteLi ne("Converted value = {0:yyyy-MM-dd}",
    DateTime.ParseE xact(_match.Val ue, New String() {"dd/MM/yyyy",
    "MM/dd/yyyy",
    "MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
    "MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles. None))
    _matches += 1
    Catch _ex As Exception
    Console.WriteLi ne(_ex.Message)
    End Try
    _match = _match.NextMatc h()
    End While
    >
    Console.WriteLi ne("{0} candidates found", _candidates)
    >
    Console.WriteLi ne("{0} matches found", _matches)
    >
    >
    "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
    news:eFm$y5rMHH A.4376@TK2MSFTN GP03.phx.gbl...
    You are sort of on the same track as mine.


    I must first apologize I did not tell you the complete story.

    Although the application does not exactly know before hand what format
    the
    data may come in, however part of the application allow user to define
    and
    record favourite for a website
    - to extract by text or html
    - header content and format
    - record format and date format ( that is where the date format mask
    come in)
    - optionally ordinal number for each column or re-ordering
    - trailer content and format

    For a given batch, at least for the body, date format are uniform

    furthermore, the need to make the extract process generic and adaptable
    to
    the front end that takes the user definitions, I believe it would be
    easier
    to "normalize" date string to "yyyy-mm-dd".

    Also the end target for of may not necessarily be SQL database but may
    be
    text, pasted to word report. or excel by user


    Therefore, I can transform the date format mask to regex in the
    appropriate
    format and identifier I can use regex,replace to normalize the date. As
    a
    matter of fact the date separator does not have to / but can be space as
    long as there are identifiable delimiter around the date string.

    I already have code for dealing with regex for dates from prior project.
    all I have to do is adapt to the present need

    who knows, maybe I taken on a totally offbeat tract

    "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
    news:%23vnOBJiM HHA.1280@TK2MSF TNGP04.phx.gbl. ..
    thanks for all pitched in so far.
    >
    let give it another shot.
    >
    looks like an easier way out would be
    1.copy the date format string regex string holder and then derive the
    relevant regex expression to be used for date normalization later in
    part
    2:
    replace the regex string the yyyy to regex year expression with
    year
    identifier
    look for yy and replace with 20yy and repeat the step above
    replace mmm with the month regex expression associated with month
    identifier
    replace mm with the 2 digit month regex expression associated with
    month
    identifier
    replace dd with the 2 digit day regix expression assoc. with day
    identifier
    >
    2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
    >
    >
    any problem with the above approach?
    >
    "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
    news:%23Qj7TbWM HHA.3944@TK2MSF TNGP06.phx.gbl. ..
    GS,

    Maybe can you avoid this in 2007 and all things like that as
    DateTime.parseE xact, but have a look to the nicely by Microsoft
    inbuild
    globalization and than the to that related ToString option.

    Cor

    "gs" <gs@dontMail.te lusschreef in bericht
    news:OtrnsPTMHH A.4720@TK2MSFTN GP03.phx.gbl...
    let say I have to deal with various date format and I am give
    format
    string from one of the following
    dd/mm/yyyy
    mm/dd/yyyy
    dd/mmm/yyyy
    mmm/dd/yyyy
    dd/mm/yy
    mm/dd/yy
    dd/mmm/yy
    mmm/dd/yy
    dd/mm
    what is the best way to come up a relevant regex for the incoming
    format
    string
    a) use two array and statically match
    b) use regex to find the order
    >


    >
    >
    >
    >

    Comment

    • Stephany Young

      #17
      Re: best design for parse

      Now we're cooking with gas. I think that regex is overkill for this
      'problem'. Sure, you can use it if you wish but I think you will be making a
      rod for your own back.

      Here is a solution that works for your sample data. Create a Windows Forms
      project, plonk a button on the form and paste the following into the form:

      Private m_source1 As String = "Date Parts ID Parts Description location
      Quantitiy Unit Cost Total Cost" & Environment.New Line & _
      "11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
      Environment.New Line & _
      "15 Dec A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "18 Dec A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "19 Dec A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12 Dec A1234988 Sample Parts description 1 10.00 20"

      Private m_source2 As String = "Date Parts ID Parts Description location
      Quantitiy Unit Cost Total Cost" & Environment.New Line & _
      "11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
      Environment.New Line & _
      "15 12 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "18 12 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "19 12 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12 12 06 A1234988 Sample Parts description 1 10.00 20"

      Private m_source3 As String = "Parts Parts ID Description location
      Quantitiy Unit Cost Total Cost" & Environment.New Line & _
      "11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
      Environment.New Line & _
      "15/12/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "18/12/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "19/12/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12/12/06 A1234988 Sample Parts description 1 10.00 20"

      Private m_source4 As String = "Date Parts ID Parts Description location
      Quantitiy Unit Cost Total Cost" & Environment.New Line & _
      "11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
      Environment.New Line & _
      "15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12/dec/06 A1234988 Sample Parts description 1 10.00 20"

      Private m_source5 As String = "Date Parts ID Parts Description location
      Quantitiy Unit Cost Total Cost" & Environment.New Line & _
      "12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
      Environment.New Line & _
      "12 15 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12 18 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12 19 06 A1234988 Sample Parts description 1 10.00 20" &
      Environment.New Line & _
      "12 12 06 A1234988 Sample Parts description 1 10.00 20"

      Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
      System.EventArg s) Handles Button1.Click

      Console.WriteLi ne()

      Console.WriteLi ne("Sample 1")

      ProcessData(m_s ource1)

      Console.WriteLi ne()

      Console.WriteLi ne("Sample 2")

      ProcessData(m_s ource2)

      Console.WriteLi ne()

      Console.WriteLi ne("Sample 3")

      ProcessData(m_s ource3)

      Console.WriteLi ne()

      Console.WriteLi ne("Sample 4")

      ProcessData(m_s ource4)

      Console.WriteLi ne()

      Console.WriteLi ne("Sample 5")

      ProcessData(m_s ource5)

      Console.WriteLi ne()

      End Sub

      Private Sub ProcessData(ByV al source As String)

      ' Assumption: Lines of data are seperated by a carriage return/line feed
      pair
      Dim _lines As String() = source.Split(Ne w String()
      {Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)

      ' Determined by eyeballing data: All 'fields' are delimited by a pair of
      spaces
      Dim _ss As String() = _lines(0).Split (New String() {" "},
      StringSplitOpti ons.None)

      ' Determine which line is the first line of actual data
      ' If the first line is a heading line then all characters of the first
      field will be letters
      Dim _lettercount As Integer = 0
      For Each _c As Char In _ss(0)
      If Char.IsLetter(_ c) Then _lettercount += 1
      Next
      Dim _firstline As Integer = 0
      If _lettercount = _ss(0).Length Then _firstline = 1

      'Split the first actual line on the field delimiter
      _ss = _lines(_firstli ne).Split(New String() {" "},
      StringSplitOpti ons.None)

      ' Determined by eyeballing data: The date field is always the first
      field in the line

      ' Determine the delimiter to be used for the date format
      Dim _delimiter As String = ""
      If _ss(0).IndexOf( " ") 0 Then
      _delimiter = " "
      ElseIf _ss(0).IndexOf( "/") 0 Then
      _delimiter = "/"
      ElseIf _ss(0).IndexOf( "-") 0 Then
      _delimiter = "-"
      Else
      Console.WriteLi ne("Unable to determine delimiter out of " & _ss(0))
      Return
      End If
      Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")

      ' Construct the date format to be used
      Dim _format As String = String.Empty
      ' Split the first field on the date format delimiter
      Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
      StringSplitOpti ons.None)
      If _parts.Length = 2 Then
      ' If there are 2 parts then we only have day and month components
      If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
      Char.IsLetter(_ parts(1).Chars( 0)) Then
      ' The 1st part starts with a digit and the 2nd part starts with a
      letter
      ' so we can assume that the 1st part is the day and the 2nd part is
      the month
      _format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
      If _parts(1).Lengt h 3 Then _format &= "M"
      ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
      Char.IsDigit(_p arts(1).Chars(0 )) Then
      ' Both parts start with a digit
      ' Start with the assumption that the 1st part is the day and the 2nd
      part is the month
      _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
      String("M"c, _parts(0).Lengt h))
      If Integer.Parse(_ parts(1)) 12 Then
      ' The 1st part must be the month and the 2nd part must be the day
      _format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
      String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
      _parts(2).Lengt h)
      End If
      ' There is big gotcha here if both parts are < 12 and are different
      ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
      February or January 2
      End If
      ElseIf _parts.Length = 3 Then
      ' If there 3 parts then we have day, month and year components
      ' Assume that the year is always th 3rd part
      If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
      Char.IsLetter(_ parts(1).Chars( 0)) Then
      ' The 1st part starts with a digit and the 2nd part starts with a
      letter
      ' so we can assume that the 1st part is the day and the 2nd part is
      the month
      _format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
      If _parts(1).Lengt h 3 Then _format &= "M"
      _format &= _delimiter & New String("y"c, _parts(2).Lengt h)
      ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
      Char.IsDigit(_p arts(1).Chars(0 )) Then
      ' Both parts start with a digit
      ' Start with the assumption that the 1st part is the day and the 2nd
      part is the month
      _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
      String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
      _parts(2).Lengt h)
      If Integer.Parse(_ parts(1)) 12 Then
      ' The 1st part must be the month and the 2nd part must be the day
      _format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
      String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
      _parts(2).Lengt h)
      End If
      ' There is big gotcha here if the forst two parts are < 12 and are
      different
      ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
      February or January 2
      End If
      End If
      If _format.Length = 0 Then
      ' We were unable to determine the date format from the available
      information
      Console.WriteLi ne("Unable to determine format from " & _ss(0))
      Return
      End If

      ' We were able to determine the date format so we can continue and parse
      the dates
      Console.WriteLi ne("Determined format as " & _format)

      ' Start from our actual first line of data
      For _i As Integer = _firstline To _lines.Length - 1
      _ss = _lines(_i).Spli t(New String() {" "}, StringSplitOpti ons.None)
      Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format, Nothing)
      Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted date:
      " & _date.ToString( "yyyy-MM-dd"))
      Next

      End Sub

      Note, from the results, that if there is no year part then
      DateTime.ParseE xact will interpret tahe date being in the current year as
      determined from the system date at the time the code is executed.


      "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
      news:uceG$z0MHH A.4916@TK2MSFTN GP06.phx.gbl...
      look like I am not expressing myself clearly. although the application
      does
      not know which format is used but does know for a given Set which date
      format I deals with and can expect the same format for a given Set of
      input.
      I should not have used the term batch but a set of record. The only
      possible variations are some records in certain sets may be split into 2
      lines but that is not critical as the conditions can be described before
      hand and normalized by the another parse component
      >
      sample date
      >
      Set1: date format mask is "dd MMM"
      Date Parts ID Parts Description location Quantitiy Unit Cost Total
      Cost
      11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
      15 Dec A1234988 Sample Parts description 1 10.00 20
      18 Dec A1234988 Sample Parts description 1 10.00 20
      19 Dec A1234988 Sample Parts description 1 10.00 20
      12 Dec A1234988 Sample Parts description 1 10.00 20
      >
      >
      Set 2 date format Mask is "dd MM yy"
      Date Parts ID Parts Description location Quantitiy Unit Cost Total
      Cost
      11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
      15 12 06 A1234988 Sample Parts description 1 10.00 20
      18 12 06 A1234988 Sample Parts description 1 10.00 20
      19 12 06 A1234988 Sample Parts description 1 10.00 20
      12 12 06 A1234988 Sample Parts description 1 10.00 20
      >
      Set 3 date format mask "dd/MMM/06"
      Parts Description location Quantitiy Unit Cost Total Cost
      11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
      15/12/06 A1234988 Sample Parts description 1 10.00
      2018/12/06 A1234988 Sample Parts description 1 10.00
      2019/12/06 A1234988 Sample Parts description 1 10.00
      2012/12/06 A1234988 Sample Parts description 1 10.00 20
      >
      Set 4 date format mask ""
      Date Parts ID Parts Description location Quantitiy Unit Cost Total
      Cost
      11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
      15/dec/06 A1234988 Sample Parts description 1 10.00 20
      18/dec/06 A1234988 Sample Parts description 1 10.00 20
      19/dec/06 A1234988 Sample Parts description 1 10.00 20
      12/dec/06 A1234988 Sample Parts description 1 10.00 20
      >
      how do I deal with format without year, I do have cluse for other parts of
      teh originatin website and optional default set by user
      >
      the sample data show variation of date format from set to set but the date
      format that I need to deal within a given set are consistant and user has
      influence to date format mask used.
      >
      Like Cor suggestion. don't let user enter the format but let the user pick
      from a list. that will like be case at least n the version 0
      >
      >
      "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
      news:eFm$y5rMHH A.4376@TK2MSFTN GP03.phx.gbl...
      >You are sort of on the same track as mine.
      >>
      >>
      >I must first apologize I did not tell you the complete story.
      >>
      >Although the application does not exactly know before hand what format
      >the
      >data may come in, however part of the application allow user to define
      >and
      >record favourite for a website
      > - to extract by text or html
      > - header content and format
      > - record format and date format ( that is where the date format mask
      >come in)
      > - optionally ordinal number for each column or re-ordering
      > - trailer content and format
      >>
      >For a given batch, at least for the body, date format are uniform
      >>
      >furthermore, the need to make the extract process generic and adaptable
      >to
      >the front end that takes the user definitions, I believe it would be
      easier
      >to "normalize" date string to "yyyy-mm-dd".
      >>
      >Also the end target for of may not necessarily be SQL database but may be
      >text, pasted to word report. or excel by user
      >>
      >>
      >Therefore, I can transform the date format mask to regex in the
      appropriate
      >format and identifier I can use regex,replace to normalize the date. As
      >a
      >matter of fact the date separator does not have to / but can be space as
      >long as there are identifiable delimiter around the date string.
      >>
      >I already have code for dealing with regex for dates from prior project.
      >all I have to do is adapt to the present need
      >>
      >who knows, maybe I taken on a totally offbeat tract
      >>
      >"GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
      >news:%23vnOBJi MHHA.1280@TK2MS FTNGP04.phx.gbl ...
      thanks for all pitched in so far.
      >
      let give it another shot.
      >
      looks like an easier way out would be
      1.copy the date format string regex string holder and then derive the
      relevant regex expression to be used for date normalization later in
      part
      >2:
      replace the regex string the yyyy to regex year expression with
      year
      identifier
      look for yy and replace with 20yy and repeat the step above
      replace mmm with the month regex expression associated with month
      identifier
      replace mm with the 2 digit month regex expression associated with
      >month
      identifier
      replace dd with the 2 digit day regix expression assoc. with day
      identifier
      >
      2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
      >
      >
      any problem with the above approach?
      >
      "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
      news:%23Qj7TbWM HHA.3944@TK2MSF TNGP06.phx.gbl. ..
      GS,
      >
      Maybe can you avoid this in 2007 and all things like that as
      DateTime.parseE xact, but have a look to the nicely by Microsoft
      inbuild
      globalization and than the to that related ToString option.
      >
      Cor
      >
      "gs" <gs@dontMail.te lusschreef in bericht
      news:OtrnsPTMHH A.4720@TK2MSFTN GP03.phx.gbl...
      let say I have to deal with various date format and I am give
      format
      string from one of the following
      dd/mm/yyyy
      mm/dd/yyyy
      dd/mmm/yyyy
      mmm/dd/yyyy
      dd/mm/yy
      mm/dd/yy
      dd/mmm/yy
      mmm/dd/yy
      dd/mm
      what is the best way to come up a relevant regex for the incoming
      >format
      string
      a) use two array and statically match
      b) use regex to find the order

      >
      >
      >
      >
      >>
      >>
      >
      >

      Comment

      • Cor Ligthert [MVP]

        #18
        Re: best design for parse

        Stephany,

        I am curious, what does this phrase mean, I don't know it.
        Now we're cooking with gas.
        (Living in Holland which is above one of the former biggest gasbells of
        Europe)

        Cor


        Comment

        • Stephany Young

          #19
          Re: best design for parse

          It's a euphemism for:

          Efficiently performing a task after a long period
          of inefficient performance or possibly failed
          attempts at the entire task or certain steps in the process.

          Vefore we saw the sample data we were 'shooting in the dark'. As soon as the
          sample data was posted it all became clear.


          "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
          news:%23H3tLJ7M HHA.1280@TK2MSF TNGP04.phx.gbl. ..
          Stephany,
          >
          I am curious, what does this phrase mean, I don't know it.
          >
          >Now we're cooking with gas.
          >
          (Living in Holland which is above one of the former biggest gasbells of
          Europe)
          >
          Cor
          >

          Comment

          • GS

            #20
            Re: best design for parse

            I see the code work hard on does work most a lot of cases, but it not better
            we get assistance from user who knows what date format being used? That is
            the rationale I let user somehow pick the date format mask. Guessing date
            format is tough to master for all cases. Not only months, days can be
            indeterminate at time; worse when 2 digit year is used. I have seen some
            sample data that is way out of ordinary date format commonly seen in US.

            relying the first 1 or 2 being numeric would miss out quite a few cases.
            Nonetheless. the code can be a default in absence of user spec. . thank you
            very much for that

            Sorry for misleading you with incomplete data samples.
            There are sample data set where the first column is not date. on the other
            sometimes first 2 columns can also be dates as well as rarely another column
            else where can to date. this sound incredulous but that's what users have
            to content with.

            "Stephany Young" <noone@localhos twrote in message
            news:edlcDJ5MHH A.1252@TK2MSFTN GP02.phx.gbl...
            Now we're cooking with gas. I think that regex is overkill for this
            'problem'. Sure, you can use it if you wish but I think you will be making
            a
            rod for your own back.
            >
            Here is a solution that works for your sample data. Create a Windows Forms
            project, plonk a button on the form and paste the following into the form:
            >
            Private m_source1 As String = "Date Parts ID Parts Description location
            Quantitiy Unit Cost Total Cost" & Environment.New Line & _
            "11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
            Environment.New Line & _
            "15 Dec A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "18 Dec A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "19 Dec A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12 Dec A1234988 Sample Parts description 1 10.00 20"
            >
            Private m_source2 As String = "Date Parts ID Parts Description location
            Quantitiy Unit Cost Total Cost" & Environment.New Line & _
            "11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
            Environment.New Line & _
            "15 12 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "18 12 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "19 12 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12 12 06 A1234988 Sample Parts description 1 10.00 20"
            >
            Private m_source3 As String = "Parts Parts ID Description location
            Quantitiy Unit Cost Total Cost" & Environment.New Line & _
            "11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
            Environment.New Line & _
            "15/12/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "18/12/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "19/12/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12/12/06 A1234988 Sample Parts description 1 10.00 20"
            >
            Private m_source4 As String = "Date Parts ID Parts Description location
            Quantitiy Unit Cost Total Cost" & Environment.New Line & _
            "11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
            Environment.New Line & _
            "15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12/dec/06 A1234988 Sample Parts description 1 10.00 20"
            >
            Private m_source5 As String = "Date Parts ID Parts Description location
            Quantitiy Unit Cost Total Cost" & Environment.New Line & _
            "12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
            Environment.New Line & _
            "12 15 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12 18 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12 19 06 A1234988 Sample Parts description 1 10.00 20" &
            Environment.New Line & _
            "12 12 06 A1234988 Sample Parts description 1 10.00 20"
            >
            Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
            System.EventArg s) Handles Button1.Click
            >
            Console.WriteLi ne()
            >
            Console.WriteLi ne("Sample 1")
            >
            ProcessData(m_s ource1)
            >
            Console.WriteLi ne()
            >
            Console.WriteLi ne("Sample 2")
            >
            ProcessData(m_s ource2)
            >
            Console.WriteLi ne()
            >
            Console.WriteLi ne("Sample 3")
            >
            ProcessData(m_s ource3)
            >
            Console.WriteLi ne()
            >
            Console.WriteLi ne("Sample 4")
            >
            ProcessData(m_s ource4)
            >
            Console.WriteLi ne()
            >
            Console.WriteLi ne("Sample 5")
            >
            ProcessData(m_s ource5)
            >
            Console.WriteLi ne()
            >
            End Sub
            >
            Private Sub ProcessData(ByV al source As String)
            >
            ' Assumption: Lines of data are seperated by a carriage return/line
            feed
            pair
            Dim _lines As String() = source.Split(Ne w String()
            {Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)
            >
            ' Determined by eyeballing data: All 'fields' are delimited by a pair
            of
            spaces
            Dim _ss As String() = _lines(0).Split (New String() {" "},
            StringSplitOpti ons.None)
            >
            ' Determine which line is the first line of actual data
            ' If the first line is a heading line then all characters of the first
            field will be letters
            Dim _lettercount As Integer = 0
            For Each _c As Char In _ss(0)
            If Char.IsLetter(_ c) Then _lettercount += 1
            Next
            Dim _firstline As Integer = 0
            If _lettercount = _ss(0).Length Then _firstline = 1
            >
            'Split the first actual line on the field delimiter
            _ss = _lines(_firstli ne).Split(New String() {" "},
            StringSplitOpti ons.None)
            >
            ' Determined by eyeballing data: The date field is always the first
            field in the line
            >
            ' Determine the delimiter to be used for the date format
            Dim _delimiter As String = ""
            If _ss(0).IndexOf( " ") 0 Then
            _delimiter = " "
            ElseIf _ss(0).IndexOf( "/") 0 Then
            _delimiter = "/"
            ElseIf _ss(0).IndexOf( "-") 0 Then
            _delimiter = "-"
            Else
            Console.WriteLi ne("Unable to determine delimiter out of " & _ss(0))
            Return
            End If
            Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")
            >
            ' Construct the date format to be used
            Dim _format As String = String.Empty
            ' Split the first field on the date format delimiter
            Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
            StringSplitOpti ons.None)
            If _parts.Length = 2 Then
            ' If there are 2 parts then we only have day and month components
            If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
            Char.IsLetter(_ parts(1).Chars( 0)) Then
            ' The 1st part starts with a digit and the 2nd part starts with a
            letter
            ' so we can assume that the 1st part is the day and the 2nd part
            is
            the month
            _format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
            If _parts(1).Lengt h 3 Then _format &= "M"
            ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
            Char.IsDigit(_p arts(1).Chars(0 )) Then
            ' Both parts start with a digit
            ' Start with the assumption that the 1st part is the day and the
            2nd
            part is the month
            _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
            String("M"c, _parts(0).Lengt h))
            If Integer.Parse(_ parts(1)) 12 Then
            ' The 1st part must be the month and the 2nd part must be the
            day
            _format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
            String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
            _parts(2).Lengt h)
            End If
            ' There is big gotcha here if both parts are < 12 and are
            different
            ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
            February or January 2
            End If
            ElseIf _parts.Length = 3 Then
            ' If there 3 parts then we have day, month and year components
            ' Assume that the year is always th 3rd part
            If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
            Char.IsLetter(_ parts(1).Chars( 0)) Then
            ' The 1st part starts with a digit and the 2nd part starts with a
            letter
            ' so we can assume that the 1st part is the day and the 2nd part
            is
            the month
            _format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
            If _parts(1).Lengt h 3 Then _format &= "M"
            _format &= _delimiter & New String("y"c, _parts(2).Lengt h)
            ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
            Char.IsDigit(_p arts(1).Chars(0 )) Then
            ' Both parts start with a digit
            ' Start with the assumption that the 1st part is the day and the
            2nd
            part is the month
            _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
            String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
            _parts(2).Lengt h)
            If Integer.Parse(_ parts(1)) 12 Then
            ' The 1st part must be the month and the 2nd part must be the
            day
            _format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
            String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
            _parts(2).Lengt h)
            End If
            ' There is big gotcha here if the forst two parts are < 12 and are
            different
            ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
            February or January 2
            End If
            End If
            If _format.Length = 0 Then
            ' We were unable to determine the date format from the available
            information
            Console.WriteLi ne("Unable to determine format from " & _ss(0))
            Return
            End If
            >
            ' We were able to determine the date format so we can continue and
            parse
            the dates
            Console.WriteLi ne("Determined format as " & _format)
            >
            ' Start from our actual first line of data
            For _i As Integer = _firstline To _lines.Length - 1
            _ss = _lines(_i).Spli t(New String() {" "}, StringSplitOpti ons.None)
            Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format,
            Nothing)
            Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted
            date:
            " & _date.ToString( "yyyy-MM-dd"))
            Next
            >
            End Sub
            >
            Note, from the results, that if there is no year part then
            DateTime.ParseE xact will interpret tahe date being in the current year as
            determined from the system date at the time the code is executed.
            >
            >
            "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
            news:uceG$z0MHH A.4916@TK2MSFTN GP06.phx.gbl...
            look like I am not expressing myself clearly. although the application
            does
            not know which format is used but does know for a given Set which date
            format I deals with and can expect the same format for a given Set of
            input.
            I should not have used the term batch but a set of record. The only
            possible variations are some records in certain sets may be split into 2
            lines but that is not critical as the conditions can be described before
            hand and normalized by the another parse component

            sample date

            Set1: date format mask is "dd MMM"
            Date Parts ID Parts Description location Quantitiy Unit Cost Total
            Cost
            11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
            15 Dec A1234988 Sample Parts description 1 10.00 20
            18 Dec A1234988 Sample Parts description 1 10.00 20
            19 Dec A1234988 Sample Parts description 1 10.00 20
            12 Dec A1234988 Sample Parts description 1 10.00 20


            Set 2 date format Mask is "dd MM yy"
            Date Parts ID Parts Description location Quantitiy Unit Cost Total
            Cost
            11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
            15 12 06 A1234988 Sample Parts description 1 10.00 20
            18 12 06 A1234988 Sample Parts description 1 10.00 20
            19 12 06 A1234988 Sample Parts description 1 10.00 20
            12 12 06 A1234988 Sample Parts description 1 10.00 20

            Set 3 date format mask "dd/MMM/06"
            Parts Description location Quantitiy Unit Cost Total Cost
            11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
            15/12/06 A1234988 Sample Parts description 1 10.00
            2018/12/06 A1234988 Sample Parts description 1 10.00
            2019/12/06 A1234988 Sample Parts description 1 10.00
            2012/12/06 A1234988 Sample Parts description 1 10.00 20

            Set 4 date format mask ""
            Date Parts ID Parts Description location Quantitiy Unit Cost Total
            Cost
            11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
            15/dec/06 A1234988 Sample Parts description 1 10.00 20
            18/dec/06 A1234988 Sample Parts description 1 10.00 20
            19/dec/06 A1234988 Sample Parts description 1 10.00 20
            12/dec/06 A1234988 Sample Parts description 1 10.00 20

            how do I deal with format without year, I do have cluse for other parts
            of
            teh originatin website and optional default set by user

            the sample data show variation of date format from set to set but the
            date
            format that I need to deal within a given set are consistant and user
            has
            influence to date format mask used.

            Like Cor suggestion. don't let user enter the format but let the user
            pick
            from a list. that will like be case at least n the version 0


            "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
            news:eFm$y5rMHH A.4376@TK2MSFTN GP03.phx.gbl...
            You are sort of on the same track as mine.
            >
            >
            I must first apologize I did not tell you the complete story.
            >
            Although the application does not exactly know before hand what format
            the
            data may come in, however part of the application allow user to define
            and
            record favourite for a website
            - to extract by text or html
            - header content and format
            - record format and date format ( that is where the date format
            mask
            come in)
            - optionally ordinal number for each column or re-ordering
            - trailer content and format
            >
            For a given batch, at least for the body, date format are uniform
            >
            furthermore, the need to make the extract process generic and adaptable
            to
            the front end that takes the user definitions, I believe it would be
            easier
            to "normalize" date string to "yyyy-mm-dd".
            >
            Also the end target for of may not necessarily be SQL database but may
            be
            text, pasted to word report. or excel by user
            >
            >
            Therefore, I can transform the date format mask to regex in the
            appropriate
            format and identifier I can use regex,replace to normalize the date.
            As
            a
            matter of fact the date separator does not have to / but can be space
            as
            long as there are identifiable delimiter around the date string.
            >
            I already have code for dealing with regex for dates from prior
            project.
            all I have to do is adapt to the present need
            >
            who knows, maybe I taken on a totally offbeat tract
            >
            "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
            news:%23vnOBJiM HHA.1280@TK2MSF TNGP04.phx.gbl. ..
            thanks for all pitched in so far.

            let give it another shot.

            looks like an easier way out would be
            1.copy the date format string regex string holder and then derive
            the
            relevant regex expression to be used for date normalization later in
            part
            2:
            replace the regex string the yyyy to regex year expression with
            year
            identifier
            look for yy and replace with 20yy and repeat the step above
            replace mmm with the month regex expression associated with month
            identifier
            replace mm with the 2 digit month regex expression associated
            with
            month
            identifier
            replace dd with the 2 digit day regix expression assoc. with day
            identifier

            2. use the resulting regex in regex replace to normalize to
            yyyy--mm-dd


            any problem with the above approach?

            "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
            news:%23Qj7TbWM HHA.3944@TK2MSF TNGP06.phx.gbl. ..
            GS,
            >
            Maybe can you avoid this in 2007 and all things like that as
            DateTime.parseE xact, but have a look to the nicely by Microsoft
            inbuild
            globalization and than the to that related ToString option.
            >
            Cor
            >
            "gs" <gs@dontMail.te lusschreef in bericht
            news:OtrnsPTMHH A.4720@TK2MSFTN GP03.phx.gbl...
            let say I have to deal with various date format and I am give
            format
            string from one of the following
            dd/mm/yyyy
            mm/dd/yyyy
            dd/mmm/yyyy
            mmm/dd/yyyy
            dd/mm/yy
            mm/dd/yy
            dd/mmm/yy
            mmm/dd/yy
            dd/mm
            what is the best way to come up a relevant regex for the incoming
            format
            string
            a) use two array and statically match
            b) use regex to find the order

            >
            >


            >
            >
            >
            >

            Comment

            • GS

              #21
              Re: best design for parse- resent please ignore prev

              oops. Please pardon my bad typo and proof reading
              "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
              news:OeomX07MHH A.2456@TK2MSFTN GP06.phx.gbl...
              I see the code (you put a lot effort in) does work on a lot of cases. much
              appreciated.

              However would it not be better we get assistance from user who knows what
              date format being used?

              That is the rationale I let user somehow pick the date format mask.
              Guessing date format is tough to master for all cases. Not only months,
              days can be
              indeterminate at times; worse when 2 digit year is used. I have seen some
              sample data that is way out of ordinary date format commonly seen in US.

              Relying the first 1 or 2 being numeric would miss out quite a few cases.
              Nonetheless. the code can be a default process in absence of user spec. .
              thank you
              very much for that

              Sorry for misleading you with incomplete data samples.
              There are sample data set where the first column is not date. on the other
              sometimes first 2 columns can also be dates as well as rarely another column
              else where can to date. this sound incredulous but that's what users have
              to contend with.
              "Stephany Young" <noone@localhos twrote in message
              news:edlcDJ5MHH A.1252@TK2MSFTN GP02.phx.gbl...
              Now we're cooking with gas. I think that regex is overkill for this
              'problem'. Sure, you can use it if you wish but I think you will be
              making
              a
              rod for your own back.

              Here is a solution that works for your sample data. Create a Windows
              Forms
              project, plonk a button on the form and paste the following into the
              form:

              Private m_source1 As String = "Date Parts ID Parts Description
              location
              Quantitiy Unit Cost Total Cost" & Environment.New Line & _
              "11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
              Environment.New Line & _
              "15 Dec A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "18 Dec A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "19 Dec A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12 Dec A1234988 Sample Parts description 1 10.00 20"

              Private m_source2 As String = "Date Parts ID Parts Description
              location
              Quantitiy Unit Cost Total Cost" & Environment.New Line & _
              "11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
              &
              Environment.New Line & _
              "15 12 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "18 12 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "19 12 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12 12 06 A1234988 Sample Parts description 1 10.00 20"

              Private m_source3 As String = "Parts Parts ID Description location
              Quantitiy Unit Cost Total Cost" & Environment.New Line & _
              "11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
              &
              Environment.New Line & _
              "15/12/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "18/12/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "19/12/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12/12/06 A1234988 Sample Parts description 1 10.00 20"

              Private m_source4 As String = "Date Parts ID Parts Description
              location
              Quantitiy Unit Cost Total Cost" & Environment.New Line & _
              "11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
              &
              Environment.New Line & _
              "15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12/dec/06 A1234988 Sample Parts description 1 10.00 20"

              Private m_source5 As String = "Date Parts ID Parts Description
              location
              Quantitiy Unit Cost Total Cost" & Environment.New Line & _
              "12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
              &
              Environment.New Line & _
              "12 15 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12 18 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12 19 06 A1234988 Sample Parts description 1 10.00 20" &
              Environment.New Line & _
              "12 12 06 A1234988 Sample Parts description 1 10.00 20"

              Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
              System.EventArg s) Handles Button1.Click

              Console.WriteLi ne()

              Console.WriteLi ne("Sample 1")

              ProcessData(m_s ource1)

              Console.WriteLi ne()

              Console.WriteLi ne("Sample 2")

              ProcessData(m_s ource2)

              Console.WriteLi ne()

              Console.WriteLi ne("Sample 3")

              ProcessData(m_s ource3)

              Console.WriteLi ne()

              Console.WriteLi ne("Sample 4")

              ProcessData(m_s ource4)

              Console.WriteLi ne()

              Console.WriteLi ne("Sample 5")

              ProcessData(m_s ource5)

              Console.WriteLi ne()

              End Sub

              Private Sub ProcessData(ByV al source As String)

              ' Assumption: Lines of data are seperated by a carriage return/line
              feed
              pair
              Dim _lines As String() = source.Split(Ne w String()
              {Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)

              ' Determined by eyeballing data: All 'fields' are delimited by a
              pair
              of
              spaces
              Dim _ss As String() = _lines(0).Split (New String() {" "},
              StringSplitOpti ons.None)

              ' Determine which line is the first line of actual data
              ' If the first line is a heading line then all characters of the
              first
              field will be letters
              Dim _lettercount As Integer = 0
              For Each _c As Char In _ss(0)
              If Char.IsLetter(_ c) Then _lettercount += 1
              Next
              Dim _firstline As Integer = 0
              If _lettercount = _ss(0).Length Then _firstline = 1

              'Split the first actual line on the field delimiter
              _ss = _lines(_firstli ne).Split(New String() {" "},
              StringSplitOpti ons.None)

              ' Determined by eyeballing data: The date field is always the first
              field in the line

              ' Determine the delimiter to be used for the date format
              Dim _delimiter As String = ""
              If _ss(0).IndexOf( " ") 0 Then
              _delimiter = " "
              ElseIf _ss(0).IndexOf( "/") 0 Then
              _delimiter = "/"
              ElseIf _ss(0).IndexOf( "-") 0 Then
              _delimiter = "-"
              Else
              Console.WriteLi ne("Unable to determine delimiter out of " &
              _ss(0))
              Return
              End If
              Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")

              ' Construct the date format to be used
              Dim _format As String = String.Empty
              ' Split the first field on the date format delimiter
              Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
              StringSplitOpti ons.None)
              If _parts.Length = 2 Then
              ' If there are 2 parts then we only have day and month components
              If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
              Char.IsLetter(_ parts(1).Chars( 0)) Then
              ' The 1st part starts with a digit and the 2nd part starts with
              a
              letter
              ' so we can assume that the 1st part is the day and the 2nd part
              is
              the month
              _format = New String("d"c, _parts(0).Lengt h) & _delimiter &
              "MMM"
              If _parts(1).Lengt h 3 Then _format &= "M"
              ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
              Char.IsDigit(_p arts(1).Chars(0 )) Then
              ' Both parts start with a digit
              ' Start with the assumption that the 1st part is the day and the
              2nd
              part is the month
              _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
              String("M"c, _parts(0).Lengt h))
              If Integer.Parse(_ parts(1)) 12 Then
              ' The 1st part must be the month and the 2nd part must be the
              day
              _format = New String("M"c, _parts(0).Lengt h) & _delimiter &
              New
              String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
              _parts(2).Lengt h)
              End If
              ' There is big gotcha here if both parts are < 12 and are
              different
              ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
              February or January 2
              End If
              ElseIf _parts.Length = 3 Then
              ' If there 3 parts then we have day, month and year components
              ' Assume that the year is always th 3rd part
              If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
              Char.IsLetter(_ parts(1).Chars( 0)) Then
              ' The 1st part starts with a digit and the 2nd part starts with
              a
              letter
              ' so we can assume that the 1st part is the day and the 2nd part
              is
              the month
              _format = New String("d"c, _parts(0).Lengt h) & _delimiter &
              "MMM"
              If _parts(1).Lengt h 3 Then _format &= "M"
              _format &= _delimiter & New String("y"c, _parts(2).Lengt h)
              ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
              Char.IsDigit(_p arts(1).Chars(0 )) Then
              ' Both parts start with a digit
              ' Start with the assumption that the 1st part is the day and the
              2nd
              part is the month
              _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
              String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
              _parts(2).Lengt h)
              If Integer.Parse(_ parts(1)) 12 Then
              ' The 1st part must be the month and the 2nd part must be the
              day
              _format = New String("M"c, _parts(0).Lengt h) & _delimiter &
              New
              String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
              _parts(2).Lengt h)
              End If
              ' There is big gotcha here if the forst two parts are < 12 and
              are
              different
              ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
              February or January 2
              End If
              End If
              If _format.Length = 0 Then
              ' We were unable to determine the date format from the available
              information
              Console.WriteLi ne("Unable to determine format from " & _ss(0))
              Return
              End If

              ' We were able to determine the date format so we can continue and
              parse
              the dates
              Console.WriteLi ne("Determined format as " & _format)

              ' Start from our actual first line of data
              For _i As Integer = _firstline To _lines.Length - 1
              _ss = _lines(_i).Spli t(New String() {" "},
              StringSplitOpti ons.None)
              Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format,
              Nothing)
              Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted
              date:
              " & _date.ToString( "yyyy-MM-dd"))
              Next

              End Sub

              Note, from the results, that if there is no year part then
              DateTime.ParseE xact will interpret tahe date being in the current year
              as
              determined from the system date at the time the code is executed.


              "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
              news:uceG$z0MHH A.4916@TK2MSFTN GP06.phx.gbl...
              look like I am not expressing myself clearly. although the
              application
              does
              not know which format is used but does know for a given Set which date
              format I deals with and can expect the same format for a given Set of
              input.
              I should not have used the term batch but a set of record. The only
              possible variations are some records in certain sets may be split into
              2
              lines but that is not critical as the conditions can be described
              before
              hand and normalized by the another parse component
              >
              sample date
              >
              Set1: date format mask is "dd MMM"
              Date Parts ID Parts Description location Quantitiy Unit Cost
              Total
              Cost
              11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
              15 Dec A1234988 Sample Parts description 1 10.00 20
              18 Dec A1234988 Sample Parts description 1 10.00 20
              19 Dec A1234988 Sample Parts description 1 10.00 20
              12 Dec A1234988 Sample Parts description 1 10.00 20
              >
              >
              Set 2 date format Mask is "dd MM yy"
              Date Parts ID Parts Description location Quantitiy Unit Cost
              Total
              Cost
              11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00
              20.00
              15 12 06 A1234988 Sample Parts description 1 10.00 20
              18 12 06 A1234988 Sample Parts description 1 10.00 20
              19 12 06 A1234988 Sample Parts description 1 10.00 20
              12 12 06 A1234988 Sample Parts description 1 10.00 20
              >
              Set 3 date format mask "dd/MMM/06"
              Parts Description location Quantitiy Unit Cost Total Cost
              11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
              20.00
              15/12/06 A1234988 Sample Parts description 1 10.00
              2018/12/06 A1234988 Sample Parts description 1 10.00
              2019/12/06 A1234988 Sample Parts description 1 10.00
              2012/12/06 A1234988 Sample Parts description 1 10.00 20
              >
              Set 4 date format mask ""
              Date Parts ID Parts Description location Quantitiy Unit Cost
              Total
              Cost
              11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
              20.00
              15/dec/06 A1234988 Sample Parts description 1 10.00 20
              18/dec/06 A1234988 Sample Parts description 1 10.00 20
              19/dec/06 A1234988 Sample Parts description 1 10.00 20
              12/dec/06 A1234988 Sample Parts description 1 10.00 20
              >
              how do I deal with format without year, I do have cluse for other
              parts
              of
              teh originatin website and optional default set by user
              >
              the sample data show variation of date format from set to set but the
              date
              format that I need to deal within a given set are consistant and user
              has
              influence to date format mask used.
              >
              Like Cor suggestion. don't let user enter the format but let the user
              pick
              from a list. that will like be case at least n the version 0
              >
              >
              "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
              news:eFm$y5rMHH A.4376@TK2MSFTN GP03.phx.gbl...
              >You are sort of on the same track as mine.
              >>
              >>
              >I must first apologize I did not tell you the complete story.
              >>
              >Although the application does not exactly know before hand what
              format
              >the
              >data may come in, however part of the application allow user to
              define
              >and
              >record favourite for a website
              > - to extract by text or html
              > - header content and format
              > - record format and date format ( that is where the date format
              mask
              >come in)
              > - optionally ordinal number for each column or re-ordering
              > - trailer content and format
              >>
              >For a given batch, at least for the body, date format are uniform
              >>
              >furthermore, the need to make the extract process generic and
              adaptable
              >to
              >the front end that takes the user definitions, I believe it would be
              easier
              >to "normalize" date string to "yyyy-mm-dd".
              >>
              >Also the end target for of may not necessarily be SQL database but
              may
              be
              >text, pasted to word report. or excel by user
              >>
              >>
              >Therefore, I can transform the date format mask to regex in the
              appropriate
              >format and identifier I can use regex,replace to normalize the date.
              As
              >a
              >matter of fact the date separator does not have to / but can be space
              as
              >long as there are identifiable delimiter around the date string.
              >>
              >I already have code for dealing with regex for dates from prior
              project.
              >all I have to do is adapt to the present need
              >>
              >who knows, maybe I taken on a totally offbeat tract
              >>
              >"GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
              >news:%23vnOBJi MHHA.1280@TK2MS FTNGP04.phx.gbl ...
              thanks for all pitched in so far.
              >
              let give it another shot.
              >
              looks like an easier way out would be
              1.copy the date format string regex string holder and then derive
              the
              relevant regex expression to be used for date normalization later
              in
              part
              >2:
              replace the regex string the yyyy to regex year expression with
              year
              identifier
              look for yy and replace with 20yy and repeat the step above
              replace mmm with the month regex expression associated with
              month
              identifier
              replace mm with the 2 digit month regex expression associated
              with
              >month
              identifier
              replace dd with the 2 digit day regix expression assoc. with
              day
              identifier
              >
              2. use the resulting regex in regex replace to normalize to
              yyyy--mm-dd
              >
              >
              any problem with the above approach?
              >
              "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
              news:%23Qj7TbWM HHA.3944@TK2MSF TNGP06.phx.gbl. ..
              GS,
              >
              Maybe can you avoid this in 2007 and all things like that as
              DateTime.parseE xact, but have a look to the nicely by Microsoft
              inbuild
              globalization and than the to that related ToString option.
              >
              Cor
              >
              "gs" <gs@dontMail.te lusschreef in bericht
              news:OtrnsPTMHH A.4720@TK2MSFTN GP03.phx.gbl...
              let say I have to deal with various date format and I am give
              format
              string from one of the following
              dd/mm/yyyy
              mm/dd/yyyy
              dd/mmm/yyyy
              mmm/dd/yyyy
              dd/mm/yy
              mm/dd/yy
              dd/mmm/yy
              mmm/dd/yy
              dd/mm
              what is the best way to come up a relevant regex for the
              incoming
              >format
              string
              a) use two array and statically match
              b) use regex to find the order

              >
              >
              >
              >
              >>
              >>
              >
              >
              >
              >

              Comment

              • Stephany Young

                #22
                Re: best design for parse- resent please ignore prev

                Now I'm confused.

                You have being the impression that you don't have any control on how the
                data is 'gathered'.

                Now you seem to be saying that you do have control.

                If that is the case simply validate the data at the time the user inputs it.

                If that is not the case then I think it's time you explained the big
                picture.


                "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
                news:eSykQ87MHH A.3424@TK2MSFTN GP02.phx.gbl...
                oops. Please pardon my bad typo and proof reading
                "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
                news:OeomX07MHH A.2456@TK2MSFTN GP06.phx.gbl...
                I see the code (you put a lot effort in) does work on a lot of cases.
                much
                appreciated.
                >
                However would it not be better we get assistance from user who knows what
                date format being used?
                >
                That is the rationale I let user somehow pick the date format mask.
                Guessing date format is tough to master for all cases. Not only months,
                days can be
                indeterminate at times; worse when 2 digit year is used. I have seen
                some
                sample data that is way out of ordinary date format commonly seen in US.
                >
                Relying the first 1 or 2 being numeric would miss out quite a few cases.
                Nonetheless. the code can be a default process in absence of user spec. .
                thank you
                very much for that
                >
                Sorry for misleading you with incomplete data samples.
                There are sample data set where the first column is not date. on the
                other
                sometimes first 2 columns can also be dates as well as rarely another
                column
                else where can to date. this sound incredulous but that's what users have
                to contend with.
                >
                >"Stephany Young" <noone@localhos twrote in message
                >news:edlcDJ5MH HA.1252@TK2MSFT NGP02.phx.gbl.. .
                Now we're cooking with gas. I think that regex is overkill for this
                'problem'. Sure, you can use it if you wish but I think you will be
                making
                >a
                rod for your own back.
                >
                Here is a solution that works for your sample data. Create a Windows
                Forms
                project, plonk a button on the form and paste the following into the
                form:
                >
                Private m_source1 As String = "Date Parts ID Parts Description
                location
                Quantitiy Unit Cost Total Cost" & Environment.New Line & _
                "11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
                Environment.New Line & _
                "15 Dec A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "18 Dec A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "19 Dec A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12 Dec A1234988 Sample Parts description 1 10.00 20"
                >
                Private m_source2 As String = "Date Parts ID Parts Description
                location
                Quantitiy Unit Cost Total Cost" & Environment.New Line & _
                "11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
                &
                Environment.New Line & _
                "15 12 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "18 12 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "19 12 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12 12 06 A1234988 Sample Parts description 1 10.00 20"
                >
                Private m_source3 As String = "Parts Parts ID Description location
                Quantitiy Unit Cost Total Cost" & Environment.New Line & _
                "11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
                &
                Environment.New Line & _
                "15/12/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "18/12/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "19/12/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12/12/06 A1234988 Sample Parts description 1 10.00 20"
                >
                Private m_source4 As String = "Date Parts ID Parts Description
                location
                Quantitiy Unit Cost Total Cost" & Environment.New Line & _
                "11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
                20.00"
                &
                Environment.New Line & _
                "15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12/dec/06 A1234988 Sample Parts description 1 10.00 20"
                >
                Private m_source5 As String = "Date Parts ID Parts Description
                location
                Quantitiy Unit Cost Total Cost" & Environment.New Line & _
                "12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
                &
                Environment.New Line & _
                "12 15 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12 18 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12 19 06 A1234988 Sample Parts description 1 10.00 20" &
                Environment.New Line & _
                "12 12 06 A1234988 Sample Parts description 1 10.00 20"
                >
                Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
                System.EventArg s) Handles Button1.Click
                >
                Console.WriteLi ne()
                >
                Console.WriteLi ne("Sample 1")
                >
                ProcessData(m_s ource1)
                >
                Console.WriteLi ne()
                >
                Console.WriteLi ne("Sample 2")
                >
                ProcessData(m_s ource2)
                >
                Console.WriteLi ne()
                >
                Console.WriteLi ne("Sample 3")
                >
                ProcessData(m_s ource3)
                >
                Console.WriteLi ne()
                >
                Console.WriteLi ne("Sample 4")
                >
                ProcessData(m_s ource4)
                >
                Console.WriteLi ne()
                >
                Console.WriteLi ne("Sample 5")
                >
                ProcessData(m_s ource5)
                >
                Console.WriteLi ne()
                >
                End Sub
                >
                Private Sub ProcessData(ByV al source As String)
                >
                ' Assumption: Lines of data are seperated by a carriage return/line
                >feed
                pair
                Dim _lines As String() = source.Split(Ne w String()
                {Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)
                >
                ' Determined by eyeballing data: All 'fields' are delimited by a
                pair
                >of
                spaces
                Dim _ss As String() = _lines(0).Split (New String() {" "},
                StringSplitOpti ons.None)
                >
                ' Determine which line is the first line of actual data
                ' If the first line is a heading line then all characters of the
                first
                field will be letters
                Dim _lettercount As Integer = 0
                For Each _c As Char In _ss(0)
                If Char.IsLetter(_ c) Then _lettercount += 1
                Next
                Dim _firstline As Integer = 0
                If _lettercount = _ss(0).Length Then _firstline = 1
                >
                'Split the first actual line on the field delimiter
                _ss = _lines(_firstli ne).Split(New String() {" "},
                StringSplitOpti ons.None)
                >
                ' Determined by eyeballing data: The date field is always the first
                field in the line
                >
                ' Determine the delimiter to be used for the date format
                Dim _delimiter As String = ""
                If _ss(0).IndexOf( " ") 0 Then
                _delimiter = " "
                ElseIf _ss(0).IndexOf( "/") 0 Then
                _delimiter = "/"
                ElseIf _ss(0).IndexOf( "-") 0 Then
                _delimiter = "-"
                Else
                Console.WriteLi ne("Unable to determine delimiter out of " &
                _ss(0))
                Return
                End If
                Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")
                >
                ' Construct the date format to be used
                Dim _format As String = String.Empty
                ' Split the first field on the date format delimiter
                Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
                StringSplitOpti ons.None)
                If _parts.Length = 2 Then
                ' If there are 2 parts then we only have day and month components
                If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
                Char.IsLetter(_ parts(1).Chars( 0)) Then
                ' The 1st part starts with a digit and the 2nd part starts with
                a
                letter
                ' so we can assume that the 1st part is the day and the 2nd
                part
                >is
                the month
                _format = New String("d"c, _parts(0).Lengt h) & _delimiter &
                "MMM"
                If _parts(1).Lengt h 3 Then _format &= "M"
                ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
                Char.IsDigit(_p arts(1).Chars(0 )) Then
                ' Both parts start with a digit
                ' Start with the assumption that the 1st part is the day and
                the
                >2nd
                part is the month
                _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
                String("M"c, _parts(0).Lengt h))
                If Integer.Parse(_ parts(1)) 12 Then
                ' The 1st part must be the month and the 2nd part must be the
                >day
                _format = New String("M"c, _parts(0).Lengt h) & _delimiter &
                New
                String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
                _parts(2).Lengt h)
                End If
                ' There is big gotcha here if both parts are < 12 and are
                >different
                ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be
                1
                February or January 2
                End If
                ElseIf _parts.Length = 3 Then
                ' If there 3 parts then we have day, month and year components
                ' Assume that the year is always th 3rd part
                If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
                Char.IsLetter(_ parts(1).Chars( 0)) Then
                ' The 1st part starts with a digit and the 2nd part starts with
                a
                letter
                ' so we can assume that the 1st part is the day and the 2nd
                part
                >is
                the month
                _format = New String("d"c, _parts(0).Lengt h) & _delimiter &
                "MMM"
                If _parts(1).Lengt h 3 Then _format &= "M"
                _format &= _delimiter & New String("y"c, _parts(2).Lengt h)
                ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
                Char.IsDigit(_p arts(1).Chars(0 )) Then
                ' Both parts start with a digit
                ' Start with the assumption that the 1st part is the day and
                the
                >2nd
                part is the month
                _format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
                String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
                _parts(2).Lengt h)
                If Integer.Parse(_ parts(1)) 12 Then
                ' The 1st part must be the month and the 2nd part must be the
                >day
                _format = New String("M"c, _parts(0).Lengt h) & _delimiter &
                New
                String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
                _parts(2).Lengt h)
                End If
                ' There is big gotcha here if the forst two parts are < 12 and
                are
                different
                ' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be
                1
                February or January 2
                End If
                End If
                If _format.Length = 0 Then
                ' We were unable to determine the date format from the available
                information
                Console.WriteLi ne("Unable to determine format from " & _ss(0))
                Return
                End If
                >
                ' We were able to determine the date format so we can continue and
                >parse
                the dates
                Console.WriteLi ne("Determined format as " & _format)
                >
                ' Start from our actual first line of data
                For _i As Integer = _firstline To _lines.Length - 1
                _ss = _lines(_i).Spli t(New String() {" "},
                StringSplitOpti ons.None)
                Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format,
                >Nothing)
                Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted
                >date:
                " & _date.ToString( "yyyy-MM-dd"))
                Next
                >
                End Sub
                >
                Note, from the results, that if there is no year part then
                DateTime.ParseE xact will interpret tahe date being in the current year
                as
                determined from the system date at the time the code is executed.
                >
                >
                "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
                news:uceG$z0MHH A.4916@TK2MSFTN GP06.phx.gbl...
                look like I am not expressing myself clearly. although the
                application
                does
                not know which format is used but does know for a given Set which
                date
                format I deals with and can expect the same format for a given Set of
                input.
                I should not have used the term batch but a set of record. The only
                possible variations are some records in certain sets may be split
                into
                2
                lines but that is not critical as the conditions can be described
                before
                hand and normalized by the another parse component
                >
                sample date
                >
                Set1: date format mask is "dd MMM"
                Date Parts ID Parts Description location Quantitiy Unit Cost
                Total
                Cost
                11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
                15 Dec A1234988 Sample Parts description 1 10.00 20
                18 Dec A1234988 Sample Parts description 1 10.00 20
                19 Dec A1234988 Sample Parts description 1 10.00 20
                12 Dec A1234988 Sample Parts description 1 10.00 20
                >
                >
                Set 2 date format Mask is "dd MM yy"
                Date Parts ID Parts Description location Quantitiy Unit Cost
                Total
                Cost
                11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00
                20.00
                15 12 06 A1234988 Sample Parts description 1 10.00 20
                18 12 06 A1234988 Sample Parts description 1 10.00 20
                19 12 06 A1234988 Sample Parts description 1 10.00 20
                12 12 06 A1234988 Sample Parts description 1 10.00 20
                >
                Set 3 date format mask "dd/MMM/06"
                Parts Description location Quantitiy Unit Cost Total Cost
                11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
                20.00
                15/12/06 A1234988 Sample Parts description 1 10.00
                2018/12/06 A1234988 Sample Parts description 1 10.00
                2019/12/06 A1234988 Sample Parts description 1 10.00
                2012/12/06 A1234988 Sample Parts description 1 10.00 20
                >
                Set 4 date format mask ""
                Date Parts ID Parts Description location Quantitiy Unit Cost
                Total
                Cost
                11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
                20.00
                15/dec/06 A1234988 Sample Parts description 1 10.00 20
                18/dec/06 A1234988 Sample Parts description 1 10.00 20
                19/dec/06 A1234988 Sample Parts description 1 10.00 20
                12/dec/06 A1234988 Sample Parts description 1 10.00 20
                >
                how do I deal with format without year, I do have cluse for other
                parts
                >of
                teh originatin website and optional default set by user
                >
                the sample data show variation of date format from set to set but the
                >date
                format that I need to deal within a given set are consistant and user
                >has
                influence to date format mask used.
                >
                Like Cor suggestion. don't let user enter the format but let the user
                >pick
                from a list. that will like be case at least n the version 0
                >
                >
                "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
                news:eFm$y5rMHH A.4376@TK2MSFTN GP03.phx.gbl...
                >You are sort of on the same track as mine.
                >>
                >>
                >I must first apologize I did not tell you the complete story.
                >>
                >Although the application does not exactly know before hand what
                format
                >the
                >data may come in, however part of the application allow user to
                define
                >and
                >record favourite for a website
                > - to extract by text or html
                > - header content and format
                > - record format and date format ( that is where the date format
                >mask
                >come in)
                > - optionally ordinal number for each column or re-ordering
                > - trailer content and format
                >>
                >For a given batch, at least for the body, date format are uniform
                >>
                >furthermore, the need to make the extract process generic and
                adaptable
                >to
                >the front end that takes the user definitions, I believe it would be
                easier
                >to "normalize" date string to "yyyy-mm-dd".
                >>
                >Also the end target for of may not necessarily be SQL database but
                may
                >be
                >text, pasted to word report. or excel by user
                >>
                >>
                >Therefore, I can transform the date format mask to regex in the
                appropriate
                >format and identifier I can use regex,replace to normalize the date.
                >As
                >a
                >matter of fact the date separator does not have to / but can be
                >space
                >as
                >long as there are identifiable delimiter around the date string.
                >>
                >I already have code for dealing with regex for dates from prior
                >project.
                >all I have to do is adapt to the present need
                >>
                >who knows, maybe I taken on a totally offbeat tract
                >>
                >"GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comwr ote in message
                >news:%23vnOBJi MHHA.1280@TK2MS FTNGP04.phx.gbl ...
                thanks for all pitched in so far.
                >
                let give it another shot.
                >
                looks like an easier way out would be
                1.copy the date format string regex string holder and then derive
                >the
                relevant regex expression to be used for date normalization later
                in
                part
                >2:
                replace the regex string the yyyy to regex year expression
                with
                year
                identifier
                look for yy and replace with 20yy and repeat the step above
                replace mmm with the month regex expression associated with
                month
                identifier
                replace mm with the 2 digit month regex expression associated
                >with
                >month
                identifier
                replace dd with the 2 digit day regix expression assoc. with
                day
                identifier
                >
                2. use the resulting regex in regex replace to normalize to
                >yyyy--mm-dd
                >
                >
                any problem with the above approach?
                >
                "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
                news:%23Qj7TbWM HHA.3944@TK2MSF TNGP06.phx.gbl. ..
                GS,
                >
                Maybe can you avoid this in 2007 and all things like that as
                DateTime.parseE xact, but have a look to the nicely by Microsoft
                inbuild
                globalization and than the to that related ToString option.
                >
                Cor
                >
                "gs" <gs@dontMail.te lusschreef in bericht
                news:OtrnsPTMHH A.4720@TK2MSFTN GP03.phx.gbl...
                let say I have to deal with various date format and I am give
                format
                string from one of the following
                dd/mm/yyyy
                mm/dd/yyyy
                dd/mmm/yyyy
                mmm/dd/yyyy
                dd/mm/yy
                mm/dd/yy
                dd/mmm/yy
                mmm/dd/yy
                dd/mm
                what is the best way to come up a relevant regex for the
                incoming
                >format
                string
                a) use two array and statically match
                b) use regex to find the order

                >
                >
                >
                >
                >>
                >>
                >
                >
                >
                >
                >>
                >>
                >
                >

                Comment

                • kgerritsen

                  #23
                  Re: best design for parse


                  Cor Ligthert [MVP] wrote:
                  Stephany,
                  >
                  I am curious, what does this phrase mean, I don't know it.
                  >
                  Now we're cooking with gas.
                  >
                  (Living in Holland which is above one of the former biggest gasbells of
                  Europe)
                  >
                  Cor
                  It's somewhat, in some senses but not perfectly, an opposite of
                  "gezellig"

                  :)

                  Comment

                  • Cor Ligthert [MVP]

                    #24
                    Re: best design for parse- resent please ignore prev

                    GS,

                    You are assuming something that is standard. For today I can write in my
                    country

                    9-1-07
                    09-1-07
                    9-01-7
                    09-1-2007
                    9-jan-2007
                    9-januari-2007
                    etc. not any law tels me how to do it, it is not seldom done as
                    2007-01-09 as well in ISO.

                    And than every culture has its own style.


                    Cor


                    Comment

                    • Stephany Young

                      #25
                      Re: best design for parse

                      If my understanding of the meaning 'gezellig' is correct, then 'cooking with
                      gas' is nothing like any sort of opposite of 'gezellig'.

                      My understanding of 'gezellig' is that, although not directly translatable,
                      means something like 'feeling good amongst family and/or friends' but has
                      much more subtle meanings than that.

                      'Cooking with gas' means to be working fast/proceeding rapidly. For example:

                      After working with thos old hand tools, power tools will
                      make you feel like you are really cooking with gas.

                      Metaphorically it is comparing a gas cooker where you get instant heat when
                      you light it, to other cookers (electric, wood coal, etc) where they take a
                      while to warm up.


                      "kgerritsen " <kig25@drexel.e duwrote in message
                      news:1168354601 .946949.308190@ s34g2000cwa.goo glegroups.com.. .
                      >
                      Cor Ligthert [MVP] wrote:
                      >Stephany,
                      >>
                      >I am curious, what does this phrase mean, I don't know it.
                      >>
                      Now we're cooking with gas.
                      >>
                      >(Living in Holland which is above one of the former biggest gasbells of
                      >Europe)
                      >>
                      >Cor
                      >
                      It's somewhat, in some senses but not perfectly, an opposite of
                      "gezellig"
                      >
                      :)
                      >

                      Comment

                      • GS

                        #26
                        Re: best design for parse- resent please ignore prev

                        you're right every culture has it won style.

                        However the current project charter covers only data from
                        professional/commercial site with reputation of accuracy of date format
                        and content
                        other application that has consistent date format
                        of course if we can cover other unusual date format variations without
                        exceeding budget, it will be welcomed but I sure don't want to get involved
                        until everything is completed for the charter.

                        the date in the data gathered by user ( they don't key directly, thank
                        goodness)do fall in 2 digit day and month, standard English 3 letter month
                        or full months - no spelling errors. The users don't really enter the data.
                        user controls the site the application to visit. One way or another user
                        specify a date format mask for the data to be processed

                        NO the component and application is not expected to handle spelling error
                        but expected to deal with common date format in US, Canada(English ). There
                        may be more later on but that is not my worry for this project scope.

                        thanks to the aborted metrication (and so call freedom of speech) there are
                        a few more variants of date format from US.


                        As of yyyy-mm-dd format is a safe common format to use in N. America for
                        software published by Microsoft. I have yet to seen any Microsoft
                        application fails to convert the yyyy-mm-dd string to date properly among my
                        users base unless they arbitrary to set the windows date format to
                        yyyy-dd-mm

                        I suppose using yyyy-MMM-dd as the remediate string date will avoid that
                        issue al together

                        The real tricky part is to validate the users' date format mask against
                        actual data found. that is why regex replace was tempting to me


                        you are right regex replace will still not be able handle all mistakes


                        "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
                        news:u8MdYbBNHH A.4992@TK2MSFTN GP04.phx.gbl...
                        GS,
                        >
                        You are assuming something that is standard. For today I can write in my
                        country
                        >
                        9-1-07
                        09-1-07
                        9-01-7
                        09-1-2007
                        9-jan-2007
                        9-januari-2007
                        etc. not any law tels me how to do it, it is not seldom done as
                        2007-01-09 as well in ISO.
                        >
                        And than every culture has its own style.
                        >
                        >
                        Cor
                        >
                        >

                        Comment

                        • Cor Ligthert [MVP]

                          #27
                          Re: best design for parse- resent please ignore prev

                          GS,

                          In my idea dit you not see my first advice just checking agains the DateTime
                          with TryParse will give you direct the idea if the dateTime can be valid.

                          Another addition. Canada(English) has AFAIK the same date time patern as all
                          former and current Gemenebest members in the parts where English is the
                          spoken languages.

                          Cor

                          "GS" <gsmsnews.micro soft.comGS@msne ws.Nomail.comsc hreef in bericht
                          news:esQ5ZHJNHH A.4720@TK2MSFTN GP03.phx.gbl...
                          you're right every culture has it won style.
                          >
                          However the current project charter covers only data from
                          professional/commercial site with reputation of accuracy of date format
                          and content
                          other application that has consistent date format
                          of course if we can cover other unusual date format variations without
                          exceeding budget, it will be welcomed but I sure don't want to get
                          involved
                          until everything is completed for the charter.
                          >
                          the date in the data gathered by user ( they don't key directly, thank
                          goodness)do fall in 2 digit day and month, standard English 3 letter
                          month
                          or full months - no spelling errors. The users don't really enter the
                          data.
                          user controls the site the application to visit. One way or another user
                          specify a date format mask for the data to be processed
                          >
                          NO the component and application is not expected to handle spelling error
                          but expected to deal with common date format in US, Canada(English ).
                          There
                          may be more later on but that is not my worry for this project scope.
                          >
                          thanks to the aborted metrication (and so call freedom of speech) there
                          are
                          a few more variants of date format from US.
                          >
                          >
                          As of yyyy-mm-dd format is a safe common format to use in N. America for
                          software published by Microsoft. I have yet to seen any Microsoft
                          application fails to convert the yyyy-mm-dd string to date properly among
                          my
                          users base unless they arbitrary to set the windows date format to
                          yyyy-dd-mm
                          >
                          I suppose using yyyy-MMM-dd as the remediate string date will avoid that
                          issue al together
                          >
                          The real tricky part is to validate the users' date format mask against
                          actual data found. that is why regex replace was tempting to me
                          >
                          >
                          you are right regex replace will still not be able handle all mistakes
                          >
                          >
                          "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
                          news:u8MdYbBNHH A.4992@TK2MSFTN GP04.phx.gbl...
                          >GS,
                          >>
                          >You are assuming something that is standard. For today I can write in my
                          >country
                          >>
                          >9-1-07
                          >09-1-07
                          >9-01-7
                          >09-1-2007
                          >9-jan-2007
                          >9-januari-2007
                          >etc. not any law tels me how to do it, it is not seldom done as
                          >2007-01-09 as well in ISO.
                          >>
                          >And than every culture has its own style.
                          >>
                          >>
                          >Cor
                          >>
                          >>
                          >
                          >

                          Comment

                          • Cor Ligthert [MVP]

                            #28
                            Re: best design for parse

                            Stephany,

                            Living in a country where Gas is the same as the most ordinair basic stuff,
                            does your sentence not add anything than not "gezellig".

                            Sitting at an open fire in a open wood and talk with each other is more
                            something for us as gezellig, for which we by the way have to go out of our
                            country if we don't do it illegal or are really rich enough.

                            If I am well informed, than you are not living in a country with not so much
                            people on a square kilometre as here, so the idea about that can be
                            completely opposite.
                            After working with thos old hand tools, power tools will
                            make you feel like you are really cooking with gas.
                            Some people can make a methaphoor to simple task fullfiling drag and drop
                            tools, I can assure you that I find that far from gezellig.

                            :-)

                            Cor
                            ..

                            "Stephany Young" <noone@localhos tschreef in bericht
                            news:%23SWxBjEN HHA.4712@TK2MSF TNGP04.phx.gbl. ..
                            If my understanding of the meaning 'gezellig' is correct, then 'cooking
                            with gas' is nothing like any sort of opposite of 'gezellig'.
                            >
                            My understanding of 'gezellig' is that, although not directly
                            translatable, means something like 'feeling good amongst family and/or
                            friends' but has much more subtle meanings than that.
                            >
                            'Cooking with gas' means to be working fast/proceeding rapidly. For
                            example:
                            >
                            After working with thos old hand tools, power tools will
                            make you feel like you are really cooking with gas.
                            >
                            Metaphorically it is comparing a gas cooker where you get instant heat
                            when you light it, to other cookers (electric, wood coal, etc) where they
                            take a while to warm up.
                            >
                            >
                            "kgerritsen " <kig25@drexel.e duwrote in message
                            news:1168354601 .946949.308190@ s34g2000cwa.goo glegroups.com.. .
                            >>
                            >Cor Ligthert [MVP] wrote:
                            >>Stephany,
                            >>>
                            >>I am curious, what does this phrase mean, I don't know it.
                            >>>
                            >Now we're cooking with gas.
                            >>>
                            >>(Living in Holland which is above one of the former biggest gasbells of
                            >>Europe)
                            >>>
                            >>Cor
                            >>
                            >It's somewhat, in some senses but not perfectly, an opposite of
                            >"gezellig"
                            >>
                            >:)
                            >>
                            >
                            >

                            Comment

                            • Stephany Young

                              #29
                              Re: best design for parse

                              Let me put it this way:

                              You are working on a project and you are unable
                              to make any progress because you are waiting on
                              some vital information. At this point you are
                              'bogged down'.

                              The information that you are waiting on arrives,
                              and, as a result, you are now able to make rapid
                              progress toward completion of the project. Now
                              you are 'cooking with gas'.

                              'Cooking with gas' is to do with the 'rush' of activity.


                              "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
                              news:%23gcZXNON HHA.4992@TK2MSF TNGP04.phx.gbl. ..
                              Stephany,
                              >
                              Living in a country where Gas is the same as the most ordinair basic
                              stuff, does your sentence not add anything than not "gezellig".
                              >
                              Sitting at an open fire in a open wood and talk with each other is more
                              something for us as gezellig, for which we by the way have to go out of
                              our country if we don't do it illegal or are really rich enough.
                              >
                              If I am well informed, than you are not living in a country with not so
                              much people on a square kilometre as here, so the idea about that can be
                              completely opposite.
                              >
                              > After working with thos old hand tools, power tools will
                              > make you feel like you are really cooking with gas.
                              >
                              Some people can make a methaphoor to simple task fullfiling drag and drop
                              tools, I can assure you that I find that far from gezellig.
                              >
                              :-)
                              >
                              Cor
                              .
                              >
                              "Stephany Young" <noone@localhos tschreef in bericht
                              news:%23SWxBjEN HHA.4712@TK2MSF TNGP04.phx.gbl. ..
                              >If my understanding of the meaning 'gezellig' is correct, then 'cooking
                              >with gas' is nothing like any sort of opposite of 'gezellig'.
                              >>
                              >My understanding of 'gezellig' is that, although not directly
                              >translatable , means something like 'feeling good amongst family and/or
                              >friends' but has much more subtle meanings than that.
                              >>
                              >'Cooking with gas' means to be working fast/proceeding rapidly. For
                              >example:
                              >>
                              > After working with thos old hand tools, power tools will
                              > make you feel like you are really cooking with gas.
                              >>
                              >Metaphorical ly it is comparing a gas cooker where you get instant heat
                              >when you light it, to other cookers (electric, wood coal, etc) where they
                              >take a while to warm up.
                              >>
                              >>
                              >"kgerritsen " <kig25@drexel.e duwrote in message
                              >news:116835460 1.946949.308190 @s34g2000cwa.go oglegroups.com. ..
                              >>>
                              >>Cor Ligthert [MVP] wrote:
                              >>>Stephany,
                              >>>>
                              >>>I am curious, what does this phrase mean, I don't know it.
                              >>>>
                              >>Now we're cooking with gas.
                              >>>>
                              >>>(Living in Holland which is above one of the former biggest gasbells of
                              >>>Europe)
                              >>>>
                              >>>Cor
                              >>>
                              >>It's somewhat, in some senses but not perfectly, an opposite of
                              >>"gezellig"
                              >>>
                              >>:)
                              >>>
                              >>
                              >>
                              >
                              >

                              Comment

                              • Cor Ligthert [MVP]

                                #30
                                Re: best design for parse

                                Stephany,

                                I had understood this sentence already the moment you had placed it.

                                However this was really a statement I had never seen (it exist before you
                                start to correct me). I could not resist to write as I did.

                                Cor

                                "Stephany Young" <noone@localhos tschreef in bericht
                                news:O8BI16PNHH A.780@TK2MSFTNG P03.phx.gbl...
                                Let me put it this way:
                                >
                                You are working on a project and you are unable
                                to make any progress because you are waiting on
                                some vital information. At this point you are
                                'bogged down'.
                                >
                                The information that you are waiting on arrives,
                                and, as a result, you are now able to make rapid
                                progress toward completion of the project. Now
                                you are 'cooking with gas'.
                                >
                                'Cooking with gas' is to do with the 'rush' of activity.
                                >
                                >
                                "Cor Ligthert [MVP]" <notmyfirstname @planet.nlwrote in message
                                news:%23gcZXNON HHA.4992@TK2MSF TNGP04.phx.gbl. ..
                                >Stephany,
                                >>
                                >Living in a country where Gas is the same as the most ordinair basic
                                >stuff, does your sentence not add anything than not "gezellig".
                                >>
                                >Sitting at an open fire in a open wood and talk with each other is more
                                >something for us as gezellig, for which we by the way have to go out of
                                >our country if we don't do it illegal or are really rich enough.
                                >>
                                >If I am well informed, than you are not living in a country with not so
                                >much people on a square kilometre as here, so the idea about that can be
                                >completely opposite.
                                >>
                                >> After working with thos old hand tools, power tools will
                                >> make you feel like you are really cooking with gas.
                                >>
                                >Some people can make a methaphoor to simple task fullfiling drag and drop
                                >tools, I can assure you that I find that far from gezellig.
                                >>
                                >:-)
                                >>
                                >Cor
                                >.
                                >>
                                >"Stephany Young" <noone@localhos tschreef in bericht
                                >news:%23SWxBjE NHHA.4712@TK2MS FTNGP04.phx.gbl ...
                                >>If my understanding of the meaning 'gezellig' is correct, then 'cooking
                                >>with gas' is nothing like any sort of opposite of 'gezellig'.
                                >>>
                                >>My understanding of 'gezellig' is that, although not directly
                                >>translatabl e, means something like 'feeling good amongst family and/or
                                >>friends' but has much more subtle meanings than that.
                                >>>
                                >>'Cooking with gas' means to be working fast/proceeding rapidly. For
                                >>example:
                                >>>
                                >> After working with thos old hand tools, power tools will
                                >> make you feel like you are really cooking with gas.
                                >>>
                                >>Metaphoricall y it is comparing a gas cooker where you get instant heat
                                >>when you light it, to other cookers (electric, wood coal, etc) where
                                >>they take a while to warm up.
                                >>>
                                >>>
                                >>"kgerritsen " <kig25@drexel.e duwrote in message
                                >>news:11683546 01.946949.30819 0@s34g2000cwa.g ooglegroups.com ...
                                >>>>
                                >>>Cor Ligthert [MVP] wrote:
                                >>>>Stephany,
                                >>>>>
                                >>>>I am curious, what does this phrase mean, I don't know it.
                                >>>>>
                                >>>Now we're cooking with gas.
                                >>>>>
                                >>>>(Living in Holland which is above one of the former biggest gasbells
                                >>>>of
                                >>>>Europe)
                                >>>>>
                                >>>>Cor
                                >>>>
                                >>>It's somewhat, in some senses but not perfectly, an opposite of
                                >>>"gezellig"
                                >>>>
                                >>>:)
                                >>>>
                                >>>
                                >>>
                                >>
                                >>
                                >
                                >

                                Comment

                                Working...