Searching for timestamp in string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Brian Mitchell

    Searching for timestamp in string

    Is there an easy way to pull a date/time stamp from a string? The DateTime
    stamp is located in different parts of each string and the DateTime stamp
    could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
    dd/mm...etc.)

    Any ideas would be appreciated,
    Thanks!!


  • Jay B. Harlow [MVP - Outlook]

    #2
    Re: Searching for timestamp in string

    Brian,
    You could use a RegEx to search a string for "DateTime" like formats in a
    string.

    Something like:

    Imports System.Text.Reg ularExpressions

    Const pattern As String =
    "(?<date>(\d{1, 2}/\d{1,2}/\d{1,4})|(\d{2} :\d{2} \d{1,2}/\d{1,2}/\d{1,4}))"
    Static dateExpression As New Regex(pattern, RegexOptions.Co mpiled)
    Dim input As String = "Today is ""12/31/2004"" or 12:49 12/31/2004"
    For Each match As match In dateExpression. Matches(input)
    Debug.WriteLine (match.Groups(" date").Value, "found")
    Next

    You could expand pattern to include multiple formats, I only show date &
    time followed by date. Just be careful of 4 & 2 digit years. Note that the
    RegEx pattern doesn't know or care if the date is mm/dd/yy or dd/mm/yy or
    even yy/mm/dd, just that it is 3 numbers seperated by a slash...

    Once you have the Match I would recommend the following DateTime.ParseE xact
    overload to parse the date found into a DateTime value.



    As it allows you to specific a number of custom formats to check against to
    convert the string to a DateTime.

    A tutorial & reference on using regular expressions:
    At Regular-Expressions.info you will find a wide range of in-depth information about a powerful search pattern language called regular expressions.


    The MSDN's documentation on regular expressions:


    Hope this helps
    Jay



    "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
    news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...[color=blue]
    > Is there an easy way to pull a date/time stamp from a string? The DateTime
    > stamp is located in different parts of each string and the DateTime stamp
    > could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
    > dd/mm...etc.)
    >
    > Any ideas would be appreciated,
    > Thanks!!
    >[/color]


    Comment

    • Nick Malik [Microsoft]

      #3
      Re: Searching for timestamp in string

      well, you could create multiple regular expression that will parse out the
      date/time string: one expression for each format.
      Then, when you get a source string, loop through each of your regular
      expressions until one of them picks up a date.

      --
      --- Nick Malik [Microsoft]
      MCSD, CFPS, Certified Scrummaster


      Disclaimer: Opinions expressed in this forum are my own, and not
      representative of my employer.
      I do not answer questions on behalf of my employer. I'm just a
      programmer helping programmers.
      --
      "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
      news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...[color=blue]
      > Is there an easy way to pull a date/time stamp from a string? The DateTime
      > stamp is located in different parts of each string and the DateTime stamp
      > could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
      > dd/mm...etc.)
      >
      > Any ideas would be appreciated,
      > Thanks!!
      >
      >[/color]


      Comment

      • Jay B. Harlow [MVP - Outlook]

        #4
        Re: Searching for timestamp in string

        Nick,
        Why have multiple regular expressions?

        Especially when the | operator (alternation construct) in RegEx allows for a
        single expression to match multiple terms?



        In other words instead of:
        Dim rx1 As New RegEx("a")
        Dim rx2 As New RegEx("b")
        Dim rx3 As New RegEx("c")

        You can use
        Dim rx As New RegEx("a|b|c")

        Hope this helps
        Jay



        "Nick Malik [Microsoft]" <nickmalik@hotm ail.nospam.com> wrote in message
        news:QRhBd.2618 31$5K2.65227@at tbi_s03...[color=blue]
        > well, you could create multiple regular expression that will parse out the
        > date/time string: one expression for each format.
        > Then, when you get a source string, loop through each of your regular
        > expressions until one of them picks up a date.
        >
        > --
        > --- Nick Malik [Microsoft]
        > MCSD, CFPS, Certified Scrummaster
        > http://blogs.msdn.com/nickmalik
        >
        > Disclaimer: Opinions expressed in this forum are my own, and not
        > representative of my employer.
        > I do not answer questions on behalf of my employer. I'm just a
        > programmer helping programmers.
        > --
        > "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
        > news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...[color=green]
        >> Is there an easy way to pull a date/time stamp from a string? The
        >> DateTime
        >> stamp is located in different parts of each string and the DateTime stamp
        >> could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
        >> dd/mm...etc.)
        >>
        >> Any ideas would be appreciated,
        >> Thanks!!
        >>
        >>[/color]
        >
        >[/color]


        Comment

        • Brian Mitchell

          #5
          Re: Searching for timestamp in string

          Thank you very much for the info, this helps me a great deal. That is a
          great site for the tuorial, again thanks!!


          "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
          news:OWUHHo27EH A.3076@TK2MSFTN GP15.phx.gbl...[color=blue]
          > Brian,
          > You could use a RegEx to search a string for "DateTime" like formats in a
          > string.
          >
          > Something like:
          >
          > Imports System.Text.Reg ularExpressions
          >
          > Const pattern As String =
          > "(?<date>(\d{1, 2}/\d{1,2}/\d{1,4})|(\d{2} :\d{2} \d{1,2}/\d{1,2}/\d{1,4}))"
          > Static dateExpression As New Regex(pattern, RegexOptions.Co mpiled)
          > Dim input As String = "Today is ""12/31/2004"" or 12:49[/color]
          12/31/2004"[color=blue]
          > For Each match As match In dateExpression. Matches(input)
          > Debug.WriteLine (match.Groups(" date").Value, "found")
          > Next
          >
          > You could expand pattern to include multiple formats, I only show date &
          > time followed by date. Just be careful of 4 & 2 digit years. Note that the
          > RegEx pattern doesn't know or care if the date is mm/dd/yy or dd/mm/yy or
          > even yy/mm/dd, just that it is 3 numbers seperated by a slash...
          >
          > Once you have the Match I would recommend the following[/color]
          DateTime.ParseE xact[color=blue]
          > overload to parse the date found into a DateTime value.
          >
          >[/color]
          http://msdn.microsoft.com/library/de...xactTopic3.asp[color=blue]
          >
          > As it allows you to specific a number of custom formats to check against[/color]
          to[color=blue]
          > convert the string to a DateTime.
          >
          > A tutorial & reference on using regular expressions:
          > http://www.regular-expressions.info/
          >
          > The MSDN's documentation on regular expressions:
          >[/color]
          http://msdn.microsoft.com/library/de...geElements.asp[color=blue]
          >
          > Hope this helps
          > Jay
          >
          >
          >
          > "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
          > news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...[color=green]
          > > Is there an easy way to pull a date/time stamp from a string? The DateTi[/color][/color]
          me[color=blue][color=green]
          > > stamp is located in different parts of each string and the DateTime[/color][/color]
          stamp[color=blue][color=green]
          > > could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
          > > dd/mm...etc.)
          > >
          > > Any ideas would be appreciated,
          > > Thanks!!
          > >[/color]
          >
          >[/color]


          Comment

          • Nick Malik [Microsoft]

            #6
            Re: Searching for timestamp in string

            Hi Jay,

            I break down regular expressions for the same reason I break down a
            complicated task into multiple calls to different methods: to make it easier
            to understand and debug.

            It's personal preference, really. A regular expression for matching one
            date format is not going to be all that trivial. The OP wants to match
            multiple date formats. Unless you are an expert at Regex, and most folks
            aren't, it will be fairly easy to make a mistake in one of them.

            If all of your regular expressions are combined into one complicated
            expression, seperated by 'or' operators, and you make a mistake, it's that
            much harder to find and fix the mistake.

            I'll take my chances with multiple individual expressions.

            --
            --- Nick Malik [Microsoft]
            MCSD, CFPS, Certified Scrummaster


            Disclaimer: Opinions expressed in this forum are my own, and not
            representative of my employer.
            I do not answer questions on behalf of my employer. I'm just a
            programmer helping programmers.
            --
            "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
            news:uvBrwx37EH A.1264@TK2MSFTN GP12.phx.gbl...[color=blue]
            > Nick,
            > Why have multiple regular expressions?
            >
            > Especially when the | operator (alternation construct) in RegEx allows for[/color]
            a[color=blue]
            > single expression to match multiple terms?
            >
            >[/color]
            http://msdn.microsoft.com/library/de...Constructs.asp[color=blue]
            >
            > In other words instead of:
            > Dim rx1 As New RegEx("a")
            > Dim rx2 As New RegEx("b")
            > Dim rx3 As New RegEx("c")
            >
            > You can use
            > Dim rx As New RegEx("a|b|c")
            >
            > Hope this helps
            > Jay
            >
            >
            >
            > "Nick Malik [Microsoft]" <nickmalik@hotm ail.nospam.com> wrote in message
            > news:QRhBd.2618 31$5K2.65227@at tbi_s03...[color=green]
            > > well, you could create multiple regular expression that will parse out[/color][/color]
            the[color=blue][color=green]
            > > date/time string: one expression for each format.
            > > Then, when you get a source string, loop through each of your regular
            > > expressions until one of them picks up a date.
            > >
            > > --
            > > --- Nick Malik [Microsoft]
            > > MCSD, CFPS, Certified Scrummaster
            > > http://blogs.msdn.com/nickmalik
            > >
            > > Disclaimer: Opinions expressed in this forum are my own, and not
            > > representative of my employer.
            > > I do not answer questions on behalf of my employer. I'm just a
            > > programmer helping programmers.
            > > --
            > > "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
            > > news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...[color=darkred]
            > >> Is there an easy way to pull a date/time stamp from a string? The
            > >> DateTime
            > >> stamp is located in different parts of each string and the DateTime[/color][/color][/color]
            stamp[color=blue][color=green][color=darkred]
            > >> could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
            > >> dd/mm...etc.)
            > >>
            > >> Any ideas would be appreciated,
            > >> Thanks!!
            > >>
            > >>[/color]
            > >
            > >[/color]
            >
            >[/color]


            Comment

            • Jay B. Harlow [MVP - Outlook]

              #7
              Re: Searching for timestamp in string

              Nick,
              I agree, I break down regular expressions, while I am developing them.

              However: Once I am comfortable that they work, I then combine them, to
              "simplify" the supporting code.

              [color=blue]
              > It's personal preference, really.[/color]
              Is it? My concern is the manual looping you are adding unnecessary
              complexity to the code, hence my question. Plus you might be adding possible
              performance problems (evaluating multiple RegEx as opposed to a single
              complex one). Either method may be causing increased GC pressure. How does
              that saying go "robbing peter to pay paul", don't get me wrong, sometimes it
              is "better" to write more complex supporting code to simplify the RegEx
              verses more complex RegEx to simplify the code...

              Also my concern (with both methods) is precedence, which is a problem I my
              expression has with 2 & 4 digit years (it actually allows a 3 digit year).
              Manually looping over individual expressions may cause an different
              expression to be matched then a properly constructed group with alternation
              (I am not inferring my expression is properly constructed!).

              Also in this instance I would consider something like:

              Const pattern1 As String = "a"
              Const pattern2 As String = "b"
              Const pattern3 As String = "c"

              Const pattern As String = pattern1 & "|" & pattern2 & "|" & pattern3

              Which easily allows you to define & maintain the patterns separately, then
              gain the "simplicity " of combining the RegEx call... I would then structure
              my Unit Tests such that I could easily identify if pattern1, pattern2 or
              pattern3 was failing or working...
              [color=blue]
              > If all of your regular expressions are combined into one complicated
              > expression, seperated by 'or' operators, and you make a mistake, it's that
              > much harder to find and fix the mistak[/color]
              Note: | is the alternation operator not the Or operator... As Or implies
              combining (when applied to numbers & boolean), where | does not combine it
              provides alternatives!

              Just a thought
              Jay

              "Nick Malik [Microsoft]" <nickmalik@hotm ail.nospam.com> wrote in message
              news:Q3mBd.8327 00$8_6.744651@a ttbi_s04...[color=blue]
              > Hi Jay,
              >
              > I break down regular expressions for the same reason I break down a
              > complicated task into multiple calls to different methods: to make it
              > easier
              > to understand and debug.
              >
              > It's personal preference, really. A regular expression for matching one
              > date format is not going to be all that trivial. The OP wants to match
              > multiple date formats. Unless you are an expert at Regex, and most folks
              > aren't, it will be fairly easy to make a mistake in one of them.
              >
              > If all of your regular expressions are combined into one complicated
              > expression, seperated by 'or' operators, and you make a mistake, it's that
              > much harder to find and fix the mistake.
              >
              > I'll take my chances with multiple individual expressions.
              >
              > --
              > --- Nick Malik [Microsoft]
              > MCSD, CFPS, Certified Scrummaster
              > http://blogs.msdn.com/nickmalik
              >
              > Disclaimer: Opinions expressed in this forum are my own, and not
              > representative of my employer.
              > I do not answer questions on behalf of my employer. I'm just a
              > programmer helping programmers.
              > --
              > "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
              > news:uvBrwx37EH A.1264@TK2MSFTN GP12.phx.gbl...[color=green]
              >> Nick,
              >> Why have multiple regular expressions?
              >>
              >> Especially when the | operator (alternation construct) in RegEx allows
              >> for[/color]
              > a[color=green]
              >> single expression to match multiple terms?
              >>
              >>[/color]
              > http://msdn.microsoft.com/library/de...Constructs.asp[color=green]
              >>
              >> In other words instead of:
              >> Dim rx1 As New RegEx("a")
              >> Dim rx2 As New RegEx("b")
              >> Dim rx3 As New RegEx("c")
              >>
              >> You can use
              >> Dim rx As New RegEx("a|b|c")
              >>
              >> Hope this helps
              >> Jay
              >>
              >>
              >>
              >> "Nick Malik [Microsoft]" <nickmalik@hotm ail.nospam.com> wrote in message
              >> news:QRhBd.2618 31$5K2.65227@at tbi_s03...[color=darkred]
              >> > well, you could create multiple regular expression that will parse out[/color][/color]
              > the[color=green][color=darkred]
              >> > date/time string: one expression for each format.
              >> > Then, when you get a source string, loop through each of your regular
              >> > expressions until one of them picks up a date.
              >> >
              >> > --
              >> > --- Nick Malik [Microsoft]
              >> > MCSD, CFPS, Certified Scrummaster
              >> > http://blogs.msdn.com/nickmalik
              >> >
              >> > Disclaimer: Opinions expressed in this forum are my own, and not
              >> > representative of my employer.
              >> > I do not answer questions on behalf of my employer. I'm just a
              >> > programmer helping programmers.
              >> > --
              >> > "Brian Mitchell" <MagellanTX@hot mail.com> wrote in message
              >> > news:uCMmTcv7EH A.2012@TK2MSFTN GP15.phx.gbl...
              >> >> Is there an easy way to pull a date/time stamp from a string? The
              >> >> DateTime
              >> >> stamp is located in different parts of each string and the DateTime[/color][/color]
              > stamp[color=green][color=darkred]
              >> >> could be in different formats (mm/dd/yy or dd/mm/yyyy, or hh:mm:ss
              >> >> dd/mm...etc.)
              >> >>
              >> >> Any ideas would be appreciated,
              >> >> Thanks!!
              >> >>
              >> >>
              >> >
              >> >[/color]
              >>
              >>[/color]
              >
              >[/color]


              Comment

              • Nick Malik [Microsoft]

                #8
                Re: Searching for timestamp in string

                Hi Jay,

                You are clearly one of the folks that I would describe as "more expert than
                I in RegEx."
                [color=blue]
                > I agree, I break down regular expressions, while I am developing them.
                >
                > However: Once I am comfortable that they work, I then combine them, to
                > "simplify" the supporting code.[/color]

                Code simplicity is an interesting term. Not sure I agree that combining two
                or three (or ten) expressions creates simplicity. The code is certainly
                shorter. However, I have no desire to make things simple for the compiler
                or the runtime. I want to make things simple for myself and the developer
                who will follow me, and have to maintain my code.
                [color=blue][color=green]
                > > It's personal preference, really.[/color]
                > Is it? My concern is the manual looping you are adding unnecessary
                > complexity to the code, hence my question.[/color]

                In my opinion, a loop is a fairly common construct, and therefore the
                complexity of adding a loop is small compared to the complexity of making
                the RegEx more difficult for a non-expert to read.
                [color=blue]
                > Plus you might be adding possible
                > performance problems (evaluating multiple RegEx as opposed to a single
                > complex one).[/color]

                If RegEx was being used in an inner loop, in a situation where we were
                processor bound, I would agree. I haven't run across that situation. I
                suppose my answer would become more cautious if I had. That said, RegEx is
                pretty efficient.
                [color=blue]
                > Either method may be causing increased GC pressure.[/color]

                Sorry to be thick, but I don't understand why. If I were doing a series of
                RegEx matches in a loop, I would create the expressions outside the loop and
                simply use them in the loop. A match is as good as a mile. Technically,
                that should create the same number of matches.

                Also, once again, most of the apps that I've done parsing in aren't tuned
                for Garbage Collection. It is nearly always easier to find opportunities to
                reduce GC pressure simply by applying StringBuilder where it is useful (the
                "80-20" rule).
                [color=blue]
                > Also my concern (with both methods) is precedence, which is a problem I my
                > expression has with 2 & 4 digit years (it actually allows a 3 digit year).
                > Manually looping over individual expressions may cause an different
                > expression to be matched then a properly constructed group with[/color]
                alternation[color=blue]
                > (I am not inferring my expression is properly constructed!).[/color]

                I completely agree. This is one place where I feel that a loop is better.
                You can add in extra logic by structuring the code so that you match your
                string against a couple of different patterns, and then YOU can apply a
                complex rule to decide which to use... with the RegEx language, you don't
                have the right to control precedence in as detailed a way as you can with
                logical constructs and business rules.
                [color=blue]
                >
                > Also in this instance I would consider something like:
                >
                > Const pattern1 As String = "a"
                > Const pattern2 As String = "b"
                > Const pattern3 As String = "c"
                >
                > Const pattern As String = pattern1 & "|" & pattern2 & "|" & pattern3
                >
                > Which easily allows you to define & maintain the patterns separately, then
                > gain the "simplicity " of combining the RegEx call...[/color]

                An excellent idea. One thing to consider, though. Each of the patterns
                above would need to be tested individually, and the combined pattern would
                need to be tested as well. If you do one, and not the other, it is possible
                for a small syntax error in two patterns to balance eachother out, allowing
                the final construct to be legal, valid, and wrong.

                This adds to the testing burden a bit. Not much, perhaps, but still a bit.
                The unit tests that you describe should still cover it, as long as they look
                for boundary conditions effectively.
                [color=blue][color=green]
                > > If all of your regular expressions are combined into one complicated
                > > expression, seperated by 'or' operators, and you make a mistake, it's[/color][/color]
                that[color=blue][color=green]
                > > much harder to find and fix the mistak[/color]
                > Note: | is the alternation operator not the Or operator... As Or implies
                > combining (when applied to numbers & boolean), where | does not combine it
                > provides alternatives![/color]

                I stand corrected.

                --
                --- Nick Malik [Microsoft]
                MCSD, CFPS, Certified Scrummaster


                Disclaimer: Opinions expressed in this forum are my own, and not
                representative of my employer.
                I do not answer questions on behalf of my employer. I'm just a
                programmer helping programmers.
                --


                Comment

                Working...