elementary string processing question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • tonywh00t

    elementary string processing question

    Hi everyone,

    I have a "simple" question, especially for people familiar with regex.
    I need to parse strings that have the form:

    1:3::5:9

    which indicates the set of integers {1 3 4 5 9}. In other words i have
    a set of numbers separated by ":", where "::" indicates a range from
    lo to hi inclusive. It is desirable to error check this string (i.e it
    should. start and end with a number, and be composed only numbers,
    "::", and ":"). I'm currently using the Boost C++ library, and i've
    worked out some pretty ugly solutions. If anyone has a suggestion, I'd
    very much appreciate it. Thanks!
  • James Kanze

    #2
    Re: elementary string processing question

    On Nov 1, 4:28 am, tonywh00t <tony.s...@gmai l.comwrote:
    I have a "simple" question, especially for people familiar
    with regex. I need to parse strings that have the form:
    1:3::5:9
    which indicates the set of integers {1 3 4 5 9}. In other
    words i have a set of numbers separated by ":", where "::"
    indicates a range from lo to hi inclusive. It is desirable to
    error check this string (i.e it should. start and end with a
    number, and be composed only numbers, "::", and ":"). I'm
    currently using the Boost C++ library, and i've worked out
    some pretty ugly solutions. If anyone has a suggestion, I'd
    very much appreciate it. Thanks!
    I presume that the number of entries in the string may vary;
    otherwise, of course, you said it yourself, regex. I'd still
    use regex to validate the string, something like
    "^\\d+(:\\d+|:: \\d+)*$", I think would do the trick. (It would
    be really elegant if you could use capture, but capture doesn't
    work well within closures---only the last match is captured.)
    Then I'd simply break the string up into substrings at each ':':

    std::vector< std::string >
    parse( std::string const& source )
    {
    typedef std::string::co nst_iterator
    TextIter ;
    std::vector< std::string >
    result ;
    TextIter current = source.begin() ;
    TextIter const end = source.end() ;
    while ( current != end ) {
    TextIter fieldBegin = current ;
    current = std::find( current, end, ':' ) ;
    result.push_bac k( std::string( fieldBegin, current ) ) ;
    if ( current != end ) {
    ++ current ;
    }
    }
    return result ;
    }

    This gives you an array of strings, with an emtpy string between
    :: (so when you see an empty string, you know you have a range).
    So you could do something like:

    int
    toInt( std::string const& string )
    {
    std::istringstr eam cvt( string ) ;
    int result ;
    cvt >result ;
    return result ;
    }

    std::vector< int >
    convert( std::vector< std::string const& source )
    {
    typedef std::vector< std::string >::const_iterat or
    FieldIter ;
    std::vector< int result ;
    FieldIter current = source.begin() ;
    FieldIter const end = source.end() ;
    while ( current != end ) {
    result.push_bac k( toInt( *current ) ) ;
    ++ current ;
    if ( current != end && *current == "" ) {
    int bottom = result.back() ;
    ++ current ;
    int top = toInt( *current ) ;
    if ( top <= bottom ) {
    throw someError ;
    }
    while ( ++ bottom <= top ) {
    result.push_bac k( bottom ) ;
    }
    ++ current ;
    }
    }
    sort( result.begin(), result.end() ) ;
    // Or you might want to track the last seen to ensure
    // that the input was correctly sorted.
    return result ;
    }

    Note that all of the above code supposes the precheck on the
    format using regex. Otherwise, you'll need a lot more error
    handling and special cases.

    --
    James Kanze (GABI Software) email:james.kan ze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientier ter Datenverarbeitu ng
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

    Comment

    • James Kanze

      #3
      Re: elementary string processing question

      On Nov 1, 4:28 am, tonywh00t <tony.s...@gmai l.comwrote:
      I have a "simple" question, especially for people familiar
      with regex. I need to parse strings that have the form:
      1:3::5:9
      which indicates the set of integers {1 3 4 5 9}. In other
      words i have a set of numbers separated by ":", where "::"
      indicates a range from lo to hi inclusive. It is desirable to
      error check this string (i.e it should. start and end with a
      number, and be composed only numbers, "::", and ":"). I'm
      currently using the Boost C++ library, and i've worked out
      some pretty ugly solutions. If anyone has a suggestion, I'd
      very much appreciate it. Thanks!
      I presume that the number of entries in the string may vary;
      otherwise, of course, you said it yourself, regex. I'd still
      use regex to validate the string, something like
      "^\\d+(:\\d+|:: \\d+)*$", I think would do the trick. (It would
      be really elegant if you could use capture, but capture doesn't
      work well within closures---only the last match is captured.)
      Then I'd simply break the string up into substrings at each ':':

      std::vector< std::string >
      parse( std::string const& source )
      {
      typedef std::string::co nst_iterator
      TextIter ;
      std::vector< std::string >
      result ;
      TextIter current = source.begin() ;
      TextIter const end = source.end() ;
      while ( current != end ) {
      TextIter fieldBegin = current ;
      current = std::find( current, end, ':' ) ;
      result.push_bac k( std::string( fieldBegin, current ) ) ;
      if ( current != end ) {
      ++ current ;
      }
      }
      return result ;
      }

      This gives you an array of strings, with an emtpy string between
      :: (so when you see an empty string, you know you have a range).
      So you could do something like:

      int
      toInt( std::string const& string )
      {
      std::istringstr eam cvt( string ) ;
      int result ;
      cvt >result ;
      return result ;
      }

      std::vector< int >
      convert( std::vector< std::string const& source )
      {
      typedef std::vector< std::string >::const_iterat or
      FieldIter ;
      std::vector< int result ;
      FieldIter current = source.begin() ;
      FieldIter const end = source.end() ;
      while ( current != end ) {
      result.push_bac k( toInt( *current ) ) ;
      ++ current ;
      if ( current != end && *current == "" ) {
      int bottom = result.back() ;
      ++ current ;
      int top = toInt( *current ) ;
      if ( top <= bottom ) {
      throw someError ;
      }
      while ( ++ bottom <= top ) {
      result.push_bac k( bottom ) ;
      }
      ++ current ;
      }
      }
      sort( result.begin(), result.end() ) ;
      // Or you might want to track the last seen to ensure
      // that the input was correctly sorted.
      return result ;
      }

      Note that all of the above code supposes the precheck on the
      format using regex. Otherwise, you'll need a lot more error
      handling and special cases.

      --
      James Kanze (GABI Software) email:james.kan ze@gmail.com
      Conseils en informatique orientée objet/
      Beratung in objektorientier ter Datenverarbeitu ng
      9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

      Comment

      • Juha Nieminen

        #4
        Re: elementary string processing question

        tonywh00t wrote:
        I'm currently using the Boost C++ library, and i've
        worked out some pretty ugly solutions. If anyone has a suggestion, I'd
        very much appreciate it. Thanks!
        My experience is that whenever you need to parse input data which is
        more complicated than fixed-format whitespace-separated elements, the
        parsing code always becomes very complicated in C++ (as well as C). The
        C/C++ language has clearly not been designed to be a language which you
        can use to create complicated format parsers with one-liners. Often not
        even with 100-liners (especially if you want full error checking).

        Of course libraries have been developed during the decades to try to
        help this, but they often only help more on the abstraction rather than
        on the verbosity and complexity of the code.

        Comment

        • tonywh00t

          #5
          Re: elementary string processing question

          thanks guys very much for your suggestions and help =).

          Comment

          Working...