Data Records Formats Testing Tool

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark Jerde

    Data Records Formats Testing Tool

    (If these are the wrong groups please suggest the right one(s). Thanks.)

    I need to come up with a way to test potentially thousands of data (files /
    records / streams) to determine if they match one of about thirty defined
    data formats. If a record partially matches one of the formats I need to
    log why it failed.

    The formats are byte-oriented. Byte 0 is the type, byte 1 is the subtype,
    bytes 2-5 give the total record length, etc. There are two wrinkles.
    First, some of the formats allow 1..n subrecords, like a person listing her
    home phone, cell phone, fax number, ICQ #, the dog's cell phone, etc.
    Second, some of the formats allow other formats to be wholly contained in
    them, like an "inventory" format being made up of many separate items of
    different "item" format types.

    In the history of computers this *can't* be the first need for this kind of
    program. ;-) New formats are approved periodically so hard-coding
    everything in C# or VB.NET is a sub-optimal solution. ISTM it should be
    possible write the permissible format "rules" in (XML / ASN.1 / RegEx /
    etc.), present the rules to a tried and true program, and smash data files
    against the program all day long.

    Suggestions? Windows preferred but not required.

    Thanks.

    -- Mark


  • Ken Tucker [MVP]

    #2
    Re: Data Records Formats Testing Tool

    Hi,

    Convert the stream to a string and use an regular expressions to
    match the format. Not sure how you will be able to tell if the phone number
    is a home number, fax, or dog's cell phone.



    Library of regular expressions.



    Ken
    -------------------------
    "Mark Jerde" <mark.jerde@ver izon.no.spam.ne t> wrote in message
    news:ueXfu3K%23 EHA.3120@TK2MSF TNGP12.phx.gbl. ..
    (If these are the wrong groups please suggest the right one(s). Thanks.)

    I need to come up with a way to test potentially thousands of data (files /
    records / streams) to determine if they match one of about thirty defined
    data formats. If a record partially matches one of the formats I need to
    log why it failed.

    The formats are byte-oriented. Byte 0 is the type, byte 1 is the subtype,
    bytes 2-5 give the total record length, etc. There are two wrinkles.
    First, some of the formats allow 1..n subrecords, like a person listing her
    home phone, cell phone, fax number, ICQ #, the dog's cell phone, etc.
    Second, some of the formats allow other formats to be wholly contained in
    them, like an "inventory" format being made up of many separate items of
    different "item" format types.

    In the history of computers this *can't* be the first need for this kind of
    program. ;-) New formats are approved periodically so hard-coding
    everything in C# or VB.NET is a sub-optimal solution. ISTM it should be
    possible write the permissible format "rules" in (XML / ASN.1 / RegEx /
    etc.), present the rules to a tried and true program, and smash data files
    against the program all day long.

    Suggestions? Windows preferred but not required.

    Thanks.

    -- Mark



    Comment

    • Mark Jerde

      #3
      Re: Data Records Formats Testing Tool

      Ken Tucker [MVP] wrote:[color=blue]
      > Hi,
      >
      > Convert the stream to a string and use an regular expressions
      > to match the format.[/color]

      Thanks, I'll look into this if we decide to write something. I don't know
      much about regular expressions yet but I'm concerned about the calculated
      offsets and regex complexity (and validation). See the phones example
      below.

      There are some advantages for this project to use a commercial or open
      source product. A "drag & drop" interface like Visio would be ideal.
      [color=blue]
      > Not sure how you will be able to tell if the
      > phone number is a home number, fax, or dog's cell phone.[/color]

      (My addition may be off...)
      Byte 10 - Length of the phone text description
      Bytes 11 to 11+(val(Byte10-1)) - Phone text description
      Byte 11+(val(Byte10) ) - Length of phone number
      Bytes (11+(val(Byte10 ))) to (11+(val(Byte10 )))+(val(11+(va l(Byte10)))-1) -
      Phone number

      -- Mark

      [color=blue]
      >[/color]
      http://msdn.microsoft.com/library/de...xpressions.asp[color=blue]
      >
      > Library of regular expressions.
      > http://www.regexlib.com/
      >
      >
      > Ken
      > -------------------------
      > "Mark Jerde" <mark.jerde@ver izon.no.spam.ne t> wrote in message
      > news:ueXfu3K%23 EHA.3120@TK2MSF TNGP12.phx.gbl. ..
      > (If these are the wrong groups please suggest the right one(s).
      > Thanks.)
      >
      > I need to come up with a way to test potentially thousands of data
      > (files / records / streams) to determine if they match one of about
      > thirty defined data formats. If a record partially matches one of
      > the formats I need to log why it failed.
      >
      > The formats are byte-oriented. Byte 0 is the type, byte 1 is the
      > subtype, bytes 2-5 give the total record length, etc. There are two
      > wrinkles. First, some of the formats allow 1..n subrecords, like a
      > person listing her home phone, cell phone, fax number, ICQ #, the
      > dog's cell phone, etc. Second, some of the formats allow other
      > formats to be wholly contained in them, like an "inventory" format
      > being made up of many separate items of different "item" format types.
      >
      > In the history of computers this *can't* be the first need for this
      > kind of program. ;-) New formats are approved periodically so
      > hard-coding everything in C# or VB.NET is a sub-optimal solution.
      > ISTM it should be possible write the permissible format "rules" in
      > (XML / ASN.1 / RegEx / etc.), present the rules to a tried and true
      > program, and smash data files against the program all day long.
      >
      > Suggestions? Windows preferred but not required.
      >
      > Thanks.
      >
      > -- Mark[/color]


      Comment

      Working...