Searching of byte array with wildcards?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Richard Berg

    Searching of byte array with wildcards?

    Hello,

    I need to search a byte[] array for a sequence of bytes. The sequence
    may include wildcards. For example if the array contains 0xAA, 0xBB,
    0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
    matches... I've been all around Google but still can't find any
    suggestions on how anything like this can be implemented.... . Can
    someone please give me a clue? Some well-known algorithm maybe?

    Thanks!
  • Jeff Schwab

    #2
    Re: Searching of byte array with wildcards?

    Richard Berg wrote:
    [color=blue]
    > I need to search a byte[] array for a sequence of bytes. The sequence
    > may include wildcards. For example if the array contains 0xAA, 0xBB,
    > 0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
    > matches... I've been all around Google but still can't find any
    > suggestions on how anything like this can be implemented.... . Can
    > someone please give me a clue? Some well-known algorithm maybe?[/color]

    Try boost::regex.

    Comment

    • Siemel Naran

      #3
      Re: Searching of byte array with wildcards?

      "Richard Berg" <bik78@mail.com > wrote in message
      [color=blue]
      > I need to search a byte[] array for a sequence of bytes. The sequence
      > may include wildcards. For example if the array contains 0xAA, 0xBB,
      > 0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
      > matches... I've been all around Google but still can't find any
      > suggestions on how anything like this can be implemented.... . Can
      > someone please give me a clue? Some well-known algorithm maybe?[/color]

      Your post example suggests you're looking for ? or . matches, that is match
      any one character. In this case, you can use the strstr or
      std::search algorithm with an appropriate comparison function.

      struct eqany {
      bool operator()(char c1, char c2) const {
      if (c2=='?') return true;
      return c1 == c2;
      }
      };

      using std::string;
      const string source("ababcda bcxd");
      const string expected("cx");
      string::const_i terator find =
      std::search(sou rce.begin(), source.end(),
      expected.begin( ), expected.end(),
      eqany());


      The worst case running time of the above algorithm is O(n*m) where
      n = length(source) and m = length(expected ).

      If all the chars in the expected string are unique we can get O(n) running
      time. Maintain a pointer to the expected char in the expected string,
      initially pointing at 'c'. Now loop over the chars in the source string.
      When you find a char that matches increment
      the expected pointer. You'll loop through "ababc" and the last 'c' matches,
      so you increment the expected
      pointer to expect 'x'. But the next char is 'd' so reset the expected
      pointer back to 'c'. So eventually you'll get to "ababcdabc" and the
      'c' matches, so increment the expected pointer to 'x'. The next
      character in the sequence is 'x' which matches the expected character,
      so increment the expected pointer. At this point there are no more
      expected chars so it means you've found match.

      To match any number of characters, we could call std::search
      repeatedly. Suppose the expected string is "ba*cx". Split this
      into "ba" and "cx". Search the source string for "ba". If not
      found then "ba*cx" couldn't possibly be found either. But if
      "ba" found then find "cx".

      using std::string;
      typedef string::const_i terator Iter;
      const string source("ababcda bcxd");
      const string expected("ba*cx ");
      Iter expect = expected.begin( );
      while (expect!=expect ed.end())
      {
      using std::find;
      using std::search;
      Iter expectend = find(expect, expect.end(), '*');
      Iter s = search(source.b egin(), source.end(), expect, expectend);
      if (s == source.end()) return false; // not found
      if (expectend == expected.end()) return true;
      expect = expectend;
      ++expect;
      ++s;
      }


      The efficient algorithm is to loop over the chars in the search string
      's' or "ababcdabcx d". It is sufficient to loop over the first
      length(s)-length(s2)+1 chars.

      Note the std::search algorithm does not assume null terminated arrays
      and works with any container, such as std::vector<cha r>,
      std::vector<int >, etc.


      Comment

      • Matt

        #4
        Re: Searching of byte array with wildcards?

        Richard Berg wrote:[color=blue]
        > Hello,
        >
        > I need to search a byte[] array for a sequence of bytes. The sequence
        > may include wildcards. For example if the array contains 0xAA, 0xBB,
        > 0xAA, OxDD then I want to be able to search for 0xAA, 0x?? and get two
        > matches... I've been all around Google but still can't find any
        > suggestions on how anything like this can be implemented.... . Can
        > someone please give me a clue? Some well-known algorithm maybe?
        >
        > Thanks![/color]

        See language theory books such as _Introduction to Automata Theory,
        Languages, and Computation_, by Hopcroft and Ullman, 1979, or similar.

        See compiler textbooks such as _Compilers: Principles, Techniques, and
        Tools_, by Aho, Sethi, and Ullman, 1987, Chapter 3: Lexical Analysis, or
        similar.

        Topics: regular expressions, regular languages, deterministic finite
        automata, lexical analysis.

        You would build a little dynamic lexical-analyzer generator. The
        generated lexical analyzer would be a little deterministic finite
        automaton (DFA). Start off reading about DFAs and regular expressions
        and you will rather soon see how to do it.

        Comment

        • Siemel Naran

          #5
          Re: Searching of byte array with wildcards?

          "Matt" <matt@themattfe lla.zzzz.com> wrote in message news:aOpfc.1742
          [color=blue]
          > See language theory books such as _Introduction to Automata Theory,
          > Languages, and Computation_, by Hopcroft and Ullman, 1979, or similar.
          >
          > See compiler textbooks such as _Compilers: Principles, Techniques, and
          > Tools_, by Aho, Sethi, and Ullman, 1987, Chapter 3: Lexical Analysis, or
          > similar.
          >
          > Topics: regular expressions, regular languages, deterministic finite
          > automata, lexical analysis.
          >
          > You would build a little dynamic lexical-analyzer generator. The
          > generated lexical analyzer would be a little deterministic finite
          > automaton (DFA). Start off reading about DFAs and regular expressions
          > and you will rather soon see how to do it.[/color]

          Fine, but the real thig. But what if std::search will suffice?


          Comment

          • Matt

            #6
            Re: Searching of byte array with wildcards?

            Siemel Naran wrote:[color=blue]
            > "Matt" <matt@themattfe lla.zzzz.com> wrote in message news:aOpfc.1742
            >
            >[color=green]
            >>See language theory books such as _Introduction to Automata Theory,
            >>Languages, and Computation_, by Hopcroft and Ullman, 1979, or similar.
            >>
            >>See compiler textbooks such as _Compilers: Principles, Techniques, and
            >>Tools_, by Aho, Sethi, and Ullman, 1987, Chapter 3: Lexical Analysis, or
            >>similar.
            >>
            >>Topics: regular expressions, regular languages, deterministic finite
            >>automata, lexical analysis.
            >>
            >>You would build a little dynamic lexical-analyzer generator. The
            >>generated lexical analyzer would be a little deterministic finite
            >>automaton (DFA). Start off reading about DFAs and regular expressions
            >>and you will rather soon see how to do it.[/color]
            >
            >
            > Fine, but the real thig. But what if std::search will suffice?
            >
            >[/color]

            For the OP's special purpose, your solution is simpler and better.

            Comment

            Working...