Pattern Matching in C/C++

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • abz89
    New Member
    • Mar 2010
    • 21

    Pattern Matching in C/C++

    I've learned a programming language in C/C++. I want to make a program which try to compare two arrays (random chars)and notify the user about the differrents. The program have to notify both the subtitution and insertion/deletion.

    Here are the examples :

    array 1 : xxxxxxxxx...(as standard)
    array 2 : xxxyxxxxx...

    the program will notify that,
    "there are subtitution in 4th"

    and

    array 1 : xxxxxxx..
    array 2 : xxyyxxxxx..

    "there are insertion yy at 3rd"

    I've designed the algorithm with conditional if, but they are still many bugs when i'm used on variant data (array2)
  • Dheeraj Joshi
    Recognized Expert Top Contributor
    • Jul 2009
    • 1129

    #2
    What errors you are getting? What does the code look like. Please post the appropriate code snippet along with code tags.

    Regards
    Dheeraj Joshi

    Comment

    • abz89
      New Member
      • Mar 2010
      • 21

      #3
      i don't know the error mechanism,
      this algorithm results the wrong analyze output..

      here is the code (in C#) but i want to porting it to C++

      Code:
      /* here is the fragment of my program (abz)
      the program kindly array comparer
      
      */
      public void algoritma(string dt, string ds, ref string dm, int n, int m, int array2_counts, int array1_counts, string filename)
              {
                  // algorithm attributtes
                  int different_counts = 0;
                  int deletion_counts = 0;
                  int insertion_counts = 0;
                  array2_counts = ds.Length;
                  dm += "array filename: " + filename + "\r\n" + "total char: " + array2_counts + " \r\n ";
                  ds += "XXXXXXXXXXXX";
                  dt += "XXXXXXXXXXXX";
                  array1_counts = dt.Length;
                  array2_counts = ds.Length;
                  this.progressBar1.Maximum = array1_counts;
                  if (this.Button1.Clicked)
                  {
                      #conditional region
                      try
                      {
      while (n < array2_counts - 12 && m < array1_counts)
                          {
                              if (ds[n] != dt[m])
                              {
                                  if (n + 5 < array2_counts && m + 6 < array1_counts)
                                  {
                                      if ((ds[n] == dt[m + 1]) && (ds[n + 1] == dt[m + 2]) && (ds[n + 2] == dt[m + 3]) && (ds[n + 3] == dt[m + 4]) && (ds[n + 4] == dt[m + 5]) && (ds[n + 5] == dt[m + 6]))
                                      {
                                          dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + " D";
                                          dm += "\r\n";
                                          m++;
                                          different_counts++;
                                          deletion_counts++;
                                      }
                                      else if (n + 5 < array2_counts && m + 5 < array1_counts)
                                      {
                                          if ((ds[n + 1] == dt[m + 1]) && (ds[n + 2] == dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]))
                                          {
                                              dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n];
                                              dm += "\r\n";
                                              different_counts++;
                                          }
                                          else if (n + 5 < array2_counts && m + 5 < array1_counts)
                                          {
                                              if ((ds[n + 1] == dt[m + 1]) && (ds[n + 2] != dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]))
                                              {
                                                  dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n];
                                                  dm += "\r\n";
                                                  different_counts++;
                                              }
                                              else if (n + 8 < array2_counts && m + 8 < array1_counts)
                                              {
                                                  if ((ds[n + 2] == dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]) && (ds[n + 6] == dt[m + 6]) && (ds[n + 7] != dt[m + 7]) && (ds[n + 8] == dt[m + 8]))
                                                  {
                                                      dm += "there are substitution at, " + dt[m + 1] + " " + (m + 1) + " " + ds[n];
                                                      dm += "\r\n";
      
                                                      dm += "ada substitusii di posisi, " + dt[m + 1] + " " + (m + 2) + " " + ds[n + 1];
                                                      dm += "\r\n";
                                                      m++;
                                                      n++;
                                                      different_counts = different_counts + 2;
                                                  }
                                                  else if (n + 4 < array2_counts && m + 6 < array1_counts)
                                                  {
                                                      if ((ds[n] == dt[m + 2]) && (ds[n + 1] == dt[m + 3]) && (ds[n + 2] == dt[m + 4]) && (ds[n + 3] == dt[m + 5]) && (ds[n + 4] == dt[m + 6]))
                                                      {
                                                          dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + "D";
                                                          dm += "\r\n";
                                                          dm += "there are deletion at, " + dt[m + 2] + " " + (m + 2) + "D";
                                                          dm += "\r\n";
                                                          m = m + 2;
                                                          different_counts = different_counts + 2;
                                                          deletion_counts = deletion_counts + 2;
                                                      }
                                                      else if (n + 4 < array2_counts && m + 13 < array1_counts)
                                                      {
                                                          if ((ds[n] == dt[m + 9]) && (ds[n + 1] == dt[m + 10]) && (ds[n + 2] == dt[m + 11]) && (ds[n + 3] == dt[m + 12]) && (ds[n + 4] == dt[m + 13]))
                                                          {
                                                              dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 2] + " " + (m + 2) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 3] + " " + (m + 3) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 4] + " " + (m + 4) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 5] + " " + (m + 5) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 6] + " " + (m + 6) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 7] + " " + (m + 7) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 8] + " " + (m + 8) + " D" + "\r\n";
                                                              dm += "there are deletion at, " + dt[m + 9] + " " + (m + 9) + " D" + "\r\n";
                                                              m = m + 9;
                                                              different_counts = different_counts + 9;
                                                              deletion_counts = deletion_counts + 9;
                                                          }
                                                          else if (n + 5 < array2_counts && m + 4 < array1_counts)
                                                          {
                                                              if ((ds[n + 1] == dt[m]) && (ds[n + 2] == dt[m + 1]) && (ds[n + 3] == dt[m + 2]) && (ds[n + 4] == dt[m + 3]) && (ds[n + 5] == dt[m + 4]))
                                                              {
                                                                  dm += "there are insertion at, " + "I " + m + ".1" + " " + ds[n + 1] + "\r\n";
                                                                  m = m - 1;
                                                                  different_counts++;
                                                                  insertion_counts++;
                                                              }
                                                              else if (n + 7 < array2_counts && m + 5 < array1_counts)
                                                              {
                                                                  if ((ds[n + 2] == dt[m]) && (ds[n + 3] == dt[m + 1]) && (ds[n + 4] == dt[m + 2]) && (ds[n + 5] == dt[m + 3]) && (ds[n + 6] == dt[m + 4]) && (ds[n + 7] == dt[m + 5]))
                                                                  {
                                                                      dm += "there are insertion at, " + "I " + m + ".2" + " " + ds[n + 2] + "\r\n";
                                                                      m = m - 1;
                                                                      n++;
      
                                                                      different_counts = different_counts + 2;
                                                                      insertion_counts = insertion_counts + 2;
                                                                  }
                                                                  else if (n + 8 < array2_counts && m + 4 < array1_counts)
                                                                  {
                                                                      if ((ds[n + 4] == dt[m]) && (ds[n + 5] == dt[m + 1]) && (ds[n + 6] == dt[m + 2]) && (ds[n + 7] == dt[m + 3]) && (ds[n + 8] == dt[m + 4]))
                                                                      {
                                                                          dm += "there are insertion at, " + "I " + m + ".4" + " " + ds[n + 4] + "\r\n";
                                                                          m = m - 1;
                                                                          n = n + 3;
                                                                          different_counts = different_counts + 4;
                                                                          insertion_counts = insertion_counts + 4;
                                                                      }
                                                                      else if (n + 13 < array2_counts && m + 7 < array1_counts)
                                                                      {
                                                                          if ((ds[n + 6] == dt[m]) && (ds[n + 7] == dt[m + 1]) && (ds[n + 8] == dt[m + 2]) && (ds[n + 9] == dt[m + 3]) && (ds[n + 10] == dt[m + 4]) && (ds[n + 11] == dt[m + 5]) && (ds[n + 12] == dt[m + 6]) && (ds[n + 13] == dt[m + 7]))
                                                                          {
                                                                              dm += "there are insertion at, " + "I " + m + ".6" + " " + ds[n + 6] + "\r\n";
                                                                              m = m - 1;
                                                                              n = n + 5;
                                                                              different_counts = different_counts + 6;
                                                                              insertion_counts = insertion_counts + 6;
                                                                          }
                                                                          else if (n + 13 < array2_counts && m + 7 < array1_counts)
                                                                          {
                                                                              if ((ds[n + 6] == dt[m]) && (ds[n + 7] == dt[m + 1]) && (ds[n + 8] == dt[m + 2]) && (ds[n + 9] == dt[m + 3]) && (ds[n + 10] == dt[m + 4]) && (ds[n + 11] == dt[m + 5]) && (ds[n + 12] == dt[m + 6]) && (ds[n + 13] == dt[m + 7]))
                                                                              {
                                                                                  dm += "there are insertion at, " + "I " + m + ".6" + " " + ds[n + 6] + "\r\n";
                                                                                  m = m - 1;
                                                                                  n = n + 5;
                                                                                  different_counts = different_counts + 6;
                                                                                  insertion_counts = insertion_counts + 6;
                                                                              }
                                                                              else if (n + 2 < array2_counts && m + 1 < array1_counts)
                                                                              {
                                                                                  if ((ds[n + 1] == dt[m]) && (ds[n + 2] == dt[m + 1]))
                                                                                  {
                                                                                      dm += "there are insertion at, " + "I" + m + ".1" + " " + ds[n + 1] + "\r\n";
                                                                                      m = m - 1;
                                                                                      different_counts++;
                                                                                      insertion_counts++;
                                                                                  }
                                                                                  else if (n + 1 < array2_counts && m < array1_counts)
                                                                                  {
                                                                                      if ((ds[n + 1] == dt[m]))
                                                                                      {
                                                                                          dm += "there are insertion at, " + "I " + m + ".1" + " " + ds[n + 1] + "\r\n";
                                                                                          m = m - 1;
                                                                                          different_counts++;
                                                                                          insertion_counts++;
                                                                                      }
                                                                                      else if (n + 1 < array2_counts && m + 1 < array1_counts)
                                                                                      {
                                                                                          if (ds[n + 1] == dt[m + 1])
                                                                                          {
      
                                                                                              dm += "ada substitusi diposisi, " + dt[m] + " " + (m + 1) + " " + ds[n] + "\r\n";
                                                                                              different_counts++;
                                                                                          }
                                                                                          else
                                                                                          {
                                                                                              dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n] + "\r\n";
                                                                                              different_counts++;
                                                                                          }
                                                                                      }
                                                                                  }
                                                                              }
                                                                          }
                                                                      }
                                                                  }
                                                              }
                                                          }
                                                      }
                                                  }
                                              }
                                          }
                                      }
                                  }
                              }
                              n++;
                              m++;

      Comment

      • sridhard2406
        New Member
        • Dec 2007
        • 52

        #4
        if you know the total index of two arrays, you can compare the each and every index and print the output, which one is getting differed until end of the index.

        Comment

        • Dheeraj Joshi
          Recognized Expert Top Contributor
          • Jul 2009
          • 1129

          #5
          Find out the size of 2 arrays.
          If sizes are not same no need to do further calculation. You can exit.
          Is sizes are same, traverse through each element of array and compare them. If character at position "n" are not same exit. You can also count the position of difference(whic h you can calculate using loop index).

          Regards
          Dheeraj Joshi

          Comment

          • abz89
            New Member
            • Mar 2010
            • 21

            #6
            yupz, my algorithm do..
            it compare the array length to find out the insertion/deletion

            after the array length are the same, continue to substitution compare..
            but my algorithm have some error on some data..

            can somebody suggest the effective algorithm?? please
            thank you.. :)

            Comment

            • Dheeraj Joshi
              Recognized Expert Top Contributor
              • Jul 2009
              • 1129

              #7
              Effective in time and space?
              Some algorithm experts may help you.
              If the input set is small, almost all algorithms works ok. But as the input set grows the complexity changes.

              Regards
              Dheeraj Joshi

              Comment

              • jkmyoung
                Recognized Expert Top Contributor
                • Mar 2006
                • 2057

                #8
                By any chance do you have the code in pseudocode? You've haven't really commented your code so it's hard to see what's going on.

                Also, let's say you have
                bnanan
                banana

                Is this considered an insertion of an a, then a deletion of a n,
                OR a deletion of an n, then an insertion of an a?

                Comment

                • abz89
                  New Member
                  • Mar 2010
                  • 21

                  #9
                  @dheerajjoshim
                  hmm, my algo is just conditional if, which prior to insertion/deletion then substitution..

                  @jkmyoung
                  owh i cannot commented the code well, my english was bad, so i can't descripting it fine :(

                  at your example, it will insertion of an a, then a deletion of a n
                  because the array 1 is a standard which to be fixed (may not changed or reassembly)

                  thx a lot everybody :)

                  Comment

                  • jkmyoung
                    Recognized Expert Top Contributor
                    • Mar 2006
                    • 2057

                    #10
                    Ok, so you prefer insertion over deletion.

                    Where are you setting n and m? If not set they probably have the old values in them. Actually looking at the current algorithm, it is painful.

                    offhand pseudo algorithm: bound checking excluded.
                    Code:
                    Comparing ds to dt
                    m, n set to 0
                    mmax = ds.length - 1
                    nmax = dt.length - 1
                    while (m <= mmax and n <= nmax)
                      if ds[m] = dt[n], m++, n++, continue
                      //did not match. 
                      invalidChars = 1
                      // try pure insertion first
                      for (insert = invalidChars; insert >= 0; insert --){
                        if dt[n+insert] = ds[m+invalidChars - insert]{
                          ***print off list of inserted and invalid characters
                          n += 1 + insert
                          m += 1 + invalidChars - insert
                          break;
                          }
                      }
                    To explain this: for (insert = invalidChars; insert >= 0; insert --){
                    Lets say you have
                    abcde
                    fcghi

                    We list the comparisons in order below:
                    compare a -> f, no
                    invalidChars = 1, insert = 1
                    a -> c? no, insert = 0
                    b -> f? no.
                    invalidChars = 2, insert = 2
                    a -> g? no. insert = 1
                    b-> c? no. insert = 0
                    c -> f? no.
                    invalidChars = 3, insert = 3
                    a -> h? no. insert = 2
                    b-> g? no. insert = 1
                    c -> c? MATCH!
                    So,
                    a deleted
                    b deleted
                    f inserted.
                    ===
                    This biggest problem with this is that it is iterative. Eg if you have
                    abcbde
                    acbde
                    the algorithm will match the wrong b's and say:
                    c inserted at 2
                    c deleted at 3
                    b deleted at 4.
                    As opposed to the simple:
                    b deleted at 2

                    If you want to avoid that, you need to implement saving states and backtracking with some sort of recursion.

                    Comment

                    • abz89
                      New Member
                      • Mar 2010
                      • 21

                      #11
                      my mean is not prefer insertion over deletion..
                      at your example, if i prefer deletion over insertion, the array1 lenght will changed :)

                      the algo have not changed the array1 but array2, cause array1 act as standard, so the array2 have to matched with array1

                      at my algo, i make the 'fragmen' to compare
                      cause the real array can up to thousands,

                      this algo will create a fragments with 13char (array) for each
                      to make possible char matching process go to the end

                      here are the links, my algo with a little comment

                      Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

                      sorry cause it too ugly to post here
                      my english was bad too :(

                      Comment

                      • abz89
                        New Member
                        • Mar 2010
                        • 21

                        #12
                        Originally posted by jkmyoung
                        Ok, so you prefer insertion over deletion.

                        Where are you setting n and m? If not set they probably have the old values in them. Actually looking at the current algorithm, it is painful.

                        offhand pseudo algorithm: bound checking excluded.
                        Code:
                        Comparing ds to dt
                        m, n set to 0
                        mmax = ds.length - 1
                        nmax = dt.length - 1
                        while (m <= mmax and n <= nmax)
                          if ds[m] = dt[n], m++, n++, continue
                          //did not match. 
                          invalidChars = 1
                          // try pure insertion first
                          for (insert = invalidChars; insert >= 0; insert --){
                            if dt[n+insert] = ds[m+invalidChars - insert]{
                              ***print off list of inserted and invalid characters
                              n += 1 + insert
                              m += 1 + invalidChars - insert
                              break;
                              }
                          }
                        To explain this: for (insert = invalidChars; insert >= 0; insert --){
                        Lets say you have
                        abcde
                        fcghi

                        We list the comparisons in order below:
                        compare a -> f, no
                        invalidChars = 1, insert = 1
                        a -> c? no, insert = 0
                        b -> f? no.
                        invalidChars = 2, insert = 2
                        a -> g? no. insert = 1
                        b-> c? no. insert = 0
                        c -> f? no.
                        invalidChars = 3, insert = 3
                        a -> h? no. insert = 2
                        b-> g? no. insert = 1
                        c -> c? MATCH!
                        So,
                        a deleted
                        b deleted
                        f inserted.
                        ===
                        This biggest problem with this is that it is iterative. Eg if you have
                        abcbde
                        acbde
                        the algorithm will match the wrong b's and say:
                        c inserted at 2
                        c deleted at 3
                        b deleted at 4.
                        As opposed to the simple:
                        b deleted at 2

                        If you want to avoid that, you need to implement saving states and backtracking with some sort of recursion.
                        no not like this dude..
                        insertion is the insertion data to array1 so the array1 will changed

                        eg:

                        array1 : abcde < 5 char
                        array2 : abcfde < 6 char

                        for deletion:

                        array1 : abcde < 5 char
                        array2 : abde < 4 char, d deleted

                        so substitution will executed after the lenght of both array are same..

                        here is a insertion and subt case:

                        array1 : abcdef < 6 char
                        array2 : abcghef < 7 char

                        at array2 : g will deleted to reduce the array length

                        array1 : abcdef
                        array2 : abchef

                        the insertion/deletion have higher prior than subs, so after lenght data matched the subs comparison will executed..

                        thx a lot :D

                        Comment

                        • jkmyoung
                          Recognized Expert Top Contributor
                          • Mar 2006
                          • 2057

                          #13
                          I know what you mean, but what I mean is coding up the algorithm smartly will not be so simple: Take the core of this example again:
                          bcb
                          cb
                          To the human eye, we can easily see that we just need to delete b, to get the string. To the computer's eye, it's not that easy. It only looks at what we tell it, so it might match the wrong b's first. If we tell it to look for deletion first, we get the reverse problem. eg:
                          cb
                          bcb
                          Again the wrong b's are matched up.
                          ===
                          To do this smartly, you need to save the results of multiple methods, recursively.
                          public void algorithma(stri ng dt, string ds, ref string dm, int n, int m, int array2_counts, int array1_counts, string filename)
                          Is this the only method declaration you are allowed to use?

                          The algorithm would be more like:
                          Code:
                          Algorithm(..arguments...){
                            while matching, increment m and n
                              once we get to our first mismatch, break;
                          
                            If we reached the end of one of the strings, finish results and return them.
                          
                            Otherwise:
                              Recursively find all possible solutions from here on.
                              Pick the best solution.
                            return the best solution
                          }
                          In order to not go through every case, have a variable, which holds the best results so far, passed in to the function. Once we reach that number, we end output and return null.

                          Eg a saved state might look something like

                          Code:
                           class State
                              {
                                  String mismatchOutputString;
                                  int errors;
                          
                                  public State()        {
                                      errors = 0;
                                      mismatchOutputString = "";
                                  }
                          
                                  public void Insert(int i, char c)
                                  {
                                      mismatchOutputString += "inserted at " + i + ":" + c + "\n";
                                      errors++;
                                  }
                          }
                          Then when you recurse, you check the number of errors. If worse than your best case so far, throw it away.

                          Comment

                          • abz89
                            New Member
                            • Mar 2010
                            • 21

                            #14
                            thx a lot to the responses

                            i've found the algorithm!
                            what i need is just "levensthei n distance"

                            :D

                            now i'm trying on it..
                            hope i can :p

                            Comment

                            • jkmyoung
                              Recognized Expert Top Contributor
                              • Mar 2006
                              • 2057

                              #15
                              interesting! Note, that the algorithm uses replacement as a single operation, but you can easily modify it.

                              Comment

                              Working...