All possible combinations for seqence

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Alex72
    New Member
    • May 2007
    • 7

    All possible combinations for seqence

    I'm programming with C on Linux to get all possible combinations of particular sequence related to biology. But I don't know how I implement it because it seems to be very complicated to make rule. I'm going to use those combination to make different sequences to test.

    The sequence is like following.

    info_table[0]: W S N Y N Y N N W S
    info_table[1]: 2 2 4 2 4 2 4 4 2 2 : the number of replaceable letters

    Each letter should be replaced with proper other letters.
    For example,
    W->A or T,
    S->C or G,
    Y->C or T,
    N-> A or C or G or T
    Total the number of combinations is up to 16,384(=2*2*4*2 *4*2*4*4*2*2).

    So final results are like below:
    case 1: A C A C A C A A A C
    case 2: T - - - - - - - - - (-: same as case 1)
    case 3: - G - - - - - - - -
    case 4: - - C - - - - - - -
    case 5: - - G - - - - - - -
    case 6: - - T - - - - - - -
    case 7: - - - T - - - - - -
    case 8: - - - - C - - - - -
    case 9: - - - - G - - - - -
    case10: - - - - T - - - - -
    case11: - - - - - T - - - -
    case12: - - - - - - C - - -
    case13: - - - - - - G - - -
    case14: - - - - - - T - - -
    case15: - - - - - - - C - -
    case16: - - - - - - - G - -
    case17: - - - - - - - T - -
    case18: - - - - - - - - T -
    case19: - - - - - - - - - G
    (replace two letters)
    case20: T G - - - - - - - -
    case21: T - C - - - - - - -
    case22: T - G - - - - - - -
    case23: T - T - - - - - - -
    case24: T - - T - - - - - -
    case25: T - - - C - - - - -
    case26: T - - - G - - - - -
    case27: T - - - T - - - - -
    ...
    ...
    (replace three letters)
    case n: T G C - - - - - - -
    casen1: T G G - - - - - - -
    casen2: T G T - - - - - - -
    casen3: T G - T - - - - - -
    casen4: T G - - C - - - - -
    casen5: T G - - G - - - - -
    casen6: T G - - T - - - - -
    ...
    ...
    casem1: T - C T - - - - - -
    casem2: T - C - C - - - - -
    casem2: T - C - G - - - - -
    casem2: T - C - T - - - - -
    ...
    ...
    casel1: T - - T C - - - - -
    casel2: T - - T G - - - - -
    ...
    ...

    (replace four letters)
    casek1: T G C T - - - - - -
    ...

    What I'd like to know is how to make all possible combinations with those sequence to make it different sequence from original one. I think there are lots of patterns to extract letters from different positions. By the way, the length of sequence is up to 100 approximately.

    For example, I need to choose one letter to replace it.

    W
    S
    N
    Y
    N
    W
    S

    I need to choose two letters.

    WS WN WY WN WY WN WN WW WS: start with W at first position
    SN SY SN SY SN SN SW SS: start with S at second position
    NY NN NY NN NN NW NS: start with N at third position
    ....
    ....
    WS: start with W at second position from last

    Next, I need to choose three letters. It's lost of possible to make combinations

    WSN WSY WSN WSY WSN WSN WSW WSS: case n
    WNY WNN WNY WNN WNN WNW WNS: case m
    ...
    ...

    Next, five, six and so on...

    Thank you for your help in advance.
  • sicarie
    Recognized Expert Specialist
    • Nov 2006
    • 4677

    #2
    I'm sorry, was there a coding question in there?

    Perhaps this might help you.

    Comment

    • kky2k
      New Member
      • May 2007
      • 34

      #3
      so far what u tried abt this..if u come up with some try..then evryone will help u out..

      Comment

      • Alex72
        New Member
        • May 2007
        • 7

        #4
        Originally posted by sicarie
        I'm sorry, was there a coding question in there?

        Perhaps this might help you.
        I'm trying to think about some algorithms before starting code.
        But I only finished up with pseudo code in case of one letter chosen.
        I've been stock in case of more than 2 letters chosen.

        Code:
        char *info_table[]; 
        char *basic_seq, *temp_change;
        
        info_table[0]="WSNYNYNNWS";
        info_table[1]="2242424422";
        
        strcpy(basic_seq, info_table[0]);
        
        convert_basetype(basic_seq);  // replace contents in info_table[0] with A,C,G or T
        // before:WSNYNYNNWS, after: ACACACAAAC
        
        for(i=1; i<length; i++)
        {
          for(j=1; j<info_table[1][i]-1; j++)
          {
            strcpy(temp_change,basic_seq);  // copy basic sequence into temp_change
            temp_change[i] = getAltLetters(info_table[0][i], j);  // return replaceable letter like A,C,G or T
            strcpy(out_table[cnt++],temp_change); 
          }
        }
        If you need anything else to help, please let me know.
        Thank you.

        Comment

        • sicarie
          Recognized Expert Specialist
          • Nov 2006
          • 4677

          #5
          So you have values you want to compute all the different possible combinations of, of strings of length 100 chars. Have you decided on a first or base string? Is there more than one possible base string?

          Comment

          • sicarie
            Recognized Expert Specialist
            • Nov 2006
            • 4677

            #6
            Right now I'm thinking a recursive call to a choosing function might be a good way to do this (I know I did this a bit with trees before), but the N (four possible options) might be tricky... I'd also recommend the singleton design pattern for your result set.

            Comment

            • Alex72
              New Member
              • May 2007
              • 7

              #7
              Originally posted by sicarie
              So you have values you want to compute all the different possible combinations of, of strings of length 100 chars. Have you decided on a first or base string? Is there more than one possible base string?
              I'm sorry I don't understand exactly because I'm not good at English.
              But I think I need to explain the procedure of running of this program.
              If this does not meet your question, please rephrase your question.

              Step1: User input sequence of nucleotides.
              For examples: WSNYTNTGGYTNCCN WSNGRGCNA.... (size is up to 200 or 300)
              This sequence will be different.

              Step2: My program will extract ambiguous letters from input.
              Ambiguous letters are all letters of input other than A,C,G and T
              so info_table[0] will contain WSNYNYNNWS... (size is up to 100)

              Step3: Create all the different possible combinations from info_table[0]
              For examples:
              ACACACAAAC
              T----------
              -G--------
              ....

              W: A,T
              S: C,G
              N: A,C,G,T
              Y: C,T
              N: A,C,G,T
              Y: C,T
              N: A,C,G,T
              N: A,C,G,T
              W: A,T
              S: C,G

              I need algorithms to make all different possible combination with replaceable letters like above,

              Step4: Modify original sequence with result from Step3 - I think I can do this
              For examples:
              ACACTATGGCTACCA ACAGAGCAA....: with only A,C,G and T
              TCACTATGGCTACCA ACAGAGCAA....
              ....

              Thank you.

              Comment

              • blazedaces
                Contributor
                • May 2007
                • 284

                #8
                Originally posted by Alex72
                I'm sorry I don't understand exactly because I'm not good at English.
                But I think I need to explain the procedure of running of this program.
                If this does not meet your question, please rephrase your question.

                Step1: User input sequence of nucleotides.
                For examples: WSNYTNTGGYTNCCN WSNGRGCNA.... (size is up to 200 or 300)
                This sequence will be different.

                Step2: My program will extract ambiguous letters from input.
                Ambiguous letters are all letters of input other than A,C,G and T
                so info_table[0] will contain WSNYNYNNWS... (size is up to 100)

                Step3: Create all the different possible combinations from info_table[0]
                For examples:
                ACACACAAAC
                T----------
                -G--------
                ....

                W: A,T
                S: C,G
                N: A,C,G,T
                Y: C,T
                N: A,C,G,T
                Y: C,T
                N: A,C,G,T
                N: A,C,G,T
                W: A,T
                S: C,G

                I need algorithms to make all different possible combination with replaceable letters like above,

                Step4: Modify original sequence with result from Step3 - I think I can do this
                For examples:
                ACACTATGGCTACCA ACAGAGCAA....: with only A,C,G and T
                TCACTATGGCTACCA ACAGAGCAA....
                ....

                Thank you.
                Listen, I want to try my best not to do your work for you (ethically, not to mention the forum guidelines), but I'll try and present you with the logic that works and try and help lead you to the correct solution to the problem.

                Ok, so you even said what you actually have to do: store/print (why do you want to print 16,000+ combinations?) all combinations of a sequence.

                Here's what you know:
                -The sequence has characters in it that can be replaced (These can be represented either by WNYS... or basically all characters not equal to A,C,G, or T)
                -These replaceable characters have specific characters that can replace them:
                "W: A,T
                S: C,G
                N: A,C,G,T
                Y: C,T" (Question: Why are you repeating this over and over? why not write it only once for W,N,Y,S ?)

                -To store/create EVERY combination (you said this up top with the case example) you would have to go through every replaceable character, so like, change the first one, same as the rest, next, next, then change the first one and the second, the same for the rest, change the second again, etc. (you already said that, it's exactly what you need to do)

                -So things you'll probably need to keep in mind if you did this by hand (I always ask that, what would I need to do this by hand, after all, the computer needs to know the same things you would):
                -The length of the inputed string (it could be 200 or 300 or whatever, but you need to know that number)
                -The number of replaceable characters - sift through the string and count every one (use a for loop, go through every single character one at a time if it's one of the replaceable characters simply count++ or whatever)
                -The number of possible combinations (you may not need this, but for the way I'm imagining it, you would)- you calculated it up there, you know your statistics ;)

                I hope that helps. Use that and perhaps get started, come up with something, and come back for more help.

                Good luck,
                -blazed

                Comment

                • Alex72
                  New Member
                  • May 2007
                  • 7

                  #9
                  Hi blazed,

                  Thank you for helping me.

                  But I have a question in your answer.

                  -To store/create EVERY combination (you said this up top with the case example) you would have to go through every replaceable character, so like, change the first one, same as the rest, next, next, then change the first one and the second, the same for the rest, change the second again, etc. (you already said that, it's exactly what you need to do)

                  I'm afraid there may need more complicate consideration in there.
                  If I need to change 3 or more letters in given sequence, there are lots of possibilities to make different values from original one.

                  For examples:
                  Supposing there are only 10 characters in the sequence.
                  And if I want to change different values with five replaceable letters.

                  [HTML]
                  WSNYNYNNWS
                  vvvvv :pattern1
                  vvvvv
                  vvvvv
                  vvvvv
                  .....
                  vvvv v :pattern2
                  vvvv v
                  vvvv v
                  ...
                  vvv vv :pattern3
                  vvv vv
                  vvv vv
                  ...
                  vvv v v :pattern4
                  vvv v v
                  vvv v v
                  ...
                  vv vvv :pattern5
                  vv vv v :pattern6
                  vv v vv :pattern7
                  vv v v v :pattern8
                  v vvvv :pattern9
                  v v vvv :pattern10
                  v v v vv :pattern11
                  v v v v v :pattern12
                  v vv vv :pattern13
                  v vv v v :pattern14
                  ...
                  ...
                  [/HTML]

                  Mark 'v' means that marked position can be replaced with replaceable letters.
                  Those patterns can shift to right.
                  I don't know how to find out those all patterns.

                  I'd like to explain what I want clearly, but my English doesn't help me to do that.

                  Thank you.

                  Comment

                  • blazedaces
                    Contributor
                    • May 2007
                    • 284

                    #10
                    Originally posted by Alex72
                    Hi blazed,

                    Thank you for helping me.

                    But I have a question in your answer.

                    -To store/create EVERY combination (you said this up top with the case example) you would have to go through every replaceable character, so like, change the first one, same as the rest, next, next, then change the first one and the second, the same for the rest, change the second again, etc. (you already said that, it's exactly what you need to do)

                    I'm afraid there may need more complicate consideration in there.
                    If I need to change 3 or more letters in given sequence, there are lots of possibilities to make different values from original one.

                    For examples:
                    Supposing there are only 10 characters in the sequence.
                    And if I want to change different values with five replaceable letters.

                    [HTML]
                    WSNYNYNNWS
                    vvvvv :pattern1
                    vvvvv
                    vvvvv
                    vvvvv
                    .....
                    vvvv v :pattern2
                    vvvv v
                    vvvv v
                    ...
                    vvv vv :pattern3
                    vvv vv
                    vvv vv
                    ...
                    vvv v v :pattern4
                    vvv v v
                    vvv v v
                    ...
                    vv vvv :pattern5
                    vv vv v :pattern6
                    vv v vv :pattern7
                    vv v v v :pattern8
                    v vvvv :pattern9
                    v v vvv :pattern10
                    v v v vv :pattern11
                    v v v v v :pattern12
                    v vv vv :pattern13
                    v vv v v :pattern14
                    ...
                    ...
                    [/HTML]

                    Mark 'v' means that marked position can be replaced with replaceable letters.
                    Those patterns can shift to right.
                    I don't know how to find out those all patterns.

                    I'd like to explain what I want clearly, but my English doesn't help me to do that.

                    Thank you.
                    First of all, may I ask what language you do speak fluently? Just curious.

                    Alright, well let me ask you two yes/no questions, maybe they will help me understand better:

                    Are you saying that the program asks how many replaceable characters you want changed?

                    For example, could this be a computer output:

                    Computer Output: What sequence of DNA am I spitting out combinations for?

                    User Input: WNYSACGTWNYS

                    Computer Output: Good, now how many characters can I replace?

                    User Input: 2

                    Computer Output: The combinations have been stored, they are:

                    AAYSACGTWNYS
                    ACYSACGTWNYS
                    AGYSACGTWNYS
                    ATYSACGTWNYS

                    ANCSACGTWNYS
                    ANTSACGTWNYS

                    ANYCACGTWNYS
                    ANYGACGTWNYS

                    ANYSACGTANYS
                    ANYSACGTTNYS

                    ANYSACGTWAYS
                    ANYSACGTWCYS
                    ANYSACGTWGYS
                    ANYSACGTWTYS

                    ANYSACGTWNCS
                    ANYSACGTWNTS

                    ANYSACGTWNYC
                    ANYSACGTWNYG

                    TAYSACGTWNYS
                    TCYSACGTWNYS
                    TGYSACGTWNYS
                    TTYSACGTWNYS

                    .
                    .
                    .

                    OR

                    Are you saying that the possible combinations can include characters that stay the same, another words are not replaced?

                    Now, let me just say no matter what, this can be done, just approach it calmly. After all, if you had just paper and as much time as you needed, you could do them all by hand right? That means you can tell a computer how to do it.

                    -blazed

                    Comment

                    • Alex72
                      New Member
                      • May 2007
                      • 7

                      #11
                      Originally posted by blazedaces
                      First of all, may I ask what language you do speak fluently? Just curious.

                      Alright, well let me ask you two yes/no questions, maybe they will help me understand better:

                      Are you saying that the program asks how many replaceable characters you want changed?

                      For example, could this be a computer output:

                      Computer Output: What sequence of DNA am I spitting out combinations for?

                      User Input: WNYSACGTWNYS

                      Computer Output: Good, now how many characters can I replace?

                      User Input: 2

                      Computer Output: The combinations have been stored, they are:

                      AAYSACGTWNYS
                      ACYSACGTWNYS
                      AGYSACGTWNYS
                      ATYSACGTWNYS

                      ANCSACGTWNYS
                      ANTSACGTWNYS

                      ANYCACGTWNYS
                      ANYGACGTWNYS

                      ANYSACGTANYS
                      ANYSACGTTNYS

                      ANYSACGTWAYS
                      ANYSACGTWCYS
                      ANYSACGTWGYS
                      ANYSACGTWTYS

                      ANYSACGTWNCS
                      ANYSACGTWNTS

                      ANYSACGTWNYC
                      ANYSACGTWNYG

                      TAYSACGTWNYS
                      TCYSACGTWNYS
                      TGYSACGTWNYS
                      TTYSACGTWNYS

                      .
                      .
                      .

                      OR

                      Are you saying that the possible combinations can include characters that stay the same, another words are not replaced?

                      Now, let me just say no matter what, this can be done, just approach it calmly. After all, if you had just paper and as much time as you needed, you could do them all by hand right? That means you can tell a computer how to do it.

                      -blazed
                      First of all, may I ask what language you do speak fluently? Just curious.
                      =>My mother language is Korean.

                      Are you saying that the program asks how many replaceable characters you want changed?
                      =>No (At present), But I guess I need it later.

                      Are you saying that the possible combinations can include characters that stay the same, another words are not replaced?
                      =>Actually output values must consist of A,C,G and T not ambiguous letters such as W,N,S and etc. That's why I replaced ambiguous letters with A,C,G or T first and then I planed to replace them.
                      For examples:
                      before:WSNYNYNN WS,
                      after: ACACACAAAC

                      you could do them all by hand right?
                      =>No, I could not do that by hand because the number of possibilities is up to 16.000 when size of input is only 10.

                      Computer Output: Good, now how many characters can I replace?
                      User Input: 2
                      =>How about 5 or 6? I think I can make some logic for 1 or 2, but I don't think I'm able to make some logic for more than 3.

                      I will use all output values as input in other program.

                      I hope this can help you understand more.

                      Thank you for your asking.
                      Alex.

                      Comment

                      • blazedaces
                        Contributor
                        • May 2007
                        • 284

                        #12
                        Originally posted by Alex72
                        First of all, may I ask what language you do speak fluently? Just curious.
                        =>My mother language is Korean.

                        Are you saying that the program asks how many replaceable characters you want changed?
                        =>No (At present), But I guess I need it later.

                        Are you saying that the possible combinations can include characters that stay the same, another words are not replaced?
                        =>Actually output values must consist of A,C,G and T not ambiguous letters such as W,N,S and etc. That's why I replaced ambiguous letters with A,C,G or T first and then I planed to replace them.
                        For examples:
                        before:WSNYNYNN WS,
                        after: ACACACAAAC

                        you could do them all by hand right?
                        =>No, I could not do that by hand because the number of possibilities is up to 16.000 when size of input is only 10.

                        Computer Output: Good, now how many characters can I replace?
                        User Input: 2
                        =>How about 5 or 6? I think I can make some logic for 1 or 2, but I don't think I'm able to make some logic for more than 3.

                        I will use all output values as input in other program.

                        I hope this can help you understand more.

                        Thank you for your asking.
                        Alex.
                        Well then, if I'm understanding correctly, the input is only of characters W, N, Y, or S and CANNOT be A, C, G, or T

                        Another example, you tell me how close to true this is (I'll use very small numbers)

                        Computer Output: What sequence of DNA am I spitting out combinations for?

                        User Input: WNYS

                        Computer Output: The combinations have been stored (32 Possible combinations), they are:

                        AACC
                        AACG

                        AATC
                        AATG

                        ACCC
                        ACCG
                        ACTC
                        ACTG

                        AGCC
                        AGCG
                        AGTC
                        AGTG

                        ATCC
                        ATCG
                        ATTC
                        ATTG


                        TACC
                        TACG

                        TATC
                        TATG

                        TCCC
                        TCCG
                        TCTC
                        TCTG

                        TGCC
                        TGCG
                        TGTC
                        TGTG

                        TTCC
                        TTCG
                        TTTC
                        TTTG

                        Is this the correct way of "doing it by hand" exactly with that input, would this be the output?

                        -blazed

                        Comment

                        • Alex72
                          New Member
                          • May 2007
                          • 7

                          #13
                          Originally posted by blazedaces
                          Well then, if I'm understanding correctly, the input is only of characters W, N, Y, or S and CANNOT be A, C, G, or T

                          Another example, you tell me how close to true this is (I'll use very small numbers)

                          Computer Output: What sequence of DNA am I spitting out combinations for?

                          User Input: WNYS

                          Computer Output: The combinations have been stored (32 Possible combinations), they are:

                          AACC
                          AACG

                          AATC
                          AATG

                          ACCC
                          ACCG
                          ACTC
                          ACTG

                          AGCC
                          AGCG
                          AGTC
                          AGTG

                          ATCC
                          ATCG
                          ATTC
                          ATTG


                          TACC
                          TACG

                          TATC
                          TATG

                          TCCC
                          TCCG
                          TCTC
                          TCTG

                          TGCC
                          TGCG
                          TGTC
                          TGTG

                          TTCC
                          TTCG
                          TTTC
                          TTTG

                          Is this the correct way of "doing it by hand" exactly with that input, would this be the output?

                          -blazed
                          Is this the correct way of "doing it by hand" exactly with that input, would this be the output?
                          =>Yes, you're right.
                          So can you make logic for that in case size of input data is up to 100?

                          if I'm understanding correctly, the input is only of characters W, N, Y, or S and CANNOT be A, C, G, or T
                          =>Actually, Input data consists of all letters but you can assume like that. Because I'll extract only letters from input data except A,C,G and T, then I'll make new values by combining with original input data.

                          Even though I've been spending two days on this problem I still don't know how I approach it.

                          Thank you.
                          Alex

                          Comment

                          • blazedaces
                            Contributor
                            • May 2007
                            • 284

                            #14
                            Originally posted by Alex72
                            Is this the correct way of "doing it by hand" exactly with that input, would this be the output?
                            =>Yes, you're right.
                            So can you make logic for that in case size of input data is up to 100?

                            if I'm understanding correctly, the input is only of characters W, N, Y, or S and CANNOT be A, C, G, or T
                            =>Actually, Input data consists of all letters but you can assume like that. Because I'll extract only letters from input data except A,C,G and T, then I'll make new values by combining with original input data.

                            Even though I've been spending two days on this problem I still don't know how I approach it.

                            Thank you.
                            Alex
                            Well, now that I know what you're doing, yes, I've noticed a pattern. Here, let me try and show you:

                            While I was typing out the pattern of those letter output combinations, did you notice I had spaces between certain places? Here, take a look:
                            Code:
                            AACC
                            AACG
                                           <--One Here
                            AATC
                            AATG
                                           <--One Here
                            ACCC
                            ACCG
                            ACTC
                            ACTG
                                           <--One Here
                            AGCC
                            AGCG
                            AGTC
                            AGTG
                                           <--One Here
                            ATCC
                            ATCG
                            ATTC
                            ATTG
                                           <--Two Here
                                           <--Two Here
                            TACC
                            TACG
                                           <--One Here
                            TATC
                            TATG
                                           <--One Here
                            TCCC
                            TCCG
                            TCTC
                            TCTG
                                           <--One Here
                            TGCC
                            TGCG
                            TGTC
                            TGTG
                                           <--One Here
                            TTCC
                            TTCG
                            TTTC
                            TTTG
                            I did that naturally, without noticing it... till I took a good look. Why you might ask? Well, I didn't want to type out EVERY SINGLE LETTER. I'm lazy, so I copied and pasted parts, but to keep track of course I put spaces between parts I copied. Try and guess what parts I copied first (it'll help you to understand the logic).

                            Now, do you start to see a small pattern like I do? I didn't just randomly change letters one at a time till I luckily got 32 combinations, there was a pattern, yes?

                            Let's take another look, this time I'll point out what kind of pattern I can see.

                            Here, check it out:

                            Code:
                            AACC
                            AACG   <-- Same thing as last one, just change the 4th letter
                            
                            AATC     <
                            AATG     <These two are the same as above, but only 3rd letter has changed
                            
                            ACCC    <
                            ACCG    <
                            ACTC     <
                            ACTG     <These 4 are the same as above 4, but 2nd letter has changed
                            
                            AGCC    <
                            AGCG    <
                            AGTC     <
                            AGTG     <Again, for another possible combo of 2nd letter
                            
                            ATCC     <
                            ATCG     <
                            ATTC      <
                            ATTG      <Again, for another possible combo of 2nd letter
                             
                                            NOW, everything is the same as above except first letter has changed
                            TACC   
                            TACG
                            
                            TATC
                            TATG
                            
                            TCCC
                            TCCG
                            TCTC
                            TCTG
                            
                            TGCC
                            TGCG
                            TGTC
                            TGTG
                            
                            TTCC
                            TTCG
                            TTTC
                            TTTG
                            Tell me if you see the same pattern. If so, if this is the way I did it (by copying and pasting parts, and changing others) do you see a possible solution forming?

                            If so, please try and put together some logic or pseudocode. I hope we're getting somewhere...

                            -blazed

                            Comment

                            • blazedaces
                              Contributor
                              • May 2007
                              • 284

                              #15
                              Just wanted to say, not to leave you hanging, I have to leave the computer and won't be back till many hours later tonight.

                              Good luck, I'll try to help you when I get back if you don't finish...

                              -blazed

                              Comment

                              Working...