Iterating over a string

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • moconno5
    New Member
    • Jul 2007
    • 19

    Iterating over a string

    Hi everybody,

    Does anyone know if the adict.has_key(k ) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

    String example:

    browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
    browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

    Dictionary example:

    'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU'
    'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU'

    What I want:

    browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
    etc.

    Thanks,

    Mark
  • ilikepython
    Recognized Expert Contributor
    • Feb 2007
    • 844

    #2
    Originally posted by moconno5
    Hi everybody,

    Does anyone know if the adict.has_key(k ) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

    String example:

    browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
    browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

    Dictionary example:

    'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU'
    'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU'

    What I want:

    browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
    etc.

    Thanks,

    Mark
    I'm not sure if this is exactly what you need:
    [code=python]
    import re
    patt = re.compile("Mus musculuslet-..")

    teststr = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21"
    match = patt.findall(te ststr)

    if match:
    if adict.has_key(m atch[0]):
    ind = teststr.index(m atch[0])
    finalstring = "%s%s%s" % (teststr[:ind+len(match[0])], adict[match[0]], teststr[ind+len(match[0]):])
    [/code]

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #3
      Originally posted by moconno5
      Hi everybody,

      Does anyone know if the adict.has_key(k ) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

      String example:

      browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
      browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

      Dictionary example:

      'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU'
      'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU'

      What I want:

      browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
      etc.

      Thanks,

      Mark
      Following are a couple of ways:[code=Python]print dd
      import re

      s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21"
      patt = re.compile(r'Mu smusculuslet-[0-9a-z]+|Musmusculusmi R-\d+')
      strList = patt.findall(s1 )
      s2 = s1
      for item in strList:
      if dd.has_key(item ):
      s2 = s2.replace(item , '%s %s' % (item, dd[item]))

      print s2

      print

      s3 = s1
      for key in dd:
      if key in s3:
      s3 = s3.replace(key, '%s %s' % (key, dd[key]))

      print s3[/code]Output:
      >>> {'Musmusculusmi R-1': 'UGGAAUGUAAAGAA GUAUGUA', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU'}
      browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
      browser details Musmusculuslet-7i UGAGGUAGUAGUUUG UGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21

      browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
      browser details Musmusculuslet-7i UGAGGUAGUAGUUUG UGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
      >>>

      Comment

      • moconno5
        New Member
        • Jul 2007
        • 19

        #4
        Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
        MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

        browser details MusmusculusmiR-1 UGGAAUGUAAAGAAG UAUGUA46b UGAGAACUGAAUUCC AUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

        when the original string is:

        browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

        is there a way to prevent this?

        my full list of keys is:
        ['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p', 'MusmusculusmiR-802', 'MusmusculusmiR-801', 'MusmusculusmiR-805', 'MusmusculusmiR-804', 'MusmusculusmiR-216b', 'MusmusculusmiR-667', 'MusmusculusmiR-666', 'MusmusculusmiR-665', 'MusmusculusmiR-741', 'MusmusculusmiR-742', 'MusmusculusmiR-668', 'MusmusculusmiR-744', 'MusmusculusmiR-1401', 'MusmusculusmiR-34a', 'MusmusculusmiR-34b', 'MusmusculusmiR-34c', 'MusmusculusmiR-592', 'MusmusculusmiR-455-5p', 'MusmusculusmiR-698', 'MusmusculusmiR-376a1', 'MusmusculusmiR-344', 'MusmusculusmiR-697', 'MusmusculusmiR-694', 'MusmusculusmiR-695', 'MusmusculusmiR-340', 'MusmusculusmiR-341', 'MusmusculusmiR-342', 'MusmusculusmiR-691', 'MusmusculusmiR-542-5p', 'MusmusculusmiR-764-5p', 'MusmusculusmiR-122a', 'MusmusculusmiR-142-5p', 'MusmusculusmiR-449', 'MusmusculusmiR-448', 'MusmusculusmiR-23a', 'MusmusculusmiR-23b', 'MusmusculusmiR-6741', 'MusmusculusmiR-135b', 'MusmusculusmiR-135a', 'MusmusculusmiR-301b', 'MusmusculusmiR-129-5p', 'MusmusculusmiR-30b', 'MusmusculusmiR-30c', 'MusmusculusmiR-30d', 'MusmusculusmiR-30e', 'MusmusculusmiR-292-3p', 'MusmusculusmiR-713', 'MusmusculusmiR-499', 'MusmusculusmiR-711', 'MusmusculusmiR-710', 'MusmusculusmiR-717', 'MusmusculusmiR-715', 'MusmusculusmiR-714', 'MusmusculusmiR-490', 'MusmusculusmiR-491', 'MusmusculusmiR-719', 'MusmusculusmiR-718', 'MusmusculusmiR-494', 'MusmusculusmiR-495', 'MusmusculusmiR-496', 'MusmusculusmiR-497', 'MusmusculusmiR-297b', 'MusmusculusmiR-485-5p', 'MusmusculusmiR-300', 'MusmusculusmiR-301', 'MusmusculusmiR-302', 'MusmusculusmiR-422b', 'MusmusculusmiR-33', 'MusmusculusmiR-32', 'MusmusculusmiR-31', 'MusmusculusmiR-181d', 'MusmusculusmiR-27a', 'MusmusculusmiR-27b', 'MusmusculusmiR-450b1', 'MusmusculusmiR-551b', 'MusmusculusmiR-302b1', 'MusmusculusmiR-155', 'MusmusculusmiR-154', 'MusmusculusmiR-151', 'MusmusculusmiR-150', 'MusmusculusmiR-153', 'MusmusculusmiR-152', 'MusmusculusmiR-409', 'MusmusculusmiR-470', 'MusmusculusmiR-471', 'MusmusculusmiR-15a', 'MusmusculusmiR-15b', 'MusmusculusmiR-675-3p', 'MusmusculusmiR-712', 'MusmusculusmiR-199a', 'MusmusculusmiR-199b', 'MusmusculusmiR-148b', 'MusmusculusmiR-148a', 'MusmusculusmiR-615', 'MusmusculusmiR-759', 'MusmusculusmiR-758', 'MusmusculusmiR-30e1', 'MusmusculusmiR-374-3p', 'MusmusculusmiR-291a-5p', 'MusmusculusmiR-488', 'MusmusculusmiR-689', 'MusmusculusmiR-688', 'MusmusculusmiR-685', 'MusmusculusmiR-684', 'MusmusculusmiR-687', 'MusmusculusmiR-686', 'MusmusculusmiR-681', 'MusmusculusmiR-680', 'MusmusculusmiR-683', 'MusmusculusmiR-682', 'MusmusculusmiR-351', 'MusmusculusmiR-350', 'MusmusculusmiR-720', 'MusmusculusmiR-721', 'MusmusculusmiR-4671', 'MusmusculusmiR-181a1', 'MusmusculusmiR-7b', 'MusmusculusmiR-130a', 'MusmusculusmiR-130b', 'MusmusculusmiR-4881', 'MusmusculusmiR-380-5p', 'MusmusculusmiR-127', 'MusmusculusmiR-467b', 'MusmusculusmiR-467a', 'MusmusculusmiR-431', 'MusmusculusmiR-291b-5p', 'MusmusculusmiR-532', 'MusmusculusmiR-539', 'MusmusculusmiR-128a', 'MusmusculusmiR-128b', 'MusmusculusmiR-543', 'MusmusculusmiR-540', 'MusmusculusmiR-542-3p', 'MusmusculusmiR-546', 'MusmusculusmiR-547', 'MusmusculusmiR-223', 'MusmusculusmiR-222', 'MusmusculusmiR-693-5p', 'MusmusculusmiR-224', 'MusmusculusmiR-91', 'MusmusculusmiR-93', 'MusmusculusmiR-92', 'MusmusculusmiR-96', 'MusmusculusmiR-98', 'MusmusculusmiR-99b', 'MusmusculusmiR-17-5p', 'MusmusculusmiR-434-3p', 'MusmusculusmiR-770-3p', 'MusmusculusmiR-763', 'MusmusculusmiR-489', 'MusmusculusmiR-761', 'MusmusculusmiR-486', 'MusmusculusmiR-484', 'MusmusculusmiR-483', 'MusmusculusmiR-652', 'MusmusculusmiR-21', 'MusmusculusmiR-22', 'MusmusculusmiR-24', 'MusmusculusmiR-25', 'MusmusculusmiR-146b', 'MusmusculusmiR-28', 'MusmusculusmiR-362', 'MusmusculusmiR-363', 'MusmusculusmiR-361', 'MusmusculusmiR-367', 'MusmusculusmiR-365', 'MusmusculusmiR-302c1', 'MusmusculusmiR-692', 'MusmusculusmiR-182', 'MusmusculusmiR-183', 'MusmusculusmiR-186', 'MusmusculusmiR-187', 'MusmusculusmiR-184', 'MusmusculusmiR-185', 'MusmusculusmiR-324-3p', 'MusmusculusmiR-188', 'MusmusculusmiR-124a', 'MusmusculusmiR-463', 'MusmusculusmiR-464', 'MusmusculusmiR-466', 'MusmusculusmiR-469', 'MusmusculusmiR-468', 'MusmusculusmiR-505', 'MusmusculusmiR-503', 'MusmusculusmiR-500', 'MusmusculusmiR-501', 'MusmusculusmiR-212', 'MusmusculusmiR-210', 'MusmusculusmiR-211', 'MusmusculusmiR-26b', 'MusmusculusmiR-26a', 'MusmusculusmiR-215', 'MusmusculusmiR-218', 'MusmusculusmiR-219', 'MusmusculusmiR-465-3p', 'MusmusculusmiR-376a', 'MusmusculusmiR-376b', 'MusmusculusmiR-376c', 'MusmusculusmiR-369-5p', 'MusmusculusmiR-133a', 'MusmusculusmiR-133b', 'MusmusculusmiR-6761', 'MusmusculusmiR-9', 'MusmusculusmiR-129-3p', 'MusmusculusmiR-1', 'MusmusculusmiR-7', 'MusmusculusmiR-675-5p', 'MusmusculusmiR-101a', 'MusmusculusmiR-101b', 'MusmusculusmiR-217', 'MusmusculusmiR-214', 'MusmusculusmiR-699', 'MusmusculusmiR-326', 'MusmusculusmiR-696', 'MusmusculusmiR-325', 'MusmusculusmiR-322', 'MusmusculusmiR-323', 'MusmusculusmiR-320', 'MusmusculusmiR-345', 'MusmusculusmiR-346', 'MusmusculusmiR-328', 'MusmusculusmiR-329', 'MusmusculusmiR-18', 'MusmusculusmiR-764-3p', 'MusmusculusmiR-16', 'MusmusculusmiR-690', 'MusmusculusmiR-429', 'MusmusculusmiR-425', 'MusmusculusmiR-424', 'MusmusculusmiR-423', 'MusmusculusmiR-132', 'MusmusculusmiR-137', 'MusmusculusmiR-136', 'MusmusculusmiR-134', 'MusmusculusmiR-139', 'MusmusculusmiR-138', 'MusmusculusmiR-30a-3p', 'MusmusculusmiR-541', 'MusmusculusmiR-199a1', 'MusmusculusmiR-291b-3p', 'MusmusculusmiR-221', 'MusmusculusmiR-292-5p', 'MusmusculusmiR-450b', 'MusmusculusmiR-455-3p', 'MusmusculusmiR-181b', 'MusmusculusmiR-708', 'MusmusculusmiR-709', 'MusmusculusmiR-704', 'MusmusculusmiR-705', 'MusmusculusmiR-376b1', 'MusmusculusmiR-706', 'MusmusculusmiR-291a-3p', 'MusmusculusmiR-700', 'MusmusculusmiR-701', 'MusmusculusmiR-485-3p', 'MusmusculusmiR-678', 'MusmusculusmiR-679', 'MusmusculusmiR-674', 'MusmusculusmiR-676', 'MusmusculusmiR-677', 'MusmusculusmiR-670', 'MusmusculusmiR-671', 'MusmusculusmiR-672', 'MusmusculusmiR-673', 'MusmusculusmiR-19a', 'MusmusculusmiR-19b', 'MusmusculusmiR-379', 'MusmusculusmiR-378', 'MusmusculusmiR-29b', 'MusmusculusmiR-370', 'MusmusculusmiR-29a', 'MusmusculusmiR-375', 'MusmusculusmiR-377', 'MusmusculusmiR-10b', 'MusmusculusmiR-10a', 'MusmusculusmiR-487b', 'MusmusculusmiR-702', 'MusmusculusmiR-191', 'MusmusculusmiR-190', 'MusmusculusmiR-193', 'MusmusculusmiR-192', 'MusmusculusmiR-195', 'MusmusculusmiR-194', 'MusmusculusmiR-380-3p', 'MusmusculusmiR-450', 'MusmusculusmiR-451', 'MusmusculusmiR-452', 'MusmusculusmiR-126-3p', 'MusmusculusmiR-103', 'MusmusculusmiR-100', 'MusmusculusmiR-107', 'MusmusculusmiR-133a1', 'MusmusculusmiR-298', 'MusmusculusmiR-299', 'MusmusculusmiR-293', 'MusmusculusmiR-290', 'MusmusculusmiR-296', 'MusmusculusmiR-297', 'MusmusculusmiR-294', 'MusmusculusmiR-295', 'MusmusculusmiR-743', 'MusmusculusmiR-201', 'MusmusculusmiR-203', 'MusmusculusmiR-202', 'MusmusculusmiR-205', 'MusmusculusmiR-204', 'MusmusculusmiR-207', 'MusmusculusmiR-206', 'MusmusculusmiR-208', 'MusmusculusmiR-433-5p', 'MusmusculusmiR-693-3p', 'Musmusculuslet-7d1', 'MusmusculusmiR-125b', 'MusmusculusmiR-125a', 'MusmusculusmiR-381', 'MusmusculusmiR-99a', 'MusmusculusmiR-434-5p', 'MusmusculusmiR-17-3p', 'MusmusculusmiR-5011', 'MusmusculusmiR-374-5p', 'MusmusculusmiR-465-5p', 'MusmusculusmiR-142-3p', 'MusmusculusmiR-20a', 'MusmusculusmiR-20b', 'MusmusculusmiR-146', 'MusmusculusmiR-144', 'MusmusculusmiR-335', 'MusmusculusmiR-181a', 'MusmusculusmiR-337', 'MusmusculusmiR-181c', 'MusmusculusmiR-331', 'MusmusculusmiR-330', 'MusmusculusmiR-669c', 'MusmusculusmiR-669b', 'MusmusculusmiR-669a', 'MusmusculusmiR-707', 'MusmusculusmiR-339', 'MusmusculusmiR-338', 'MusmusculusmiR-369-3p', 'MusmusculusmiR-703', 'MusmusculusmiR-302c', 'MusmusculusmiR-302b', 'MusmusculusmiR-141', 'MusmusculusmiR-302d', 'Musmusculuslet-7b', 'Musmusculuslet-7c', 'Musmusculuslet-7a', 'Musmusculuslet-7f', 'Musmusculuslet-7g', 'Musmusculuslet-7d', 'Musmusculuslet-7e', 'Musmusculuslet-7i', 'MusmusculusmiR-449b', 'MusmusculusmiR-382', 'MusmusculusmiR-383', 'MusmusculusmiR-384', 'MusmusculusmiR-410', 'MusmusculusmiR-411', 'MusmusculusmiR-412', 'MusmusculusmiR-145', 'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

        thanks,

        Mark

        Comment

        • ilikepython
          Recognized Expert Contributor
          • Feb 2007
          • 844

          #5
          Originally posted by moconno5
          Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
          MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

          browser details MusmusculusmiR-1 UGGAAUGUAAAGAAG UAUGUA46b UGAGAACUGAAUUCC AUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

          when the original string is:

          browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

          is there a way to prevent this?

          my full list of keys is:
          ['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p',
          <CLIPPED>
          'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

          thanks,

          Mark
          This is similar to Bv's second way:
          [code=python]
          teststr = "browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22"
          words = teststr.split()

          key = words[2] # will the key always be the second word?
          if key in adict.keys():
          finalstring = teststr.replace (key, "%s %s" % (key, adict[key])
          [/code]
          If the key is not always the second word you could check every word if there is only one key per string.

          Comment

          • moconno5
            New Member
            • Jul 2007
            • 19

            #6
            I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

            Mark

            Comment

            • ilikepython
              Recognized Expert Contributor
              • Feb 2007
              • 844

              #7
              Originally posted by moconno5
              I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

              Mark
              I'm not really sure what you mean. Are you checking each string more than once? Everytime you finish formatting a string you can append it to a list and the next time, if it is in the list, don't format it. I don't think you should have a problem with matching the wrong key. Could you post the code you used?

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #8
                Originally posted by moconno5
                I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

                Mark
                Try this regex solution to see if it works for you. It matches the empty string at the beginning or end of a word. Then the string is split on the space character and should replace only on a full match:[code=Python]print dd

                import re

                s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
                patt = re.compile(r''' \bMusmusculusle t-[0-9a-z]+\b # Matches "Musmusculu slet-" followed by alphanumeric
                # characters at word borderlines
                |\bMusmusculusm iR-[0-9a-z\-]+\b # Matches "Musmusculu smiR-" followed by alphanumeric
                # characters or dashes at word borderlines
                ''', re.VERBOSE)

                strList = patt.findall(s1 )
                s2 = s1
                for item in strList:
                if dd.has_key(item ):
                s2List = s2.split(' ')
                idx = s2List.index(it em)
                s2List[idx] = '%s %s' % (item, dd[item])
                s2 = ' '.join(s2List)

                print s2[/code]Output:
                >>> {'Musmusculusmi R-1': 'UGGAAUGUAAAGAA GUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUU GUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAA GUAUGUA'}
                browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
                browser details Musmusculuslet-7i UGAGGUAGUAGUUUG UGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
                browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
                browser details MusmusculusmiR-31 UGGAAUGUAAAGAAG UAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21

                >>>

                Comment

                • moconno5
                  New Member
                  • Jul 2007
                  • 19

                  #9
                  The code I am currently using and still getting the same problem:

                  Code:
                  def EditFile ( s1, dd ):
                      print dd
                      import re
                          patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
                      strList = patt.findall(s1)
                      s2 = s1
                      for item in strList:
                          if dd.has_key(item):
                              s2List = s2.split(' ')
                              idx = s2List.index(item)
                              s2List[idx] = '%s %s' % (item, dd[item]))
                              s2 = ' '.join(s2List)
                      print s2
                  ##      print
                  ##    s3 = s1
                  ##    words = s3.split()
                  ##    key = words[2]
                  ##    for key in dd:
                  ##        if key in s3:
                  ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
                  ##    print s3
                      f = open('editted BLAT Search Results-Mouse.txt', 'w')
                      f.writelines(s2)
                      f.close()
                      return s2
                  It seems to choke on the following matches:

                  MusmusculusmiR-1 is read when it reads MusmusculusmiR-124a, thus it gets written twice with two separate values from two separate keys:

                  UGGAAUGUAAAGAAG UAUGUA24a
                  followed by UAAGGCACGCGGUGA AUGCC

                  The first is the value for key MusmusculusmiR-1(without that 24a that it at the end), the second is the value for key MusmusculusmiR-124a

                  It is also still choking on the following matches:

                  MusmusculusmiR-126-5p (weird, since it doesn't mind MusmusculusmiR-126-3p)
                  MusmusculusmiR-127, MusmusculusmiR-128a, MusmusculusmiR-130, MusmusculusmiR-129-5p
                  and MusmusculusmiR-324-3p because there is a MusmusculusmiR-32.

                  I ran the above code and got the same results I did with the previous code, which is strange. Did I miss something in my transcription? what exactly does the \b do in your code?

                  Mark

                  Comment

                  • moconno5
                    New Member
                    • Jul 2007
                    • 19

                    #10
                    Okay, now I am getting a new error:

                    Traceback (most recent call last):
                    File "<pyshell#2 8>", line 1, in <module>
                    newfile = EditFile ( data, mouse )
                    File "BatchEditor.py ", line 45, in EditFile
                    patt = re.compile(r''' \bMusmusculusle t-[0-9a-z]+\b+|\bMusmuscu lusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
                    File "C:\Python25\li b\re.py", line 180, in compile
                    return _compile(patter n, flags)
                    File "C:\Python25\li b\re.py", line 233, in _compile
                    raise error, v # invalid expression
                    error: nothing to repeat

                    I edited the code, so it is now like this:

                    Code:
                    def EditFile ( s1, dd ):
                        
                        #print dd
                        import re
                        patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
                        strList = patt.findall(s1)
                        s2 = s1
                        print strList
                        for item in strList:
                            if dd.has_key(item):
                                s2List = s2.split(' ')
                                idx = s2List.index(item)
                                s2List[idx] = '%s %s' % (item, dd[item])
                                s2 = ' '.join(s2List)
                        print s2
                    ##      print
                    ##    s3 = s1
                    ##    words = s3.split()
                    ##    key = words[2]
                    ##    for key in dd:
                    ##        if key in s3:
                    ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
                    ##    print s3
                        f = open('editted BLAT Search Results-Mouse.txt', 'w')
                        f.writelines(s2)
                        f.close()
                        return s2
                    I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

                    Mark

                    Comment

                    • bvdet
                      Recognized Expert Specialist
                      • Oct 2006
                      • 2851

                      #11
                      Originally posted by moconno5
                      Okay, now I am getting a new error:

                      Traceback (most recent call last):
                      File "<pyshell#2 8>", line 1, in <module>
                      newfile = EditFile ( data, mouse )
                      File "BatchEditor.py ", line 45, in EditFile
                      patt = re.compile(r''' \bMusmusculusle t-[0-9a-z]+\b+|\bMusmuscu lusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
                      File "C:\Python25\li b\re.py", line 180, in compile
                      return _compile(patter n, flags)
                      File "C:\Python25\li b\re.py", line 233, in _compile
                      raise error, v # invalid expression
                      error: nothing to repeat

                      I edited the code, so it is now like this:

                      Code:
                      def EditFile ( s1, dd ):
                          
                          #print dd
                          import re
                          patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
                          strList = patt.findall(s1)
                          s2 = s1
                          print strList
                          for item in strList:
                              if dd.has_key(item):
                                  s2List = s2.split(' ')
                                  idx = s2List.index(item)
                                  s2List[idx] = '%s %s' % (item, dd[item])
                                  s2 = ' '.join(s2List)
                          print s2
                      ##      print
                      ##    s3 = s1
                      ##    words = s3.split()
                      ##    key = words[2]
                      ##    for key in dd:
                      ##        if key in s3:
                      ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
                      ##    print s3
                          f = open('editted BLAT Search Results-Mouse.txt', 'w')
                          f.writelines(s2)
                          f.close()
                          return s2
                      I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

                      Mark
                      The error you received is caused by an additional '+' character after '\b'. Since '\b' just matches the whitespace between words, there is nothing to repeat.

                      Three single quotes or three double quotes would be correct.

                      Comment

                      • moconno5
                        New Member
                        • Jul 2007
                        • 19

                        #12
                        I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

                        browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUG CAUGCAU AUUGGGAACAUUUUG CAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
                        browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

                        This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

                        Thanks again,
                        Mark

                        Comment

                        • bvdet
                          Recognized Expert Specialist
                          • Oct 2006
                          • 2851

                          #13
                          Originally posted by moconno5
                          I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

                          browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUG CAUGCAU AUUGGGAACAUUUUG CAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
                          browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

                          This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

                          Thanks again,
                          Mark
                          Do you have multiple occurrences of the key 'MusmusculusmiR-450b1' in the string? That would explain the double values. Try this:[code=Python]print dd

                          import re

                          s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
                          patt = re.compile(r''' \bMusmusculusle t-[0-9a-z]+\b # Matches "Musmusculu slet-" followed by alphanumeric
                          # characters at word borderlines
                          |\bMusmusculusm iR-[0-9a-z\-]+\b # Matches "Musmusculu smiR-" followed by alphanumeric
                          # characters or dashes at word borderlines
                          ''', re.VERBOSE)

                          sList = s1.split('\n')
                          outList = []
                          for item in sList:
                          tem = patt.search(ite m)
                          if tem:
                          if dd.has_key(tem. group(0)):
                          item = item.replace(te m.group(0), '%s %s' % (tem.group(0), dd[tem.group(0)]))
                          outList.append( item)

                          s2 = '\n'.join(outLi st)
                          print s2[/code]

                          >>> {'Musmusculusmi R-1': 'UGGAAUGUAAAGAA GUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUU GUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUU GUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUU GUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAA GUAUGUA'}
                          browser details Musmusculuslet-7g UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
                          browser details Musmusculuslet-7i UGAGGUAGUAGUUUG UGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
                          browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUG UACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
                          browser details MusmusculusmiR-31 UGGAAUGUAAAGAAG UAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
                          browser details MusmusculusmiR-31 UGGAAUGUAAAGAAG UAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
                          >>>

                          Comment

                          • moconno5
                            New Member
                            • Jul 2007
                            • 19

                            #14
                            That has done the trick! Thanks for all of the help, I didn't even know about Python having a regex module. Still wondering why the triple quotes, but I will go to python.org and read up on it.

                            Mark

                            Comment

                            Working...