Splitting a file into multiple file based on some pattern

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rokstar24
    New Member
    • Aug 2014
    • 7

    Splitting a file into multiple file based on some pattern

    the text file is like:
    shfhsgfkshkjg
    gkjsfkgkjfgkfg
    model1
    lkgjhllghfjgh
    kjfgkjjfghg
    endmodel
    model2
    jfhkjhcgbkcjg
    xhbgkxfkgh
    endmodel

    i want between each model and endmodel ,what is the text should be in the new file.and file name should be like model1,model2.. ..model may be 100 or more.please help me.
  • sicarie
    Recognized Expert Specialist
    • Nov 2006
    • 4677

    #2
    Between the two, it looks like "model" is the same between. You could either write a regex or parse the file for matching characters.

    Comment

    • rokstar24
      New Member
      • Aug 2014
      • 7

      #3
      but how to write for making the new text file each time.

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        The file could be written something like:
        Code:
        fn = "model1"
        f = open(fn, "w")
        f.write("\n".join(content))
        f.close()
        "content" would be a list of the text items between "modelX" and "endmodel".

        A possible regex pattern could be:
        Code:
        import re
        patt = re.compile(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)
        Then the file names and content could be extracted to a list of lists:
        Code:
        contents = patt.findall(s)

        Comment

        • rokstar24
          New Member
          • Aug 2014
          • 7

          #5
          bro does it give 100 FILE if 100 models are there...

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            rokstar24,

            Yes, if the models are uniquely identified.

            Comment

            • rokstar24
              New Member
              • Aug 2014
              • 7

              #7
              can anyone give me proper code which will run directly..becau se i m trying but i am not getting it.

              Comment

              • rokstar24
                New Member
                • Aug 2014
                • 7

                #8
                Code:
                f=open("E:\pra.txt","r+")
                
                
                import re,sys
                k= re.findall(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",f.read())
                
                count = 1
                fwrite = open("filename%s" %(count), 'w')
                for line in f:
                    if k in line:
                        # close open file object, increment count, open new file object
                        
                        count += 1
                        fwrite = open("filename%s" %(count), 'w')
                        fwrite.write(k)
                fwrite.close()
                f.close()
                i have written this..tell me where i m wrong??
                Last edited by bvdet; Aug 12 '14, 12:23 PM. Reason: Add code tags

                Comment

                • rokstar24
                  New Member
                  • Aug 2014
                  • 7

                  #9
                  how to make this k to be written in new file..because it is showing k is not string.

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #10
                    You are right - 'k' is not a string. It is a list of lists, and you have to get the appropriate individual elements, which are strings, and write them to disk. 'k' may look something like this:
                    Code:
                    [('model1', 'lkgjhllghfjgh\nkjfgkjjfghg\n'), ('model2', 'jfhkjhcgbkcjg\nxhbgkxfkgh\n')]

                    Comment

                    • dwblas
                      Recognized Expert Contributor
                      • May 2008
                      • 626

                      #11
                      k= re.findall(r"(m odel\d+)\n([0-9a-zA-Z\n]+?)endmodel",f. read())

                      and
                      if k in line:

                      are redundant. Forget the findall and just use if "model" in line, or since the line starts with "model" and we don't want "endmodel"
                      if line.startswith (model)

                      Code:
                      test_data = """the text file is like:
                       shfhsgfkshkjg
                       gkjsfkgkjfgkfg
                       model1
                       lkgjhllghfjgh
                       kjfgkjjfghg
                       endmodel
                       model2
                       jfhkjhcgbkcjg
                       xhbgkxfkgh
                       endmodel"""
                      
                      file_input = test_data.split("\n")
                      model_list = []
                      ctr = 0
                      for line in file_input:
                          line = line.strip()
                          if line.startswith("model"):
                              if ctr:     ## first group is junk & not copied
                                  with open(model_list[0], "w") as fp_out:
                                      for rec in model_list:
                                          fp_out.write("%s \n" % (rec))
                              ctr += 1
                              model_list = []
                          model_list.append(line)
                      
                      ## final list
                      with open(model_list[0], "w") as fp_out:
                          for rec in model_list:
                              fp_out.write("%s \n" % (rec))

                      Comment

                      • rokstar24
                        New Member
                        • Aug 2014
                        • 7

                        #12
                        can anyone tell
                        if i have to give if condition for
                        Code:
                        model        1
                        means 8 spaces exactly are between model and 1.then what should i do exactly to give condition for finding it.
                        Last edited by bvdet; Aug 13 '14, 11:44 AM. Reason: Add code tags so spaces will display

                        Comment

                        • bvdet
                          Recognized Expert Specialist
                          • Oct 2006
                          • 2851

                          #13
                          Assuming you want to use the regex solution:
                          Code:
                          patt = re.compile(r"(model *\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)

                          Comment

                          • dwblas
                            Recognized Expert Contributor
                            • May 2008
                            • 626

                            #14
                            Or use split and join which is also a "standard" way.
                            Code:
                            test_string="model        1"
                            print test_string
                            print  "becomes", "".join(test_string.split())

                            Comment

                            Working...