Printing duplicates from a text file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pystarter
    New Member
    • May 2012
    • 6

    Printing duplicates from a text file

    If i have a text file 'test.txt' containing:

    1
    1
    1
    2
    2
    3

    I want it to print all the lines that are duplicates/matches.

    So I would expect an output of

    1
    1
    1
    2
    2

    I was thinking something along the lines of:
    Code:
    test = open("test.txt", 'r')
    
    for x in test:
         for y in test:
             if x==y:
                print x
                print y
    But it doesn't work.

    Please can someone advise?
    Last edited by bvdet; May 22 '12, 02:18 PM. Reason: Add code tags
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Read the file and compile a list of the items in the file.
    Iterate on the list.
    Use list method count to determine if there are duplicates.

    Example:
    Code:
    >>> seq = ['1', '1', '2', '2', '2', '3']
    >>> for item in seq:
    ... 	if seq.count(item) > 1:
    ... 		print item
    ... 		
    1
    1
    2
    2
    2
    >>>

    Comment

    • dwblas
      Recognized Expert Contributor
      • May 2008
      • 626

      #3
      For larger files, store the records found in a set so you only iterate once.
      Code:
      x_set=set()
      for x in test:
          x=x.strip()
          if x in x_set:
              print x
          else:
              x_set.add(x)
      The code you posted would iterate over the list once for each successive record and would be written:
      Code:
      test = open("test.txt", 'r')
      
      all_recs=test.readlines()
      test.close()
      
      for ctr in range(len(all_recs)):
          x = all_recs[ctr].strip()
          for y in range(ctr+1, len(all_recs):  ## start with the next record
              if x==all_recs[y].strip():
                  print y, all_recs[y]

      Comment

      • pystarter
        New Member
        • May 2012
        • 6

        #4
        Thanks guys - great help!

        Comment

        Working...