Let's say a input text file "input_msg. txt" file ( file size is 70,000 kb ) contains following records..
Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:56 python is good scripting language
The expected output file ( Let's say outputfile.txt ) should contain below records...
Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Note: I need all the records (including duplicate) which are starting with "Jan 1" and also I don't need Duplicate records not starting with "Jan 1"
I have tried the following program where all the duplicate records are getting deleted.
Oputput of my program are below:
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Your help would be appreciated!!!
Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:56 python is good scripting language
The expected output file ( Let's say outputfile.txt ) should contain below records...
Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Note: I need all the records (including duplicate) which are starting with "Jan 1" and also I don't need Duplicate records not starting with "Jan 1"
I have tried the following program where all the duplicate records are getting deleted.
Code:
def remove_Duplicate_Lines(inputfile, outputfile): with open(inputfile) as fin, open(outputfile, 'w') as out: lines = (line.rstrip() for line in fin) unique_lines = OrderedDict.fromkeys( (line for line in lines if line) ) out.writelines("\n".join(unique_lines.iterkeys())) return 0
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Your help would be appreciated!!!
Comment