Cron job to remove logically redundant entries in Postgres SQL

Abhilash28coder

New Member

Join Date: Jun 2019

Posts: 1
#1

Cron job to remove logically redundant entries in Postgres SQL

Jun 14 '19, 01:18 PM

I have a requirement to delete records from a Postgres SQL table which has more than 200 million records. The table is not having any primary key.

The sample table (Bookmark is the name of table) content is as below:

Code:

systemId filename mindatetime maxdatetime 70277 monitor_1.dat 2019-04-21 08:00:00 AM 2019-04-21 03:10:00 PM 10006 monitor_2.dat 2019-04-25 10:00:00 AM 2019-04-25 11:30:00 AM 10006 monitor_3.dat 2019-04-28 08:00:00 AM 2019-04-28 10:00:00 AM 10006 monitor_3.dat 2019-04-28 09:00:00 AM 2019-04-28 11:00:00 AM 10006 monitor_3.dat 2019-04-28 07:00:00 AM 2019-04-28 04:00:00 PM 8368 monitor_1.dat 2019-05-21 11:00:00 AM 2019-05-21 11:30:00 AM 8368 monitor_7.dat 2019-05-21 06:00:00 AM 2019-05-21 11:00:00 AM 8368 monitor_5.dat 2019-05-23 08:00:00 AM 2019-05-23 10:00:00 AM

The cron job should run on a given schedule to delete the records which are logically redundant.

To explain this let's take the case of systemId '10006' where filename is 'monitor_3.dat' having 3 entries with min and max date timestamp of the same day.

Logically we can delete the entries having mindatetime 08:00:00 AM and 09:00:00 AM, maxdatetime 10:00:00 AM, 11:00:00 AM as that interval is being covered by the other entry which has mindatetime as 7 AM and maxdatetime as 4 PM.

So those entries would fall under this interval and the job should identify such entries in the entire table and delete them.

My resultant output table content in this case should be:

Code:

systemId filename mindatetime maxdatetime 70277 monitor_1.dat 2019-04-21 08:00:00 AM 2019-04-21 03:10:00 PM 10006 monitor_2.dat 2019-04-25 10:00:00 AM 2019-04-25 11:30:00 AM 10006 monitor_3.dat 2019-04-28 07:00:00 AM 2019-04-28 04:00:00 PM 8368 monitor_1.dat 2019-05-21 11:00:00 AM 2019-05-21 11:30:00 AM 8368 monitor_7.dat 2019-05-21 06:00:00 AM 2019-05-21 11:00:00 AM 8368 monitor_5.dat 2019-05-23 08:00:00 AM 2019-05-23 10:00:00 AM

The table size is more than 20Gb on disk so I was exploring writing a sql procedure or job to achieve this but not able to make much progress. Any ideas or suggestions for overcoming this complex scenario?

Last edited by zmbd; Jun 14 '19, 03:41 PM. Reason: [z{Added [CODE/] format to hold the example table spacing :-) }]
Tags: None
Rabbit

Recognized Expert MVP

Join Date: Jan 2007

Posts: 12517
#2

Jun 14 '19, 08:10 PM

What happens if they only partially overlap? What happens if they fully overlap but over two different entries? What happens if they overlap but the start and end are the same? You need to clearly define the requirements otherwise you're going to run into trouble down the road.

Whatever the case may be, the answer will probably be to join the table to itself to find overlapping entries. How you formulate that join will depend on what you need to happen in the scenarios above.
Comment

Cron job to remove logically redundant entries in Postgres SQL

Cron job to remove logically redundant entries in Postgres SQL

Comment