Monday, April 26, 2010

Find and delete duplicate rows by Analytic Function

The intuitive way will be create a temp table, with Min(RowID) and Count(*)>1, then join it back to target table to do the delete.

You can get duplicate rows by Analytic SQL:
SELECT rid, deptno, job, rn
  FROM
  (SELECT /*x parallel(a) */
        ROWID rid, deptno, job,
        ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
   FROM scott.emp a
  )
WHERE rn <> 1;

Get duplicate row count with Count(*) > 0.

SELECT /*x parallel(a,8) */
 MAX(ROWID) rid, deptno, job, COUNT(*)
FROM scott.emp a
GROUP BY deptno, job
HAVING COUNT(*) > 1;

To delete them:
DELETE FROM scott.emp
WHERE ROWID IN
 (
  SELECT rid
    FROM (SELECT /*x parallel(a) */
                 ROWID rid, deptno, job,
                 ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
            FROM scott.emp a)
  WHERE rn <> 1
);

No comments: