LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-10-2019, 04:36 AM   #1
sjpoole12
LQ Newbie
 
Registered: Jan 2019
Posts: 8

Rep: Reputation: Disabled
Newbie - How To ...


I know that this is truly a newbie question, but here goes...

We have created several files that insert into Oracle DB 12c tables a large amount of data. We've got this one table that we are getting many error messages during the load concerning duplicate rows.

I need a method that will allow me to identify these duplicate rows and delete them in case there are other errors.

Is there a way to do this using a bash script?

Sandy
 
Old 04-10-2019, 05:00 AM   #2
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
I don’t know how you perform the insertion, but if you can list all rows, a pipeline consisting of the sort and uniq commands is the easiest solution. For example, to find the duplicates:
Code:
sort FILE-WITH-ROWS | uniq -d >FILE-WITH-DUPS
To delete the duplicates, you don’t even need uniq:
Code:
sort -u FILE-WITH-ROWS > FILE-WITHOUT-DUPS
As a side-effect, FILE-WITHOUT-DUPS contains the rows sorted alphanumerically. If that is not acceptable, heavier weapons are required.

Last edited by berndbausch; 04-10-2019 at 05:04 AM.
 
1 members found this post helpful.
Old 04-10-2019, 10:57 AM   #3
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
If you're getting errors about duplicate rows, the duplicates won't be inserted into the database, BUT "duplicate" to the database means duplicate keys, so there may be new information in the data that is not being updated that should be.
 
Old 04-10-2019, 03:36 PM   #4
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by sjpoole12 View Post
I know that this is truly a newbie question, but here goes...

We have created several files that insert into Oracle DB 12c tables a large amount of data. We've got this one table that we are getting many error messages during the load concerning duplicate rows.
berndbausch's answer will avoid the problem if you errors are about duplicate rows---i.e., where the entire row has been duplicated. Using those tips, you merely clean the data before you try and insert.

Things get more complicated if you're inserting multiple values and only one value turns out to be duplicated in some rows. Example: you have two records comprised of a person's social security number and their name.
Code:
nnn-nn-nnnn James Smith
nnn-nn-nnnn Jim Smith
The SSN value/column would almost certainly be defined in the database as needing to be unique and attempting to add the second would result in an error. The previously mentioned checks at the shell level wouldn't catch these potential errors. Also, if you merely chucked everything in each file record into a single column, you wouldn't catch these potential errors either. I seem to recall that Oracle would allow you to turn off constraints checking during a load which, in the above example, would allow the loading of those rows even though you were going to wind up with duplicate values in the column that should shouldn't have them. Database administrators tend to get ulcers when someone proposes this.
 
1 members found this post helpful.
Old 04-11-2019, 09:29 AM   #5
sjpoole12
LQ Newbie
 
Registered: Jan 2019
Posts: 8

Original Poster
Rep: Reputation: Disabled
Newbie - How To ...

Outstanding!
I took the results of the 2 sort suggestions and they provided exactly what I needed.
I'm a contractor and even though the database would have loaded correctly having those errors would not have looked good.
I will retain this information and suggestions on my cheat sheet for future problems.
Thank You!
Sandy
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] apache newbie/mrtg newbie (newbie to everything) new2nagios Linux - Software 3 12-14-2009 09:05 AM
Apache newbie question.. (very newbie question) tarballed Linux - Newbie 1 02-07-2003 08:41 PM
A challenge for a newbie and not onle for a newbie neo77777 General 1 06-06-2002 05:52 AM
ipchains newbie. Kernel newbie skeletal29 Linux - Security 3 06-02-2002 03:30 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration