Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I know that this is truly a newbie question, but here goes...
We have created several files that insert into Oracle DB 12c tables a large amount of data. We've got this one table that we are getting many error messages during the load concerning duplicate rows.
I need a method that will allow me to identify these duplicate rows and delete them in case there are other errors.
I don’t know how you perform the insertion, but if you can list all rows, a pipeline consisting of the sort and uniq commands is the easiest solution. For example, to find the duplicates:
Code:
sort FILE-WITH-ROWS | uniq -d >FILE-WITH-DUPS
To delete the duplicates, you don’t even need uniq:
Code:
sort -u FILE-WITH-ROWS > FILE-WITHOUT-DUPS
As a side-effect, FILE-WITHOUT-DUPS contains the rows sorted alphanumerically. If that is not acceptable, heavier weapons are required.
Last edited by berndbausch; 04-10-2019 at 05:04 AM.
If you're getting errors about duplicate rows, the duplicates won't be inserted into the database, BUT "duplicate" to the database means duplicate keys, so there may be new information in the data that is not being updated that should be.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803
Rep:
Quote:
Originally Posted by sjpoole12
I know that this is truly a newbie question, but here goes...
We have created several files that insert into Oracle DB 12c tables a large amount of data. We've got this one table that we are getting many error messages during the load concerning duplicate rows.
berndbausch's answer will avoid the problem if you errors are about duplicate rows---i.e., where the entire row has been duplicated. Using those tips, you merely clean the data before you try and insert.
Things get more complicated if you're inserting multiple values and only one value turns out to be duplicated in some rows. Example: you have two records comprised of a person's social security number and their name.
Code:
nnn-nn-nnnn James Smith
nnn-nn-nnnn Jim Smith
The SSN value/column would almost certainly be defined in the database as needing to be unique and attempting to add the second would result in an error. The previously mentioned checks at the shell level wouldn't catch these potential errors. Also, if you merely chucked everything in each file record into a single column, you wouldn't catch these potential errors either. I seem to recall that Oracle would allow you to turn off constraints checking during a load which, in the above example, would allow the loading of those rows even though you were going to wind up with duplicate values in the column that should shouldn't have them. Database administrators tend to get ulcers when someone proposes this.
Outstanding!
I took the results of the 2 sort suggestions and they provided exactly what I needed.
I'm a contractor and even though the database would have loaded correctly having those errors would not have looked good.
I will retain this information and suggestions on my cheat sheet for future problems.
Thank You!
Sandy
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.