Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I bcped a table into a txt file. It have 8 GB of data, almost 48 million rows.
In each row of data I need to replace one 8 character text delimited by | with a six character text.
Once the line is replaced, I dont need to recheck that line for replacement.
I found the following command which is taking almost 4 minutes for each replacement.
so it will take me almost 10 hours to run the sed command in a loop or something.
I have 400 1-1 mapping to replace.
One thing I was thinking is to remove the line once replaced and copy to different file so that each pass with take less and less time.
But I am not able to figure out how to do that.
Any help is appreciated.
sed -i 's/JAM/BUTTER/g;s/BREAD/CRACKER/g;s/SCOOP/FORK/g;s/SPREAD/SPLAT/g' test.txt
You might be able to do this more easily in LibreOffice Base. Databases in applications like Base can have tables, and you can perform operations like what you are describing on those tables. You might be able to start with your original table, then import it into a new database, then work with it in the program interface. I have never used Base, but I have used Micro$oft Access, and those two programs are similar...not the same, but similar. When I was using Access for work I remember how once I got used to using the program I was able to do all kinds of stuff with it.
my database is in sybase. With all indexes it is still taking 15 mins per update which is far more than sed (3 mins).
so using database is not giving in good performance for this as one would assume.
bcp out is taking 15 mins and bcp in will take another 1.5 hours. still I will save lots of time with file processing.
if we can figure out to eliminate the processed rows in file, that would give very good timing.
its not duplicate per say, as I am just replace 8 characters. if after replacing, i can move that line to a different file that would help.
But how to achieve that, I am not able to figure out.
Last edited by fundoo.code; 09-22-2015 at 08:46 PM.
You can then parallel process the sed cmds.
Of course if you can arrange the data in a known order/groups, you would then not have to match all sed's against all the files.
Alternatively you write a program in eg Perl that pulls out a subset and does the substitution at the same time (Perl regexes are fast). Run multiple copies to parallelize the performance.
That awk did not do what I was expecting. I have to kill it after 1 hour for 4 entries.
I am thinking of trying split and merge the files and see how that perform.
if you guys have similar thing in shell script, sed, awk. for some funny reason our production servers does not have perl installed.
Might I suggest that you look into gsar? It is not installed by default, and MAY not be in your base repository, but it is worth looking up. Insane fast, it has never failed me!
In that case, yes do the split & parallel process the sed's.
Re Perl: if this is a one-off, then 2 options:
1. if you have wkstns that can remotely access the DB directly, so you could do it that way
2. you could also download the file, the split and parallel process it in Perl (if you find Perl easier than sed - I would).
If this is going to be a regular requirement, consider writing a program in the locally approved language eg C that can run on the DB server.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.