Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
04-27-2017, 03:39 AM
|
#1
|
LQ Newbie
Registered: Apr 2017
Posts: 4
Rep: 
|
Execution of the bash script is too Slower.
I have used following sed command to replace character
sed 's/"/\"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/\~\^/"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/,""/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/"[ \t]*/"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/,"N"/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
in my script but it is taking more time if i am having bulk amount of data in the csv file.
how to optimized execution speed of this script file
|
|
|
04-27-2017, 04:03 AM
|
#2
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,718
|
sed can do several actions in one pass in several ways Here are two of them:
Code:
sed -e '...; ...; ...; ...;' file.txt > newfile.txt
sed -e '...;' -e '...; -e '...; -e '...;' file.txt > newfile.txt
So, try combining your formulas so that the rename only has to happen once.
|
|
|
04-27-2017, 04:27 AM
|
#3
|
LQ Newbie
Registered: Apr 2017
Posts: 4
Original Poster
Rep: 
|
Quote:
Originally Posted by Turbocapitalist
sed can do several actions in one pass in several ways Here are two of them:
Code:
sed -e '...; ...; ...; ...;' file.txt > newfile.txt
sed -e '...;' -e '...; -e '...; -e '...;' file.txt > newfile.txt
So, try combining your formulas so that the rename only has to happen once.
|
i have combined these in to a single line
sed 's/"/\\"/g;s/\~\^/"/g;s/,""/,"\\N"/g;s/"[ \t]*/"/g;s/,"N"/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
but still it is taking time because that csv is having bulk amount of data.
is there any other way to do this operation faster.
|
|
|
04-27-2017, 06:27 AM
|
#4
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,036
|
I am not sure that combining is going to help much if the data is so large that it is slowing the transaction down, but another alternative could be to combine in a file and call that.
You can also do away with the mv by using sed's -i switch (in testing I would also pass a backup name option):
Code:
$ cat changes
's/"/\\"/g'
's/\\~/^/g'
's/(,""|,"N")/,"\\\\N"/g'
's/"[ \t]*/"/g'
$ sed -r -i.bak -f changes "$CSVDIR/$tableName.csv"
Two things to note:
1. I changed some of your commands as the previous ones were not doing what you expected (namely you need to escape \ with \ to have a single \ appear)
2. On completion of the above you will see no output, but you will now have 2 files, "$CSVDIR/$tableName.csv" will be the one with all the changes in it and "$CSVDIR/$tableName.csv.bak" will be a backup of the original file (just in case something went wrong)
|
|
|
04-27-2017, 06:59 AM
|
#5
|
Senior Member
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375
|
There seems to be something fundamentally wrong here. Why are you using sed to a separate file called .tmp and then overwriting the original file as oppose to doing in-place updates?
Code:
man sed
...
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
...
not knowing your system it is hard to make recommendations, put perhaps you could drop new records into a .new file. When you need to update, rotate the .new file out and process the .new file with sed and using >> to append to the end of the existing .csv.
Realistically, the best option would probably to use an actual database system like mysql or postgres and capture/change the characters using a stored procedure or trigger but that maybe a far bigger change than wanted.
|
|
|
04-27-2017, 07:29 AM
|
#6
|
LQ Newbie
Registered: Apr 2017
Posts: 4
Original Poster
Rep: 
|
Quote:
Originally Posted by r3sistance
There seems to be something fundamentally wrong here. Why are you using sed to a separate file called .tmp and then overwriting the original file as oppose to doing in-place updates?
Code:
man sed
...
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
...
not knowing your system it is hard to make recommendations, put perhaps you could drop new records into a .new file. When you need to update, rotate the .new file out and process the .new file with sed and using >> to append to the end of the existing .csv.
Realistically, the best option would probably to use an actual database system like mysql or postgres and capture/change the characters using a stored procedure or trigger but that maybe a far bigger change than wanted.
|
This script is executing in solaris system , in solaris -i is not working thats why i have to store it in temp file.
Is there any alternate solution to make it faster?
Last edited by sunil21oct; 04-27-2017 at 07:30 AM.
|
|
|
04-27-2017, 07:30 AM
|
#7
|
LQ Newbie
Registered: Apr 2017
Posts: 4
Original Poster
Rep: 
|
Is there any alternate solution to make it faster?
|
|
|
04-27-2017, 07:41 AM
|
#8
|
LQ Guru
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 11,149
|
Quote:
Originally Posted by sunil21oct
Is there any alternate solution to make it faster?
|
Sure! Use a real programming language!
I'm writing this off the top of my head, but in PHP for instance, something like this sketch:
Code:
#!/usr/bin/php
# The preceding ("#!shebang") line tells Bash that this "shell script"
# is written in PHP.
# It should be the first line in the file.
# Slurp the entire file into a string ...
$str = file_get_contents($filename);
$str = preg_replace('/\~\^/', "\"", $str);
$str = preg_replace('/,"N"/', ",\\N\"", $str);
# etc.
# Write it all out from the string.
file_put_contents($filename, $str);
rename( ... );
One "gotcha" to be aware of in some languages is that you might need to use double quotes to enclose the string if you want "interpolation" (of things like \n) to take place. Therefore you must "escape" any double-quote literals by preceding them with a backslash. (The LQ forum software apparently won't let me show you an example.)
Nevertheless: you are doing all of the string-twiddling in memory, then writing out the file-content once. Right now, you are laboriously reading the entire file, over and over and over again, just to do one thing to it.
If the file is too large to read into memory all at once (fairly unlikely, these days ...) you can also process the file "line by line" (into a different file), but once again applying all of the string-manipulations to each line all-at-once.
- - - - -
Generally speaking: While " bash scripting" is a sort-of-okay thing to do now and then, IMHO it usually isn't the right way to do "real" work. That scripting tool was designed for "knock-off work," at best. And, through the #!shebang feature, Bash allows you to write your scripts in any language of your choice. In Linux, you have "an embarrassment of riches™" of languages to choose from.
Last edited by sundialsvcs; 04-27-2017 at 07:49 AM.
|
|
|
04-27-2017, 08:28 AM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,036
|
Have to agree with sundialsvcs. Perl would have been my thought as it too is really good at processing large amounts of data.
|
|
|
04-27-2017, 08:37 AM
|
#10
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,718
|
perl can be done as a formal script or as a one-liner. The -i option can do in-place editing. The -p option wraps a loop around the code you put in with the -e option. See the manual page(s) for details.
Code:
man perlrun
man perlre
man perlfunc
So it could look like this:
Code:
perl -p -e 's/a/b/g; s/c/d/g; ...' file.txt > newfile.txt
perl -p -i.orig -e 's/a/b/g; s/c/d/g; ...' file.txt
However, perl also has proper modules for processing CSV and similar flat-files, such as Text::CSV_XS.
|
|
|
04-27-2017, 08:37 AM
|
#11
|
Member
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508
Rep: 
|
Quote:
Originally Posted by sundialsvcs
Sure! Use a real programming language!
I'm writing this off the top of my head, but in PHP for instance, something like this sketch:
|
Yep, but I'd suggest Perl over PHP. It was MADE for stuff like this (Perl = "Practical Extraction and Report Language").
|
|
|
04-27-2017, 08:46 AM
|
#12
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,363
|
I fail to see how simple substitution like this would be any faster in perl.
|
|
|
04-27-2017, 08:50 AM
|
#13
|
Senior Member
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375
|
The answer, as I think I advised above but put into different words, is that the data should be pre-processed prior to being added to the CSV. Could be done with any language tho, just requires applying data to a different location and then processing it, a perl cronjob would be better than BASH for this. While it could be done with BASH, perl should be more consistent and cross platform.
|
|
|
04-27-2017, 09:19 AM
|
#14
|
Member
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508
Rep: 
|
Quote:
Originally Posted by syg00
I fail to see how simple substitution like this would be any faster in perl.
|
First, Perl is more efficient at regular expressions, and, probably more importantly, the script would be run in one process. You wouldn't have to fork() for every call to sed. You wouldn't need mv since with Perl you can just save the altered data in a new directory and optionally delete the original with unlink().
Last edited by Laserbeak; 04-27-2017 at 09:21 AM.
|
|
|
04-27-2017, 09:42 AM
|
#15
|
LQ Guru
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS, Manjaro
Posts: 6,131
|
To extend upon the good advice above:
You are not using bash to do the substitutions.
There is one disk read for the script, then one for every sed command, and on efor every reference to the file. This adds up to a lot of I/O. You are using bash to call sed MULTIPLE times to do the grunt work.
Engines like Perl do not need to make external calls to outside programs, so they only load from disk ONCE. If the data can all fit in memory, they also need only one massive read and one massive write for all of the file I/O. This reduces the total I/O delay greatly. while both SED and PERL are highly optimized for this, PERL is the more general and efficient choice for this particular case.
Even if you have to work on a block (or line) at a time, the principal of doing one read, make all of the substitutions on the buffer, then write that out and read the next would speed things up.
Other general languages with characteristics abound, but the principal is to use each tool for what it is best at. This case is just not optimal for bash and sed together.
|
|
1 members found this post helpful.
|
All times are GMT -5. The time now is 08:30 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|