LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-27-2017, 03:39 AM   #1
sunil21oct
LQ Newbie
 
Registered: Apr 2017
Posts: 4

Rep: Reputation: Disabled
Execution of the bash script is too Slower.


I have used following sed command to replace character

sed 's/"/\"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/\~\^/"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/,""/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/"[ \t]*/"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv
sed 's/,"N"/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv

in my script but it is taking more time if i am having bulk amount of data in the csv file.

how to optimized execution speed of this script file
 
Old 04-27-2017, 04:03 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
sed can do several actions in one pass in several ways Here are two of them:

Code:
sed -e '...; ...; ...; ...;' file.txt > newfile.txt
sed -e '...;' -e '...; -e '...; -e '...;' file.txt > newfile.txt
So, try combining your formulas so that the rename only has to happen once.
 
Old 04-27-2017, 04:27 AM   #3
sunil21oct
LQ Newbie
 
Registered: Apr 2017
Posts: 4

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
sed can do several actions in one pass in several ways Here are two of them:

Code:
sed -e '...; ...; ...; ...;' file.txt > newfile.txt
sed -e '...;' -e '...; -e '...; -e '...;' file.txt > newfile.txt
So, try combining your formulas so that the rename only has to happen once.

i have combined these in to a single line
sed 's/"/\\"/g;s/\~\^/"/g;s/,""/,"\\N"/g;s/"[ \t]*/"/g;s/,"N"/,"\\N"/g' $CSVDIR/$tableName.csv > $CSVDIR/$tableName.tmp && mv $CSVDIR/$tableName.tmp $CSVDIR/$tableName.csv

but still it is taking time because that csv is having bulk amount of data.
is there any other way to do this operation faster.
 
Old 04-27-2017, 06:27 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I am not sure that combining is going to help much if the data is so large that it is slowing the transaction down, but another alternative could be to combine in a file and call that.
You can also do away with the mv by using sed's -i switch (in testing I would also pass a backup name option):
Code:
$ cat changes
's/"/\\"/g'
's/\\~/^/g'
's/(,""|,"N")/,"\\\\N"/g'
's/"[ \t]*/"/g'
$ sed -r -i.bak -f changes "$CSVDIR/$tableName.csv"
Two things to note:
1. I changed some of your commands as the previous ones were not doing what you expected (namely you need to escape \ with \ to have a single \ appear)

2. On completion of the above you will see no output, but you will now have 2 files, "$CSVDIR/$tableName.csv" will be the one with all the changes in it and "$CSVDIR/$tableName.csv.bak" will be a backup of the original file (just in case something went wrong)
 
Old 04-27-2017, 06:59 AM   #5
r3sistance
Senior Member
 
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375

Rep: Reputation: 217Reputation: 217Reputation: 217
There seems to be something fundamentally wrong here. Why are you using sed to a separate file called .tmp and then overwriting the original file as oppose to doing in-place updates?

Code:
man sed
...
       -i[SUFFIX], --in-place[=SUFFIX]

              edit files in place (makes backup if SUFFIX supplied)
...
not knowing your system it is hard to make recommendations, put perhaps you could drop new records into a .new file. When you need to update, rotate the .new file out and process the .new file with sed and using >> to append to the end of the existing .csv.

Realistically, the best option would probably to use an actual database system like mysql or postgres and capture/change the characters using a stored procedure or trigger but that maybe a far bigger change than wanted.
 
Old 04-27-2017, 07:29 AM   #6
sunil21oct
LQ Newbie
 
Registered: Apr 2017
Posts: 4

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by r3sistance View Post
There seems to be something fundamentally wrong here. Why are you using sed to a separate file called .tmp and then overwriting the original file as oppose to doing in-place updates?

Code:
man sed
...
       -i[SUFFIX], --in-place[=SUFFIX]

              edit files in place (makes backup if SUFFIX supplied)
...
not knowing your system it is hard to make recommendations, put perhaps you could drop new records into a .new file. When you need to update, rotate the .new file out and process the .new file with sed and using >> to append to the end of the existing .csv.

Realistically, the best option would probably to use an actual database system like mysql or postgres and capture/change the characters using a stored procedure or trigger but that maybe a far bigger change than wanted.

This script is executing in solaris system , in solaris -i is not working thats why i have to store it in temp file.

Is there any alternate solution to make it faster?

Last edited by sunil21oct; 04-27-2017 at 07:30 AM.
 
Old 04-27-2017, 07:30 AM   #7
sunil21oct
LQ Newbie
 
Registered: Apr 2017
Posts: 4

Original Poster
Rep: Reputation: Disabled
Is there any alternate solution to make it faster?
 
Old 04-27-2017, 07:41 AM   #8
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939Reputation: 3939
Quote:
Originally Posted by sunil21oct View Post
Is there any alternate solution to make it faster?
Sure! Use a real programming language!

I'm writing this off the top of my head, but in PHP for instance, something like this sketch:

Code:
#!/usr/bin/php

# The preceding ("#!shebang") line tells Bash that this "shell script" 
#    is written in PHP.
# It should be the first line in the file.

# Slurp the entire file into a string ...
$str = file_get_contents($filename);

$str = preg_replace('/\~\^/', "\"", $str);
$str = preg_replace('/,"N"/', ",\\N\"", $str);
# etc.

# Write it all out from the string.
file_put_contents($filename, $str);

rename( ... );
One "gotcha" to be aware of in some languages is that you might need to use double quotes to enclose the string if you want "interpolation" (of things like \n) to take place. Therefore you must "escape" any double-quote literals by preceding them with a backslash. (The LQ forum software apparently won't let me show you an example.)

Nevertheless: you are doing all of the string-twiddling in memory, then writing out the file-content once. Right now, you are laboriously reading the entire file, over and over and over again, just to do one thing to it.

If the file is too large to read into memory all at once (fairly unlikely, these days ...) you can also process the file "line by line" (into a different file), but once again applying all of the string-manipulations to each line all-at-once.

- - - - -

Generally speaking: While "bash scripting" is a sort-of-okay thing to do now and then, IMHO it usually isn't the right way to do "real" work. That scripting tool was designed for "knock-off work," at best. And, through the #!shebang feature, Bash allows you to write your scripts in any language of your choice. In Linux, you have "an embarrassment of riches™" of languages to choose from.

Last edited by sundialsvcs; 04-27-2017 at 07:49 AM.
 
Old 04-27-2017, 08:28 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Have to agree with sundialsvcs. Perl would have been my thought as it too is really good at processing large amounts of data.
 
Old 04-27-2017, 08:37 AM   #10
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
perl can be done as a formal script or as a one-liner. The -i option can do in-place editing. The -p option wraps a loop around the code you put in with the -e option. See the manual page(s) for details.

Code:
man perlrun
man perlre
man perlfunc
So it could look like this:

Code:
perl -p -e 's/a/b/g; s/c/d/g; ...' file.txt > newfile.txt

perl -p -i.orig -e 's/a/b/g; s/c/d/g; ...' file.txt
However, perl also has proper modules for processing CSV and similar flat-files, such as Text::CSV_XS.
 
Old 04-27-2017, 08:37 AM   #11
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by sundialsvcs View Post
Sure! Use a real programming language!

I'm writing this off the top of my head, but in PHP for instance, something like this sketch:
Yep, but I'd suggest Perl over PHP. It was MADE for stuff like this (Perl = "Practical Extraction and Report Language").
 
Old 04-27-2017, 08:46 AM   #12
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
I fail to see how simple substitution like this would be any faster in perl.
 
Old 04-27-2017, 08:50 AM   #13
r3sistance
Senior Member
 
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375

Rep: Reputation: 217Reputation: 217Reputation: 217
The answer, as I think I advised above but put into different words, is that the data should be pre-processed prior to being added to the CSV. Could be done with any language tho, just requires applying data to a different location and then processing it, a perl cronjob would be better than BASH for this. While it could be done with BASH, perl should be more consistent and cross platform.
 
Old 04-27-2017, 09:19 AM   #14
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by syg00 View Post
I fail to see how simple substitution like this would be any faster in perl.
First, Perl is more efficient at regular expressions, and, probably more importantly, the script would be run in one process. You wouldn't have to fork() for every call to sed. You wouldn't need mv since with Perl you can just save the altered data in a new directory and optionally delete the original with unlink().

Last edited by Laserbeak; 04-27-2017 at 09:21 AM.
 
Old 04-27-2017, 09:42 AM   #15
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,620

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
To extend upon the good advice above:
You are not using bash to do the substitutions.
There is one disk read for the script, then one for every sed command, and on efor every reference to the file. This adds up to a lot of I/O. You are using bash to call sed MULTIPLE times to do the grunt work.

Engines like Perl do not need to make external calls to outside programs, so they only load from disk ONCE. If the data can all fit in memory, they also need only one massive read and one massive write for all of the file I/O. This reduces the total I/O delay greatly. while both SED and PERL are highly optimized for this, PERL is the more general and efficient choice for this particular case.

Even if you have to work on a block (or line) at a time, the principal of doing one read, make all of the substitutions on the buffer, then write that out and read the next would speed things up.

Other general languages with characteristics abound, but the principal is to use each tool for what it is best at. This case is just not optimal for bash and sed together.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Issue: Bash script displays its name upon execution gacanepa Linux - Newbie 8 07-08-2013 02:17 PM
Bash script execution dann_radkov Linux - Newbie 5 10-18-2011 09:46 AM
[SOLVED] Bash script execution error dann_radkov Linux - Newbie 4 09-13-2011 03:15 PM
Detect bash script source vs. direct execution Jessard Programming 11 11-30-2010 06:43 AM
BASH: open console on script execution Quis Programming 2 02-07-2006 09:41 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration