LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Insert a string for different files....complete a sentence only! (https://www.linuxquestions.org/questions/linux-newbie-8/insert-a-string-for-different-files-complete-a-sentence-only-792904/)

Drigo 03-03-2010 10:53 AM

Insert a string for different files....complete a sentence only!
 
I have the following bash script:

#Add extension into the name
for FILE in $(ls $SOURCE*$1*);
do echo $FILE;
mv $FILE $2_"${FILE//$1}";
echo "to...." $2_"${FILE//$1}";
done

So i will add $2 at the beggining and get rid of $1 in each filename. Now my questions is how can I insert $2 and replace it with $1 or inserted in the middle of the filename?

Thanks

David the H. 03-03-2010 11:00 AM

I'm sorry, I don't follow your request at all. Can you give an example or two of the before and after of what you want?

Also, please enclose your code in [code][/code] tags, to preserve formatting.

Drigo 03-03-2010 11:04 AM

Before

123_oldtxt_32.xxx
124_oldtxt_33.xxx
125_oldtxt_33.xxx
126_oldtxt_34.xxx
127_oldtxt_33.xxx

to....


123_newname_32.xxx
124_newname_33.xxx
125_newname_33.xxx
126_newname_34.xxx
127_newname_33.xxx

instead of :


newname_123_32.xxx
newname_124_33.xxx
newname_125_33.xxx
newname_126_34.xxx
newname_127_33.xxx

which is doing according to my code. I done what the newname inserted at the beinning of the string but replace it with the string i am replacing...

David the H. 03-03-2010 11:24 AM

Probably the most reliable way to do it is to use a sed regex to modify the filename and save it to a new variable. Then use that to do the renaming. Something like this.

Code:

NEWNAME=$(echo $FILE | sed -r "s/([0-9]+_)[^_]+(_*)/\1$2\2/"
Another option would be to extract substrings. This has the advantage of using only bash built-ins.
Code:

PRE=${FILE%%_*}
POST=${FILE##*_}
NEWNAME=$PRE_$2_$POST

Edit: D'oh! Actually, since it appears that $1 holds the old text and $2 the new text, it should be as easy as:
Code:

mv $FILE ${FILE/$1/$2}

Drigo 03-03-2010 11:54 AM

David,
That a great info....I would like to learn more about the command you showed me:

NEWNAME=$(echo $FILE | sed -r "s/([0-9]+_)[^_]+(_*)/\1$2\2/"

specifically sed -r "s/([0-9]+_)[^_]+(_*)/\1$2\2/
Can you elaborate on this?
[0-9]+_ ?
[^_] ?
\1$2\2/ ?

schneidz 03-03-2010 12:02 PM

assuming uniform filenames:
Code:

filename=123_oldtxt_32.xxx
echo $filename | awk -F _ '{print $1 "_newname_" $3}'


Drigo 03-03-2010 12:10 PM

More difficult:
(Real files)
This is what i have:
11420_07_001_004_20090604.nii_Corrected.nii.gz 11420_17_001_003_20090928.nii_Corrected.nii.gz
11420_07_001_004_20090604.nii_Corrected.nii.gz_mask.nii.gz 11420_17_001_003_20090928.nii_Corrected.nii.gz_mask.nii.gz
11420_08_001_003_20090806.nii_Corrected.nii.gz 11420_17_001_004_20090928.nii_Corrected.nii.gz
11420_08_001_003_20090806.nii_Corrected.nii.gz_mask.nii.gz 11420_17_001_004_20090928.nii_Corrected.nii.gz_mask.nii.gz
11420_08_001_004_20090806.nii_Corrected.nii.gz 11420_20_001_003_20100104.nii_Corrected.nii.gz
11420_08_001_004_20090806.nii_Corrected.nii.gz_mask.nii.gz 11420_20_001_003_20100104.nii_Corrected.nii.gz_mask.nii.gz
11420_09_001_003_20090813.nii_Corrected.nii.gz 11420_20_001_004_20100104.nii_Corrected.nii.gz
11420_09_001_003_20090813.nii_Corrected.nii.gz_mask.nii.gz 11420_20_001_004_20100104.nii_Corrected.nii.gz_mask.nii.gz


I would like to get rid of 20090604.nii_Corrected.nii.gz which is in the middle of each filename in all these files.The names following 2009 ARE NOT THE SAME.
Thanks i advance! This will helped me a lot.

chrism01 03-03-2010 07:13 PM

You can extend the awk soln above, but note that the string you specify goes to the end of the filename in many cases, not just 'in the middle'. Are you sure that's what you want?

David the H. 03-04-2010 12:32 AM

The sed command uses a regular expression (regex), which if you don't know is an advanced pattern-matching language. I highly recommend learning at least the basics of it if you plan on doing much scripting. There are plenty of tutorials available on the net.

Here's a break down the command I gave you.

sed -r "s/([0-9]+_)[^_]+(_*)/\1$2\2/"

The -r option turns on regex. Note also that the expression is enclosed in double-quotes, since you need to be able to expand the shell variable in it.

([0-9]+_)

The brackets will match a range of characters, in this case "0-9". The plus sign means "match one or more of the previous character", letting it match any string of numbers. Then I added the underscore. Then I put the whole thing in parentheses to save it for the output. More on that later.

[^_]+

One difficulty with regex is that it's greedy. If you use a simple wildcard, it will keep going until it finds the last matching character available, so if your string has multiple underscores, it wouldn't stop until it hits the last one. So instead I use a negation ^ in range brackets means "not", so in this case, "match any number of characters except the underscore", until you hit the next character in the expression. We don't need to save this part, so no parentheses here.

(_*) --fixed to--> (_.*)

Actually, this is a mistake on my part. There should be an additional period here.
When the expression reaches the next underscore, save it, and everything else following it, in a second parentheses. The period is a wildcard meaning "any character at all", and * is similar to +, but will match zero or more of the previous character. So in regex ".*" means match everything.

\1$2\2

The replacement function is easy. Each set of parentheses in the matching expression is referred to by a \n. So \1 outputs the contents of whatever the first parentheses matches, and \2 outputs the second parentheses. Then just put your variable in the middle, and you have your final string.


Now your real-life example might actually be a bit easier to do, since all you want to do is remove a fixed pattern from the string. Assuming the pattern is fixed, of course. That's the problem with pattern matching, if there's no common pattern, there's no easy way to set up a matching expression :).

Code:

echo $FILE| sed -r "s/_[0-9]{8}.nii_Corrected.nii.gz//"
({8} is a regex specifying the exact number of the previous character to match, 8 in this case)

Or using bash parameter substitution (which doesn't have range brackets).

Code:

${FILE/_[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9].nii_Corrected.nii.gz/}
Notice how I customized the number ranges to match only the ones that appear in dates. You can add in $2 or whatever in the replacement side of the expression, of course, since removing the pattern from the shorter of the examples you gave leaves the files with no ending part.

schneidz 03-04-2010 08:50 AM

in general:
1. determine what the feild delimmiter is.
2. determine what feilds you want to keep.
3. determine what new feilds you want to add.
4. assemble the output from the previous 3 steps.

it seems like awk would be the best tool for the job.
good luck.

Drigo 03-04-2010 10:54 AM

Thanks all for all your help....I'm starting to understand these. :)

Drigo 03-04-2010 11:34 AM

So I managed to do everything except adding a string in the middle. This is what I tried:

echo $FILE | sed -r "s/(.*$FLAGS)($FLAGS*.)/\1$2\2/" #where FLAGS is the string before what I want to add....Im very confused :S :S can I also use mv instead?

in other words:
Input

abc.xx
abc1.xx
abc2.xx
abc3.xx

$FLAGS = ab

Output:

abDRIGOc.xx
abDRIGOc1.xx
abDRIGOc2.xx
abDRIGOc3.xx

The self instructions I've created:

Command options:
Add line in the middle: -addmid <flag of files> <expression to add>
An input will be asked where to add the string... (this will be called FLAGS)

Drigo 03-04-2010 11:49 AM

Nevermid....I used some expression to add lines in the middle.
This is what i did:


elif [ $1 = "-addmid" ]; then
# echo "in process addmid, script not finished"
# mv $FILE $FILE$3;
# echo $FILE | sed -r "s/($FLAGS)/\1$3/"
#echo $FILE | sed -r "s/(.*$FLAGS)($FLAGS*.)/\1$2\2/"
PRE=${FILE//$3*}
PRO=${FILE//*$3}
# echo $PRE
mv $FILE $PRE$3$4$PRO

Im done and happy! (Thanks David!)


All times are GMT -5. The time now is 03:21 AM.