LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-23-2014, 07:48 AM   #1
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Rep: Reputation: Disabled
Understanding a sed command...


Hey Guys,

I have been working on some scripts for some time now, and I've had a lot of help from the forum, so many thanks for that.

I'm pretty much there now.. I didn't know bash scripting before I started so it's been a learning curve.. a very enjoyable learning curve at that.

I understand everything I have, as I've been careful to make sure I fully understand something before it ends up in my script, however I have one command that has slipped through the grid and I don't fully understand it. I wonder if someone has a minute to explain it for me?

The command is:

Code:
find . -name "MF_BAT_BB*$D*" -exec sh -c 'a=$(echo {} | sed -r "s/([^.]*)\$/\L\1/"); [ "$a" != "{}" ] && mv "{}" "$a" ' \;
So, I understand the find command of course, which is finding all files in the current directory with MF_BAT_BB in the filename along with todays date which is stored in $D.

Then once we find the files, exec carries out a sed command... I am pretty sure it's renaming the files.. but I don't recall how and what it's renaming them as.

It's a little more difficult to dissect what it's doing as the script is quite large..

I would be eternally grateful if someone could walk me through the steps happening after the find command?

Thanks in advance,

Jon
 
Old 09-23-2014, 08:17 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 7,912

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
The sed command is this
Code:
 sed -r "s/([^.]*)\$/\L\1/");
It has to stop there, because a semi-colon ends it. Sed is a stream editor - it pipes stuff through and, in this case, changes it on the way.

The straight form of that is:
Code:
sed 's/Expression 1/Expression2/'
and exp. 1 is swapped for exp 2. In your case they look like Posix regexes, where the backslash preceding a character indicate the ordinary meaning for the next character.
exp1 = ([^.]*)\$
exp2 = \L\1

They look a little odd - the thing to watch is the full stop, which means basically anything, while \. means a full stop. I have not seen '1' escaped before, but my regexes are weak.

As for directions, stuff being found by the find is processed and sent to the next part.
 
Old 09-23-2014, 09:22 AM   #3
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Original Poster
Rep: Reputation: Disabled
So, as I currently understand it:

Code:
find . -name "MF_BAT_BB*$D*" -exec sh -c 'a=$(echo {} | sed -r "s/([^.]*)\$/\L\1/"); [ "$a" != "{}" ] && mv "{}" "$a" ' \;
find . -name "MF_BAT_BB*$D*" - Finds all files according to the name in the current location

-exec sh -c - Once it finds the file, execute something...

'a=$(echo {} - Execute this.. making $a nothing? (empty?)

sed -r "s/([^.]*)\$/\L\1/"); - Pipe the output to the SED command (not sure what it does).

[ "$a" != "{}" ] && mv "{}" "$a" ' \; - Then do another SED command which appears to ask if $a is not equal to {} (nothing?) and then move {} (nothing?) to $a.. ie, rename nothing with the value of $a.

That's my understanding of it from just looking at what it's doing with the knowledge I have.. but I am a little stuck in two areas. What is the first SED command doing? Renaming something? If so, how? And the second SED command appears to be actually carrying out the rename right? So maybe the first SED command is manipulating a filename and then the second is writing that filename using the mv command.

So, I think I've got a basic theoretical understanding of what it's doing.. but if possible I'd like to understand why it's doing what it's doing so that I could use variations of it elsewhere should I need to.

Thanks
Jon
 
Old 09-23-2014, 09:45 AM   #4
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Original Poster
Rep: Reputation: Disabled
The more I stare at it the more things click...

$a contains the result of echo {} | sed -r "s/([^.]*)\$/\L\1/") - still no clearer as to what the sed command does, but in turn the second command seems to be nothing to do with sed and is just a shell command.

Little closer to understanding, but still a long way off lol.
 
Old 09-23-2014, 10:49 AM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 3,399

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
The sed command matches as many characters at the end of the line as it can without encountering a literal "." and changes them to lower case. The result of that substitution is assigned to variable a.

The find command will have replaced every instance of "{}" with the path that it found, so the test
Code:
[ "$a" != "{}" ]
checks whether the variable a is different from the original path (i.e., that sed actually changed something). If the test is true, the mv command renames the file to the changed name.

The purpose is to force the extension to lower case, e.g., rename "XyZZy.JPG" to "XyZZy.jpg".

There is a bug that changes the whole name to lower case if there is no "." in it. If such a file is in a subdirectory, that renaming attempt would extend to the directory name as well (and probably fail).

Note: I have not tested any of the above -- it's just my reading of the command.
 
1 members found this post helpful.
Old 09-24-2014, 02:58 AM   #6
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Original Poster
Rep: Reputation: Disabled
Wow that's awesome.. Great explanation and that makes perfect sense

One other question though, and I know this is probably going to be a mine field so apologies and feel free to say "Go read a book" lol.. (And I intend to read)

But how does the sed command do that? I assume it's regular expressions but how would I construct that for something else if I needed to?
 
Old 09-24-2014, 04:36 AM   #7
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Original Poster
Rep: Reputation: Disabled
I have been reading all morning.. I did begin to remember quite a bit from the courses I've done about regular expressions which is good.. and I've written some notes as to my findings as I go.. I'm determined to fully understand this.. I don't like the idea of using a command/string in a script that I don't fully understand, so I'll be forever grateful if you could take a look at my notes and let me know if I'm on the right track.

Apologies if this seems a bit basic, but I'm still learning

Code:
find . -name "MF_BAT_BB*$D*" -exec sh -c 'a=$(echo {} | sed -r "s/([^.]*)\$/\L\1/"); [ "$a" != "{}" ] && mv "{}" "$a" ' \;

sed -r "s/([^.]*)\$/\L\1/");

"s/([^.]*)\$/\L\1/")
"s/( = The bracket is clearly sectioning off some expressions.. (I guess) - but the s/ is a mystery.. assuming it's something to do with SED?

[^.] = Exclude "." in the search? (^ inside brackets negates the expression) ^ outside []'s represents start of string? "." could mean any character? But not sure about within []'s

* = Exclude all "."'s? (* matches when the preceding character occurs 0 or more times)

\$ = $ is look only at the end of the string, but \ seems to be escape the character. This tells me it's searching for $ as a literal character but we know this is not true.

/ = Unable to find any information about \'s in regular expressions.

\L = Here I'm starting to think Regular Expressions is ending, and something new is starting.. because from what I'm reading \L should escape the L character
(if it were a special character, which it doesn't look like it is. Knowing now what the command does, I guess it's something to do with lower case?

\1 = As above, not really sure.

")= This bit is confusing only because of the miss placement of the "'s and )'s.. why are they over lapping?
 
Old 09-24-2014, 09:27 AM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 3,399

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
sed has a language all its own, and regular expressions are just a small part of it. Then manpage for sed has a brief synopsis of the sed commands. If the info command is installed on your system, you can run "info sed" and get a fuller description, or you can to to http://www.gnu.org/software/sed/. While the language is typically used for filtering text, it is actually Turing-complete. Someone actually wrote a sed script that emulates the bc arbitrary precision calculator.

I'll answer some of your questions:

The "s/regexp/replacement/" is the "substitute" command in sed, as it is in vi and some other editors. The part of the line matched by the regexp is replaced by the replacement text.

Since sed was given the "-r" option, it is using extended regular expressions. The parentheses within the regexp are marking a subexpression that is subsequently referenced by the "\1" back reference. (\n, where n is a single digit, refers to the n-th parenthesized subexpression of the regular expression.)

The "$" character is special to the shell, and needs to be escaped by a backslash in order to be passed literally to the sed command. You always need to be aware of how the shell handles various special characters. You can use "set -x" in the shell to see what is actually passed to each invoked command. (Use "set +x" to cancel that.)

The "\L" in the replacement text is part of the sed language and causes the subsequent part of the replacement to be converted to lower case.

As for that final "), that parenthesis is not part of the sed command, it is the end of the "a=$( ... )" shell variable assignment.

Last edited by rknichols; 09-24-2014 at 09:35 AM. Reason: add "As for that final ..."
 
Old 09-24-2014, 09:54 AM   #9
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Original Poster
Rep: Reputation: Disabled
Ah ok, thanks for that.

It's starting to make a little more sense.. and gives me something to work on You're a gentleman for spending the time to explain that to me, it actually all made sense too... so I credit your explanation over my ability to understand as it normally takes a while for things to sink in.

I'll take a look at the pages you've recommended and have a play around with the command to see if I can tweak it.

At the moment (I've tested it), it renames something like xxxxxx.CSV.PGP to xxxxxxx.CSV.pgp. I'd actually like it to set the CSV to lower case too, which I imagine is possible by modifying the same command?

Thanks
Jon
 
Old 09-24-2014, 10:46 AM   #10
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 3,399

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
Quote:
Originally Posted by jonnybinthemix View Post
At the moment (I've tested it), it renames something like xxxxxx.CSV.PGP to xxxxxxx.CSV.pgp. I'd actually like it to set the CSV to lower case too, which I imagine is possible by modifying the same command?
Just add to the regexp:
Code:
sed -r "s/([^.]*\.[^.]*)\$/\L\1/"
Now it matches anynumber of characters that are not a ".", followed by a literal ".", followed by any number of characters that are not a ".", all occurring at the end of the line. Actually, I'd change those asterisks to "+" signs to insist on matching at least one non-"." character in each place:
Code:
sed -r "s/([^.]+\.[^.]+)\$/\L\1/"
It probably makes no difference, but when doing things like that I like to make my matches as specific as I can.

All of the above misbehave on files that do not have two extensions, so let's restrict the action to files that do have two extensions in the final path component:
Code:
sed -r "/[^/]+\.[^.]+\.[^.]+\$/s/([^.]+\.[^.]+)\$/\L\1/"
The part in red selects only those lines that end with one or more characters that are not "/", followed by ".", followed by the two "." separated extensions that we want to change.

Last edited by rknichols; 09-24-2014 at 11:22 AM. Reason: Add "All of the above misbehave ..."
 
Old 09-24-2014, 11:21 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 7,912

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
It's always the same - by the time you have it working you're an expert, but it's like you crammed for an exam.
Usually, you then get knowledge bulimia: Learn it for the job, forget it afterward:-P.
 
Old 09-25-2014, 02:00 AM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,356

Rep: Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367
This is also a popular how-sed-works-by-example, if a bit old http://www.grymoire.com/Unix/Sed.html
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed gives :sed: -e expression #1, char 1: unknown command: `'' samasat Linux - Newbie 10 06-09-2012 05:31 PM
[SOLVED] sed help to run sed command against multiple different file names bkone Programming 2 04-16-2012 12:27 PM
Need help understanding a command henrtm05 Programming 4 09-26-2010 01:57 PM
Understanding `ls /home |sed '/$user/ d' in Bash akiladila Linux - Newbie 2 03-01-2008 01:54 PM
help understanding this command vince_2x Linux - General 4 09-21-2004 12:42 AM


All times are GMT -5. The time now is 12:41 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration