LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   using sed to parse dir output (https://www.linuxquestions.org/questions/linux-newbie-8/using-sed-to-parse-dir-output-611796/)

kcorkran 01-07-2008 12:55 PM

using sed to parse dir output
 
Hello Linux Professionals:

I am trying to parse the output of a windows dir command so it looks like to the below 'After' statement. I just to remove the extra stuff even the recursive directories.

Before:
Volume in drive \\Scandocs_vs\scandocs is SCANDOCS
Volume Serial Number is C0A8-579C

Directory of \\Scandocs_vs\scandocs\archives_webfiles\arcmaps\pdfs

03/04/2004 12:39p <DIR> .
03/04/2004 12:39p <DIR> ..
03/19/2004 01:15p 24,364,073 10315.pdf

After:
03/19/2004 01:15p 24,364,073 10315.pdf

Any help appreciated!
Keith

cjcox 01-07-2008 01:09 PM

Remove the first 7 lines (?).

dir | sed '1,7d'

Just a stab in the dark...

kcorkran 01-07-2008 02:42 PM

I think that would work if I did not have to dir recursively. [dir /s]
I was thinking that if I could remove all lines that did not match '.pdf' in the string it would work.
-Keith

********************
03/19/2004 01:15p 24,364,073 10315.pdf (keep)

Directory of \\Scandocs_vs\scandocs\archives_webfiles\arcmaps\pdfs (discard)
********************

Poetics 01-07-2008 02:44 PM

Why don't you just use grep? You can search for ".pdf" and only include those lines that have a .pdf file on them (if so named). There are a variety of ways to do this, all equally valid, but I leave their discovery as an exercise for the reader.

kcorkran 01-08-2008 09:10 AM

grep did it.
I was using it like this: (cygwin by the way)
c:> grep -i '.pdf' dir_pdfs
But the result was not returning what I expected (directories were still listed) so I was luckily able to modify the statement to:
c:> grep -i '[0-9].pdf' dir_pdfs
and it returned the results I wanted. So problem solved!

As a matter of curiosity, do you know why it did not seem to apply the '.' in the string example.
Thanks,
Keith

colucix 01-08-2008 01:35 PM

The dot '.' as a special meaning in a regular expression: it matches any single character, not just the dot itself. When you use a dot or any other special character in the pattern, grep interprets it as a regular expression and you can obtain an unexpected result.

On the other hand, to match a dot literally you have to enclose it in square brackets, e.g
Code:

grep [.]pdf dir_pdfs

kcorkran 01-08-2008 11:32 PM

That explains it. Thanks much!

pixellany 01-08-2008 11:45 PM

Quote:

Originally Posted by colucix (Post 3015818)
The dot '.' as a special meaning in a regular expression: it matches any single character, not just the dot itself. When you use a dot or any other special character in the pattern, grep interprets it as a regular expression and you can obtain an unexpected result.

On the other hand, to match a dot literally you have to enclose it in square brackets, e.g
Code:

grep [.]pdf dir_pdfs

Every day I learn that you learn something new every day!!

I had learned that the "normal" way to change the meaning of certain characters was the "escape"---as in:
grep "\." filename
The square bracket I never saw before---does it also work in SED?

Yes...

ghostdog74 01-09-2008 12:09 AM

Quote:

Originally Posted by pixellany (Post 3016250)
The square bracket I never saw before---does it also work in SED?

Yes...

its used in regexp to specify range or single character. eg [a-z] , [abc].
From wiki
Quote:

\[ \] A bracket expression. Matches a single character that is contained within the brackets. For example, \[abc\] matches "a", "b", or "c". \[a-z\] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: \[abcx-z\] matches "a", "b", "c", "x", "y", and "z", as does \[a-cx-z\].

The - character is treated as a literal character if it is the last or the first character within the brackets, or if it is escaped with a backslash: \[abc-\], \[-abc\], or \[a\-bc\].
If the sed you are using supports this syntax, then yes, it can be used in sed.

pixellany 01-09-2008 06:58 AM

Light goes on....
I knew bracket expressions, but had never considered that a "special" character would cease to be special inside one. The books typically don't talk about the use of brackets in lieu of escaping---but it obviously works.

So, is there a way to pass in as a variable the string to go inside [ ]?

colucix 01-09-2008 07:49 AM

Quote:

Originally Posted by pixellany (Post 3016553)
The books typically don't talk about the use of brackets in lieu of escaping---but it obviously works.

Yes... not really used as an escape, but as a way to match single characters, as ghostdog reported. Anyway, very useful for "escaping" in some cases!
Quote:

So, is there a way to pass in as a variable the string to go inside [ ]?
I think this can be done in the common way. For example consider a text file with these two lines
Code:

$ cat testfile
line with a dot . inside
line with a dot at the end.

You can do
Code:

$ my_var=.$
$ grep [$my_var] testfile
line with a dot . inside
line with a dot at the end.

whereas if you want to retain the special meaning of $ you have to add it outside the brackets.
Code:

$ grep [$my_var]$ testfile
line with a dot at the end.

Cheers! :)

pixellany 01-10-2008 07:37 AM

OK---special meaning as you use it means "at the end of the line". But, inside the [ ], the "$" clearly has its more general special meaning--i.e. "the value of". so you would have to use [\$] to look for a literal "$".

What other characters are special by default inside of [ ]? e.g. "r[^ab]" means "r, not followed by a or b".

colucix 01-10-2008 01:17 PM

CORRECT. Except when you put $ at the end of the character list, that is if it's not followed by any other character it cannot expand any variable. How many nuances the shell has!!


All times are GMT -5. The time now is 02:15 AM.