using sed to parse dir output

kcorkran · 01-07-2008, 12:55 PM

Hello Linux Professionals:

I am trying to parse the output of a windows dir command so it looks like to the below 'After' statement. I just to remove the extra stuff even the recursive directories.

Before:
Volume in drive \\Scandocs_vs\scandocs is SCANDOCS
Volume Serial Number is C0A8-579C

Directory of \\Scandocs_vs\scandocs\archives_webfiles\arcmaps\pdfs

03/04/2004 12:39p <DIR> .
03/04/2004 12:39p <DIR> ..
03/19/2004 01:15p 24,364,073 10315.pdf

After:
03/19/2004 01:15p 24,364,073 10315.pdf

Any help appreciated!
Keith

cjcox · 01-07-2008, 01:09 PM

Remove the first 7 lines (?).

dir | sed '1,7d'

Just a stab in the dark...

kcorkran · 01-07-2008, 02:42 PM

I think that would work if I did not have to dir recursively. [dir /s]
I was thinking that if I could remove all lines that did not match '.pdf' in the string it would work.
-Keith

********************
03/19/2004 01:15p 24,364,073 10315.pdf (keep)

Directory of \\Scandocs_vs\scandocs\archives_webfiles\arcmaps\pdfs (discard)
********************

Poetics · 01-07-2008, 02:44 PM

Why don't you just use grep? You can search for ".pdf" and only include those lines that have a .pdf file on them (if so named). There are a variety of ways to do this, all equally valid, but I leave their discovery as an exercise for the reader.

kcorkran · 01-08-2008, 09:10 AM

grep did it.
I was using it like this: (cygwin by the way)
c:> grep -i '.pdf' dir_pdfs
But the result was not returning what I expected (directories were still listed) so I was luckily able to modify the statement to:
c:> grep -i '[0-9].pdf' dir_pdfs
and it returned the results I wanted. So problem solved!

As a matter of curiosity, do you know why it did not seem to apply the '.' in the string example.
Thanks,
Keith

colucix · 01-08-2008, 01:35 PM

The dot '.' as a special meaning in a regular expression: it matches any single character, not just the dot itself. When you use a dot or any other special character in the pattern, grep interprets it as a regular expression and you can obtain an unexpected result.

On the other hand, to match a dot literally you have to enclose it in square brackets, e.g

Code:

grep [.]pdf dir_pdfs

kcorkran · 01-08-2008, 11:32 PM

That explains it. Thanks much!

pixellany · 01-08-2008, 11:45 PM

Quote:

Originally Posted by colucix

The dot '.' as a special meaning in a regular expression: it matches any single character, not just the dot itself. When you use a dot or any other special character in the pattern, grep interprets it as a regular expression and you can obtain an unexpected result.

On the other hand, to match a dot literally you have to enclose it in square brackets, e.g

Code:

grep [.]pdf dir_pdfs

Every day I learn that you learn something new every day!!

I had learned that the "normal" way to change the meaning of certain characters was the "escape"---as in:
grep "\." filename
The square bracket I never saw before---does it also work in SED?

Yes...

ghostdog74 · 01-09-2008, 12:09 AM

Quote:

Originally Posted by pixellany

The square bracket I never saw before---does it also work in SED?

Yes...

its used in regexp to specify range or single character. eg [a-z] , [abc].
From wiki

Quote:

\[ \] A bracket expression. Matches a single character that is contained within the brackets. For example, \[abc\] matches "a", "b", or "c". \[a-z\] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: \[abcx-z\] matches "a", "b", "c", "x", "y", and "z", as does \[a-cx-z\].

The - character is treated as a literal character if it is the last or the first character within the brackets, or if it is escaped with a backslash: \[abc-\], \[-abc\], or \[a\-bc\].

If the sed you are using supports this syntax, then yes, it can be used in sed.

pixellany · 01-09-2008, 06:58 AM

Light goes on....
I knew bracket expressions, but had never considered that a "special" character would cease to be special inside one. The books typically don't talk about the use of brackets in lieu of escaping---but it obviously works.

So, is there a way to pass in as a variable the string to go inside [ ]?

colucix · 01-09-2008, 07:49 AM

Quote:

Originally Posted by pixellany

The books typically don't talk about the use of brackets in lieu of escaping---but it obviously works.

Yes... not really used as an escape, but as a way to match single characters, as ghostdog reported. Anyway, very useful for "escaping" in some cases!

Quote:

So, is there a way to pass in as a variable the string to go inside [ ]?

I think this can be done in the common way. For example consider a text file with these two lines

Code:

$ cat testfile
line with a dot . inside
line with a dot at the end.

You can do

Code:

$ my_var=.$
$ grep [$my_var] testfile
line with a dot . inside
line with a dot at the end.

whereas if you want to retain the special meaning of $ you have to add it outside the brackets.

Code:

$ grep [$my_var]$ testfile
line with a dot at the end.

Cheers!

pixellany · 01-10-2008, 07:37 AM

OK---special meaning as you use it means "at the end of the line". But, inside the [ ], the "$" clearly has its more general special meaning--i.e. "the value of". so you would have to use [\$] to look for a literal "$".

What other characters are special by default inside of [ ]? e.g. "r[^ab]" means "r, not followed by a or b".

colucix · 01-10-2008, 01:17 PM

CORRECT. Except when you put $ at the end of the character list, that is if it's not followed by any other character it cannot expand any variable. How many nuances the shell has!!