LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-23-2018, 03:07 PM   #1
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Rep: Reputation: 51
Question regex (for grep): a dot inside a bracket expression, and a few more questions


Hello (:

Although I am familiar with fairly complex regexes, I have a little doubt now: if I use a dot inside a bracket expression, may it mean "most chars", as it does in normal places?

Reading '$ man grep # GNU grep 2.27', there is no occurrence of the word dot! A bit strange. But there is a section "Character Classes and Bracket Expressions", which mentions [:alpha:] and others, and also says that "Most meta-characters lose their special meaning inside bracket expressions". I wanted to know which of them does and does not! I would say to other people look in the man page for that...

Further, I want to match everything but a closing bracket, *including* newlines! How to do that in file that possibly includes binary data before that "]"?

One of my first good tries is:

Code:
$ grep 'C\[[PA][av][^]]\+\]' file # does *not* grep newlines
Please point: I am unsure if I need to use '-a' with this kind of file, that may contain binary data (or chars in any encoding) mixed with the brackets and ASCII chars used (coarsely saying here) to make a set of it.
 
Old 03-23-2018, 03:14 PM   #2
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
Grep has a switch to treat as a binary file. If you want to give special characters their special meaning inside brackets, escape them with '\'. I think newline is '\n'.
 
1 members found this post helpful.
Old 03-23-2018, 03:28 PM   #3
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Thumbs down

Quote:
Originally Posted by AwesomeMachine View Post
Grep has a switch to treat as a binary file.
For the kind of file I said, a text file with mixed chars in any encoding, should I use -a?

Quote:
Originally Posted by AwesomeMachine View Post
If you want to give special characters their special meaning inside brackets, escape them with '\'. I think newline is '\n'.
My question was:

Quote:
Further, I want to match everything but a closing bracket, *including* newlines! How to do that in file that possibly includes binary data before that "]"?
Do not miss that the "everything" I wrote includes chars in any encoding! So writing all of them is not an option. My try for that regex part is:

Code:
[^]]
But it does *not* grep this wanted part of my files:

Code:
C[Palkajsdalsk
laskjdasld
dlaksjdsaldjas]
If any of you prefer to discard the try I showed and give me a completely new one, there is no problem with that.

Last edited by dedec0; 03-23-2018 at 03:40 PM.
 
Old 03-23-2018, 03:33 PM   #4
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Arrow Should I use -a?

From what I said in #1, and from

Code:
$ man grep # GNU grep 2.27
where we read

Code:
       -a, --text
              Process a binary file as if it were text;
              this is equivalent  to
              the --binary-files=text option.
should I use -a for this task?
 
Old 03-23-2018, 03:48 PM   #5
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Just things I am trying... the first 2 posts are more important

'-a' does not change the result for one file, but should I use it?

Code:
$  grep -a 'C\[[PA][av][^]]\+\]' file | md5sum
9621b629612e01686a6a6af3564a62e7  -

$  grep 'C\[[PA][av][^]]\+\]' file | md5sum
9621b629612e01686a6a6af3564a62e7  -
 
Old 03-23-2018, 03:53 PM   #6
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
For grepping multiple lines, maybe try
-P (perlre)
-z (lines terminated with NULL instead of newline)
-o (print only matched)
Code:
grep -Poz 'C\[Pa[^]]+' file
 
1 members found this post helpful.
Old 03-23-2018, 04:51 PM   #7
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by keefaz View Post
For grepping multiple lines, maybe try
-P (perlre)
-z (lines terminated with NULL instead of newline)
-o (print only matched)
Code:
grep -Poz 'C\[Pa[^]]+' file
It is not absolutely clear to me, please explain:

- Why '-z' option is there? Due the "any encoding" chars?

- Why '-o' option is (or might be) relevant for what I need. Maybe I did not imagine that possibility for my problem files.

From the last things I tried, and what you suggest, I did:

Quote:
grep -Poz 'C\[[PA][av][[:print:]\n]\+\]' file
That did not work. Empty result! Maybe I need to fix that regex as a Perl expression. May you help me with it?

One more try:

Code:
grep -oz 'C\[[PA][av][[:print:]\n]\+\]' f
This command has an idea error: '[[:print:]\n]\+' grows too much, passing the closing bracket that ends the big expression, going to the last one in the file (I think, but at least more than should, as I have seen in my test).

A simple question about this first try: does it consider only chars in the locale where I run the command? May it fail in valid but "bad for that" files?
 
Old 03-23-2018, 05:34 PM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
Quote:
Originally Posted by dedec0 View Post
Reading '$ man grep # GNU grep 2.27', there is no occurrence of the word dot! A bit strange. But there is a section "Character Classes and Bracket Expressions", which mentions [:alpha:] and others, and also says that "Most meta-characters lose their special meaning inside bracket expressions". I wanted to know which of them does and does not! I would say to other people look in the man page for that...
The only meta-characters which keep their special meaning are those mentioned in the three sentences that follow in that paragraph:
"To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last."
All other characters, and that includes backslash and ".", are taken literally.
 
1 members found this post helpful.
Old 03-23-2018, 05:39 PM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
This has wandered around, but to answer the main question, you cannot grep for \n.
Being a stream, the newline is stripped prior to you getting the record(s). You can imply the location of the newline, and reinsert it in need (common with sed). perl has the option to retain the newline in input, effectively slurping the entire file as a single record.
 
Old 03-23-2018, 06:03 PM   #10
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
Quote:
Originally Posted by syg00 View Post
This has wandered around, but to answer the main question, you cannot grep for \n.
Being a stream, the newline is stripped prior to you getting the record(s).
You can do it with the "-z" option:
"-z, --null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline."
There doesn't seem to be any way to include a literal newline within a bracket expression, but a "." (outside of a bracket expression) or a literal newline (in a single-quoted string to get it past the shell) will match a newline in the input.

Trying to test that is really hard.
 
Old 03-23-2018, 06:25 PM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Especially with binary - possibility of lots of nulls.
 
1 members found this post helpful.
Old 03-23-2018, 08:02 PM   #12
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
Grep doesn't work with every possible collection of bytes. It only works with certain characters and nonprinting characters. If the file contains binary data you can convert it to text, but the result would be meaningless. If you want to find everything grep will find besides ']', then you would
Code:
$ grep -v '\]' file
 
Old 03-23-2018, 08:38 PM   #13
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by AwesomeMachine View Post
Grep doesn't work with every possible collection of bytes. It only works with certain characters and nonprinting characters. If the file contains binary data you can convert it to text, but the result would be meaningless. If you want to find everything grep will find besides ']', then you would
Code:
$ grep -v '\]' file
"All possible bytes" may be something hard. The files are SGF files for weiqi games. But the move comments inside them (due how they are generated) may have any encoding. I do not have the clueslest guess of which bytes this situation discards or not.

Finding with -v (which inverts match)... nice idea. But I cannot use only that. I need to find the start for the specific ']' found (which is the "C[" start from my tries' longest regexes). Two piped greps? I will think about this idea for sometime...
 
Old 03-23-2018, 08:43 PM   #14
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,372

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by syg00 View Post
Especially with binary - possibility of lots of nulls.
I know in normal binary files, lots of zero bytes are easily found. But that may not be the case, I think. The files are SGF, as I said (with more details) in the previous post.
 
Old 03-23-2018, 09:01 PM   #15
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
OK, I looked at the SGF format. There is no binary data used in them. There are nonprinting characters, which are not the same as binary data. I suggest you just tell the community exactly what you want to do, post part of the file inside quote tags, and let everyone look at what you have. I'm rest assured grep is not the tool to use for your project.
 
  


Reply

Tags
grep regex gnu



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Grep regex: bracket fanoflq Linux - Newbie 6 03-06-2017 01:03 AM
Need Regular Expression (regex) help [Java] dwhitney67 Programming 4 08-24-2013 02:24 AM
[SOLVED] sed edit the line above a regex expression corcodelagaze Programming 9 02-25-2012 08:52 AM
regex expression ksmatthews Linux - Software 5 01-11-2012 10:43 AM
[SOLVED] grep Bracket Expressions Star_Gazer Linux - Newbie 2 04-10-2010 08:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration