LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-16-2017, 02:32 PM   #1
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Rep: Reputation: 42
Question How to grep specific byte values?


I have a file in iso-latin1 encoding (no doubt with that!) and a shell script that needs to grep lines from such a file.

I have read about grep --binary. I have also read about doing grep "\x##". But it does not work for me. Check the test I have built now and a few comments I make here about some steps.

Below are commands and their output copied from my terminal. Lines starting with $ are commands, others are output:

Code:
$ # an iso-latin1 file with a few lines and two chars in this encoding:
$ #  (0xf3) and  (0xfa)
$ cat arq

<td align=left nowrap><a href="...">
aaaa grep this line
</a>
s do not grep this line
nmero neither this
<br>


$ # the file *is* in iso-latin1 encoding. There is no doubt!
$ # hd and hexdump are the same file, just different output
$ hd arq  # it shows an hexdump with a bit better default than command 'hexdump'

00000000  3c 74 64 20 61 6c 69 67  6e 3d 6c 65 66 74 20 6e  |<td align=left n|
00000010  6f 77 72 61 70 3e 3c 61  20 68 72 65 66 3d 22 2e  |owrap><a href=".|
00000020  2e 2e 22 3e 0a 61 61 f3  61 61 20 67 72 65 70 20  |..">.aa.aa grep |
00000030  74 68 69 73 20 6c 69 6e  65 0a 3c 2f 61 3e 0a 73  |this line.</a>.s|
00000040  f3 20 64 6f 20 6e 6f 74  20 67 72 65 70 20 74 68  |. do not grep th|
00000050  69 73 20 6c 69 6e 65 0a  6e fa 6d 65 72 6f 20 6e  |is line.n.mero n|
00000060  65 69 74 68 65 72 20 74  68 69 73 0a 3c 62 72 3e  |either this.<br>|
00000070  0a                                                |.|
00000071


$ # The terminal I am using is UTF-8, no doubt on that too.

$ # This should output the wanted line with string "aaaa"
$ cat arq |grep --binary "aa\xf3aa"
$ # No output! I need to type "" in iso-latin1 encoding! Grep doesn't have it? Just with Perl regex?? :-/

$ cat arq |grep --binary "aaaa" # do not work because terminal is UTF-8? Seems so.
$

$ # The "" is not correctly shown for this command because it spits an iso-latin1 char in UTF-8 term
$ cat arq |grep --binary -P "aa\xf3aa"

aa�aa grep this line

$
I have seen a question in stackoverflow about this, but the checked answer uses --binary together with --text, which seems nonsense to me.

The \x notation is not present in grep regexes? The -P/--perl-regexp is also used... if a regex can be written without the -P flag, it solves my problem here!

No way but -P?? I am disappointed, if that is true.

Last edited by dedec0; 04-17-2017 at 07:58 AM.
 
Old 04-16-2017, 03:17 PM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 17,237
Blog Entries: 10

Rep: Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160
Quote:
Originally Posted by dedec0 View Post
I have seen a question in stackoverflow about this, but the checked answer uses --binary together with --text, which seems nonsense to me.
i see
Code:
grep -obUaP
which seems to be the key to this - whatever it means.
if i were you i'd start with
Code:
man grep
 
Old 04-16-2017, 03:56 PM   #3
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
Thumbs down

Quote:
Originally Posted by ondoho View Post
i see
Code:
grep -obUaP
which seems to be the key to this - whatever it means.
if i were you i'd start with
Code:
man grep
:-/ You assumed that I did not try anything before making this thread, which is wrong.

Option by option:

Code:
grep -o  # Return only matching part. This makes no difference for my test, removed
grep -b  # same as --byte-offset (show byte offset of match before it); also removed
grep -U  # same as --binary, but much less readable
grep -a  # same as --text - nonsense to me, since -U is given
grep -P  # as said above, same as --perl-regexp

# What are we left with? My question and my test, right?

Last edited by dedec0; 04-17-2017 at 08:51 AM.
 
Old 04-16-2017, 04:43 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 23,806

Rep: Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973Reputation: 6973
Quote:
Originally Posted by dedec0 View Post
:-/ You assumed that I did not try anything before making this thread, which is wrong.
We cannot know what you've done/tried, unless you actually tell us what you did. You only said "Check the test I have built"...but don't tell us what that test is, only hint at it. The sample you provided doesn't show the character (at least not on my screen), that you say it should look for, and the grep you posted doesn't have any of the options...
Quote:
Option by option:
Code:
grep -o  # Return only matching part. This makes not difference for my test, removed
grep -b  # same as --byte-offset (show byte offset of match before it); also removed
grep -U  # same as --binary, but much less readable
grep -a  # same as --text - nonsense to me, since -U is given
grep -P  # as said above, same as --perl-regexp
What are we left with? My question and my test, right?
...you posted here. Read the "Question Guidelines" link in my posting signature. Please post a valid sample of the file you're looking at (please, not a copy/paste of a command-output...the actual line(s) from the file), along with what you're typing in to look for it.
 
1 members found this post helpful.
Old 04-16-2017, 05:12 PM   #5
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916Reputation: 2916
One solution is:

Code:
xxd -p arq | grep 'f3'
If you want simply to know if the character is present, use grep's -c option.
 
Old 04-16-2017, 05:53 PM   #6
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
Unhappy

Quote:
Originally Posted by TB0ne View Post
We cannot know what you've done/tried, unless you actually tell us what you did. You only said "Check the test I have built"...but don't tell us what that test is, only hint at it. The sample you provided doesn't show the character (at least not on my screen), that you say it should look for, and the grep you posted doesn't have any of the options...

...you posted here. Read the "Question Guidelines" link in my posting signature. Please post a valid sample of the file you're looking at (please, not a copy/paste of a command-output...the actual line(s) from the file), along with what you're typing in to look for it.
I am sorry, there is a misunderstanding here. I fully showed the test, it is just a few commands. I forgot to mention that the code I put on the first post was several lines copied from my terminal. This was "naturally obvious" to me, I had just used [codes] for terminal lines in a few posts I wrote just before.

I have edited my first post to fix this detail and added a few more comments about the test. Is it better now?
 
Old 04-16-2017, 06:17 PM   #7
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
Arrow

Quote:
Originally Posted by hydrurga View Post
One solution is:

Code:
xxd -p arq | grep 'f3'
If you want simply to know if the character is present, use grep's -c option.
Ermmm... xxd? :-/ It can do the reverse way, but I need to have the full line where that "" appears (and a few more chars, it is a string in iso-latin1). Is that possible? I cannot do that with xxd. Can you?

The full story is:

- I download an HTML page which is encoded in iso-latin1

- I want to process and get some information in a few lines of it

- I tried to use the magic . and the apparently working [^o] (that should be anything but the normal "o")... both did not work like I imagined (or wanted).

My basic pipe commands in this step of the whole script are:

Code:
cat $file |
grep "$stringWithIsoLatin1Chars" |
grep -o -e "$somethingWithRegex" |
sed -r 's/ [somethingMore] /\1M/g'
I assign this pipe to a variable (backquotes) and process it. The difficulty is the latin1 char for the first grep. Isn't there another way but the "Perl highly experimental regex" of "\x##"? No normal regex for that in grep? The manpage says

Quote:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly
experimental and grep -P may warn of unimplemented features.
which made me think of something else existing. But I could not find that.

Last edited by dedec0; 04-16-2017 at 06:20 PM.
 
Old 04-16-2017, 06:21 PM   #8
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,616
Blog Entries: 11

Rep: Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655
It is really not clear what is in your file, or what you expect to get in your grep results.

Here is my best guess, but let us know if this does not lead to the desired result.

First, what is actually in your file? You have posted a hexdump, but it is not clear whether you want to grep the file, or the hexdump of the file. (Did you hexdump this on a Windows machine or Linux? "hd" is not a utility on my Linux system.)

Reversing the hexdump, I would guess the file contents look like this...

Code:
  <td align=left nowrap><a href="...">
  aaaa grep this
  </a>
  s grep this too
  nmero
  <br>
Now, if you want to grep this file, grep can handle the unicode itself, so something like this will work"

Code:
grep '[]' example.bin
  aaaa grep this
  s grep this too
  nmero
If you actually want to grep the hexdump of that file you need to make sure that the character encodings as stored on disk are as expected (system dependent!). On my system, the two characters are identified as f3 and fa digraphs in Vim as expected, but the binary unicode representation on disk (hex and byte order) appear as follows:

Code:
0000000: 20 20 3c 74 64 20 61 6c 69 67 6e 3d 6c 65 66 74    <td align=left
0000010: 20 6e 6f 77 72 61 70 3e 3c 61 20 68 72 65 66 3d   nowrap><a href=
0000020: 22 2e 2e 2e 22 3e 0a 20 20 61 61 c3 b3 61 61 20  "...">.  aa..aa
0000030: 67 72 65 70 20 74 68 69 73 0a 20 20 3c 2f 61 3e  grep this.  </a>
0000040: 0a 20 20 73 c3 b3 20 67 72 65 70 20 74 68 69 73  .  s.. grep this
0000050: 20 74 6f 6f 0a 20 20 6e c3 ba 6d 65 72 6f 0a 20   too.  n..mero.
0000060: 20 3c 62 72 3e 0a                                 <br>.
The unicode representation will always be multi-byte and the two high bits of the (system-dependent ordering) "first" byte of a two-byte character (which these are) will always be set, or 'c', hence c3b3 and c3ba.

So, to grep the hex output itself I would do something like...

Code:
xxd -g 1 example.bin |grep 'c3'
0000020: 22 2e 2e 2e 22 3e 0a 20 20 61 61 c3 b3 61 61 20  "...">.  aa..aa
0000040: 0a 20 20 73 c3 b3 20 67 72 65 70 20 74 68 69 73  .  s.. grep this
0000050: 20 74 6f 6f 0a 20 20 6e c3 ba 6d 65 72 6f 0a 20   too.  n..mero.
Now, it has been a while since I worked fully through the twists of Unicode character storage, so take the above as direction, maybe not destination.

But using the above as guide, see if you can describe for us more completely what you are trying to do, the contents of the file you are working on, on what platform (important) and what you expect as a result.

Hope this helps!
 
Old 04-16-2017, 06:41 PM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,700

Rep: Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548
The perl regex support in grep is fine for your needs. All you need is
Code:
cat $arq |grep -aP "\xf3"
 
1 members found this post helpful.
Old 04-16-2017, 11:28 PM   #10
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,616
Blog Entries: 11

Rep: Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655Reputation: 3655
You updated your info as was typing my earlier post, so a quick catch-up...

My concerns with Unicode were that you stated your terminal was UTF-8 and I was not confident that the hexdump was actually the data you were trying to grep. The 'hd' utility for hexdump appears to be for Windows, RedHat up to 7.3 and old SCO machines, so I was concerned that it might be translating between iso-latin1 and/or Windows codepages and/or UTF-8, and/or... just a confidence thing. But I think not a cncern now.

You are now very clear (I think) that the file is in fact iso-latin1, and that what you want is the actual text lines, not the hex dumped lines.

As syg00 says, grep -P is up to the task. I don't think the binary flag is necessary, but use -a if -P alone doesn't work on your system. (By the way, what is the actual system you are doing this on?).

Here is my own working test case. The file data in iso-latin1 text (with UTF-8 substitutions for this post) and the hexdump are...

Code:
cat iso-latin1.bin
<td align=left nowrap><a href="...">
  aaaa grep this
  </a>
  s grep this too
  nmero
  <br>

xxd -g 1 iso-latin1.bin
0000000: 20 20 3c 74 64 20 61 6c 69 67 6e 3d 6c 65 66 74    <td align=left
0000010: 20 6e 6f 77 72 61 70 3e 3c 61 20 68 72 65 66 3d   nowrap><a href=
0000020: 22 2e 2e 2e 22 3e 0a 20 20 61 61 f3 61 61 20 67  "...">.  aa.aa g
0000030: 72 65 70 20 74 68 69 73 0a 20 20 3c 2f 61 3e 0a  rep this.  </a>.
0000040: 20 20 73 f3 20 67 72 65 70 20 74 68 69 73 20 74    s. grep this t
0000050: 6f 6f 0a 20 20 6e fa 6d 65 72 6f 0a 20 20 3c 62  oo.  n.mero.  <b
0000060: 72 3e 0a                                         r>.
And the grep (without UTF-8 substitutions):

Code:
grep -P '\xf3|\xfa' iso-latin1.bin
  aaa grep this
  sgrep this too
  nmero
On my UTF-8 locale it appears the characters are changed, as it may on yours, but they are fine (cat used only to match your case)...

Code:
cat iso-latin1.bin |grep -P '\xf3|\xfa' >iso-latin1.out

xxd -g 1 iso-latin1.out
0000000: 20 20 61 61 f3 61 61 20 67 72 65 70 20 74 68 69    aa.aa grep thi
0000010: 73 0a 20 20 73 f3 20 67 72 65 70 20 74 68 69 73  s.  s. grep this
0000020: 20 74 6f 6f 0a 20 20 6e fa 6d 65 72 6f 0a         too.  n.mero.
Send that down your pipeline and you should be OK.

The only additional caution I would offer is that if you open the file in Vi on a UTF-8 enabled machine, it will want to automatically convert those characters to Unicode by default. To prevent that, open with the -b option. Other utilities in the pipeline "may" want to make them Unicode on a UTF-8 locale, as indicated by your UTF-8 terminal comment.
 
Old 04-17-2017, 03:33 AM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,700

Rep: Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548Reputation: 3548
Quote:
Originally Posted by astrogeek View Post
I don't think the binary flag is necessary
Yeah, my bad. I was testing using a binary for convenience.
 
Old 04-17-2017, 08:46 AM   #12
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
Arrow

There have been three replies since my last post. I will try to make all relevant or needed comments in this one, now. Reading these replies in the order they appear.

Quote:
astrogeek: It is really not clear what is in your file, or what you expect to get in your grep results.
I imagined that 'hd' is something common. 'man hd' shows me a page that starts with
Quote:
HEXDUMP(1) BSD General Commands Manual HEXDUMP(1)
NAME
hexdump, hd ASCII, decimal, hexadecimal, octal dump
I have known and used it for years. My OS now is an old Ubuntu 10.04. I have not installed hd, I guess it exists in all "debianesques" distributions.

Curious to remember (or to find) why I prefer to use hd, I tested both: the output is different! Check it here. I have done these commands:

Code:
$ whereis hd hexdump

hd: /usr/bin/hd /usr/share/man/man4/hd.4.gz /usr/share/man/man1/hd.1.gz
hexdump: /usr/bin/hexdump /usr/share/man/man1/hexdump.1.gz

$ md5sum /usr/bin/hd /usr/bin/hexdump 

479b4fffba22958dd25f15d42fd50774  /usr/bin/hd
479b4fffba22958dd25f15d42fd50774  /usr/bin/hexdump

$ hd arq 

00000000  3c 74 64 20 61 6c 69 67  6e 3d 6c 65 66 74 20 6e  |<td align=left n|
00000010  6f 77 72 61 70 3e 3c 61  20 68 72 65 66 3d 22 2e  |owrap><a href=".|
00000020  2e 2e 22 3e 0a 61 61 f3  61 61 20 67 72 65 70 20  |..">.aa.aa grep |
00000030  74 68 69 73 20 6c 69 6e  65 0a 3c 2f 61 3e 0a 73  |this line.</a>.s|
00000040  f3 20 64 6f 20 6e 6f 74  20 67 72 65 70 20 74 68  |. do not grep th|
00000050  69 73 20 6c 69 6e 65 0a  6e fa 6d 65 72 6f 20 6e  |is line.n.mero n|
00000060  65 69 74 68 65 72 20 74  68 69 73 0a 3c 62 72 3e  |either this.<br>|
00000070  0a                                                |.|
00000071

$ hexdump arq
 
0000000 743c 2064 6c61 6769 3d6e 656c 7466 6e20
0000010 776f 6172 3e70 613c 6820 6572 3d66 2e22
0000020 2e2e 3e22 610a f361 6161 6720 6572 2070
0000030 6874 7369 6c20 6e69 0a65 2f3c 3e61 730a
0000040 20f3 6f64 6e20 746f 6720 6572 2070 6874
0000050 7369 6c20 6e69 0a65 fa6e 656d 6f72 6e20
0000060 6965 6874 7265 7420 6968 0a73 623c 3e72
0000070 000a                                   
0000071

$
Now I added a comment about 'hexdump' X 'hd' in the first post. The difference in them is their default output, but some people did not have hd in their system, which is a surprise for me. It existed in all linuxes I have tried to use it (since many years ago!). I have also added in the first post the normal file dump (But be careful! Copying it to save in a test file may lead to different byte values for the special chars).

No way but -P and "\x##" ?? I am disappointed, if that is true.

Quote:
Originally Posted by astrogeek View Post
Now, if you want to grep this file, grep can handle the unicode itself, so something like this will work"

Code:
grep '[]' example.bin
  aaaa grep this
  s grep this too
  nmero
No, this is wrong. It does *not* work unless the terminal has the same encoding of the file. I have added another grep command to show this in the first post.

@astrogeek: you grep'ed from the hexdump just the byte for "" in iso-latin1. I need to grep a full string, and I want the whole line where that string is to the next command on the pipeline. I cannot imagine a way to do this with hd/hexdump/xxd. You ask me to show a more complete situation. I answer to you: it is not necessary. The complete file would be pointless big to put here. The only problem I have with that script is: terminal in UTF-8 (my choice); file in iso-latin1 (server choice); shell script written by me, being created with test commands in the terminal. And maybe the problem can be reduced to how to use (a form of) "\x##" without the -P grep flag.


Quote:
Originally Posted by syg00 View Post
The perl regex support in grep is fine for your needs. All you need is
Code:
cat $arq |grep -aP "\xf3"
Is it the *only* way?

-------------------------------------

@astrogeek for the #10 post: Does not hurt to repeat yet again: terminal in UTF-8, file in iso-latin1. You also did the test of:

Quote:
$ cat iso-latin1.bin |grep -P '\xf3|\xfa' > iso-latin1.out
This test works because cat does *not* change any byte before putting it to the pipe! See:

Code:
$ cat iso-latin1.file |grep -P '\xf3|\xfa' > out

$ hd out # or hexdump! The important here are the "" and "" latin1 bytes

00000000  61 61 f3 61 61 20 67 72  65 70 20 74 68 69 73 20  |aa.aa grep this |
00000010  6c 69 6e 65 0a 73 f3 20  64 6f 20 6e 6f 74 20 67  |line.s. do not g|
00000020  72 65 70 20 74 68 69 73  20 6c 69 6e 65 0a 6e fa  |rep this line.n.|
00000030  6d 65 72 6f 20 6e 65 69  74 68 65 72 20 74 68 69  |mero neither thi|
00000040  73 0a                                             |s.|
00000042


$ # Remember: term UTF-8, file in iso-latin1! This justifies the "?" chars here instead of [] :
$ cat out

aa�aa grep this line
s� do not grep this line
n�mero neither this
At last, to the #10 post post of astrogeek, I am used to editing files in different encodings, different linebreaks (Win, *nix, ...) and probably different byte orders (for multibyte chars in Unicode encodings). Vim (not vi!) is my favorite editor, and it deals gracefully with such situations. (:

To end this post, yet another repetition of the main detail of my problem, as it looks now:

Is there a way to pass byte values to grep without that experimental -P flag? There are 3 other matchers to choose! :-/
 
Old 04-17-2017, 10:34 AM   #13
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,487

Rep: Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530Reputation: 5530
Quote:
No, this is wrong. It does *not* work unless the terminal has the same encoding of the file. I have added another grep command to show this in the first post.
based on man grep:
Code:
The locale for category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order.
The first of these variables that is set specifies the locale.
For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian Portuguese locale is used for the LC_MESSAGES category.
The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support ( NLS ).
So this is the way to specify locale, nothing else. So I do not really understand why do you really need another solution. Furthermore:

http://stackoverflow.com/questions/3...iles-and-utf16

And finally do not use
Code:
cat file | grep pattern
# but:
grep pattern file
if you know you need a special local, you can use:
Code:
LC_ALL=my_locale grep pattern file
 
Old 04-17-2017, 01:26 PM   #14
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
About the previous post, please do not put copied man pages in code tags. They will not break the lines and make them hardly readable. Please edit it?

Use a font like "Courier New", instead:

The locale for category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order.
The first of these variables that is set specifies the locale.
For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian Portuguese locale is used for the LC_MESSAGES category.
The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support ( NLS ).


Now we can read it without any horizontal scrolling. (:

Last edited by dedec0; 04-17-2017 at 01:46 PM.
 
Old 04-17-2017, 01:40 PM   #15
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,129

Original Poster
Rep: Reputation: 42
Question

Quote:
Originally Posted by pan64 View Post
based on man grep:
Code:
The locale for category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order.
The first of these variables that is set specifies the locale.
For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian Portuguese locale is used for the LC_MESSAGES category.
The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support ( NLS ).
So this is the way to specify locale, nothing else. So I do not really understand why do you really need another solution. Furthermore:

http://stackoverflow.com/questions/3...iles-and-utf16

And finally do not use
Code:
cat file | grep pattern
# but:
grep pattern file
if you know you need a special local, you can use:
Code:
LC_ALL=my_locale grep pattern file
I do not understand why you are talking about setting locale and the related variables. For grep, LC_MESSAGES wouldn't affect only the messages it prints? I think it is not related to the encoding of files and how they are interpreted by grep (and most other programs) or put on its standard output.

I am using an UTF-8 terminal.

I have only the LC_CTYPE=C variable. My $LANG = pt_BR.utf8.

And I need to grep iso-latin1 chars from an iso-latin1 encoded file.

And I want to avoid using -P switch, if that is possible.

And you meant "LC_ALL=my_locale; grep pattern file" instead of "LC_ALL=my_locale grep pattern file" ? But this is out of my problem, I think.

Last edited by dedec0; 04-17-2017 at 01:41 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating csv file with specific values in specific fields rap Programming 1 08-03-2016 04:40 AM
[SOLVED] Searching for a specific hex string or byte array in various images plasma33 Linux - Software 24 05-08-2016 06:57 PM
[SOLVED] most effecient comparisoon of specific bit length of two 4 byte value in c? kikilinux Programming 11 10-05-2014 05:11 PM
how to grep for only the values of a specific field hchoonbeng Linux - Newbie 3 11-19-2008 08:20 AM
How to access a specific byte in a void * ? rvca Programming 8 03-11-2008 12:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration