LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-23-2021, 08:18 PM   #1
lucmove
Senior Member
 
Registered: Aug 2005
Location: Brazil
Distribution: Debian
Posts: 1,432

Rep: Reputation: 110Reputation: 110
How do I identify this character in a file?


I have a file that is a pair list with separators. The list was Ctrl + c copied from a Windows Registry key. I pasted the list into Notepad, saved the file with ANSI encoding and brought it to Linux.

I cannot identify the separator characters, but I have this:

Code:
$ od -c file.txt | head

0000000   a   l   t   e   r   a   c 343   o 001   a   l   t   e   r   a
0000020 347 343   o 002   a   l   t   e   r   a   c 365   e   s 001   a
0000040   l   t   e   r   a 347 365   e   s 002   b   r   a   z   i   l
0000060 001   B   r   a   z   i   l 002   c   a   t   a   l   g   o 001
0000100   c   a   t 341   l   o   g   o 002   c   i   r   c   u   n   t
0000120   a   n   c   i   a   s 001   c   i   r   c   u   n   s   t 342
0000140   n   c   i   a   s 002   c   l   e   n   t   e 001   c   l   i
0000160   e   n   t   e 002   c   o   n   f   i   g   u   r 347 343   o
0000200 001   c   o   n   f   i   g   u   r   a 347 343   o 002   c   o
0000220   n   f   i   g   u   r   a   c 343   o 001   c   o   n   f   i
The two separators are obviously 001 and 002. I'm sure of it because I can easily tell the words (in Portuguese) that they are separating.

Now I need to know what characters those separators are so I can produce them in a script and build a new list to be pasted back into the source.

My questions: what characters are those (give me the fish) and how does one find out exactly what characters they are in situations like this (teach me how to fish)?

TIA

Last edited by lucmove; 04-23-2021 at 08:19 PM.
 
Old 04-24-2021, 04:08 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,848

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
001 and 002 are not displayable chars (these are control chars). What is even worse: this is not a plain text file, but a binary.
You can obviously put these chars in a file, see for example here: https://www.unix.com/shell-programmi...cter-bash.html
 
Old 04-24-2021, 06:14 AM   #3
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Quote:
Originally Posted by lucmove View Post
The list was Ctrl + c copied from a Windows Registry key. I pasted the list into Notepad, saved the file with ANSI encoding and brought it to Linux.
I'm not sure this would preserve the original encoding intact. I'd rather reg save on Windows, then copy the file to Linux and work with it.

Quote:
Originally Posted by lucmove View Post
what characters are those
Code:
$ ascii 0o1 0o2
ASCII 0/1 is decimal 001, hex 01, octal 001, bits 00000001: called ^A, SOH
Official name: Start Of Header

ASCII 0/2 is decimal 002, hex 02, octal 002, bits 00000010: called ^B, STX
Official name: Start of Text

Last edited by shruggy; 04-24-2021 at 06:49 AM.
 
Old 04-24-2021, 02:52 PM   #4
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by shruggy View Post
Code:
$ ascii 0o1 0o2
ASCII 0/1 is decimal 001, hex 01, octal 001, bits 00000001: called ^A, SOH
Official name: Start Of Header

ASCII 0/2 is decimal 002, hex 02, octal 002, bits 00000010: called ^B, STX
Official name: Start of Text
!! Flashback to debugging data flows over dedicated modems by analyzing the bits...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Identify PCI and USB Wired and Wireless Driver in Linux – Identify PCI Driver. Ubuntu, Debian, LXer Syndicated Linux News 0 08-20-2014 07:21 AM
Find and Replace character/special character from the file MyRelam Red Hat 8 05-21-2012 12:52 AM
[SOLVED] Identify words having repeated character strings danielbmartin Programming 22 02-23-2012 11:51 AM
Bash scripting: parsing a text file character-by-character Completely Clueless Programming 13 08-12-2009 09:07 AM
How to read "identify" button press event, or state of "identify" blue led with IPMI? iav Linux - Server 0 01-27-2009 01:13 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration