LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-21-2011, 05:02 AM   #1
mailbox-1691
LQ Newbie
 
Registered: Aug 2009
Location: Sweden
Posts: 13

Rep: Reputation: 0
data processing


Dear users,
I have a data like this
"x\x\xxxxxxxxxx\x\xxyyyyyyyyyxxxxxxxx\xxxyyyxxxxxxx\xx
xxx\A,(minus or plus)floating point,(minus or plus)floating point,(minus or plus)floatingpoint\B,... , .... , ..\C,.. , ...., ..\....\\@"

I need to extract like this for eg,

A -2.300 0. 5.03
B -2.300 2.34 0.00
...........
..........

simply the final results are xyz co-ordinates, Can any one tell me how to do it either in perl or sed or awk or any other program

Thanks a lot
 
Old 04-21-2011, 08:56 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
A picture, or in this case a sample, is worth a thousand words.

IOW, I do not fully understand the line format or your requirements as you've explained it. Could you please give us an actual representative sample of the text you're using, highlighting exactly you want to extract from it, and how you want the output to appear?

Are we talking multiple lines, or just one? How high do you expect the ABC lettering to go? Are there any variations in the text that might cause problems?

Give us some background so we understand your requirements.

Finally, please enclose everything in [code][/code] tags, to preserve formatting and to improve readability.

Last edited by David the H.; 04-21-2011 at 08:57 AM. Reason: minor wording change
 
Old 04-22-2011, 01:21 AM   #3
mailbox-1691
LQ Newbie
 
Registered: Aug 2009
Location: Sweden
Posts: 13

Original Poster
Rep: Reputation: 0
Exclamation

Okey, Thanks for the reply, I understood, I attach a copy of the text


It seems cryptic, but i do find a pattern and tried to match with following regular expression in vim, so that i could grep in sed or perl


Code:
/\W\w,\d.,\A\d\+.\d\+,\A\d\+.\d\+
but it matches very few presumably, i could not able to generalize for all so that result should be like
Quote:
C 0. 6.1325527512 0.6911442287
C 0. 4.9424684093 1.4312934211
..........
..........
H 0. 7.0464046059 -5.5938128729
I have many files like this to match three floating points for x,y,z with corresponding label for it, any help appreciated. Thanks.
Attached Files
File Type: txt sing.txt (3.1 KB, 18 views)

Last edited by mailbox-1691; 04-22-2011 at 01:25 AM. Reason: attachment
 
Old 04-22-2011, 06:58 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
THIS is a perfect example of why providing actual sample text is important.

Is the file really formatted like that, with newlines scattered at random and a space at the beginning of each line? Because that really causes headaches in processing. Having to work across lines and remove spaces means several times the work. Ugh!

Anyway, I think I have something for you. Instead of trying to directly match the desired strings with a regex, I went a slightly different route.
Code:
tr -d "[[:space:]]\n" <sing.txt | awk 'BEGIN{ RS="\\" } /^[[:upper:]],/ { gsub(","," "); print }'
The tr command at the beginning is there simply to clean up the file format. It removes all spaces and newlines, so that the whole file is turned into one single unbroken line.

This is piped into awk, which breaks it back up into one record per "\"-delimited field. Then if a record starts with an upper-case letter followed by a comma, it replaces the commas with spaces and prints it.

Note that gsub is only supported by gawk or nawk.)

The initial cleanup could certainly also be done by awk, but it's simpler this way, IMO.
 
Old 04-22-2011, 09:33 AM   #5
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
here's a Ruby command you can try

Code:
$ ruby -0777 -ne '$_.scan(/,(-?[0-9.]+),([0-9.]+)\\([A-Z])/).each{|x| print "#{x[-1]},#{x[1]},#{x[0]}\n" }' file
C,0.6911442287,6.1325527512
C,0.7165833967,3.7095474353
C,1.426622055,2.4652112046
C,0.71899088,1.2389133759
C,0.71899088,-1.2389133759
C,1.426622055,-2.4652112046
C,0.7165833967,-3.7095474353
C,0.6911442287,-6.1325527512
C,5.6966058882,2.4498989676
C,3.5617348632,1.2346328478
C,2.86197007,2.463592843
C,5.6891543825,0.
C,2.8528109717,0.
C,5.0091360672,-1.2279289318
C,5.0224928115,-3.6797065617
C,2.86197007,-2.463592843
C,5.048424537,-6.1065799124
C,3.6464412319,-6.1266798118
C,2.8941073542,-4.9440117491
C,5.048424537,6.1065799124
C,2.8941073542,4.9440117491
H,3.1510513472,-7.0902553374
H,6.8182586514,-4.887465462
H,6.7840514357,-2.443376658
H,6.7765480345,0.
H,6.7840514357,2.443376658
H,5.5938128729,7.0464046059
H,1.1981389566,7.0902192127
Q,0.,0.
 
1 members found this post helpful.
Old 04-22-2011, 10:43 AM   #6
mailbox-1691
LQ Newbie
 
Registered: Aug 2009
Location: Sweden
Posts: 13

Original Poster
Rep: Reputation: 0
David,
Thanks David, The newlines are random but they are at 71th character on each line, excluding the white space at the beginning. I checked with other similar files, it works. Atleast until now, it's for sure after delimitation(\) it would start with an upper case. Can you please explain me what would be the options in awk if i also have two letters(uppercase followed by a lower)
eg
Quote:
C 3.3508756125 1.4163640333 0.
Fe 2.30000 3.2496341 0.
Kurumi,
It misses one of the floating point, i suppose, it would not be difficult with minor modifications. Thanks
 
Old 04-22-2011, 11:45 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I've updated it so that the entire thing is done in awk. It was so stupidly simple I should've seen it before. I'd failed to realize earlier that with RS set to backslash only, newlines can be treated just like any other character. So all we need to do is add a second gsub command.

A ? in regex means "zero or one" of the previous character (or expression), so to match an optional lowercase letter simply expand it to this:
Code:
awk 'BEGIN{ RS="\\" } {gsub("[[:space:]]","") } /^[[:upper:]][[:lower:]]?,/ { gsub(","," "); print }' sing.txt
The nice thing with this is that the main regex only has to match a partial string for it to print. You simply need to be able to differentiate the wanted from the unwanted fields in the file.

And what I meant by "random" was that the file wraps in such a way that newlines or spaces can appear pretty much anywhere inside the actual data. That's a hard thing to deal with when you're trying to extract regular patterns.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pipelining data of find command in an array after processing the contents of the data AshishJogeshwar Linux - Software 5 06-10-2010 01:52 AM
Pipelining data of find command in an array after processing the contents of the data AshishJogeshwar Linux - Software 0 06-07-2010 06:15 AM
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 08:36 PM
processing data within files PirateJack Linux - Newbie 3 03-28-2006 10:32 AM
Data Processing joelhop Linux - General 8 01-01-2006 08:08 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration