LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Help using awk,sed and grep (https://www.linuxquestions.org/questions/programming-9/help-using-awk-sed-and-grep-818444/)

shakes82 07-07-2010 03:50 AM

Help using awk,sed and grep
 
Hi. I have some financial data in the the following format:

20090302 18:02:03 1.5 1.6

I want to change this to the following format using grep awk and sed:

20090302180203,1.5,1.6,SYM

Please suggest how I can use the commands to get this formatting.

Thank you.

druuna 07-07-2010 03:56 AM

Hi,

Something like this?

awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile

Hope this helps.

shakes82 07-07-2010 04:10 AM

Thanks Druuna but it didn't work. I am getting the following output with your suggestion:

,,,sym02 180203 1.5 1.6

Any other suggestions? Thanks for your time.

colucix 07-07-2010 04:26 AM

Quote:

Originally Posted by shakes82 (Post 4025990)
,,,sym02 180203 1.5 1.6

Hmm.. this means that the input you've used is not what you've shown in the original post, since there is no way for the string "sym02" to appear from the command suggested by druuna (if copied exactly). Also which version of awk are you running and which Linux/Unix release?

Anyway, I suspect the fields in the input file are not separated by blank spaces. Maybe tabs?

druuna 07-07-2010 04:26 AM

Hi,

It works on my side.
Code:

$ cat infile
20090302 18:02:03 1.5 1.6

$ awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile
20090302180203,1.5,1.6,SYM

I just noticed I'm using mawk.

Tried it with awk: Works as well.

Code:

awk --version
GNU Awk 3.1.6

Which distro and which awk version are you using?

shakes82 07-07-2010 04:52 AM

@colucix: The fields are separated by a space. I think I am getting sym02 because it is taking 02 from 20090302 and adding sym before.
I am using fedora 13. awk version: 3.1.7

@druuna: I am using fedora 13. awk version: 3.1.7
I tried it again exactly as:
awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile
with the same spacing and everything but I am getting the following output:
,sym0302180203,1.5,1.6
I think what it is adding ,sym infront and that is why 2009 is replaced by ,sym

Can you please suggest what I can do to make it right?

Thank you

druuna 07-07-2010 04:57 AM

Hi,

Could you post a relevant example? Like colucix already said it looks like your input file is not the same as your example posted in post #1.

The problem is not the awk command I posted with the example posted by you (shown by me and confirmed by colucix).

I can come up with one thing that could be causing this: Is the infile a unix or a dos file?

shakes82 07-07-2010 05:00 AM

Following are a few lines of the data:

20090102 18:03:03 1.280550 1.281550
20090102 18:23:20 1.280570 1.281570
20090102 18:23:24 1.280270 1.281270
20090102 18:53:53 1.279970 1.280970
20090102 18:54:10 1.279810 1.280810

It is a *.txt (text) file.

druuna 07-07-2010 05:04 AM

Hi,

Unix/linux works different then windows. The fact that the file has a .txt extension doesn't say anything at all.

What does the following command show you: file infile.txt

shakes82 07-07-2010 05:06 AM

It shows me:
fx.txt: ASCII text, with CRLF line terminators

Thanks

druuna 07-07-2010 05:15 AM

Hi,

That is a file with dos/windows terminators (CRLF) and it is also the reason why it doesn't work.

Most (all?) unix/linux tools do not work too well with dos/windows files.

Here a link that gives a few examples of how to change a dos file to a unix file (do make a backup of the original before trying them out!!).

HowTo: UNIX / Linux Convert DOS Newlines CR-LF to Unix/Linux Format

Hope this helps.

shakes82 07-07-2010 05:16 AM

Thanks Druuna. I will try it and let you know how it goes. Thanks again.

druuna 07-07-2010 05:18 AM

You're welcome :)

shakes82 07-07-2010 05:19 AM

Thanks alot Druuna. It worked perfectly. I appreciate your help. Thanks for your time.

shakes82 07-07-2010 05:31 AM

Since I am new with unix, can you please suggest how I can save the changes to the file after using the command. Thanks

druuna 07-07-2010 05:40 AM

Hi,

Assuming you mean the awk command:

awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt

This leaves the original (infile.txt) as is and puts all changed entries in newfile.txt.

If you do need the output to be in the original file (after checking if all is ok with the above given command): mv newfile.txt infile.txt

BTW: Do not use the same name for the output and input file (i.e. awk '{ ... }' infile > infile), you will end up with an empty file!!

Hope this helps.

grail 07-07-2010 06:17 AM

You will need to redirect the output of your command to another file:
Code:

awk '' file1 > file2

shakes82 07-07-2010 08:36 AM

Thank you Drunna and Grail.

grail 07-07-2010 08:46 AM

No probs ... don't forget to mark as SOLVED :)

shakes82 07-07-2010 05:47 PM

Hi Druuna, grail, colucix ....I need some more help....
They gave me some more data is a slightly different format:
01/02/09 18:03:03 1.280550 1.281550
01/02/09 18:23:20 1.280570 1.281570
01/02/09 18:23:24 1.280270 1.281270
01/02/09 18:53:53 1.279970 1.280970
01/02/09 18:54:10 1.279810 1.280810
01/02/09 18:54:11 1.279780 1.280780
01/02/09 18:54:11 1.279770 1.280770
01/02/09 19:04:45 1.279500 1.280500
01/02/09 19:05:22 1.279500 1.280500
01/02/09 19:05:58 1.279500 1.280500

So now this data has the '/' character too in the first column. If I use:
awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt
Then it only removes the ':' and not the '/' character. I tried something like:
awk -F"[ :,/]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt, but it is not working.

Please let me know how I can get the above data in the following format:
090102180303,1.280550,1.281550,SYM

So in the first column I need to get rid of the '/' character and get the year first 09 then the month 01 and then the date 02. Then I need to remove both the separators, remove the space between first and second column. Eg.
01/02/09 18:03:03 1.280550 1.281550
to:
090102180303,1.280550,1.281550,SYM

Hoping to hear back soon. Thank you for your help.

pixellany 07-07-2010 05:57 PM

Code:

sed -e 's/\///g' -e 's/ //' -e 's/://g' -e 's/ /, /g' filename > newfilename

shakes82 07-07-2010 06:01 PM

Thanks pixellany, but I need to add "SYM" at the end to and change the year format in the beginning. Example:

01/02/09 18:03:03 1.280550 1.281550
to:
090102180303,1.280550,1.281550,SYM

pixellany 07-07-2010 06:05 PM

Code:

sed 's/$/, SYM/'
"$" means "at the end of the line"

or just another command string with "-e"

shakes82 07-07-2010 06:09 PM

Thanks again pixellany, but does this change the year from 01/02/09 to 090102 also?

pixellany 07-07-2010 06:09 PM

OOPs--I just saw that you wanted to change the order of the date terms--my code does not do that.

AWK is ideal for changing the order of something (just change the order of the print statements), but first you'd have to isolate the date string.

shakes82 07-07-2010 06:11 PM

Can you please suggest how I can achieve that exactly. I am very new with unix/linux but need to get this done soon.
Thank you for your help and time.

pixellany 07-07-2010 06:31 PM

Code:

awk -F" |/" '{print $3 $1 $2 $4" "$5" "$6}' filename | sed -e 's/://g' -e 's/ /, /g' -e 's/$/, SYM/' > newfilename

shakes82 07-07-2010 06:45 PM

Thanks alot pixellany, it works well but just a slight problem. I am getting a space after each ','. So I am getting something like:
090102180303, 1.280550, 1.281550, SYM
instead of:
090102180303,1.280550,1.281550,SYM
Can you please suggest how I can take care of this issue?

Thanks again.

colucix 07-07-2010 06:48 PM

...by means of awk only:
Code:

awk -F"[ /]" 'BEGIN{OFS=","}{gsub(/:/,"",$4); print $3 $1 $2 $4, $5, $6, "SYM"}' file

grail 07-07-2010 06:54 PM

Hi shakes ... whilst it is obvious that you are new to the likes of awk and sed you really are supposed to try and learn from the information given, otherwise you will just be back when you get
stuck again.

Anyhoo, if you look at the sed provided by pixellany you will see a space after each comma that is replacing items like a space or end of line.
Personally I would just leave it all in awk:
Code:

awk -F" |/" '{print $3 $1 $2 $4","$5","$6",SYM"}' filename > newfilename
As you can see this was not that different from the first solution presented to you.

Good luck

Edit: My bad ... go with colucix's solution as mine doesn't replace the ':'

colucix 07-07-2010 06:57 PM

Quote:

Originally Posted by shakes82 (Post 4026776)
I am getting a space after each ','

There is a typo in the pixellany's code:
Code:

-e 's/ /, /g'
should be
Code:

-e 's/ /,/g'
anyway, at this point you could try to figure it out, if you understand what these sed and awk statements do exactly. A little advice: pay always attention to the suggested code before actually run it and read carefully the related documentation to fully understand the algorithms. You can safely trust code posted by pixellany, druuna or me and many many other members here at LQ... but you never know! :)

Edit: ...and many many other members... grail being one of them. ;)

shakes82 07-07-2010 07:08 PM

Thanks grail. I understand what you are saying and I will learn this soon. I just needed to complete this work that is why needed help. Very soon I will be answering to questions on this forum.
I tried colucix command but it is giving me an error:
awk: ^syntax error. I copied the exact command. Please help. Thanks

shakes82 07-07-2010 07:14 PM

Thanks everyone for your time and help. I have got the solution. Colucix, as I told grail....very soon you will find me answering to queries from newbies. I just needed to get this work done soon. Thanks again.

grail 07-07-2010 07:49 PM

Well just something to cut your teeth on later, the following should be good for both formats you have shown us:
Code:

awk -F"[ /]" 'gsub(/:/,"");if(NF==4){var=$1;d=2}else var="20"$3 $1 $2;print var $(4-d),$(5-d),$(6-d),"SYM"}' OFS="," inputfile > outputfile

shakes82 07-07-2010 11:12 PM

thanks grail!!!!!


All times are GMT -5. The time now is 04:38 PM.