Help using awk,sed and grep
Hi. I have some financial data in the the following format:
20090302 18:02:03 1.5 1.6 I want to change this to the following format using grep awk and sed: 20090302180203,1.5,1.6,SYM Please suggest how I can use the commands to get this formatting. Thank you. |
Hi,
Something like this? awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile Hope this helps. |
Thanks Druuna but it didn't work. I am getting the following output with your suggestion:
,,,sym02 180203 1.5 1.6 Any other suggestions? Thanks for your time. |
Quote:
Anyway, I suspect the fields in the input file are not separated by blank spaces. Maybe tabs? |
Hi,
It works on my side. Code:
$ cat infile Tried it with awk: Works as well. Code:
awk --version |
@colucix: The fields are separated by a space. I think I am getting sym02 because it is taking 02 from 20090302 and adding sym before.
I am using fedora 13. awk version: 3.1.7 @druuna: I am using fedora 13. awk version: 3.1.7 I tried it again exactly as: awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile with the same spacing and everything but I am getting the following output: ,sym0302180203,1.5,1.6 I think what it is adding ,sym infront and that is why 2009 is replaced by ,sym Can you please suggest what I can do to make it right? Thank you |
Hi,
Could you post a relevant example? Like colucix already said it looks like your input file is not the same as your example posted in post #1. The problem is not the awk command I posted with the example posted by you (shown by me and confirmed by colucix). I can come up with one thing that could be causing this: Is the infile a unix or a dos file? |
Following are a few lines of the data:
20090102 18:03:03 1.280550 1.281550 20090102 18:23:20 1.280570 1.281570 20090102 18:23:24 1.280270 1.281270 20090102 18:53:53 1.279970 1.280970 20090102 18:54:10 1.279810 1.280810 It is a *.txt (text) file. |
Hi,
Unix/linux works different then windows. The fact that the file has a .txt extension doesn't say anything at all. What does the following command show you: file infile.txt |
It shows me:
fx.txt: ASCII text, with CRLF line terminators Thanks |
Hi,
That is a file with dos/windows terminators (CRLF) and it is also the reason why it doesn't work. Most (all?) unix/linux tools do not work too well with dos/windows files. Here a link that gives a few examples of how to change a dos file to a unix file (do make a backup of the original before trying them out!!). HowTo: UNIX / Linux Convert DOS Newlines CR-LF to Unix/Linux Format Hope this helps. |
Thanks Druuna. I will try it and let you know how it goes. Thanks again.
|
You're welcome :)
|
Thanks alot Druuna. It worked perfectly. I appreciate your help. Thanks for your time.
|
Since I am new with unix, can you please suggest how I can save the changes to the file after using the command. Thanks
|
Hi,
Assuming you mean the awk command: awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt This leaves the original (infile.txt) as is and puts all changed entries in newfile.txt. If you do need the output to be in the original file (after checking if all is ok with the above given command): mv newfile.txt infile.txt BTW: Do not use the same name for the output and input file (i.e. awk '{ ... }' infile > infile), you will end up with an empty file!! Hope this helps. |
You will need to redirect the output of your command to another file:
Code:
awk '' file1 > file2 |
Thank you Drunna and Grail.
|
No probs ... don't forget to mark as SOLVED :)
|
Hi Druuna, grail, colucix ....I need some more help....
They gave me some more data is a slightly different format: 01/02/09 18:03:03 1.280550 1.281550 01/02/09 18:23:20 1.280570 1.281570 01/02/09 18:23:24 1.280270 1.281270 01/02/09 18:53:53 1.279970 1.280970 01/02/09 18:54:10 1.279810 1.280810 01/02/09 18:54:11 1.279780 1.280780 01/02/09 18:54:11 1.279770 1.280770 01/02/09 19:04:45 1.279500 1.280500 01/02/09 19:05:22 1.279500 1.280500 01/02/09 19:05:58 1.279500 1.280500 So now this data has the '/' character too in the first column. If I use: awk -F"[ :]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt Then it only removes the ':' and not the '/' character. I tried something like: awk -F"[ :,/]" '{ print $1 $2 $3 $4","$5","$6",SYM" }' infile.txt > newfile.txt, but it is not working. Please let me know how I can get the above data in the following format: 090102180303,1.280550,1.281550,SYM So in the first column I need to get rid of the '/' character and get the year first 09 then the month 01 and then the date 02. Then I need to remove both the separators, remove the space between first and second column. Eg. 01/02/09 18:03:03 1.280550 1.281550 to: 090102180303,1.280550,1.281550,SYM Hoping to hear back soon. Thank you for your help. |
Code:
sed -e 's/\///g' -e 's/ //' -e 's/://g' -e 's/ /, /g' filename > newfilename |
Thanks pixellany, but I need to add "SYM" at the end to and change the year format in the beginning. Example:
01/02/09 18:03:03 1.280550 1.281550 to: 090102180303,1.280550,1.281550,SYM |
Code:
sed 's/$/, SYM/' or just another command string with "-e" |
Thanks again pixellany, but does this change the year from 01/02/09 to 090102 also?
|
OOPs--I just saw that you wanted to change the order of the date terms--my code does not do that.
AWK is ideal for changing the order of something (just change the order of the print statements), but first you'd have to isolate the date string. |
Can you please suggest how I can achieve that exactly. I am very new with unix/linux but need to get this done soon.
Thank you for your help and time. |
Code:
awk -F" |/" '{print $3 $1 $2 $4" "$5" "$6}' filename | sed -e 's/://g' -e 's/ /, /g' -e 's/$/, SYM/' > newfilename |
Thanks alot pixellany, it works well but just a slight problem. I am getting a space after each ','. So I am getting something like:
090102180303, 1.280550, 1.281550, SYM instead of: 090102180303,1.280550,1.281550,SYM Can you please suggest how I can take care of this issue? Thanks again. |
...by means of awk only:
Code:
awk -F"[ /]" 'BEGIN{OFS=","}{gsub(/:/,"",$4); print $3 $1 $2 $4, $5, $6, "SYM"}' file |
Hi shakes ... whilst it is obvious that you are new to the likes of awk and sed you really are supposed to try and learn from the information given, otherwise you will just be back when you get
stuck again. Anyhoo, if you look at the sed provided by pixellany you will see a space after each comma that is replacing items like a space or end of line. Personally I would just leave it all in awk: Code:
awk -F" |/" '{print $3 $1 $2 $4","$5","$6",SYM"}' filename > newfilename Good luck Edit: My bad ... go with colucix's solution as mine doesn't replace the ':' |
Quote:
Code:
-e 's/ /, /g' Code:
-e 's/ /,/g' Edit: ...and many many other members... grail being one of them. ;) |
Thanks grail. I understand what you are saying and I will learn this soon. I just needed to complete this work that is why needed help. Very soon I will be answering to questions on this forum.
I tried colucix command but it is giving me an error: awk: ^syntax error. I copied the exact command. Please help. Thanks |
Thanks everyone for your time and help. I have got the solution. Colucix, as I told grail....very soon you will find me answering to queries from newbies. I just needed to get this work done soon. Thanks again.
|
Well just something to cut your teeth on later, the following should be good for both formats you have shown us:
Code:
awk -F"[ /]" 'gsub(/:/,"");if(NF==4){var=$1;d=2}else var="20"$3 $1 $2;print var $(4-d),$(5-d),$(6-d),"SYM"}' OFS="," inputfile > outputfile |
thanks grail!!!!!
|
All times are GMT -5. The time now is 04:38 PM. |