LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   CSV | Text manipulation (https://www.linuxquestions.org/questions/programming-9/csv-%7C-text-manipulation-659220/)

burschik 07-31-2008 02:45 AM

I agree that sed is cumbersome for large tasks, but this is a very small task. And the sed solution does seem to be quite a bit more concise than the others.

radoulov 07-31-2008 02:46 AM

Quote:

Originally Posted by burschik (Post 3231754)
I agree that sed is cumbersome for large tasks, but this is a very small task. And the sed solution does seem to be quite a bit more concise than the others.


Sure,
but think if you have to add more patterns like this ...

Quote:

If field starts with ...

lmedland 07-31-2008 02:56 AM

Quote:

Originally Posted by radoulov (Post 3230799)
This is for GNU Awk (otherwise you should escape the new lines in the ternary operator):

Code:

awk>new.csv -F', *' 'BEGIN {
  n = split("N C L", t, OFS)
  while (++i <= n) tt[t[i]] = sprintf("%02d", i)
  c = "01"
  }
NR == 1 { $(NF + 1) = "New code"; print; next }
{ $(NF + 1) = $NF ~ /^[NCL].*/ ?
    tt[substr($NF, 1, 1)] substr($NF, 2, 3) c :
      $NF }
1'  OFS=', ' filename


Sorry, for some reason I missed this post!!!

Tried this but it pops a new line in there. How do I correct this?

Here is output:-

Code:

Patient No, Balance, Payor ID

 ,New code
8388, 13, NBUP

, 01BUP01
8526, 315, 8526

, 8526

8550, 464.65, NBUP

, 01BUP01

Thank you

radoulov 07-31-2008 03:14 AM

Hm, is this MS Windows?
Try changng the FS:

from:

Code:

-F', *'
to

Code:

-F', '

gnashley 07-31-2008 03:25 AM

This whole exercise looks a lot like homework to me...

lmedland 07-31-2008 03:26 AM

Quote:

Originally Posted by radoulov (Post 3231777)
Hm, is this MS Windows?
Try changng the FS:

from:

Code:

-F', *'
to

Code:

-F', '

Nope, Ubuntu. I'll give it ago, thanks again.

lmedland 07-31-2008 03:42 AM

Quote:

Originally Posted by gnashley (Post 3231786)
This whole exercise looks a lot like homework to me...

I can assure you its not. I'm 27 and work in an IT department of just 1....me.....to do everything!

I needed to manipulate some data extracts from legacy systems as we can't afford to do a data cleanse.

lmedland 07-31-2008 03:44 AM

Quote:

Originally Posted by radoulov (Post 3231777)
Hm, is this MS Windows?
Try changng the FS:

from:

Code:

-F', *'
to

Code:

-F', '

Sorry my mistake, the original csv file was generated under Excel, so your right - it was Windows.

I have since loaded it into OpenOffice and exported again and the routine above works. Thank you.

burschik 07-31-2008 04:33 AM

Quote:

Originally Posted by radoulov (Post 3231755)
Sure,
but think if you have to add more patterns like this ...

Well, of course the thought had occurred to me, but the OP did not state that the number of patterns to process might increase. So I decided to go for a one-liner rather than a complete, modular, extensible solution. And for a one-liner, sed is perfectly valid, possibly even optimal.

radoulov 07-31-2008 05:01 AM

I agree and I must confess that usually I have the same approach (quick and dirty, but efficient).

ghostdog74 07-31-2008 09:56 AM

I like to work with fields instead of regexp, especially for structured data.
Code:

awk 'BEGIN{OFS=FS=", "
 c["N"]="01"
 c["C"]="02"
 c["L"]="03"
}
NR==1{print}
NR>1{
 print $1,$2, c[substr($3,1,1)] substr($3,2) "01"
}
' file

tell me honestly, would you want to read this, or one long line of regexp? :)

burschik 07-31-2008 10:44 AM

Quote:

Originally Posted by ghostdog74 (Post 3232164)
I like to work with fields instead of regexp, especially for structured data.
Code:

awk 'BEGIN{OFS=FS=", "
 c["N"]="01"
 c["C"]="02"
 c["L"]="03"
}
NR==1{print}
NR>1{
 print $1,$2, c[substr($3,1,1)] substr($3,2) "01"
}
' file

tell me honestly, would you want to read this, or one long line of regexp? :)

I'm not denying your program is more readable and more maintainable. I merely object to your claim that sed is unsuitable for the task. Moreover, "one long line of regexp" could also be written like this:

Code:

s/, N\([A-Z]\+\)/\0, 01\101/
s/, C\([A-Z]\+\)/\0, 02\101/
s/, L\([A-Z]\+\)/\0, 03\101/

Now, that also looks pretty readable to me.


All times are GMT -5. The time now is 10:18 AM.