LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-13-2011, 09:48 PM   #1
ivn
LQ Newbie
 
Registered: Sep 2011
Posts: 10

Rep: Reputation: Disabled
How-to cut specific text from cell and paste into new column


I have a CSV file with data in a column that contains the product name, size, color (but sometimes it may or may not contain the size and or color). I need to get the size and color (if available) into their own column.

I'm using a shell script to manipulate the file. I tried using sed and a file that consists of lines like: "s/\bBLACK\b/,Black,/g" in order to create a new column, but that screws up the file because not all items contain color and/or size information.

Right now I added two empty columns to the original file. If I had a list of colors and sizes (which I do have), whats the best way to cut or move the color matches into one of the new columns, and the size matches into the other?

What is the best approach, any recommendations?

Sample cell:
HELMET GP-TECH BLACK XL

Sample Result:
HELMET GP-TECH,BLACK,XL
 
Old 12-13-2011, 09:56 PM   #2
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
You may need to be more specific about your data. For example, this sed script works with the example data you provided:
Code:
user@localhost:~$ echo "HELMET GP-TECH BLACK XL" | sed 's@\(.*\) \(.*\) \(.*\)@\1,\2,\3@'
HELMET GP-TECH,BLACK,XL
user@localhost:~$
The limitations are:
1. The line must contain at least two spaces
2. The last two spaces are used to separate the color and size information

This obviously will not work if your color or size options contain spaces themselves.

EDIT:
In other words, it would be useful to see some real, full-fledged sample data (any confidential info "scrubbed" if necessary). Otherwise, we can't test a solution on our end to make sure it matches what you need.

EDIT2:
Just as another example, this might work (which shows additional data that could be part of the CSV):
Code:
user@localhost:~$ echo "HELMET GP-TECH BLACK XL,dummy data 1,dummy data 2" | sed 's@\([^,]*\) \([^,]*\) \([^,]*\)@\1,\2,\3@'
HELMET GP-TECH,BLACK,XL,dummy data 1,dummy data 2
user@localhost:~$

Last edited by Dark_Helmet; 12-13-2011 at 10:04 PM.
 
Old 12-14-2011, 08:35 PM   #3
ivn
LQ Newbie
 
Registered: Sep 2011
Posts: 10

Original Poster
Rep: Reputation: Disabled
Here is some sample data. The problem is that the column I'm working with, has no set pattern. Colors and Sizes are not necessarily in the same order, and colors can be combos (blk/wht) and not necessarily spelled the same, some items have sizes only and no colors, and so on and so forth. I'd also like to rename the colors and sizes to something standard. So like I mentioned before, I have a file with all the possible color and size variables that I'm looking for.

Thanks for your help!

Code:
dummydata1,dummydata2,HELMET GP-TECH BLACK S,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH BLACK M,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH BLACK L,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT SM,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT MD,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT LG,dummydata3,dummydata4
dummydata1,dummydata2,T2 LEVER BUCKLE-PAIR,dummydata3,dummydata4
dummydata1,dummydata2,TECH 4 BUCKLE SET  BLACK,dummydata3,dummydata4
dummydata1,dummydata2,TECH 4 BUCKLE SET BLK/WHT,dummydata3,dummydata4
dummydata1,dummydata2,TECH6S BUCKLE SET-BLACK,dummydata3,dummydata4
dummydata1,dummydata2,TECH6S BUCKLE SET-WHITE,dummydata3,dummydata4
dummydata1,dummydata2,T-6 BUCKL/STRAP SET BK/BK,dummydata3,dummydata4
dummydata1,dummydata2,T-6 BUCKL/STRAP SET BK/WT,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.BLK/BLK L,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.BLK/BLK M,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.BLK/BLK S,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.BLK/BLK XL,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.BLK/BLK XXL,dummydata3,dummydata4
Here is some data from the other columns:
Code:
0101-1234,HELMET GP-TECH BLACK M,AGV SPA - ITALY,AGV,59995,55,0
0101-1235,HELMET GP-TECH BLACK L,AGV SPA - ITALY,AGV,59995,55,0
0101-1236,HELMET GP-TECH BLACK XL,AGV SPA - ITALY,AGV,59995,55,7
0101-1237,HELMET GP-TECH BLACK XXL,AGV SPA - ITALY,AGV,59995,54,5
0101-1238,HELMET GP-TECH BLACK 3XL,AGV SPA - ITALY,AGV,59995,54,0

Last edited by ivn; 12-14-2011 at 08:54 PM.
 
Old 12-14-2011, 09:12 PM   #4
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
If I understand what you're describing, then the typical automated text tools (grep, cut, sed, etc.) won't handle your situation. You'll probably need something like (g)awk.

So, just to make sure, you're saying:
1. "Colors and Sizes are not necessarily in the same order" --> meaning line 5 might need column 2 placed after column 4, but line 8 might need no column adjustment
2. "colors can be combos (blk/wht) and not necessarily spelled the same" --> meaning there is no consistency among column "values" (which would prevent an easy identification that a specific cell was a size/color/something else)
3. "some items have sizes only and no colors" --> meaning some lines might require adding empty "placeholders" to have each line contain a consistent number of cells

So, if that's correct, then I have good news and bad news.

The bad news is:
(g)awk is probably the best approach if you want to handle all the manipulations in one shot. The reason that is bad news is because (1) I'm not that experienced with it and wouldn't be much help and (2) that there's a bit of a learning curve especially if you come to it from a non-programming background. For reference, here's The GNU Awk User's Guide. It's a "light" read... Though, there are some (g)awk gurus lurking the forums.

The good news is:
I can help you tackle it piece-meal if you like. Each component above should be possible to fix one-at-a-time. Doing so might take a while going back-and-forth. So we might have to take the discussion off the forum an over to email.
 
Old 12-14-2011, 10:49 PM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,485

Rep: Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890
Using the data from post #3, would you please also provide (I realise manually) the output of that data?

Also, are you able to confirm if the section of data in the csv we are looking at is always in the same column?
I ask as the 2 examples in post #3 have it in differing columns, ie first data in column 3 but second in column 2.

I agree with DarkH that awk is probably one of the easier solutions. If we use first data in post #3 we would be looking
at something like:
Code:
awk -F, '{n=split($3, arr, " ")}' file.csv
On it's own this does nothing but it will place all the items we are interested in into an array called 'arr' which then can
be easily worked on to see what is in each item of the array.

I will await feedback to provide more to the solution
 
Old 12-17-2011, 08:53 PM   #6
ivn
LQ Newbie
 
Registered: Sep 2011
Posts: 10

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Using the data from post #3, would you please also provide (I realise manually) the output of that data?
I need to add two columns, one for color, and another for size. In this example, if I find a color or size in the 3rd column, move it to the respective column. If nothing is found/matched, the two columns would remain with no data in them.
Code:
dummydata1,dummydata2,HELMET GP-TECH,BLACK,S,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH,BLACK,M,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH,BLACK,L,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT,,SM,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT,,MD,dummydata3,dummydata4
dummydata1,dummydata2,HELMET GP-TECH 5-CONT,,LG,dummydata3,dummydata4
dummydata1,dummydata2,T2 LEVER BUCKLE-PAIR,,,dummydata3,dummydata4
dummydata1,dummydata2,TECH 4 BUCKLE SET,BLACK,,dummydata3,dummydata4
dummydata1,dummydata2,TECH 4 BUCKLE SET,BLK/WHT,,dummydata3,dummydata4
dummydata1,dummydata2,TECH6S BUCKLE SET-,BLACK,,dummydata3,dummydata4
dummydata1,dummydata2,TECH6S BUCKLE SET-,WHITE,,dummydata3,dummydata4
dummydata1,dummydata2,T-6 BUCKL/STRAP SET,BK/BK,,dummydata3,dummydata4
dummydata1,dummydata2,T-6 BUCKL/STRAP SET,BK/WT,,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.,BLK/BLK,L,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.,BLK/BLK,M,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.,BLK/BLK,S,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.,BLK/BLK,XL,dummydata3,dummydata4
dummydata1,dummydata2,ST-1 GLV.,BLK/BLK,XXL,dummydata3,dummydata4
Quote:
Originally Posted by grail View Post
Also, are you able to confirm if the section of data in the csv we are looking at is always in the same column?
I ask as the 2 examples in post #3 have it in differing columns, ie first data in column 3 but second in column 2.
Yes, data will always be in the same column; sorry - wasn't consistent.

Quote:
Originally Posted by grail View Post
I agree with DarkH that awk is probably one of the easier solutions. If we use first data in post #3 we would be looking
at something like:
Code:
awk -F, '{n=split($3, arr, " ")}' file.csv
On it's own this does nothing but it will place all the items we are interested in into an array called 'arr' which then can
be easily worked on to see what is in each item of the array.

I will await feedback to provide more to the solution
What would be the next step(s)?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Subtracting the value of a specific cell to the whole column udiubu Programming 13 11-11-2011 10:50 AM
[SOLVED] Bash command to 'cut' text into another text file & modifying text. velgasius Programming 4 10-17-2011 04:55 AM
Read text file column by column RVF16 Programming 11 05-31-2009 07:16 AM
Cut and paste from text editors causes layout problems redandwhitestripes Linux - General 4 03-23-2009 11:01 PM
cut column ust Linux - Newbie 10 01-09-2008 03:28 AM


All times are GMT -5. The time now is 06:53 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration