LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-01-2010, 02:11 AM   #1
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Rep: Reputation: 16
Change to capital first letter of every word over specific column


Hi guys,

Trying to change to upper case first letter of every word over a specific column.

The source file is as follow:
Code:
PRODUCT No.|SCIENCE BOOKS|DESCRIPTION
Product 1|PHILOSOPHIAE NATURALIS PRINCIPIA MATHEMATICA (1687)|Blah blah blah
Product 2|Dialogue concerning the two chief world systems (1632)|blah blah blah
Product 3|De Revolutionibus Orbium Coelestium (1543)|blah blah blah
Product 4|the voyage of the beagle (1845)|blah blah blah
The output desired is:
Code:
PRODUCT No.,SCIENCE BOOKS,DESCRIPTION
Product 1|Philosophiae Naturalis Principia Mathematica (1687)|blah blah blah
Product 2|Dialogue Concerning The Two Chief World Systems (1632)|blah blah blah
Product 3|De Revolutionibus Orbium Coelestium  (1543)|blah blah blah
Product 4|The Voyage Of The Beagle (1845)|blah blah blah
I now that something similar could be done with SED using:
Code:
sed -e 's/.*/\L&/' -e 's/\<./\u&/g' file where:
sed 's/.*/\L&/' file   --> To changes to lower case all file content
sed 's/\<./\u&/g' file --> To change only first letter of each word to uppercase
But SED works over all columns and I dont know how to do it in a specific column using SED, for this sample file the task would be over column 2.

Ive trying with AWK either with the next script:
Code:
awk 'BEGIN{FS=OFS="|"} NR>1{$2=tolower($2);$2=gensub(/\<[A-Za-z]/,"X","g",$2)} {print $0} file'
PRODUCT No.,SCIENCE BOOKS,DESCRIPTION
Product 1|Xhilosophiae Xaturalis Xrincipia Xathematica (1687)|blah blah blah
Product 2|Xialogue Xoncerning Xhe Xwo Xhief Xorld Xystems (1632)|blah blah blah
Product 3|Xe Xevolutionibus Xrbium Xoelestium (1543)|blah blah blah
Product 4|Xhe Xoyage Xf Xhe Xeagle (1845)|blah blah blah
But this script only is a test one because replaces within column 2, only the first letter of every word with a constant "X" and I dont know how to replace the match pattern in gensub, with the same pattern but in upper case.

Maybe somebody could give a suggestion.

Thanks in advance.
 
Old 05-01-2010, 05:18 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,557
Blog Entries: 28

Rep: Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178
Here's code that does the core of what you want but it
  • Assumes the last word does not need title-casing (is always the year).
  • Does not yield true title case ("the", "of" etc. should not be capitalised).
  • Assumes there are only three "|"-separated fields.
Code:
#!/bin/bash

# Simulate reading file
line[0]="Product 1|PHILOSOPHIAE NATURALIS PRINCIPIA MATHEMATICA (1687)|Blah blah blah"
line[1]="Product 2|Dialogue concerning the two chief world systems (1632)|blah blah blah"
line[2]="Product 3|De Revolutionibus Orbium Coelestium (1543)|blah blah blah"
line[3]="Product 4|the voyage of the beagle (1845)|blah blah blah"

IFS='|'
for (( i=0; i<${#line[*]}; i++ ))
do
    array=( ${line[i]} )
    #echo "${array[1]}" | sed 's/.[^[:space:]]* /XX /g'
    #echo "${array[1]}" | sed 's/\(.\)[^[:space:]]* /\1XX /g'
    #echo "${array[1]}" | sed 's/\(.\)[^[:space:]]* /\u\1XX /g'
    #echo "${array[1]}" | sed 's/\(.\)\([^[:space:]]* \)/\u\1\2/g'
    #echo "${array[1]}" | sed 's/\(.\)\([^[:space:]]* \)/\u\1\L\2/g'
    echo "${array[0]}|$( echo "${array[1]}" | sed 's/\(.\)\([^[:space:]]* \)/\u\1\L\2/g' )|${array[2]}"
done
unset IFS
The experiments used to determine the correct sed command are shown, commented out, in case they are instructive.
 
Old 05-01-2010, 07:16 AM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,648

Rep: Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961
Hmm ... Well I am sure someone else would be able to work out where my slip up is, but the following will set ALL first letters to capital:

Code:
sed -r -e '2,$s/\|([^|]*)/\|\L\1/' -e 's@(\b[a-z])@\u\1@g' in.txt
Edit: I was keen to find an awk solution but again we may need a guru to tidy it up, however, it does the desired result:
Code:
awk 'BEGIN{OFS=FS="|"}
     NR>1{$2=tolower($2);split($2,arr," "); $2=""; i=asorti(arr,arr2); 
          for(x=1;x<=i;x++){$2=$2toupper(substr(arr[arr2[x]],1,1))substr(arr[arr2[x]],2)" "}
         }1
    ' in.txt

Last edited by grail; 05-01-2010 at 07:46 AM.
 
Old 05-01-2010, 08:15 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,648

Rep: Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961
Well this one may be a little more readable:
Code:
awk 'BEGIN{OFS=FS="|"}
     NR>1{$2 = tolower($2);
          split($2,arr," ");
          for(x in arr)
              sub(arr[x],toupper(substr(arr[x],1,1))substr(arr[x],2),$2)
         }1' in.txt
 
Old 05-01-2010, 04:17 PM   #5
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
Hi catlin,

Many thanks for your help, I tested your solution and works, the issue is the real files have 40 columns. But Im certainly will try to learn from the regexp you used in sed commands!.

Hi grail,

Your awk solutions is work! thanks again for your help.

The SED script work over complete line, not only over column 2.

Well, your solutions come to me several questions like always, sorry :-)

May you help me this doubts:
Ive recently learned that "\1", "\2".."\9" is the way to remember patterns, but:

1-) What does these parts in SED script mean
1.1) "sed -r -e '2,$s/..."?

1.2) "s@(.." and ..\1@g.. "?

1.3) It looks like is not possible say to SED that work over specific column considering a
determined field separator, only using a regexp, right?

1.4) Do you know about some Unix style regexp (like use SED or AWK) tester to use
on windows?

2) The first awk script works nice, but Ive been trying without success to remove the last
space character that is introduced in column 2 after finish the processing.

3) Similarly to 1st awk script, in the 2nd one, Ive been trying without success to remove
the extra "(" and ")" that is introduced surround the years.

Many thanks in advance, thanks both guys.

Regards,
 
Old 05-01-2010, 08:46 PM   #6
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
#!/bin/sh

while read line
do
echo $(echo "$line" | cut -d'|' -f1)'|'$(echo "$line" | cut -d'|' -f3 | convert first letter to uppercase)'|'$(echo "$line" | cut -d'|' -f3)
done
 
1 members found this post helpful.
Old 05-02-2010, 12:42 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,648

Rep: Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961
Quote:
May you help me this doubts:
Ive recently learned that "\1", "\2".."\9" is the way to remember patterns, but:

1-) What does these parts in SED script mean
1.1) "sed -r -e '2,$s/..."?

1.2) "s@(.." and ..\1@g.. "?

1.3) It looks like is not possible say to SED that work over specific column considering a
determined field separator, only using a regexp, right?

1.4) Do you know about some Unix style regexp (like use SED or AWK) tester to use
on windows?

2) The first awk script works nice, but Ive been trying without success to remove the last
space character that is introduced in column 2 after finish the processing.

3) Similarly to 1st awk script, in the 2nd one, Ive been trying without success to remove
the extra "(" and ")" that is introduced surround the years.
1.1)I presume you have issue with "2,$" - this is a range saying only perform the sed on all lines between 2 and the end of the file

1.2)Here you are confused by the "@" symbol? - if so, you can have pretty much any delimeter you like, the norm is 's///' where I have used s'@@@'. If I have
lots of other slashes/sloshes ie "/" or "\" then I sometimes use this symbol

1.3)No it is possible, I just haven't worked out the kinks to get the capitalisation to also work within the delimetered boundary

2) I might need more information here as based on the example I have no extra spaces?

3) Sorry about that one, my bad ... this fixes it:
Code:
awk 'BEGIN{OFS=FS="|"}
     NR>1{$2 = tolower($2);split($2,arr," ");
	  for(x in arr)
              if(arr[x] ~ /^[a-z]/)
                  sub(arr[x],toupper(substr(arr[x],1,1))substr(arr[x],2),$2)
         }1' in.txt
 
1 members found this post helpful.
Old 05-02-2010, 01:49 AM   #8
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
Hi MTK358,

Thanks for your help, Ive tried to execute your script, but just dont know how, I put the "inpufile" name at the end with SCRIPT inputfile but doesnt work. How is the way to run it?

grail,

Again and again thanks.
Quote:
1.1)I presume you have issue with "2,$" - this is a range saying only perform the sed on all lines between 2 and the end of the file
I was near about this, now Im clear. thanks!
Quote:
1.2)Here you are confused by the "@" symbol? - if so, you can have pretty much any delimeter you like, the norm is 's///' where I have used s'@@@'. If I have lots of other slashes/sloshes ie "/" or "\" then I sometimes use this symbol
Great explanation, great Tip, great to know this. I havent idea about this SED feature. Thanks.
Quote:
1.3)No it is possible, I just haven't worked out the kinks to get the capitalisation to also work within the delimetered boundary
Ok.
Quote:
2) I might need more information here as based on the example I have no extra spaces?
This is a little detail, is only to learn, well when I execute the first awk script the output contain an new space as last character in every line. See comparison of product2 line below:
Code:
.
Product 2|Dialogue Concerning The Two Chief World Systems (1632) |Blah blah blah| -->(With extra space at the end)
Product 2|Dialogue Concerning The Two Chief World Systems (1632)|Blah  blah blah|--> (correct output)
And the last question:
In your awk scripts, what does the "1" at the end mean? }
Code:
awk '... ...}1' in.txt
Thanks for all your help.
 
Old 05-02-2010, 02:44 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,648

Rep: Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961Reputation: 1961
Quote:
first awk script the output contain an new space as last character in every line
Again my bad as I removed that one after I came up with other solution. Please the "if" into the "for" statement as I showed in
last post should remedy this.

Quote:
In your awk scripts, what does the "1" at the end mean?
The default action for awk is to print, try these two examples to see what the difference is:
Code:
awk '0' inputfile

awk '1' inputfile
Note: Any number greater than zero will work in last example

As for MTK358's example:
Quote:
How is the way to run it?
Use this as the last line instead of the current "done"
Code:
done<inputfile
 
1 members found this post helpful.
Old 05-02-2010, 03:21 AM   #10
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
Quote:
The default action for awk is to print, try these two examples to see what the difference is:
Code:
awk '0' inputfile

awk '1' inputfile
Note: Any number greater than zero will work in last example
0 or nothing acts like "not print" and any other number greater than 0 is like "print". With every answers I learn a lot! Many thanks grail.

Your help is really appreciated.

Best regards.
 
Old 05-02-2010, 08:22 AM   #11
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
My script was NOT a fully working program.

The part in bold had to be replaced with something I didn't know how to do, and it would be used like this:

Code:
SCRIPT < file.txt
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
grep regex number decimal capital letter casperdaghost Linux - Newbie 4 08-22-2009 07:32 AM
substitute few words + change all the lines starting with a specific word + put blank rahmathullakm Programming 6 01-18-2009 12:35 PM
convert input text to capital letter dwarf007 Linux - General 2 07-02-2007 03:28 AM
dealing with capital and lowercase letter under /win Bonobobo Linux - General 2 02-20-2006 02:53 PM
capital letter email id sachin_keluskar Linux - Software 3 09-02-2005 11:07 AM


All times are GMT -5. The time now is 01:54 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration