Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-20-2024, 01:00 PM
|
#1
|
LQ Newbie
Registered: Aug 2023
Posts: 14
Rep:
|
warning: regexp escape sequence "' is not a known regexp operator
Hello and sorry to bother you but I`m trying to run this script and for some reason I can`t because of this error:
$ bash task1.sh accounts.csv awk: cmd. line:31: warning: regexp escape sequence "' is not a known regexp operator
To me the line it`s "empty"
This is the script, if this is not where it should be, please forgive me and let me know where should I ask this question.
Code:
#!/bin/bash
# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]
then
echo "Usage: ./task1.sh /path/to/accounts.csv"
exit 0
fi
file=$1
# Exit if provided file doesn't exist
if [ ! -f $file ]
then
echo "File $file doesn't exist"
exit 1
fi
# Extract directory from file
path=$(dirname $file)
# Processing csv file with awk
awk '
# Set Field Separator and Output Field Separators
BEGIN { FS=","; OFS=",";}
# Skip first row, as it contains only column names
NR == 1 {
print
}
# First pass through file to check for uniqueness of emails
NR == FNR {
# 3rd field contains name
# splitting name to first name and last name
split($3, name, / /)
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
++counter[email]
}
# Second pass through file, skipping first line
NR > FNR && FNR != 1{
# Create an array from fields as the default FS processes
# quoted commas incorrectly
j=0
inside_quotes=0
for(i=1;i<=NF;i++) {
# Opening quote, save to new field,
# set inside_quotes to true
if($i ~ /^\"/) {
inside_quotes=1
j++
fields[j] = $i
}
# Closing quote, append to last field, set inside_quote to false
else if($i ~ /\"$/) {
inside_quotes=0
fields[j] = fields[j] OFS $i
}
# middle of quoted text, append to last field
else if (inside_quotes==1) {
fields[j] = fields[j] OFS $i
}
# outside of quotes, save to new field
else {
j++
fields[j] = $i
}
}
# fields[3] contains name
# Split name by space
split(fields[3], name, / /)
# Change the first character to uppercase, all other characters to lower case
name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
# Change the 3rd field to new value
fields[3] = name[1] " " name[2]
# email format: flast_name@abc.com
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
# if the email is not unique, append location id
if (counter[email] > 1) email=email fields[2]
fields[5] = email "@abc.com"
# Set new values for all 6 columns
NF=6
for(i=1;i<=NF;i++) $i=fields[i]
print
}
' $file $file > $path/accounts_new.csv
|
|
|
08-20-2024, 02:12 PM
|
#2
|
Member
Registered: Oct 2021
Posts: 62
Rep:
|
Try
and
Code:
else if($i ~ /"$/) {
instead of
and
Code:
else if($i ~ /\"$/) {
|
|
3 members found this post helpful.
|
08-20-2024, 02:20 PM
|
#3
|
LQ Newbie
Registered: Aug 2023
Posts: 14
Original Poster
Rep:
|
Thank you ! But now it`s another issue ...
So we have this initial .csv file...
Code:
id,location_id,name,title,email,department
1,1,Susan houston,Director of Services,,
2,1,Christina Gonzalez,Director,,
3,2,Brenda brown,"Director, Second Career Services",,
4,3,Howard Lader,"Manager, Senior Counseling",,
5,4,Kimberly Pesavento,Commercial director,,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,,
7,6,peter Olson,Director,,
8,6,Bart charlow,Executive Director,,
9,7,Bart Charlow,Executive Director,,
10,7,Barbara Kalt,Director,,
11,8,Marilyn Baker-Venturini,Director,,
12,8,Graciela Hernandez,Assistant Manager,,
13,8,Julie avelino,Assessment Specialist,,
14,9,Dave Genesy,Library Director,,
15,9,maria kramer,Library Divisions Manager,,
16,10,Dave Genesy,Tester,,
17,10,Maria kramer,Library Division Manager,,
18,11,Dave Genesy,Head of office,,
19,11,Elizabeth Meeks,Branch Manager,,
20,12,Kathy Endaya,Director,,
21,13,dave genesy,Library Director,,
22,14,Andres Espinoza,"Manager, Commanding Officer",,
23,15,Jack Phillips,Administrator,,
24,16,James Lee,Commanding Officer,,
25,17,Kenneth Gibson,Tester,,
26,18,Sharon Petersen,Administrator,,
27,19,Sharon Petersen,Administrator,,
28,21,Moncef Salah,Tester,,Office of Innovation
29,22,Suzanne Badenhoop,Tester,suzanne@example.com,Referrals
30,20,Sean Houston,Director of new Services,,
31,8,David Genesy,Account Manager,,
32,8,Elizabeth Feeney,CEO,e.feeney@foobar.org,Operations
33,8,Erika Meeks,Tester,e.meeks@foobar.org,Operations
I`ve updated the script to
Code:
#!/bin/bash
# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]; then
echo "Usage: ./task1.sh /path/to/accounts.csv"
exit 0
fi
file=$1
# Exit if provided file doesn't exist
if [ ! -f "$file" ]; then
echo "File $file doesn\'t exist"
exit 1
fi
# Extract directory from file
path=$(dirname "$file")
# Processing csv file with awk
awk '
BEGIN { FS=","; OFS=","; }
NR == 1 {
print
next
}
NR == FNR {
split($3, name, / /)
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
++counter[email]
next
}
FNR > 1 {
j = 0
inside_quotes = 0
for(i = 1; i <= NF; i++) {
if($i ~ /^"/) {
inside_quotes = 1
j++
fields[j] = substr($i, 2)
} else if($i ~ /"$/) {
inside_quotes = 0
fields[j] = fields[j] OFS substr($i, 1, length($i) - 1)
} else if (inside_quotes == 1) {
fields[j] = fields[j] OFS $i
} else {
j++
fields[j] = $i
}
}
split(fields[3], name, / /)
name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
fields[3] = name[1] " " name[2]
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
if (counter[email] > 1) email = email fields[2]
fields[5] = email "@abc.com"
NF = 6
for(i = 1; i <= NF; i++) $i = fields[i]
print
}
' "$file" "$file" > "$path/accounts_new.csv"
now it`s almost right , output:
Code:
id,location_id,name,title,email,department
1,1,Susan Houston,Director of Services,shouston1@abc.com,
2,1,Christina Gonzalez,Director,cgonzalez@abc.com,
3,2,Brenda Brown,Director, Second Career Services,bbrown@abc.com,
4,3,Howard Lader,Manager, Senior Counseling,hlader@abc.com,
5,4,Kimberly Pesavento,Commercial director,kpesavento@abc.com,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,jbloom@abc.com,
7,6,Peter Olson,Director,polson@abc.com,
8,6,Bart Charlow,Executive Director,bcharlow6@abc.com,
9,7,Bart Charlow,Executive Director,bcharlow7@abc.com,
10,7,Barbara Kalt,Director,bkalt@abc.com,
11,8,Marilyn Baker-venturini,Director,mbaker-venturini@abc.com,
12,8,Graciela Hernandez,Assistant Manager,ghernandez@abc.com,
13,8,Julie Avelino,Assessment Specialist,javelino@abc.com,
14,9,Dave Genesy,Library Director,dgenesy9@abc.com,
15,9,Maria Kramer,Library Divisions Manager,mkramer9@abc.com,
16,10,Dave Genesy,Tester,dgenesy10@abc.com,
17,10,Maria Kramer,Library Division Manager,mkramer10@abc.com,
18,11,Dave Genesy,Head of office,dgenesy11@abc.com,
19,11,Elizabeth Meeks,Branch Manager,emeeks11@abc.com,
20,12,Kathy Endaya,Director,kendaya@abc.com,
21,13,Dave Genesy,Library Director,dgenesy13@abc.com,
22,14,Andres Espinoza,Manager, Commanding Officer,aespinoza@abc.com,
23,15,Jack Phillips,Administrator,jphillips@abc.com,
24,16,James Lee,Commanding Officer,jlee@abc.com,
25,17,Kenneth Gibson,Tester,kgibson@abc.com,
26,18,Sharon Petersen,Administrator,spetersen18@abc.com,
27,19,Sharon Petersen,Administrator,spetersen19@abc.com,
28,21,Moncef Salah,Tester,msalah@abc.com,Office of Innovation
29,22,Suzanne Badenhoop,Tester,sbadenhoop@abc.com,Referrals
30,20,Sean Houston,Director of new Services,shouston20@abc.com,
31,8,David Genesy,Account Manager,dgenesy8@abc.com,
32,8,Elizabeth Feeney,CEO,efeeney@abc.com,Operations
33,8,Erika Meeks,Tester,emeeks8@abc.com,Operations
should be -Venturini, not with (small) v. But thank you !
|
|
|
08-20-2024, 04:03 PM
|
#4
|
Member
Registered: Oct 2021
Posts: 62
Rep:
|
If the capitalization in the original csv is right, why are you doing this?
Code:
split(fields[3], name, / /)
name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
fields[3] = name[1] " " name[2]
If I were you, I would trust the capitalization of the original data in the .csv file, if you don't you will have to foresee a huge amount of situations:
after - a capital letter, allways?
what about compound nouns like "María del Carmen" or "d'Alembert"?
did someone forgot a space after the "." in J.F. Kennedy?
Are tolower and toupper behaving properly with characters like á à ñ ü ç...?
...
|
|
3 members found this post helpful.
|
08-20-2024, 07:17 PM
|
#5
|
LQ Newbie
Registered: Aug 2023
Posts: 14
Original Poster
Rep:
|
I've managed to solve the "requirements" using this script:
Code:
#!/bin/bash
# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]; then
echo "Usage: ./task1.sh /path/to/accounts.csv"
exit 0
fi
file=$1
# Exit if provided file doesn't exist
if [ ! -f "$file" ]; then
echo "File $file doesn\'t exist"
exit 1
fi
# Extract directory from file
path=$(dirname "$file")
# Processing csv file with awk
awk '
BEGIN { FS=","; OFS=","; }
NR == 1 {
print
next
}
NR == FNR {
split($3, name, / /)
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
++counter[email]
next
}
FNR > 1 {
j = 0
inside_quotes = 0
for(i = 1; i <= NF; i++) {
if($i ~ /^"/) {
inside_quotes = 1
j++
fields[j] = substr($i, 2)
} else if($i ~ /"$/) {
inside_quotes = 0
fields[j] = fields[j] OFS substr($i, 1, length($i) - 1)
} else if (inside_quotes == 1) {
fields[j] = fields[j] OFS $i
} else {
j++
fields[j] = $i
}
}
split(fields[3], name, / /)
name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
# Capitalize after hyphen in the last name
if (name[2] ~ /-/) {
split(name[2], hyphenated, "-")
for (k in hyphenated) {
hyphenated[k] = toupper(substr(hyphenated[k], 1, 1)) tolower(substr(hyphenated[k], 2))
}
name[2] = hyphenated[1] "-" hyphenated[2]
}
fields[3] = name[1] " " name[2]
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
if (counter[email] > 1) email = email fields[2]
fields[5] = email "@abc.com"
# Enclose title in quotes if it contains a comma
if (fields[4] ~ /,/) {
fields[4] = "\"" fields[4] "\""
}
NF = 6
for(i = 1; i <= NF; i++) $i = fields[i]
print
}
' "$file" "$file" > "$path/accounts_new.csv"
# Log completion
echo "Processing complete. Output saved to: $path/accounts_new.csv"
I'm not as knowledgeable as you... maybe one day. Thank you very much !
|
|
|
08-21-2024, 07:46 AM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017
|
Ok, so it would seem you want to be using an awk script but feel it necessary to use bash to wrap it??
As with all scripts, the shebang can be changed to suit your needs, so the first thing I would do is change the following:
Code:
#!/bin/bash
--To
#!/usr/bin/awk -f
I will leave you to look up how to write the "Usage" and file tests.
Looking at your awk code:
NR = number of total records
FNR = number of records for a file
As you are reading a single file, any comparison between the two of these is pointless as they will always be the same.
And running over the file twice seems pointless if we are simply making the email column (correct me if I am wrong)
So as a quick throw together, but without the file testing you need to implement:
Code:
#!/usr/bin/awk -f
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"
OFS = ","
}
NR>1 && !($5 ~ /@/){
split($3,name,/ /)
$5 = tolower(substr(name[1],1,1)name[2]"@abc.com")
}
$1=$1
Let me know if you need me to explain any of it?
|
|
1 members found this post helpful.
|
08-21-2024, 12:43 PM
|
#7
|
Senior Member
Registered: Dec 2011
Location: Simplicity
Posts: 2,915
|
The following capitalizes after each hyphen or space.
Code:
...
# Capitalize words separated by [hyphen or space]
doupper = 1
cname = fields[3]
len = length(cname)
for (pos = 1; pos <= len; pos++) {
char = substr(cname, pos, 1)
if (doupper) {
cname = ( substr(cname, 1, pos-1) toupper(char) substr(cname, pos+1) )
doupper=0
}
if (char ~ /[- ]/) { doupper = 1 }
}
fields[3] = cname
split($3, name, / /)
email = substr(name[1], 1, 1) name[2]
email = tolower(email)
...
|
|
|
08-23-2024, 08:06 AM
|
#8
|
LQ Guru
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,820
|
For what it may be worth, long ago a guy named Larry Wall once decided that "awk" wasn't powerful enough for him. So, he invented perl, and kept it similar to "awk." You may wish to explore this language – and its vast(!) "CPAN" library of contributed code – as a possible power-tool where you are now using "awk." For example, there's probably a library-module out there that you can simply ... use.
@MadeInGermany: Please note that your proposed solution [erroneously ...] assumes that the first character in the string should be uppercased. Instead, you should test the first character when setting the initial value of the doupper flag.
Last edited by sundialsvcs; 08-23-2024 at 08:10 AM.
|
|
2 members found this post helpful.
|
08-23-2024, 12:29 PM
|
#9
|
Senior Member
Registered: Dec 2011
Location: Simplicity
Posts: 2,915
|
Test for what and how?
I think all names begin with a capital letter.
|
|
|
08-25-2024, 04:47 AM
|
#10
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017
|
@OP --
I was having a further play with this, are you always going to have input data that is inconsistent?
I am referring to the fact that not every row has the same amount of columns in it
|
|
|
08-25-2024, 05:25 AM
|
#11
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,494
|
Which distro is this for? Recent versions of AWK include CSV support. Check the manual page for AWK and see if --csv is mentioned.
Quote:
Originally Posted by sundialsvcs
For what it may be worth, long ago a guy named Larry Wall once decided that "awk" wasn't powerful enough for him. So, he invented perl, and kept it similar to "awk." You may wish to explore this language – and its vast(!) "CPAN" library of contributed code – as a possible power-tool where you are now using "awk." For example, there's probably a library-module out there that you can simply ... use.
|
Otherwise, Perl is the right idea. See CPAN's Text::CSV_XS.
|
|
|
All times are GMT -5. The time now is 03:17 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|