LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-20-2024, 01:00 PM   #1
anothernoname
LQ Newbie
 
Registered: Aug 2023
Posts: 14

Rep: Reputation: 1
warning: regexp escape sequence "' is not a known regexp operator


Hello and sorry to bother you but I`m trying to run this script and for some reason I can`t because of this error:
$ bash task1.sh accounts.csv awk: cmd. line:31: warning: regexp escape sequence "' is not a known regexp operator

To me the line it`s "empty"
This is the script, if this is not where it should be, please forgive me and let me know where should I ask this question.

Code:
#!/bin/bash

# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]
then
    echo "Usage: ./task1.sh /path/to/accounts.csv"
    exit 0
fi

file=$1

# Exit if provided file doesn't exist
if [ ! -f $file ]
then
    echo "File $file doesn't exist"
    exit 1
fi

# Extract directory from file
path=$(dirname $file)

# Processing csv file with awk
awk '
    # Set Field Separator and Output Field Separators
    BEGIN { FS=","; OFS=",";}

    # Skip first row, as it contains only column names
    NR == 1 {
        print
    }

    # First pass through file to check for uniqueness of emails
    NR == FNR {
        # 3rd field contains name
        # splitting name to first name and last name
        split($3, name, / /)
        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)
        ++counter[email]
    }

    # Second pass through file, skipping first line
    NR > FNR && FNR != 1{

        # Create an array from fields as the default FS processes
        # quoted commas incorrectly

        j=0
        inside_quotes=0
        for(i=1;i<=NF;i++) {
            # Opening quote, save to new field,
            # set inside_quotes to true
            if($i ~ /^\"/) {
                inside_quotes=1
                j++
                fields[j] = $i
            }
            # Closing quote, append to last field, set inside_quote to false
            else if($i ~ /\"$/)  {
                inside_quotes=0
                fields[j] = fields[j] OFS $i
            }
            # middle of quoted text, append to last field
            else if (inside_quotes==1) {
                fields[j] = fields[j] OFS $i
            }
            # outside of quotes, save to new field
            else {
                j++
                fields[j] = $i
            }
        }
        # fields[3] contains name 
        # Split name by space
        split(fields[3], name, / /)
        # Change the first character to uppercase, all other characters to lower case
        name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
        name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
        # Change the 3rd field to new value
        fields[3] = name[1] " " name[2]

        # email format: flast_name@abc.com
        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)

        # if the email is not unique, append location id
        if (counter[email] > 1) email=email fields[2]
        fields[5] = email "@abc.com"

        # Set new values for all 6 columns
        NF=6
        for(i=1;i<=NF;i++) $i=fields[i]
        print
    }
' $file $file > $path/accounts_new.csv
 
Old 08-20-2024, 02:12 PM   #2
Racho
Member
 
Registered: Oct 2021
Posts: 62

Rep: Reputation: Disabled
Try
Code:
            if($i ~ /^"/) {
and
Code:
            else if($i ~ /"$/)  {

instead of
Code:
            if($i ~ /^\"/) {
and
Code:
            else if($i ~ /\"$/)  {
 
3 members found this post helpful.
Old 08-20-2024, 02:20 PM   #3
anothernoname
LQ Newbie
 
Registered: Aug 2023
Posts: 14

Original Poster
Rep: Reputation: 1
Thank you ! But now it`s another issue ...
So we have this initial .csv file...

Code:
id,location_id,name,title,email,department
1,1,Susan houston,Director of Services,,
2,1,Christina Gonzalez,Director,,
3,2,Brenda brown,"Director, Second Career Services",,
4,3,Howard Lader,"Manager, Senior Counseling",,
5,4,Kimberly Pesavento,Commercial director,,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,,
7,6,peter Olson,Director,,
8,6,Bart charlow,Executive Director,,
9,7,Bart Charlow,Executive Director,,
10,7,Barbara Kalt,Director,,
11,8,Marilyn Baker-Venturini,Director,,
12,8,Graciela Hernandez,Assistant Manager,,
13,8,Julie avelino,Assessment Specialist,,
14,9,Dave Genesy,Library Director,,
15,9,maria kramer,Library Divisions Manager,,
16,10,Dave Genesy,Tester,,
17,10,Maria kramer,Library Division Manager,,
18,11,Dave Genesy,Head of office,,
19,11,Elizabeth Meeks,Branch Manager,,
20,12,Kathy Endaya,Director,,
21,13,dave genesy,Library Director,,
22,14,Andres Espinoza,"Manager, Commanding Officer",,
23,15,Jack Phillips,Administrator,,
24,16,James Lee,Commanding Officer,,
25,17,Kenneth Gibson,Tester,,
26,18,Sharon Petersen,Administrator,,
27,19,Sharon Petersen,Administrator,,
28,21,Moncef Salah,Tester,,Office of Innovation
29,22,Suzanne Badenhoop,Tester,suzanne@example.com,Referrals
30,20,Sean Houston,Director of new Services,,
31,8,David Genesy,Account Manager,,
32,8,Elizabeth Feeney,CEO,e.feeney@foobar.org,Operations
33,8,Erika Meeks,Tester,e.meeks@foobar.org,Operations
I`ve updated the script to
Code:
#!/bin/bash

# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]; then
    echo "Usage: ./task1.sh /path/to/accounts.csv"
    exit 0
fi

file=$1

# Exit if provided file doesn't exist
if [ ! -f "$file" ]; then
    echo "File $file doesn\'t exist"
    exit 1
fi

# Extract directory from file
path=$(dirname "$file")

# Processing csv file with awk
awk '
    BEGIN { FS=","; OFS=","; }

    NR == 1 {
        print
        next
    }

    NR == FNR {
        split($3, name, / /)
        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)
        ++counter[email]
        next
    }

    FNR > 1 {
        j = 0
        inside_quotes = 0
        for(i = 1; i <= NF; i++) {
            if($i ~ /^"/) {
                inside_quotes = 1
                j++
                fields[j] = substr($i, 2)
            } else if($i ~ /"$/)  {
                inside_quotes = 0
                fields[j] = fields[j] OFS substr($i, 1, length($i) - 1)
            } else if (inside_quotes == 1) {
                fields[j] = fields[j] OFS $i
            } else {
                j++
                fields[j] = $i
            }
        }
        split(fields[3], name, / /)
        name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
        name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
        fields[3] = name[1] " " name[2]

        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)

        if (counter[email] > 1) email = email fields[2]
        fields[5] = email "@abc.com"

        NF = 6
        for(i = 1; i <= NF; i++) $i = fields[i]
        print
    }
' "$file" "$file" > "$path/accounts_new.csv"
now it`s almost right , output:

Code:
id,location_id,name,title,email,department
1,1,Susan Houston,Director of Services,shouston1@abc.com,
2,1,Christina Gonzalez,Director,cgonzalez@abc.com,
3,2,Brenda Brown,Director, Second Career Services,bbrown@abc.com,
4,3,Howard Lader,Manager, Senior Counseling,hlader@abc.com,
5,4,Kimberly Pesavento,Commercial director,kpesavento@abc.com,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,jbloom@abc.com,
7,6,Peter Olson,Director,polson@abc.com,
8,6,Bart Charlow,Executive Director,bcharlow6@abc.com,
9,7,Bart Charlow,Executive Director,bcharlow7@abc.com,
10,7,Barbara Kalt,Director,bkalt@abc.com,
11,8,Marilyn Baker-venturini,Director,mbaker-venturini@abc.com, 
12,8,Graciela Hernandez,Assistant Manager,ghernandez@abc.com,
13,8,Julie Avelino,Assessment Specialist,javelino@abc.com,
14,9,Dave Genesy,Library Director,dgenesy9@abc.com,
15,9,Maria Kramer,Library Divisions Manager,mkramer9@abc.com,
16,10,Dave Genesy,Tester,dgenesy10@abc.com,
17,10,Maria Kramer,Library Division Manager,mkramer10@abc.com,
18,11,Dave Genesy,Head of office,dgenesy11@abc.com,
19,11,Elizabeth Meeks,Branch Manager,emeeks11@abc.com,
20,12,Kathy Endaya,Director,kendaya@abc.com,
21,13,Dave Genesy,Library Director,dgenesy13@abc.com,
22,14,Andres Espinoza,Manager, Commanding Officer,aespinoza@abc.com,
23,15,Jack Phillips,Administrator,jphillips@abc.com,
24,16,James Lee,Commanding Officer,jlee@abc.com,
25,17,Kenneth Gibson,Tester,kgibson@abc.com,
26,18,Sharon Petersen,Administrator,spetersen18@abc.com,
27,19,Sharon Petersen,Administrator,spetersen19@abc.com,
28,21,Moncef Salah,Tester,msalah@abc.com,Office of Innovation
29,22,Suzanne Badenhoop,Tester,sbadenhoop@abc.com,Referrals
30,20,Sean Houston,Director of new Services,shouston20@abc.com,
31,8,David Genesy,Account Manager,dgenesy8@abc.com,
32,8,Elizabeth Feeney,CEO,efeeney@abc.com,Operations
33,8,Erika Meeks,Tester,emeeks8@abc.com,Operations
should be -Venturini, not with (small) v. But thank you !
 
Old 08-20-2024, 04:03 PM   #4
Racho
Member
 
Registered: Oct 2021
Posts: 62

Rep: Reputation: Disabled
If the capitalization in the original csv is right, why are you doing this?
Code:
        split(fields[3], name, / /)
        name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
        name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))
        fields[3] = name[1] " " name[2]
If I were you, I would trust the capitalization of the original data in the .csv file, if you don't you will have to foresee a huge amount of situations:
after - a capital letter, allways?
what about compound nouns like "María del Carmen" or "d'Alembert"?
did someone forgot a space after the "." in J.F. Kennedy?
Are tolower and toupper behaving properly with characters like á à ñ ü ç...?
...
 
3 members found this post helpful.
Old 08-20-2024, 07:17 PM   #5
anothernoname
LQ Newbie
 
Registered: Aug 2023
Posts: 14

Original Poster
Rep: Reputation: 1
I've managed to solve the "requirements" using this script:
Code:
#!/bin/bash

# Exit if path to accounts.csv file not provided as argument
if [ $# -lt 1 ]; then
    echo "Usage: ./task1.sh /path/to/accounts.csv"
    exit 0
fi

file=$1

# Exit if provided file doesn't exist
if [ ! -f "$file" ]; then
    echo "File $file doesn\'t exist"
    exit 1
fi

# Extract directory from file
path=$(dirname "$file")

# Processing csv file with awk
awk '
    BEGIN { FS=","; OFS=","; }

    NR == 1 {
        print
        next
    }

    NR == FNR {
        split($3, name, / /)
        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)
        ++counter[email]
        next
    }

    FNR > 1 {
        j = 0
        inside_quotes = 0
        for(i = 1; i <= NF; i++) {
            if($i ~ /^"/) {
                inside_quotes = 1
                j++
                fields[j] = substr($i, 2)
            } else if($i ~ /"$/)  {
                inside_quotes = 0
                fields[j] = fields[j] OFS substr($i, 1, length($i) - 1)
            } else if (inside_quotes == 1) {
                fields[j] = fields[j] OFS $i
            } else {
                j++
                fields[j] = $i
            }
        }

        split(fields[3], name, / /)
        name[1] = toupper(substr(name[1], 1, 1)) tolower(substr(name[1], 2))
        name[2] = toupper(substr(name[2], 1, 1)) tolower(substr(name[2], 2))

        # Capitalize after hyphen in the last name
        if (name[2] ~ /-/) {
            split(name[2], hyphenated, "-")
            for (k in hyphenated) {
                hyphenated[k] = toupper(substr(hyphenated[k], 1, 1)) tolower(substr(hyphenated[k], 2))
            }
            name[2] = hyphenated[1] "-" hyphenated[2]
        }

        fields[3] = name[1] " " name[2]

        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)

        if (counter[email] > 1) email = email fields[2]
        fields[5] = email "@abc.com"

        # Enclose title in quotes if it contains a comma
        if (fields[4] ~ /,/) {
            fields[4] = "\"" fields[4] "\""
        }

        NF = 6
        for(i = 1; i <= NF; i++) $i = fields[i]
        print
    }
' "$file" "$file" > "$path/accounts_new.csv"

# Log completion
echo "Processing complete. Output saved to: $path/accounts_new.csv"
I'm not as knowledgeable as you... maybe one day. Thank you very much !
 
Old 08-21-2024, 07:46 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017

Rep: Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196
Ok, so it would seem you want to be using an awk script but feel it necessary to use bash to wrap it??

As with all scripts, the shebang can be changed to suit your needs, so the first thing I would do is change the following:
Code:
#!/bin/bash

--To

#!/usr/bin/awk -f
I will leave you to look up how to write the "Usage" and file tests.

Looking at your awk code:

NR = number of total records
FNR = number of records for a file

As you are reading a single file, any comparison between the two of these is pointless as they will always be the same.
And running over the file twice seems pointless if we are simply making the email column (correct me if I am wrong)

So as a quick throw together, but without the file testing you need to implement:
Code:
#!/usr/bin/awk -f

BEGIN{
	FPAT = "([^,]+)|(\"[^\"]+\")"
	OFS = ","
}

NR>1 && !($5 ~ /@/){
	split($3,name,/ /)
	$5 = tolower(substr(name[1],1,1)name[2]"@abc.com")
}

$1=$1
Let me know if you need me to explain any of it?
 
1 members found this post helpful.
Old 08-21-2024, 12:43 PM   #7
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,915

Rep: Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236
The following capitalizes after each hyphen or space.
Code:
...

        # Capitalize words separated by [hyphen or space]
        doupper = 1
        cname = fields[3]
        len = length(cname)
        for (pos = 1; pos <= len; pos++) {
            char = substr(cname, pos, 1)
            if (doupper) {
                cname = ( substr(cname, 1, pos-1) toupper(char) substr(cname, pos+1) )
                doupper=0
            }
            if (char ~ /[- ]/) { doupper = 1 }
        }

        fields[3] = cname

        split($3, name, / /)
        email = substr(name[1], 1, 1) name[2]
        email = tolower(email)

...
 
Old 08-23-2024, 08:06 AM   #8
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,820
Blog Entries: 4

Rep: Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984Reputation: 3984
For what it may be worth, long ago a guy named Larry Wall once decided that "awk" wasn't powerful enough for him. So, he invented perl, and kept it similar to "awk." You may wish to explore this language – and its vast(!) "CPAN" library of contributed code – as a possible power-tool where you are now using "awk." For example, there's probably a library-module out there that you can simply ... use.

@MadeInGermany: Please note that your proposed solution [erroneously ...] assumes that the first character in the string should be uppercased. Instead, you should test the first character when setting the initial value of the doupper flag.

Last edited by sundialsvcs; 08-23-2024 at 08:10 AM.
 
2 members found this post helpful.
Old 08-23-2024, 12:29 PM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,915

Rep: Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236Reputation: 1236
Test for what and how?
I think all names begin with a capital letter.
 
Old 08-25-2024, 04:47 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017

Rep: Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196
@OP --
I was having a further play with this, are you always going to have input data that is inconsistent?
I am referring to the fact that not every row has the same amount of columns in it
 
Old 08-25-2024, 05:25 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,494
Blog Entries: 3

Rep: Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812Reputation: 3812
Which distro is this for? Recent versions of AWK include CSV support. Check the manual page for AWK and see if --csv is mentioned.

Quote:
Originally Posted by sundialsvcs View Post
For what it may be worth, long ago a guy named Larry Wall once decided that "awk" wasn't powerful enough for him. So, he invented perl, and kept it similar to "awk." You may wish to explore this language – and its vast(!) "CPAN" library of contributed code – as a possible power-tool where you are now using "awk." For example, there's probably a library-module out there that you can simply ... use.
Otherwise, Perl is the right idea. See CPAN's Text::CSV_XS.
 
  


Reply

Tags
awk, csv


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk warning on escape sequence Faki Linux - Newbie 9 05-14-2022 08:10 AM
shell: how to escape (or not escape) $ within an echo statement? Paul_N Linux - Newbie 2 04-05-2016 01:59 AM
[SOLVED] C++ Operator Overloading Within an Already Overloaded Operator mirlin510 Programming 8 04-17-2011 12:02 PM
Any way to disable the escape sequence with "screen"? slinx Linux - General 2 01-06-2010 09:04 AM
Help With Java Problem Please"""""""""""" suemcholan Linux - Newbie 1 04-02-2008 06:02 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration