Clean CSV data as instructed

rbees · 01-31-2019, 10:51 AM

As always THANKS for helping us less experienced people get things working.

I am trying to get some "clean data" in a csv file. Columns 2 and 3 can be as much as 3 digit numbers, not sure that maters. Sample Data:

Code:

1 Chronicles,1,1,1, ､**,*,*,*,*
1 Chronicles,1,1,2,אָדָ֥ם,Adam,’ā·ḏām,121,Noun
1 Chronicles,1,1,3, ､**,*,*,*,*
1 Chronicles,1,1,4,שֵׁ֖ת,Sheth,šêṯ,8352,Noun
1 Chronicles,1,1,5, ､**,*,*,*,*
1 Chronicles,1,1,6,אֱנֽוֹשׁ׃,Enosh,’ĕ·nō·wōš.,583,Noun
1 Chronicles,1,2,1, ､**,*,*,*,*
1 Chronicles,1,2,2,קֵינָ֥ן,Kenan,qê·nān,7018,Noun
1 Chronicles,1,2,3, ､**,*,*,*,*
1 Chronicles,1,2,4,מַהֲלַלְאֵ֖ל,Mahalaleel,ma·hă·lal·’êl,4111,Noun
1 Chronicles,1,2,5, ､**,*,*,*,*
1 Chronicles,1,2,6,יָֽרֶד׃,Jered,yā·reḏ.,3382,Noun
1 Chronicles,1,3,1, ､**,*,*,*,*
1 Chronicles,1,3,2,חֲנ֥וֹךְ,Henoch,ḥă·nō·wḵ,2585,Noun
1 Chronicles,1,3,3, ､**,*,*,*,*
1 Chronicles,1,3,4,מְתוּשֶׁ֖לַח,Methuselah,mə·ṯū·še·laḥ,4968,Noun
1 Chronicles,1,3,5, ､**,*,*,*,*
1 Chronicles,1,3,6,לָֽמֶךְ׃,Lamech,lā·meḵ.,3929,Noun
1 Chronicles,1,4,1, ､**,*,*,*,*
1 Chronicles,1,4,2,נֹ֥חַ,Noah,nō·aḥ,5146,Noun
1 Chronicles,1,4,3, ､**,*,*,*,*
1 Chronicles,1,4,4,שֵׁ֖ם,Shem,šêm,8035,Noun
1 Chronicles,1,4,5, ､**,*,*,*,*
1 Chronicles,1,4,6,חָ֥ם,Ham,ḥām,2526,Noun
1 Chronicles,1,4,7,  .**,*,*,*,*
1 Chronicles,1,4,8,וָיָֽפֶת׃,and*Japheth,wā·yā·p̄eṯ.,3315,Noun
1 Chronicles,1,4,9,ס,*-*,s,*,*
1 Chronicles,1,5,1,בְּנֵ֣י,The*sons,bə·nê,1121,Noun
1 Chronicles,1,5,2, –**,*,*,*,*

In this sample data every other line needs to have columns 5,6,7,8,9 deleted while keeping the chapter, verse, and wordnr count in columns 2,3,4 correct by deleting them from the end. For instance verse one contains 6 lines. Lines 1, 3, and 5 need columns 5,6,7,8,and 9 deleted and lines 4, 5, and 6 need to have columns 1,2,3, and 4 deleted so that I end up with 3 lines instead of 6 lines for verse 1. Most of the rest of the data will not have so many close edits needed.

Debug output:

Code:

+ '[' awk -F, '$5==""' 5:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,5, ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' ､**,*,*,*,* ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' 6:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,6,אֱנֽוֹשׁ׃,Enosh,’ĕ·nō·wōš.,583,Noun ']'
./dataCleaner.sh: line 45: [: too many arguments

My bash code:

Code:

#!/bin/bash

# Script to correct hebrew.csv file
# Delete existing column 1 as it is not relevant
# Remove non-hebrew characters in column 5 (old 6) from Hebrew csv text file.
# Update column 4 (old 5) to correct number of words per verse.


# csv format
# id	book	chapter	verse	wordnr	word    concordance	   translit	strongs	lemma
# 1	2 Samuel   1	1	1	וַיְהִ֗י	Now*it*came*to*pass	way·hî,	1961	Verb
set -x

workDir=~/TheKing/tmp

# delete column 1
# while read p; do
#    cut -d, -f1 --complement $workDir/$p.csv > $workDir/$p.csv; 
# done < books.csv


# Characters in column 5 that trigger delete
#        Unicode character	Oct	Dec	Hex	HTML
# ､ halfwidth ideographic comma	0177544	65380	0xFF64	､
# .	full stop	        056	46	0x2E	.
# \x{A0}  no-break space	0240	160	0xA0	&nbsp;

for filename in "$workDir"/*.csv; do
    # Get number of chapters
    # cat New1Chronicles.csv  | cut -f2 -d, | uniq
    # where {(column 2=current chapter)
    #declare -a ChapterCount
    ChapterCount=$(cut -f2 -d, "$filename" | uniq)
    echo $ChapterCount
    for chapter in $ChapterCount; do
        # need specific line range
        # (column 3=current verse)
        editLines=$(grep -n $chapter $filename)
        echo $editLines
        for line in $editLines; do
            # if column 6 in line# is empty
            if [ awk -F, '$5==""' $line ]; then
                # (delete columns 5,6,7,8,9 from line) & (delete columns 1,2,3,4 from the last line)
                [ cut -d, -f5,6,7,8,9 --complement $line ] & [ cut -d, f1,2,3,4 --complement ${#line[@]} ]
            # remove last element in array
            unset 'line[-1]'
            fi
        done
    done
done

I have tried no (()) [] [[]] '' "" and in every configuration I think will work with no joy. shellcheck.net has not helped.

Code:

$ shellcheck myscript
 
Line 45:
            if [ awk -F, '$5==""' $line ]; then
            ^-- SC1009: The mentioned syntax error was in this if expression.
               ^-- SC1073: Couldn't parse this test expression. Fix to allow more checks.
                 ^-- SC1014: Use 'if cmd; then ..' to check exit code, or 'if [[ $(cmd) == .. ]]' to check output.
                       ^-- SC1035: You are missing a required space here.
                         ^-- SC1072: Expected test to end here (don't wrap commands in []/[[]]). Fix any mentioned problems and try again.
$

Google has not helped, or I am not able to make the connection between what I find and how to make my script work. Probably the latter. Currently two days into this. Thanks record breaking cold weather in the Midwest.

Thanks again.

Turbocapitalist · 01-31-2019, 11:07 AM

Can you just delete the lines containing " ､**,*,*,*,*" there?

Perhaps if you provide a sample of what your output should look like then it will be more clear.

l0f4r0 · 01-31-2019, 11:17 AM

Quote:

Originally Posted by rbees

if [ awk -F, '$5==""' $line ]

awk doesn't work that way.
You can just do:

Code:

awk -F, '$5==""{ some_actions }' <<<"$line"

NB: I'm pretty sure you don't need custom shell scripting to accomplish your task. However, I really cannot understand what you want here exactly.
Please give us an expected output sample as suggested by Turbocapitalist.

TB0ne · 01-31-2019, 11:47 AM

From this thread: https://www.linuxquestions.org/quest...ic-4175647055/

Rbees, I still don't know why you're doing this, at all. You are taking a sqlite DATABASE file, and trying to sort/add things to it as a CSV File. This is akin to disassembling your car, putting the parts in a backpack and walking somewhere, only to reassemble it; a whole ton of effort and time for no payback, when (if you used it as it was intended), you could save yourself both. Gave you a VERY simple method of importing this data into MySQL, where you can sort/order/extract based on a HUGE number of criteria/operations. Which, based on your other thread, is what you want when you do the front-end. Which you STILL won't get with a CSV file, no matter what you do.

You can easily add columns, up to a huge degree...I doubt you'd run out of space, as there are many multi-terabyte databases running just fine, with less-than-one-second response time. Further, a MySQL database can use different character sets, so accommodating the 'special' characters is easy. Hundreds of columns? Not a problem. And instead of using a single 'flat file' architecture, you import the data into ONE table, with the prime numbers in another, and use the primary key/record ID to do associations. None of this is particularly hard, and instead of trying to learn the VERY (for this) error-prone method of awk, you'd have actual useful data, ready to interface. Even via web browser.

Sorry to go back to the database solution, but for your end goal, its what you need. And it's faster to accomplish.

l0f4r0 · 01-31-2019, 11:55 AM

Just for the sake of technical challenge (so nothing to do if it's a good idea or not, please refer to what TB0ne said instead), is the following helping?

Code:

awk 'BEGIN{FS=OFS=","} { if(verse==$3){print $5,$6,$7,$8,$9} else{print $1,$2,$3":"; print $5,$6,$7,$8,$9; verse=$3} }' hebrew.csv | sed '/\*,\*,\*,\*/d'

It's really hard for me to help you more because hebrew characters are not supported from my side and you didn't provide us with any output sample...

rbees · 01-31-2019, 01:56 PM

Turbocapitalist

No. Chapter 1 verse 1 has three words. The data base has six lines. If I just delete lines 1,3,5 I will end up with wordnr's 2,4,6 not 1,2,3.

This is the same sample data:

Code:

1 Chronicles	1	1	1	אָדָ֥ם	Adam	’ā·ḏām	121	Noun
1 Chronicles	1	1	2	שֵׁ֖ת	Sheth	šêṯ	8352	Noun
1 Chronicles	1	1	3	אֱנֽוֹשׁ׃	Enosh	’ĕ·nō·wōš.	583	Noun
1 Chronicles	1	2	1	קֵינָ֥ן	Kenan	qê·nān	7018	Noun
1 Chronicles	1	2	2	מַהֲלַלְאֵ֖ל	Mahalaleel	ma·hă·lal·’êl	4111	Noun
1 Chronicles	1	2	3	יָֽרֶד׃	Jered	yā·reḏ.	3382	Noun
1 Chronicles	1	3	1	חֲנ֥וֹךְ	Henoch	ḥă·nō·wḵ	2585	Noun
1 Chronicles	1	3	2	מְתוּשֶׁ֖לַח	Methuselah	mə·ṯū·še·laḥ	4968	Noun
1 Chronicles	1	3	3	לָֽמֶךְ׃	Lamech	lā·meḵ.	3929	Noun
1 Chronicles	1	4	1	נֹ֥חַ	Noah	nō·aḥ	5146	Noun
1 Chronicles	1	4	2	שֵׁ֖ם	Shem	šêm	8035	Noun
1 Chronicles	1	4	3	חָ֥ם	Ham	ḥām	2526	Noun
1 Chronicles	1	4	4	וָיָֽפֶת׃	and*Japheth	wā·yā·p̄eṯ.	3315	Noun
1 Chronicles	1	4	5	ס	*-*	s	*	*

The last 5 fields of lines 1, 3, and 5 deleted AND the first 4 fields of lines 4, 5, and 6.

Another data sample:

Code:

Esther	2	8	1	 ､**	*	*	*	*
Esther	2	8	2	וַיְהִ֗י	so*it*came*to*pass	way·hî,	1961	Verb
Esther	2	8	3	 ､**	*	*	*	*
Esther	2	8	4	בְּהִשָּׁמַ֤ע	was*heard	bə·hiš·šā·ma‘	8085	Verb
Esther	2	8	5	דְּבַר־	command	də·ḇar-	1697	Noun
Esther	2	8	6	הַמֶּ֙לֶךְ֙	of*when*the*king	ham·me·leḵ	4428	Noun
Esther	2	8	7	וְדָת֔וֹ	and*his*decree	wə·ḏā·ṯōw,	1881	Noun
Esther	2	8	8	וּֽבְהִקָּבֵ֞ץ	and*were*gathered*together	ū·ḇə·hiq·qā·ḇêṣ	6908	Verb
Esther	2	8	9	נְעָר֥וֹת	maidens	nə·‘ā·rō·wṯ	5291	Noun
Esther	2	8	10	רַבּ֛וֹת	when*many	rab·bō·wṯ	7227	Adj
Esther	2	8	11	אֶל־	unto	’el-	413	Prep
Esther	2	8	12	שׁוּשַׁ֥ן	Shushan	šū·šan	7800	Noun
Esther	2	8	13	 ､**	*	*	*	*
Esther	2	8	14	הַבִּירָ֖ה	the*palace	hab·bî·rāh	1002	Noun
Esther	2	8	15	אֶל־	unto	’el-	413	Prep
Esther	2	8	16	יַ֣ד	the*custody	yaḏ	3027	Noun
Esther	2	8	17	 ､**	*	*	*	*
Esther	2	8	18	הֵגָ֑י	of*Hegai	hê·ḡāy;	1896	Noun
Esther	2	8	19	וַתִּלָּקַ֤ח	that*was*brought	wat·til·lā·qaḥ	3947	Verb
Esther	2	8	20	אֶסְתֵּר֙	Esther	’es·têr	635	Noun
Esther	2	8	21	אֶל־	also*unto	’el-	413	Prep
Esther	2	8	22	 ､**	*	*	*	*
Esther	2	8	23	בֵּ֣ית	house	bêṯ	1004	Noun
Esther	2	8	24	הַמֶּ֔לֶךְ	of*the*king	ham·me·leḵ,	4428	Noun
Esther	2	8	25	אֶל־	unto	’el-	413	Prep
Esther	2	8	26	יַ֥ד	the*custody	yaḏ	3027	Noun
Esther	2	8	27	 ､**	*	*	*	*
Esther	2	8	28	הֵגַ֖י	of*Hegai	hê·ḡay	1896	Noun
Esther	2	8	29	שֹׁמֵ֥ר	keeper	šō·mêr	8104	Verb
Esther	2	8	30	  .**	*	*	*	*
Esther	2	8	31	הַנָּשִֽׁים׃	of*the*women	han·nā·šîm.	802	Noun

Corrected data sample deleting 7 half lines from the right side and 7 half lines from the left side.

Code:

Esther	2	8	1	וַיְהִ֗י	so*it*came*to*pass	way·hî,	1961	Verb
Esther	2	8	2	בְּהִשָּׁמַ֤ע	was*heard	bə·hiš·šā·ma‘	8085	Verb
Esther	2	8	3	דְּבַר־	command	də·ḇar-	1697	Noun
Esther	2	8	4	הַמֶּ֙לֶךְ֙	of*when*the*king	ham·me·leḵ	4428	Noun
Esther	2	8	5	וְדָת֔וֹ	and*his*decree	wə·ḏā·ṯōw,	1881	Noun
Esther	2	8	6	וּֽבְהִקָּבֵ֞ץ	and*were*gathered*together	ū·ḇə·hiq·qā·ḇêṣ	6908	Verb
Esther	2	8	7	נְעָר֥וֹת	maidens	nə·‘ā·rō·wṯ	5291	Noun
Esther	2	8	8	רַבּ֛וֹת	when*many	rab·bō·wṯ	7227	Adj
Esther	2	8	9	אֶל־	unto	’el-	413	Prep
Esther	2	8	10	שׁוּשַׁ֥ן	Shushan	šū·šan	7800	Noun
Esther	2	8	11	הַבִּירָ֖ה	the*palace	hab·bî·rāh	1002	Noun
Esther	2	8	12	אֶל־	unto	’el-	413	Prep
Esther	2	8	13	יַ֣ד	the*custody	yaḏ	3027	Noun
Esther	2	8	14	הֵגָ֑י	of*Hegai	hê·ḡāy;	1896	Noun
Esther	2	8	15	וַתִּלָּקַ֤ח	that*was*brought	wat·til·lā·qaḥ	3947	Verb
Esther	2	8	16	אֶסְתֵּר֙	Esther	’es·têr	635	Noun
Esther	2	8	17	אֶל־	also*unto	’el-	413	Prep
Esther	2	8	18	בֵּ֣ית	house	bêṯ	1004	Noun
Esther	2	8	19	הַמֶּ֔לֶךְ	of*the*king	ham·me·leḵ,	4428	Noun
Esther	2	8	20	אֶל־	unto	’el-	413	Prep
Esther	2	8	21	יַ֥ד	the*custody	yaḏ	3027	Noun
Esther	2	8	22	הֵגַ֖י	of*Hegai	hê·ḡay	1896	Noun
Esther	2	8	23	שֹׁמֵ֥ר	keeper	šō·mêr	8104	Verb
Esther	2	8	24	הַנָּשִֽׁים׃	of*the*women	han·nā·šîm.	802	Noun

This example does not seam to be affected by using grep to extract it verses awk. Note wordnr 20/16.

TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.

Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.

l0f4r0, Looks like it might be if I can get my head around what it is doing.

awk 'BEGIN{FS=OFS=","} { if(verse==$3){print $5,$6,$7,$8,$9} else{print $1,$2,$3":"; print $5,$6,$7,$8,$9; verse=$3} }' hebrew.csv | sed '/\*,\*,\*,\*/d'

To break it down the way I understand it. And I know I am missing things......
I assume that {FS=OFS=","} is setting the field separator.
if(verse==$3) do this or that. How is $3 filled?

This
cut -d, -f1,2,3,4 --complement
makes more sense to me than the sed above

So on to my own work. If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?

I changed the failing awk line to

Code:

awk -F, '$5==""{cut -d, -f5,6,7,8,9 --complement $line }' <<< $line

but it occurs to me that it will be better to start a counter for each verse rotation to store the number of right side deletions in so that when it gets done with the right side I can just delete that many left sides from the bottom up to align things correctly again.

And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.

Thanks

l0f4r0 · 01-31-2019, 02:17 PM

Quote:

Originally Posted by rbees

I assume that {FS=OFS=","} is setting the field separator.

Yes. FS=Field Separator. OFS=Output Field Separator.

Quote:

Originally Posted by rbees

if(verse==$3) do this or that. How is $3 filled?

It's automatic. $3 is the third field (delimited by FS) from each line read by awk.

Quote:

Originally Posted by rbees

This

Code:

cut -d, -f1,2,3,4 --complement

makes more sense to me than the sed above

It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".

Quote:

Originally Posted by rbees

If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?

No, a child process can never change a parent variable. Any change inside child will be lost after child exits.
Exporting only allows child processes to be aware of the parent variables.

rbees · 01-31-2019, 02:49 PM

Quote:

It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".

I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.

Whole lines or whole columns are easy. Lines made of parts, not so easy. I have the logic down, it is getting the commands to do said logic is my issue.

Quote:

Exporting only allows child processes to be aware of the parent variables.

More rethinking required.

Thanks

TB0ne · 01-31-2019, 02:50 PM

Quote:

Originally Posted by rbees

TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.

Sorry, no. The ENTIRE COMMAND to add a column to a database is

Code:

alter database <database name> add column <colum name> <type>
(say your database is torah and you're adding a column called book that has text)
alter database torah add column book text

That's it...certainly won't take six months to type that in.

Quote:

Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.

Then you should stop now, because you're missing the point. This:

Quote:

Originally Posted by rbees

And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.

..indicates you're ALREADY hitting the limitations of what you can do with the tools you're using. And from your other thread, you said:

Quote:

Originally Posted by rbees

I have considered Perl for later in the project as I would like to have a nice user interface to the eventual data output. But I don't have any experience with anything but Bash. Getting this database fixed and expandable is the first step.

...and...

Quote:

Originally Posted by rbees

This csv file is some 300,000 lines long and will eventually grow to a million plus.

If you want a 'nice user interface', you're not going to get far trying to interface with a text file. You specifically mention a database as well. How long does it take you to pull up the 300,000 line file in an editor now? Multiply that by four, which is how long your project/program is going to take to load EACH TIME you run it at a million lines. How long does it take you to search for a term in that 300,000 lines now?? Again: x4, just for a SIMPLE SEARCH. How usable is your program/project going to be in reality, if you continue down the path you're on??? Do you expect your users (even if it's just you), to wait 15-30 minutes for a result, especially if the same result could be had in under a second??

Using a relational database is the *ONLY WAY* you are going to get your project/program working; and your data is ALREADY IN A DATABASE FORMAT. Adding a column to a database is trivial. Loading your existing sqlite database to a more scalable one (MySQL/MariaDB) is also trivial. You could have had ALL YOUR DATA already usable, searchable and working within 30 minutes at most. Use it or not, that's obviously your call, but it's certainly not wrong to suggest that route.

rbees · 01-31-2019, 03:12 PM

Quote:

The ENTIRE COMMAND to add a column to a database is

I don't need to add a column right now. I need the data to be clean.

Quote:

If you want a 'nice user interface', you're not going to get far trying to interface with a text file.

I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.

So please explain how I am suppose to

Quote:

I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.

Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql.

So please post a mysql that will process said logic.

TB0ne · 01-31-2019, 03:47 PM

Quote:

Originally Posted by rbees

I don't need to add a column right now. I need the data to be clean.

...and...

Quote:

I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.

Right; which is what you already have.

Quote:

So please explain how I am suppose to Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql. So please post a mysql that will process said logic.

Did you read your other thread, or are just missing what's being said:

Your data is **ALREADY CLEAN**, it is YOU that is getting it in a poor state with the repeated awk/sed/whatever-commands. Your data is ALREADY in a sqlite database file, which is parsed and loadable. There is **ZERO NEED** to do anything else to it.
You were given exact commands and a link in your other thread that tells you EXACTLY what to do, and how to do it.

If you honestly think you're going to be able to code this in bash and get data relations done on a million-plus lines with what you want, you are sadly mistaken. This is going to require that you learn how to write real code; be it in python, perl, C, or whatever language that can do such things. Bash was never designed to handle data of this size. Again, your user interface won't be able to interface well (if at ALL) with a text file, not to mention the huge time-lags you'll have trying to do anything with such a big text file.

You said that step one is getting clean data; you already have it, since you downloaded a sqlite database file which was already clean. Your step one is already done, but you don't want to use it. Step two is adding the columns you want; you were ALSO given a command on how to do that. Beyond that is steps 3, 4, etc....and you are going to have to perform those steps. And when you say:

Quote:

Originally Posted by rbees

I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.

This is where *YOUR* program comes in. A database gives you lookup values and ways to manipulate the data, nothing more. Your program interfaces with it...so read the records one at a time...look at the field, perform your tests on it, and act accordingly. Whether that's putting ANOTHER value into a different place in the database, telling the user something, adding something, or printing something out, that is what YOU write. And you're going to *HAVE TO* write this code in something other than bash, period.

If you want to learn nothing new and continue with an error-filled text file, that gives you an unusable program when you're done, keep going. Not much more to tell you; you asked folks who are experienced for their advice, but you seem to want to ignore it. Your call.

l0f4r0 · 01-31-2019, 03:55 PM

Ahhhh, okay I think I get it now.
Is the following what you want?

Code:

awk 'BEGIN{FS=OFS=","} !/\*,\*,\*,\*/{ if(verse==$3){print $1,$2,$3,++increment,$5,$6,$7,$8,$9} else{increment=1;print $1,$2,$3,increment,$5,$6,$7,$8,$9; verse=$3} }' hebrew.csv

Turbocapitalist · 02-01-2019, 04:19 AM

It's a bit silly but if you are familiar with spreadsheets, you could import the file into LibreOffice (or Calligra) and work with it there. Based on the one data sample it looks like you are just sliding 'cells' upward as the ones above them are deleted. It can be done in AWK or Perl, but is a bit of fiddle.

rbees · 02-01-2019, 06:41 AM

l0f4r0; no joy. It removes a lot of entries in other columns randomly.

Turbocapitalist; That is actually how I cleaned up the data I posted. If I was a spreadsheet guru I could write a macro to do it, but sadly I am not and never will be.

Truth be that I will never get beyond grade school bash. I know me.

You posted this

Code:

awk '$2=="2 Samuel"' FS=',' < file > newfile

in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.

I don't have much time this morning for testing with a "fresh brain" and will not be able to get back to it for a few days with Shabbat and short work hours that I need to make up.

Thanks

Turbocapitalist · 02-01-2019, 06:50 AM

Thanks.

Quote:

Originally Posted by rbees

You posted this

Code:

awk '$2=="2 Samuel"' FS=',' < file > newfile

in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.

That prints out the whole row regardless of the size of any of the numbers in any of the fields, but it does that only if the second column matches what's in the quotes. You could make it more advanced and more complex such that it sorts the different books into separate files. However, since both SQL and AWK (and by extension perl) seem to be out of consideration that may leave only manual intervention using a spreadsheet as the last remaining option.

If I understand the question based on the sample in #6 above you are just moving fields forward one or more rows to fill in earlier empty fields from later filled fields. I'm not sure how that retains the integrity of the data set since the basic unit of data is a record (line) and moving some fields from one record (line) to another breaks that. Sorry to be obtuse but perhaps you could provide a second short data sample with matching output to supplements #6?