LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-31-2019, 10:51 AM   #1
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Rep: Reputation: 46
Clean CSV data as instructed


As always THANKS for helping us less experienced people get things working.

I am trying to get some "clean data" in a csv file. Columns 2 and 3 can be as much as 3 digit numbers, not sure that maters. Sample Data:

Code:
1 Chronicles,1,1,1, 、**,*,*,*,*
1 Chronicles,1,1,2,אָדָ֥ם,Adam,’ā·ḏām,121,Noun
1 Chronicles,1,1,3, 、**,*,*,*,*
1 Chronicles,1,1,4,שֵׁ֖ת,Sheth,šêṯ,8352,Noun
1 Chronicles,1,1,5, 、**,*,*,*,*
1 Chronicles,1,1,6,אֱנֽוֹשׁ׃,Enosh,’ĕ·nō·wōš.,583,Noun
1 Chronicles,1,2,1, 、**,*,*,*,*
1 Chronicles,1,2,2,קֵינָ֥ן,Kenan,qê·nān,7018,Noun
1 Chronicles,1,2,3, 、**,*,*,*,*
1 Chronicles,1,2,4,מַהֲלַלְאֵ֖ל,Mahalaleel,ma·hă·lal·’êl,4111,Noun
1 Chronicles,1,2,5, 、**,*,*,*,*
1 Chronicles,1,2,6,יָֽרֶד׃,Jered,yā·reḏ.,3382,Noun
1 Chronicles,1,3,1, 、**,*,*,*,*
1 Chronicles,1,3,2,חֲנ֥וֹךְ,Henoch,ḥă·nō·wḵ,2585,Noun
1 Chronicles,1,3,3, 、**,*,*,*,*
1 Chronicles,1,3,4,מְתוּשֶׁ֖לַח,Methuselah,mə·ṯū·še·laḥ,4968,Noun
1 Chronicles,1,3,5, 、**,*,*,*,*
1 Chronicles,1,3,6,לָֽמֶךְ׃,Lamech,lā·meḵ.,3929,Noun
1 Chronicles,1,4,1, 、**,*,*,*,*
1 Chronicles,1,4,2,נֹ֥חַ,Noah,nō·aḥ,5146,Noun
1 Chronicles,1,4,3, 、**,*,*,*,*
1 Chronicles,1,4,4,שֵׁ֖ם,Shem,šêm,8035,Noun
1 Chronicles,1,4,5, 、**,*,*,*,*
1 Chronicles,1,4,6,חָ֥ם,Ham,ḥām,2526,Noun
1 Chronicles,1,4,7,  .**,*,*,*,*
1 Chronicles,1,4,8,וָיָֽפֶת׃,and*Japheth,wā·yā·p̄eṯ.,3315,Noun
1 Chronicles,1,4,9,ס,*-*,s,*,*
1 Chronicles,1,5,1,בְּנֵ֣י,The*sons,bə·nê,1121,Noun
1 Chronicles,1,5,2, –**,*,*,*,*
In this sample data every other line needs to have columns 5,6,7,8,9 deleted while keeping the chapter, verse, and wordnr count in columns 2,3,4 correct by deleting them from the end. For instance verse one contains 6 lines. Lines 1, 3, and 5 need columns 5,6,7,8,and 9 deleted and lines 4, 5, and 6 need to have columns 1,2,3, and 4 deleted so that I end up with 3 lines instead of 6 lines for verse 1. Most of the rest of the data will not have so many close edits needed.

Debug output:
Code:
+ '[' awk -F, '$5==""' 5:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,5, ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' 、**,*,*,*,* ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' 6:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,6,אֱנֽוֹשׁ׃,Enosh,’ĕ·nō·wōš.,583,Noun ']'
./dataCleaner.sh: line 45: [: too many arguments
My bash code:
Code:
#!/bin/bash

# Script to correct hebrew.csv file
# Delete existing column 1 as it is not relevant
# Remove non-hebrew characters in column 5 (old 6) from Hebrew csv text file.
# Update column 4 (old 5) to correct number of words per verse.


# csv format
# id	book	chapter	verse	wordnr	word    concordance	   translit	strongs	lemma
# 1	2 Samuel   1	1	1	וַיְהִ֗י	Now*it*came*to*pass	way·hî,	1961	Verb
set -x

workDir=~/TheKing/tmp

# delete column 1
# while read p; do
#    cut -d, -f1 --complement $workDir/$p.csv > $workDir/$p.csv; 
# done < books.csv


# Characters in column 5 that trigger delete
#        Unicode character	Oct	Dec	Hex	HTML
# 、 halfwidth ideographic comma	0177544	65380	0xFF64	、
# .	full stop	        056	46	0x2E	.
# \x{A0}  no-break space	0240	160	0xA0	&nbsp;

for filename in "$workDir"/*.csv; do
    # Get number of chapters
    # cat New1Chronicles.csv  | cut -f2 -d, | uniq
    # where {(column 2=current chapter)
    #declare -a ChapterCount
    ChapterCount=$(cut -f2 -d, "$filename" | uniq)
    echo $ChapterCount
    for chapter in $ChapterCount; do
        # need specific line range
        # (column 3=current verse)
        editLines=$(grep -n $chapter $filename)
        echo $editLines
        for line in $editLines; do
            # if column 6 in line# is empty
            if [ awk -F, '$5==""' $line ]; then
                # (delete columns 5,6,7,8,9 from line) & (delete columns 1,2,3,4 from the last line)
                [ cut -d, -f5,6,7,8,9 --complement $line ] & [ cut -d, f1,2,3,4 --complement ${#line[@]} ]
            # remove last element in array
            unset 'line[-1]'
            fi
        done
    done
done

I have tried no (()) [] [[]] '' "" and in every configuration I think will work with no joy. shellcheck.net has not helped.

Code:
$ shellcheck myscript
 
Line 45:
            if [ awk -F, '$5==""' $line ]; then
            ^-- SC1009: The mentioned syntax error was in this if expression.
               ^-- SC1073: Couldn't parse this test expression. Fix to allow more checks.
                 ^-- SC1014: Use 'if cmd; then ..' to check exit code, or 'if [[ $(cmd) == .. ]]' to check output.
                       ^-- SC1035: You are missing a required space here.
                         ^-- SC1072: Expected test to end here (don't wrap commands in []/[[]]). Fix any mentioned problems and try again.
$
Google has not helped, or I am not able to make the connection between what I find and how to make my script work. Probably the latter. Currently two days into this. Thanks record breaking cold weather in the Midwest.

Thanks again.
 
Old 01-31-2019, 11:07 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,328
Blog Entries: 3

Rep: Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726
Can you just delete the lines containing " 、**,*,*,*,*" there?

Perhaps if you provide a sample of what your output should look like then it will be more clear.
 
Old 01-31-2019, 11:17 AM   #3
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Quote:
Originally Posted by rbees View Post
if [ awk -F, '$5==""' $line ]
awk doesn't work that way.
You can just do:
Code:
awk -F, '$5==""{ some_actions }' <<<"$line"
NB: I'm pretty sure you don't need custom shell scripting to accomplish your task. However, I really cannot understand what you want here exactly.
Please give us an expected output sample as suggested by Turbocapitalist.

Last edited by l0f4r0; 01-31-2019 at 11:25 AM.
 
Old 01-31-2019, 11:47 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,675

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
From this thread: https://www.linuxquestions.org/quest...ic-4175647055/

Rbees, I still don't know why you're doing this, at all. You are taking a sqlite DATABASE file, and trying to sort/add things to it as a CSV File. This is akin to disassembling your car, putting the parts in a backpack and walking somewhere, only to reassemble it; a whole ton of effort and time for no payback, when (if you used it as it was intended), you could save yourself both. Gave you a VERY simple method of importing this data into MySQL, where you can sort/order/extract based on a HUGE number of criteria/operations. Which, based on your other thread, is what you want when you do the front-end. Which you STILL won't get with a CSV file, no matter what you do.

You can easily add columns, up to a huge degree...I doubt you'd run out of space, as there are many multi-terabyte databases running just fine, with less-than-one-second response time. Further, a MySQL database can use different character sets, so accommodating the 'special' characters is easy. Hundreds of columns? Not a problem. And instead of using a single 'flat file' architecture, you import the data into ONE table, with the prime numbers in another, and use the primary key/record ID to do associations. None of this is particularly hard, and instead of trying to learn the VERY (for this) error-prone method of awk, you'd have actual useful data, ready to interface. Even via web browser.

Sorry to go back to the database solution, but for your end goal, its what you need. And it's faster to accomplish.
 
Old 01-31-2019, 11:55 AM   #5
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Just for the sake of technical challenge (so nothing to do if it's a good idea or not, please refer to what TB0ne said instead), is the following helping?
Code:
awk 'BEGIN{FS=OFS=","} { if(verse==$3){print $5,$6,$7,$8,$9} else{print $1,$2,$3":"; print $5,$6,$7,$8,$9; verse=$3} }' hebrew.csv | sed '/\*,\*,\*,\*/d'
It's really hard for me to help you more because hebrew characters are not supported from my side and you didn't provide us with any output sample...

Last edited by l0f4r0; 01-31-2019 at 11:57 AM.
 
1 members found this post helpful.
Old 01-31-2019, 01:56 PM   #6
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Turbocapitalist

No. Chapter 1 verse 1 has three words. The data base has six lines. If I just delete lines 1,3,5 I will end up with wordnr's 2,4,6 not 1,2,3.

This is the same sample data:
Code:
1 Chronicles	1	1	1	אָדָ֥ם	Adam	’ā·ḏām	121	Noun
1 Chronicles	1	1	2	שֵׁ֖ת	Sheth	šêṯ	8352	Noun
1 Chronicles	1	1	3	אֱנֽוֹשׁ׃	Enosh	’ĕ·nō·wōš.	583	Noun
1 Chronicles	1	2	1	קֵינָ֥ן	Kenan	qê·nān	7018	Noun
1 Chronicles	1	2	2	מַהֲלַלְאֵ֖ל	Mahalaleel	ma·hă·lal·’êl	4111	Noun
1 Chronicles	1	2	3	יָֽרֶד׃	Jered	yā·reḏ.	3382	Noun
1 Chronicles	1	3	1	חֲנ֥וֹךְ	Henoch	ḥă·nō·wḵ	2585	Noun
1 Chronicles	1	3	2	מְתוּשֶׁ֖לַח	Methuselah	mə·ṯū·še·laḥ	4968	Noun
1 Chronicles	1	3	3	לָֽמֶךְ׃	Lamech	lā·meḵ.	3929	Noun
1 Chronicles	1	4	1	נֹ֥חַ	Noah	nō·aḥ	5146	Noun
1 Chronicles	1	4	2	שֵׁ֖ם	Shem	šêm	8035	Noun
1 Chronicles	1	4	3	חָ֥ם	Ham	ḥām	2526	Noun
1 Chronicles	1	4	4	וָיָֽפֶת׃	and*Japheth	wā·yā·p̄eṯ.	3315	Noun
1 Chronicles	1	4	5	ס	*-*	s	*	*
The last 5 fields of lines 1, 3, and 5 deleted AND the first 4 fields of lines 4, 5, and 6.

Another data sample:
Code:
Esther	2	8	1	 、**	*	*	*	*
Esther	2	8	2	וַיְהִ֗י	so*it*came*to*pass	way·hî,	1961	Verb
Esther	2	8	3	 、**	*	*	*	*
Esther	2	8	4	בְּהִשָּׁמַ֤ע	was*heard	bə·hiš·šā·ma‘	8085	Verb
Esther	2	8	5	דְּבַר־	command	də·ḇar-	1697	Noun
Esther	2	8	6	הַמֶּ֙לֶךְ֙	of*when*the*king	ham·me·leḵ	4428	Noun
Esther	2	8	7	וְדָת֔וֹ	and*his*decree	wə·ḏā·ṯōw,	1881	Noun
Esther	2	8	8	וּֽבְהִקָּבֵ֞ץ	and*were*gathered*together	ū·ḇə·hiq·qā·ḇêṣ	6908	Verb
Esther	2	8	9	נְעָר֥וֹת	maidens	nə·‘ā·rō·wṯ	5291	Noun
Esther	2	8	10	רַבּ֛וֹת	when*many	rab·bō·wṯ	7227	Adj
Esther	2	8	11	אֶל־	unto	’el-	413	Prep
Esther	2	8	12	שׁוּשַׁ֥ן	Shushan	šū·šan	7800	Noun
Esther	2	8	13	 、**	*	*	*	*
Esther	2	8	14	הַבִּירָ֖ה	the*palace	hab·bî·rāh	1002	Noun
Esther	2	8	15	אֶל־	unto	’el-	413	Prep
Esther	2	8	16	יַ֣ד	the*custody	yaḏ	3027	Noun
Esther	2	8	17	 、**	*	*	*	*
Esther	2	8	18	הֵגָ֑י	of*Hegai	hê·ḡāy;	1896	Noun
Esther	2	8	19	וַתִּלָּקַ֤ח	that*was*brought	wat·til·lā·qaḥ	3947	Verb
Esther	2	8	20	אֶסְתֵּר֙	Esther	’es·têr	635	Noun
Esther	2	8	21	אֶל־	also*unto	’el-	413	Prep
Esther	2	8	22	 、**	*	*	*	*
Esther	2	8	23	בֵּ֣ית	house	bêṯ	1004	Noun
Esther	2	8	24	הַמֶּ֔לֶךְ	of*the*king	ham·me·leḵ,	4428	Noun
Esther	2	8	25	אֶל־	unto	’el-	413	Prep
Esther	2	8	26	יַ֥ד	the*custody	yaḏ	3027	Noun
Esther	2	8	27	 、**	*	*	*	*
Esther	2	8	28	הֵגַ֖י	of*Hegai	hê·ḡay	1896	Noun
Esther	2	8	29	שֹׁמֵ֥ר	keeper	šō·mêr	8104	Verb
Esther	2	8	30	  .**	*	*	*	*
Esther	2	8	31	הַנָּשִֽׁים׃	of*the*women	han·nā·šîm.	802	Noun
Corrected data sample deleting 7 half lines from the right side and 7 half lines from the left side.

Code:
Esther	2	8	1	וַיְהִ֗י	so*it*came*to*pass	way·hî,	1961	Verb
Esther	2	8	2	בְּהִשָּׁמַ֤ע	was*heard	bə·hiš·šā·ma‘	8085	Verb
Esther	2	8	3	דְּבַר־	command	də·ḇar-	1697	Noun
Esther	2	8	4	הַמֶּ֙לֶךְ֙	of*when*the*king	ham·me·leḵ	4428	Noun
Esther	2	8	5	וְדָת֔וֹ	and*his*decree	wə·ḏā·ṯōw,	1881	Noun
Esther	2	8	6	וּֽבְהִקָּבֵ֞ץ	and*were*gathered*together	ū·ḇə·hiq·qā·ḇêṣ	6908	Verb
Esther	2	8	7	נְעָר֥וֹת	maidens	nə·‘ā·rō·wṯ	5291	Noun
Esther	2	8	8	רַבּ֛וֹת	when*many	rab·bō·wṯ	7227	Adj
Esther	2	8	9	אֶל־	unto	’el-	413	Prep
Esther	2	8	10	שׁוּשַׁ֥ן	Shushan	šū·šan	7800	Noun
Esther	2	8	11	הַבִּירָ֖ה	the*palace	hab·bî·rāh	1002	Noun
Esther	2	8	12	אֶל־	unto	’el-	413	Prep
Esther	2	8	13	יַ֣ד	the*custody	yaḏ	3027	Noun
Esther	2	8	14	הֵגָ֑י	of*Hegai	hê·ḡāy;	1896	Noun
Esther	2	8	15	וַתִּלָּקַ֤ח	that*was*brought	wat·til·lā·qaḥ	3947	Verb
Esther	2	8	16	אֶסְתֵּר֙	Esther	’es·têr	635	Noun
Esther	2	8	17	אֶל־	also*unto	’el-	413	Prep
Esther	2	8	18	בֵּ֣ית	house	bêṯ	1004	Noun
Esther	2	8	19	הַמֶּ֔לֶךְ	of*the*king	ham·me·leḵ,	4428	Noun
Esther	2	8	20	אֶל־	unto	’el-	413	Prep
Esther	2	8	21	יַ֥ד	the*custody	yaḏ	3027	Noun
Esther	2	8	22	הֵגַ֖י	of*Hegai	hê·ḡay	1896	Noun
Esther	2	8	23	שֹׁמֵ֥ר	keeper	šō·mêr	8104	Verb
Esther	2	8	24	הַנָּשִֽׁים׃	of*the*women	han·nā·šîm.	802	Noun
This example does not seam to be affected by using grep to extract it verses awk. Note wordnr 20/16.

TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.

Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.

l0f4r0, Looks like it might be if I can get my head around what it is doing.

awk 'BEGIN{FS=OFS=","} { if(verse==$3){print $5,$6,$7,$8,$9} else{print $1,$2,$3":"; print $5,$6,$7,$8,$9; verse=$3} }' hebrew.csv | sed '/\*,\*,\*,\*/d'

To break it down the way I understand it. And I know I am missing things......
I assume that {FS=OFS=","} is setting the field separator.
if(verse==$3) do this or that. How is $3 filled?

This
cut -d, -f1,2,3,4 --complement
makes more sense to me than the sed above

So on to my own work. If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?

I changed the failing awk line to
Code:
awk -F, '$5==""{cut -d, -f5,6,7,8,9 --complement $line }' <<< $line
but it occurs to me that it will be better to start a counter for each verse rotation to store the number of right side deletions in so that when it gets done with the right side I can just delete that many left sides from the bottom up to align things correctly again.

And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.

Thanks
 
Old 01-31-2019, 02:17 PM   #7
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Quote:
Originally Posted by rbees View Post
I assume that {FS=OFS=","} is setting the field separator.
Yes. FS=Field Separator. OFS=Output Field Separator.

Quote:
Originally Posted by rbees View Post
if(verse==$3) do this or that. How is $3 filled?
It's automatic. $3 is the third field (delimited by FS) from each line read by awk.

Quote:
Originally Posted by rbees View Post
This
Code:
cut -d, -f1,2,3,4 --complement
makes more sense to me than the sed above
It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".

Quote:
Originally Posted by rbees View Post
If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?
No, a child process can never change a parent variable. Any change inside child will be lost after child exits.
Exporting only allows child processes to be aware of the parent variables.

Last edited by l0f4r0; 01-31-2019 at 02:21 PM.
 
Old 01-31-2019, 02:49 PM   #8
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Quote:
It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.

Whole lines or whole columns are easy. Lines made of parts, not so easy. I have the logic down, it is getting the commands to do said logic is my issue.

Quote:
Exporting only allows child processes to be aware of the parent variables.
More rethinking required.

Thanks
 
Old 01-31-2019, 02:50 PM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,675

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
Quote:
Originally Posted by rbees View Post
TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.
Sorry, no. The ENTIRE COMMAND to add a column to a database is
Code:
alter database <database name> add column <colum name> <type>
(say your database is torah and you're adding a column called book that has text)
alter database torah add column book text
That's it...certainly won't take six months to type that in.
Quote:
Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.
Then you should stop now, because you're missing the point. This:
Quote:
Originally Posted by rbees
And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.
..indicates you're ALREADY hitting the limitations of what you can do with the tools you're using. And from your other thread, you said:
Quote:
Originally Posted by rbees
I have considered Perl for later in the project as I would like to have a nice user interface to the eventual data output. But I don't have any experience with anything but Bash. Getting this database fixed and expandable is the first step.
...and...
Quote:
Originally Posted by rbees
This csv file is some 300,000 lines long and will eventually grow to a million plus.
If you want a 'nice user interface', you're not going to get far trying to interface with a text file. You specifically mention a database as well. How long does it take you to pull up the 300,000 line file in an editor now? Multiply that by four, which is how long your project/program is going to take to load EACH TIME you run it at a million lines. How long does it take you to search for a term in that 300,000 lines now?? Again: x4, just for a SIMPLE SEARCH. How usable is your program/project going to be in reality, if you continue down the path you're on??? Do you expect your users (even if it's just you), to wait 15-30 minutes for a result, especially if the same result could be had in under a second??

Using a relational database is the *ONLY WAY* you are going to get your project/program working; and your data is ALREADY IN A DATABASE FORMAT. Adding a column to a database is trivial. Loading your existing sqlite database to a more scalable one (MySQL/MariaDB) is also trivial. You could have had ALL YOUR DATA already usable, searchable and working within 30 minutes at most. Use it or not, that's obviously your call, but it's certainly not wrong to suggest that route.

Last edited by TB0ne; 01-31-2019 at 02:52 PM.
 
Old 01-31-2019, 03:12 PM   #10
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Quote:
The ENTIRE COMMAND to add a column to a database is
I don't need to add a column right now. I need the data to be clean.

Quote:
If you want a 'nice user interface', you're not going to get far trying to interface with a text file.
I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.

So please explain how I am suppose to
Quote:
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.
Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql.

So please post a mysql that will process said logic.
 
Old 01-31-2019, 03:47 PM   #11
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,675

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
Quote:
Originally Posted by rbees View Post
I don't need to add a column right now. I need the data to be clean.
...and...
Quote:
I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.
Right; which is what you already have.
Quote:
So please explain how I am suppose to Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql. So please post a mysql that will process said logic.
Did you read your other thread, or are just missing what's being said:
  • Your data is **ALREADY CLEAN**, it is YOU that is getting it in a poor state with the repeated awk/sed/whatever-commands. Your data is ALREADY in a sqlite database file, which is parsed and loadable. There is **ZERO NEED** to do anything else to it.
  • You were given exact commands and a link in your other thread that tells you EXACTLY what to do, and how to do it.
If you honestly think you're going to be able to code this in bash and get data relations done on a million-plus lines with what you want, you are sadly mistaken. This is going to require that you learn how to write real code; be it in python, perl, C, or whatever language that can do such things. Bash was never designed to handle data of this size. Again, your user interface won't be able to interface well (if at ALL) with a text file, not to mention the huge time-lags you'll have trying to do anything with such a big text file.

You said that step one is getting clean data; you already have it, since you downloaded a sqlite database file which was already clean. Your step one is already done, but you don't want to use it. Step two is adding the columns you want; you were ALSO given a command on how to do that. Beyond that is steps 3, 4, etc....and you are going to have to perform those steps. And when you say:
Quote:
Originally Posted by rbees
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.
This is where *YOUR* program comes in. A database gives you lookup values and ways to manipulate the data, nothing more. Your program interfaces with it...so read the records one at a time...look at the field, perform your tests on it, and act accordingly. Whether that's putting ANOTHER value into a different place in the database, telling the user something, adding something, or printing something out, that is what YOU write. And you're going to *HAVE TO* write this code in something other than bash, period.

If you want to learn nothing new and continue with an error-filled text file, that gives you an unusable program when you're done, keep going. Not much more to tell you; you asked folks who are experienced for their advice, but you seem to want to ignore it. Your call.

Last edited by TB0ne; 01-31-2019 at 03:52 PM.
 
Old 01-31-2019, 03:55 PM   #12
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Ahhhh, okay I think I get it now.
Is the following what you want?
Code:
awk 'BEGIN{FS=OFS=","} !/\*,\*,\*,\*/{ if(verse==$3){print $1,$2,$3,++increment,$5,$6,$7,$8,$9} else{increment=1;print $1,$2,$3,increment,$5,$6,$7,$8,$9; verse=$3} }' hebrew.csv
 
Old 02-01-2019, 04:19 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,328
Blog Entries: 3

Rep: Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726
It's a bit silly but if you are familiar with spreadsheets, you could import the file into LibreOffice (or Calligra) and work with it there. Based on the one data sample it looks like you are just sliding 'cells' upward as the ones above them are deleted. It can be done in AWK or Perl, but is a bit of fiddle.
 
Old 02-01-2019, 06:41 AM   #14
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
l0f4r0; no joy. It removes a lot of entries in other columns randomly.

Turbocapitalist; That is actually how I cleaned up the data I posted. If I was a spreadsheet guru I could write a macro to do it, but sadly I am not and never will be.

Truth be that I will never get beyond grade school bash. I know me.

You posted this
Code:
awk '$2=="2 Samuel"' FS=',' < file > newfile
in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.

I don't have much time this morning for testing with a "fresh brain" and will not be able to get back to it for a few days with Shabbat and short work hours that I need to make up.

Thanks
 
Old 02-01-2019, 06:50 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,328
Blog Entries: 3

Rep: Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726
Thanks.

Quote:
Originally Posted by rbees View Post
You posted this
Code:
awk '$2=="2 Samuel"' FS=',' < file > newfile
in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.
That prints out the whole row regardless of the size of any of the numbers in any of the fields, but it does that only if the second column matches what's in the quotes. You could make it more advanced and more complex such that it sorts the different books into separate files. However, since both SQL and AWK (and by extension perl) seem to be out of consideration that may leave only manual intervention using a spreadsheet as the last remaining option.

If I understand the question based on the sample in #6 above you are just moving fields forward one or more rows to fill in earlier empty fields from later filled fields. I'm not sure how that retains the integrity of the data set since the basic unit of data is a record (line) and moving some fields from one record (line) to another breaks that. Sorry to be obtuse but perhaps you could provide a second short data sample with matching output to supplements #6?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to print lines in csv file if 1 csv column field = "text". There are 10 column (;) in csv file nexuslinux Linux - Newbie 9 04-22-2016 11:35 PM
[SOLVED] How to script csv editing? Remove rows from csv file that do not contain certain text ingram87 Linux - Software 9 08-03-2012 12:45 PM
Map 1 CSV's columns to matching columns in another CSV 2legit2quit Programming 7 10-27-2011 08:53 AM
[SOLVED] Upon start up, I am instructed to create a PRINTER environment variable... BMan8577 Linux - Newbie 3 04-17-2011 12:00 AM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 09:15 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration