ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
In this sample data every other line needs to have columns 5,6,7,8,9 deleted while keeping the chapter, verse, and wordnr count in columns 2,3,4 correct by deleting them from the end. For instance verse one contains 6 lines. Lines 1, 3, and 5 need columns 5,6,7,8,and 9 deleted and lines 4, 5, and 6 need to have columns 1,2,3, and 4 deleted so that I end up with 3 lines instead of 6 lines for verse 1. Most of the rest of the data will not have so many close edits needed.
Debug output:
Code:
+ '[' awk -F, '$5==""' 5:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,5, ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' 、**,*,*,*,* ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' 6:1 ']'
./dataCleaner.sh: line 45: [: too many arguments
+ for line in $editLines
+ '[' awk -F, '$5==""' Chronicles,1,1,6,אֱנֽוֹשׁ׃,Enosh,’ĕ·nō·wōš.,583,Noun ']'
./dataCleaner.sh: line 45: [: too many arguments
My bash code:
Code:
#!/bin/bash
# Script to correct hebrew.csv file
# Delete existing column 1 as it is not relevant
# Remove non-hebrew characters in column 5 (old 6) from Hebrew csv text file.
# Update column 4 (old 5) to correct number of words per verse.
# csv format
# id book chapter verse wordnr word concordance translit strongs lemma
# 1 2 Samuel 1 1 1 וַיְהִ֗י Now*it*came*to*pass way·hî, 1961 Verb
set -x
workDir=~/TheKing/tmp
# delete column 1
# while read p; do
# cut -d, -f1 --complement $workDir/$p.csv > $workDir/$p.csv;
# done < books.csv
# Characters in column 5 that trigger delete
# Unicode character Oct Dec Hex HTML
# 、 halfwidth ideographic comma 0177544 65380 0xFF64 、
# . full stop 056 46 0x2E .
# \x{A0} no-break space 0240 160 0xA0
for filename in "$workDir"/*.csv; do
# Get number of chapters
# cat New1Chronicles.csv | cut -f2 -d, | uniq
# where {(column 2=current chapter)
#declare -a ChapterCount
ChapterCount=$(cut -f2 -d, "$filename" | uniq)
echo $ChapterCount
for chapter in $ChapterCount; do
# need specific line range
# (column 3=current verse)
editLines=$(grep -n $chapter $filename)
echo $editLines
for line in $editLines; do
# if column 6 in line# is empty
if [ awk -F, '$5==""' $line ]; then
# (delete columns 5,6,7,8,9 from line) & (delete columns 1,2,3,4 from the last line)
[ cut -d, -f5,6,7,8,9 --complement $line ] & [ cut -d, f1,2,3,4 --complement ${#line[@]} ]
# remove last element in array
unset 'line[-1]'
fi
done
done
done
I have tried no (()) [] [[]] '' "" and in every configuration I think will work with no joy. shellcheck.net has not helped.
Code:
$ shellcheck myscript
Line 45:
if [ awk -F, '$5==""' $line ]; then
^-- SC1009: The mentioned syntax error was in this if expression.
^-- SC1073: Couldn't parse this test expression. Fix to allow more checks.
^-- SC1014: Use 'if cmd; then ..' to check exit code, or 'if [[ $(cmd) == .. ]]' to check output.
^-- SC1035: You are missing a required space here.
^-- SC1072: Expected test to end here (don't wrap commands in []/[[]]). Fix any mentioned problems and try again.
$
Google has not helped, or I am not able to make the connection between what I find and how to make my script work. Probably the latter. Currently two days into this. Thanks record breaking cold weather in the Midwest.
NB: I'm pretty sure you don't need custom shell scripting to accomplish your task. However, I really cannot understand what you want here exactly.
Please give us an expected output sample as suggested by Turbocapitalist.
Rbees, I still don't know why you're doing this, at all. You are taking a sqlite DATABASE file, and trying to sort/add things to it as a CSV File. This is akin to disassembling your car, putting the parts in a backpack and walking somewhere, only to reassemble it; a whole ton of effort and time for no payback, when (if you used it as it was intended), you could save yourself both. Gave you a VERY simple method of importing this data into MySQL, where you can sort/order/extract based on a HUGE number of criteria/operations. Which, based on your other thread, is what you want when you do the front-end. Which you STILL won't get with a CSV file, no matter what you do.
You can easily add columns, up to a huge degree...I doubt you'd run out of space, as there are many multi-terabyte databases running just fine, with less-than-one-second response time. Further, a MySQL database can use different character sets, so accommodating the 'special' characters is easy. Hundreds of columns? Not a problem. And instead of using a single 'flat file' architecture, you import the data into ONE table, with the prime numbers in another, and use the primary key/record ID to do associations. None of this is particularly hard, and instead of trying to learn the VERY (for this) error-prone method of awk, you'd have actual useful data, ready to interface. Even via web browser.
Sorry to go back to the database solution, but for your end goal, its what you need. And it's faster to accomplish.
Just for the sake of technical challenge (so nothing to do if it's a good idea or not, please refer to what TB0ne said instead), is the following helping?
This example does not seam to be affected by using grep to extract it verses awk. Note wordnr 20/16.
TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.
Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.
l0f4r0, Looks like it might be if I can get my head around what it is doing.
To break it down the way I understand it. And I know I am missing things......
I assume that {FS=OFS=","} is setting the field separator.
if(verse==$3) do this or that. How is $3 filled?
This
cut -d, -f1,2,3,4 --complement
makes more sense to me than the sed above
So on to my own work. If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?
but it occurs to me that it will be better to start a counter for each verse rotation to store the number of right side deletions in so that when it gets done with the right side I can just delete that many left sides from the bottom up to align things correctly again.
And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.
I assume that {FS=OFS=","} is setting the field separator.
Yes. FS=Field Separator. OFS=Output Field Separator.
Quote:
Originally Posted by rbees
if(verse==$3) do this or that. How is $3 filled?
It's automatic. $3 is the third field (delimited by FS) from each line read by awk.
Quote:
Originally Posted by rbees
This
Code:
cut -d, -f1,2,3,4 --complement
makes more sense to me than the sed above
It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".
Quote:
Originally Posted by rbees
If I remember right, been awhile, in bash variable data (
$someData ) is readable by a child process but not changeable unless it is exported, True?
No, a child process can never change a parent variable. Any change inside child will be lost after child exits.
Exporting only allows child processes to be aware of the parent variables.
It's not equivalent. If I'm not wrong your cut keeps the line but takes only other fields than 1 to 4. My sed deletes the whole lines containing "*,*,*,*".
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.
Whole lines or whole columns are easy. Lines made of parts, not so easy. I have the logic down, it is getting the commands to do said logic is my issue.
Quote:
Exporting only allows child processes to be aware of the parent variables.
TBOne, From what I have read about scripting these types changes to a database means that I would have to spend the next 6 months learning a language that I don't have time to learn. I typically get three or four hours after work that I could work on learning it, IF I am able to actually concentrate and apply my old brain to it without falling asleep. So please drop it as not helpful to me. Thank you.
Sorry, no. The ENTIRE COMMAND to add a column to a database is
Code:
alter database <database name> add column <colum name> <type>
(say your database is torah and you're adding a column called book that has text)
alter database torah add column book text
That's it...certainly won't take six months to type that in.
Quote:
Even now @ early afternoon I am struggling to concentrate, and today was another easy "to cold, no work" day. My way may not be the most efficient and only grade school, but it is how I know.
Then you should stop now, because you're missing the point. This:
Quote:
Originally Posted by rbees
And the more I think about it the way I have it, it will not process the way it needs to. So some rework is required.
..indicates you're ALREADY hitting the limitations of what you can do with the tools you're using. And from your other thread, you said:
Quote:
Originally Posted by rbees
I have considered Perl for later in the project as I would like to have a nice user interface to the eventual data output. But I don't have any experience with anything but Bash. Getting this database fixed and expandable is the first step.
...and...
Quote:
Originally Posted by rbees
This csv file is some 300,000 lines long and will eventually grow to a million plus.
If you want a 'nice user interface', you're not going to get far trying to interface with a text file. You specifically mention a database as well. How long does it take you to pull up the 300,000 line file in an editor now? Multiply that by four, which is how long your project/program is going to take to load EACH TIME you run it at a million lines. How long does it take you to search for a term in that 300,000 lines now?? Again: x4, just for a SIMPLE SEARCH. How usable is your program/project going to be in reality, if you continue down the path you're on??? Do you expect your users (even if it's just you), to wait 15-30 minutes for a result, especially if the same result could be had in under a second??
Using a relational database is the *ONLY WAY* you are going to get your project/program working; and your data is ALREADY IN A DATABASE FORMAT. Adding a column to a database is trivial. Loading your existing sqlite database to a more scalable one (MySQL/MariaDB) is also trivial. You could have had ALL YOUR DATA already usable, searchable and working within 30 minutes at most. Use it or not, that's obviously your call, but it's certainly not wrong to suggest that route.
The ENTIRE COMMAND to add a column to a database is
I don't need to add a column right now. I need the data to be clean.
Quote:
If you want a 'nice user interface', you're not going to get far trying to interface with a text file.
I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.
So please explain how I am suppose to
Quote:
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.
Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql.
So please post a mysql that will process said logic.
I don't need to add a column right now. I need the data to be clean.
...and...
Quote:
I need "CLEAN and ACCURATE data" first. The user interface is pointless without it.
Right; which is what you already have.
Quote:
So please explain how I am suppose to Without spending a supbstansial amount of time learning something new. You may know just how to do that. I don't. Logic is still logic. Whether it is done with bash/sed/awk/grep or php/html/mysql. So please post a mysql that will process said logic.
Did you read your other thread, or are just missing what's being said:
Your data is **ALREADY CLEAN**, it is YOU that is getting it in a poor state with the repeated awk/sed/whatever-commands. Your data is ALREADY in a sqlite database file, which is parsed and loadable. There is **ZERO NEED** to do anything else to it.
You were given exact commands and a link in your other thread that tells you EXACTLY what to do, and how to do it.
If you honestly think you're going to be able to code this in bash and get data relations done on a million-plus lines with what you want, you are sadly mistaken. This is going to require that you learn how to write real code; be it in python, perl, C, or whatever language that can do such things. Bash was never designed to handle data of this size. Again, your user interface won't be able to interface well (if at ALL) with a text file, not to mention the huge time-lags you'll have trying to do anything with such a big text file.
You said that step one is getting clean data; you already have it, since you downloaded a sqlite database file which was already clean. Your step one is already done, but you don't want to use it. Step two is adding the columns you want; you were ALSO given a command on how to do that. Beyond that is steps 3, 4, etc....and you are going to have to perform those steps. And when you say:
Quote:
Originally Posted by rbees
I do have to take "A whole line worth", but it needs to be half "the part that contains all the blank fields (5-9) of the "current wordnr line"" and half "the first 4 fields of the last line of the "current verse range"". Maintaining the correct word count per verse is the goal.
This is where *YOUR* program comes in. A database gives you lookup values and ways to manipulate the data, nothing more. Your program interfaces with it...so read the records one at a time...look at the field, perform your tests on it, and act accordingly. Whether that's putting ANOTHER value into a different place in the database, telling the user something, adding something, or printing something out, that is what YOU write. And you're going to *HAVE TO* write this code in something other than bash, period.
If you want to learn nothing new and continue with an error-filled text file, that gives you an unusable program when you're done, keep going. Not much more to tell you; you asked folks who are experienced for their advice, but you seem to want to ignore it. Your call.
It's a bit silly but if you are familiar with spreadsheets, you could import the file into LibreOffice (or Calligra) and work with it there. Based on the one data sample it looks like you are just sliding 'cells' upward as the ones above them are deleted. It can be done in AWK or Perl, but is a bit of fiddle.
l0f4r0; no joy. It removes a lot of entries in other columns randomly.
Turbocapitalist; That is actually how I cleaned up the data I posted. If I was a spreadsheet guru I could write a macro to do it, but sadly I am not and never will be.
Truth be that I will never get beyond grade school bash. I know me.
You posted this
Code:
awk '$2=="2 Samuel"' FS=',' < file > newfile
in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.
I don't have much time this morning for testing with a "fresh brain" and will not be able to get back to it for a few days with Shabbat and short work hours that I need to make up.
in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.
That prints out the whole row regardless of the size of any of the numbers in any of the fields, but it does that only if the second column matches what's in the quotes. You could make it more advanced and more complex such that it sorts the different books into separate files. However, since both SQL and AWK (and by extension perl) seem to be out of consideration that may leave only manual intervention using a spreadsheet as the last remaining option.
If I understand the question based on the sample in #6 above you are just moving fields forward one or more rows to fill in earlier empty fields from later filled fields. I'm not sure how that retains the integrity of the data set since the basic unit of data is a record (line) and moving some fields from one record (line) to another breaks that. Sorry to be obtuse but perhaps you could provide a second short data sample with matching output to supplements #6?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.