ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
l0f4r0; no joy. It removes a lot of entries in other columns randomly.
There is nothing like random in IT. IT is not human, it doesn't make any subjective choices, nor mistakes, it's not lazy and so on...
If it doesn't work as expected, except if there are some bugs, that's because it was bad designed or because the specifications were not very clear.
I made my best but I can't understand what you want. If you are interested in getting more help from me, please give me another exact input sample and an output one.
Quote:
Originally Posted by rbees
You posted this
Code:
awk '$2=="2 Samuel"' FS=',' < file > newfile
in another thread. Will that work with a 3 digit number? The reason I ask is that column 3 (chapter) could have up to a 3 digit number.
I'm answering for Turbocapitalist while I'm at it: yes it's gonna work because this instruction doesn't deal with 3rd column explicitly.
It just says: for each line in file, if column#2 is equal to "2 Samuel" then print the whole line inside newfile.
It's a bit silly but if you are familiar with spreadsheets, you could import the file into LibreOffice (or Calligra) and work with it there. Based on the one data sample it looks like you are just sliding 'cells' upward as the ones above them are deleted. It can be done in AWK or Perl, but is a bit of fiddle.
Yep; which is why it makes more sense to manipulate everything in a database.
Quote:
Originally Posted by l0f4r0
There is nothing like random in IT. IT is not human, it doesn't make any subjective choices, nor mistakes, it's not lazy and so on... If it doesn't work as expected, except if there are some bugs, that's because it was bad designed or because the specifications were not very clear. I made my best but I can't understand what you want. If you are interested in getting more help from me, please give me another exact input sample and an output one. I'm answering for Turbocapitalist while I'm at it: yes it's gonna work because this instruction doesn't deal with 3rd column explicitly. It just says: for each line in file, if column#2 is equal to "2 Samuel" then print the whole line inside newfile.
Out of curiosity I imported and converted the sqlite file to MySQL/MariaDB; took approximately 2 minutes, and all the data was 100% correct, readable, and in correct cells in the database table.
Simple Perl program to access the database, with basic logic to do things. Assumes default MySQL/MariaDB install, with localhost root access with no password
Code:
#!/usr/bin/perl
use DBI;
# Set database variables here, and connect to the database.
my $dsn = "DBI:mysql:Torah;host=127.0.0.1";
my $user = "root";
my $pass = "";
$dbh = DBI->connect($dsn, $user, $pass,{ RaiseError => 1})
or die "Could not connect to database! $DBI::errstr";
# Select things based on criteria and get them into variables
my $torahrecord = $dbh->prepare("select id,book,chapter,verse from Torah where book='Genesis'")
or die "Couldn't prepare statement: " . $dbh->errstr;
$torahrecord->execute() # Execute the query
or die "Couldn't execute statement: " . $sth->errstr;
while (@TSEL = $torahrecord->fetchrow_array()) {
my $TORAHID = $TSEL[0]; # Record ID
my $TORAHBOOK = $TSEL[1]; # Book
my $TORAHCHAP = $TSEL[2]; # Chapter
my $TORAHVER = $TSEL[3]; # Verse
<define whatever else you want to look at here from available data.
# Unrelated; shown for syntax. One-liner to select something from DB directly into a variable, no array loop needed
$SOMEVAR = $dbh->selectrow_array("SELECT book from content where id='<something>'", undef, @params);
# Logic; check the variables; do things to the new columns/existing. Put more in for more checks/sets
if ($TORAHBOOK eq "Genesis") {
$dbh->do("update content set newcolumn='<whatever>' where id='$TORAHID'");
}
}
That's it. That's the 'six months' of learning you need to manipulate the data with Perl and MySQL, for all 405,000+ rows of data. This will loop through ALL of the records, and modify them according to whatever you write. Simple if statements. Running the two commands I gave you in your other thread converted the database, but you will have to make a small edit to the input 'file.sql' file, before running the import. Right before all of the lines starting with INSERT, there's a header. Remove all of them, and replace with
Code:
CREATE DATABASE Torah;
use Torah;
CREATE TABLE content (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
book TEXT NOT NULL,
chapter INT NOT NULL,
`verse` INT NOT NULL,
`wordnr` INT NOT NULL,
`word` TEXT NOT NULL,
`concordance` TEXT NOT NULL,
`translit` TEXT NOT NULL,
`strongs` TEXT NOT NULL,
`lemma` TEXT NOT NULL
);
Then run "mysql < file.sql", as root. A couple of minutes later, your entire MySQL database is built, ready for use. Add columns as you were told before with a simple 'alter table' statement. And since you ALSO don't want to learn anything about Libreoffice, and you claim this data is going to go to more than 1 million rows...you are aware that Libreoffice Calc has a row limit of 1,048,576, right??? Or, just over 1 million rows.
You have spent many, MANY hours of wasted effort trying to get clean data, when you already have it, and you haven't even begun to sort/append/modify it. Good luck.
There is nothing like random in IT. IT is not human, it doesn't make any subjective choices, nor mistakes, it's not lazy and so on...
If it doesn't work as expected, except if there are some bugs, that's because it was bad designed or because the specifications were not very clear.
I made my best but I can't understand what you want. If you are interested in getting more help from me, please give me another exact input sample and an output one. I'm answering for Turbocapitalist while I'm at it: yes it's gonna work because this instruction doesn't deal with 3rd column explicitly.
It just says: for each line in file, if column#2 is equal to "2 Samuel" then print the whole line inside newfile.
Out of curiosity..........
Looks good + nice of you to have provided this thorough solution!
But are you really speaking to me? Ha ha
Looks good + nice of you to have provided this thorough solution! But are you really speaking to me? Ha ha
Well, only peripherally, but enjoy.
And not a thorough solution, but just something that took about 15 minutes. None of the perl is anything that couldn't have been looked up easily, with ABUNDANT examples. The sqlite to MySQL script was downloaded, but didn't work cleanly for the new version of MariaDB/MySQL, but it, also, was a trivial lookup/fix. Still not sure why the OP wants to take data FROM a database and manipulate it AS TEXT, which will most certainly fail miserably, no matter how clever the awk/sed/bash script is written. It's already ordered, clean, and even includes the correct Hebrew characters in the resulting database, with no problems at all. Trivial to compare variables with simple 'if's, as shown, and act accordingly.
Was hesitant to post it, since the OP is absolutely ADAMANT about not wanting to learn anything, and puts a 'six month' time estimate on doing so, before any actual research was done. And I'm sure this will also be 'not helpful', because it isn't bash.
And I'm sure this will also be 'not helpful', because it isn't bash.
No worries, I clicked on "Did you find this post helpful?" for you at least
Joking aside, let's wait&see what OP says...
From my side, it's more out of curiosity than anything else that I am providing bash answers to the OP. Just to train myself ^^
But I really would like to understand what kind of output the OP wants. I've provided 2 so far and it was off beam apparently. It's still a mystery for me...
No worries, I clicked on "Did you find this post helpful?" for you at least Joking aside, let's wait&see what OP says...
Indeed.
Quote:
From my side, it's more out of curiosity than anything else that I am providing bash answers to the OP. Just to train myself ^^
But I really would like to understand what kind of output the OP wants. I've provided 2 so far and it was off beam apparently. It's still a mystery for me...
Yes, I'm having a hard time figuring out exactly what's needed as well, aside from the need to 'clean' the data. Past that, I think it's sorting the data based on criteria, to add columns based on the cell values. Taking us full-circle back to "that's what databases are for", and a not-very-complex program to do these adds/inserts.
Which, again, accomplishes the OP's steps 1&2, and leaving them a working database to interface with for their step 3, which is DOING something with the data. Doesn't seem to sink in that going through 1 million lines of text with a bash script (which is what they'll HAVE TO write their end-code in, since learning something new is apparently off the table), is going to take forever, either. Their end-application is going to be unusable, rendering the entire exercise pointless.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.