ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Please use [CODE][/CODE] tags around your code and example files to make them more readable.
You can make your awk snippet a bit more readable:
Code:
awk -F ',' -v pattern='field2' '#
(NR == 1) {
for (i = 1; i <= NF; i++)
if ($i ~ pattern)
field[++fields] = i
next
}
{
if (fields > 0) {
printf("%s", $(field[1]))
for (i = 2; i <= fields; i++)
printf(" %s", $(field[i]))
printf("\n")
}
}' input-file(s).. > output-file
As you can see, you need to save the regular expression in a variable first -- use the -v var=value option --, then you can use the regular expression match operator ~ with it.
Trying to insert the exact string from the command line is quite hazardous: consider what would happen in the pattern specified was ./) system("rm -rf") # or something along the same lines..
Using a variable like in the example above avoids the risks, because then awk treats the string as a regular expression, whatever it might contain. You cannot "escape" from the regular expression by crafting a nasty pattern.
Other than that, I use the next statement to output the header line, and use one-based indexing for the array, since awk arrays are normally indexed that way. Less risk of confusion.
If you want to use the second positional shell parameter as the pattern, just use -v pattern="$2" instead.
Last edited by Nominal Animal; 05-18-2012 at 09:57 PM.
many thanks, yes I forget to add the -v pattern="$2" to my script
but can I explain you the hole idea, I'm trying to make my script like mysql
I have the file like that:
user_table.txt
1st_name,last_name,birth,gender,field
john,ntcho,1990,m,MBA
louna,hail,1980,f,ITIL
XXX,XXX,XXX,X,XXX
so my script should do like this:
select 1st_name from user_table where field eq mba
so I start with awk to select the fields under the 1st_name, do I start in the right way I'm little confused how to begin
You're pretty much on track. You can always add the interface afterwards, as long you have the functionality first.
For now, consider operations SELECT output FROM file(s) WHERE inputopvalue only.
You can do that operation using awk, awk -v output="output" -v input="input" -v op="op" -v value="value" '...' file(s)
To prove that you are on track, consider the following. I like to set the record separator to any newline convention, and both field and record separators to consume extra whitespace.
Code:
awk -v output='1st_name' -v input='field' -v op='eq' -v value='mba' '#
BEGIN {
# Accept any newline convention; remove leading and trailing whitespace on lines.
RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"
# Comma-separated values. Remove leading and trailing whitespace.
FS = "[\t\v\f ]*,[\t\v\f ]*"
# Output lines use Linux newline convention (\n only)
ORS = "\n"
# Output lines use a comma followed by a space as the separator
OFS = ", "
# To compare case insensitively, we convert to uppercase.
outputup = toupper(output)
inputup = toupper(input)
opup = toupper(op)
valueup = toupper(value)
# Convert output to array 'outname', with 'outputs' entries.
if (output == "" || output == "*") {
# All!
outputs = 0
split("", outname)
split("", outnameup)
} else {
outputs = split(output, outname)
outputs = split(outputup, outnameup)
}
}
# First line of each file contains the field names.
(FNR == 1) {
# New file. Clear 'field' array.
split("", fieldup)
split("", field)
# Save field names in uppercase to array 'field':
for (i = 1; i <= NF; i++) {
field[$i] = i
fieldup[toupper($i)] = i
}
# Which field (number) is the input field?
if (inputup in fieldup)
inputnum = fieldup[inputup]
else
inputnum = (NF + 1)
# Which fields to output?
for (i = 1; i <= outputs; i++)
if (outnameup[i] in fieldup)
outputnum[i] = fieldup[outnameup[i]]
else
outputnum[i] = (NF + 1)
# Note: If a field is not found, we use the
# field after the last one, $(NF + 1).
# It should be empty.
# This line has no data, so skip the other rule.
next
}
( (opup == "EQ" && toupper($(inputnum)) == valueup) ||
(opup == "LT" && toupper($(inputnum)) < valueup) ||
(opup == "LE" && toupper($(inputnum)) <= valueup) ||
(opup == "GT" && toupper($(inputnum)) > valueup) ||
(opup == "GE" && toupper($(inputnum)) >= valueup) ||
(opup == "NE" && toupper($(inputnum)) != valueup) ||
(opup == "CONTAINS" && index(toupper($(inputnum)), valueup) > 0) ||
(opup == "NOTCONTAIN" && index(toupper($(inputnum)), valueup) < 1) ||
(opup == "MATCH" && $(inputnum) ~ value) ||
(opup == "NOMATCH" && $(inputnum) !~ value) ) {
# Output this record.
if (outputs > 0) {
# Only selected ones.
printf("%s", $(outputnum[1]))
for (i = 2; i <= outputs; i++)
printf("%s%s", OFS, $(outputnum[i]))
printf("%s", ORS)
} else {
# All.
printf("%s", $1)
for (i = 2; i <= NF; i++)
printf("%s%s", OFS, $i)
printf("%s", ORS)
}
}
' user_table.txt
Given the user_table.txt content, it outputs john.
If you want to be able to parse a subset of SQL queries given as text, just write a frontend in e.g. Bash, and only run the awk script when you have parsed the outputinputopvaluefile(s) fields. (It would be more efficient to use a separate awk script for each comparison operator, but considering this will be rather slow anyway, I think the one-script approach is good enough.)
Note that the above implementation is a very crude example. It does not support quoted values, embedded commas or newlines: it is definitely not suited for all possible data. There are ways to fix those, but the best solution depends on whether the input file has to be CSV, or if you can use your own format (or at least escape sequences).
oooh that's great, it's exactly what I'm looking for many thanks Senior Member
you discuss almost cases I could face, I will try to fit this script with my own one and my next step is to remove the selected row from the table like this:
remove from "name_of_table" where "field" eq "value"
so I have to use "sed" with "awk", right!!
I'm sorry Grail, I don't understand how to use my data in post #3..!!
and another question plz
how to group all the result of "for loop" and get the output in one argument outside the loop not just the last argument
for example:
for (( i=0; i<=$(( $col_num -1 )); i++ ));
do
echo -n "${col_num[$i]}"
continue
done
echo -n ${col_num[@]} # it will just print the last for argument
remove from "name_of_table" where "field" eq "value"
To manipulate the file itself, you need to make some changes. I recommend writing a different awk snippet for manipulating the database. (I'd say one awk snippet for reading/scanning the database, one for removing data from the database, and one for adding the database.)
For deletion, the approach is: output all records exactly as it is in the input file that are NOT to be deleted. (Just ignore all records that are to be deleted.)
In the shell script which runs your awk commands -- the "interface" part --, first create a safe, automatically removed temporary directory to put your temporary files into:
The last line above disables the noclobber option. If it were enabled, Bash would abort if you tried to redirect output to an existing file. Redirecting to an existing file is useful, because that way you can replace only the file contents, while keeping the file metadata (like owner, group, access mode, extended attributes, security context if using SELinux, and so on) intact.
Instead of trying to edit the database file directly, create awk snippet that outputs the input records fully and exactly (using printf("%s\n", $0)) if they are not to be removed. This also means you must invert the matching logic, since you need it to match only for the lines that are not deleted. None of the output related functionality is not used at all, so this awk snippet should be quite a bit shorter, too.
Use the database file as the input, but redirect the output to a temporary file. Only if successful (no errors), replace the input file with the temporary file. In Bash:
Code:
if cp --preserve=all -f "$DATABASEFILE" "$WORK/tempbase" && \
awk ... '#
...
' "$DATABASEFILE" > "$WORK/tempbase" && \
mv -f "$WORK/tempbase" "$DATABASEFILE" ; then
# Successfully edited "$DATABASEFILE"
else
# Failed to edit "$DATABASEFILE"!
fi
The purpose of the cp --preserve=all -f command is NOT to copy the data -- it does, but whatever data it copies will be overwritten by the awk command anyway --, but to copy the metadata. This way the temporary file has the exact same access restrictions and security context and so on as the original file. This is why you need to disable the noclobber option I meantioned earlier; the awk output will be redirected to an existing file.
The -f flag tells cp to overwrite the tempbase file, if it happens to already exist. This way you don't need to remove the temporary database files from the working directory. You can just rely on the trap to remove all of them at once, when the script finally exits.
The && tells Bash to run the right side only if the left side succeeded; you can read it as "and if that succeeded, then". Thus, the awk bit is only run if cp succeeds, and mv only if awk succeeds. (If awk is not run, it cannot succeed. So, mv is only run if cp first succeeds, and then awk also succeeds.)
The above scheme may look like overly complicated, but it turns out to be extremely robust in practice. Even though it replaces the database files, it keeps the metadata, even extended attributes, intact. It only replaces the database file if the editing was successful; read or write errors (like running out of disk space) will not corrupt your database file.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.