Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
See this thread. It's not what you're looking for, but may give you some ideas. This transformation is usually called pivoting or reshaping data long-to-wide. Searching for these terms will give you plenty of results.
Unfortunately, its a restricted environment and I cannot install any additional command line tools like miller. But thanks a lot for the thread link. I did find a few useful pointers there and will be trying out those.
Unfortunately, its a restricted environment and I cannot install any additional command line tools like miller.
In which case let's hope you have gawk - I have found the extensions in-built by GNU make it such a pain trying to use a lesser awk I now no longer even try to be posix compliant.
Last edited by syg00; 11-25-2021 at 03:08 AM.
Reason: cleanup
Well, it largely depends on the structure of your input data. As long as it is as simple as in your example above, you can easily process them with awk. But CSV is a tricky format. If your data may contain embedded commas and/or double quotes, using Perl or Python with a CSV parsing module would be preferable. gawk may give you a remedy in form of FPAT/patsplit, but it won't account for all possible CSV quirks.
Another option is using a spreadsheet software. Most support pivot tables and will import from / export to CSV.
The input file is actually generated by another script. So I do have some control over the structure of the data.
Anyway, I have made some progress by combining a few awk examples I came across.
Just for fun. (Called with awk -F\, -f in.awk -- in.txt).
Code:
BEGIN {
# For array subscripts
x = "A"
y = "B"
z = "C"
}
{
# Read the file
# Collect unique entries
a[$1]++
b[$2]++
# Collect data in an array
d[NR, x] = $1
d[NR, x, y] = $2
d[NR, x, y, z] = $3
}
END {
# Print column headers
ORS=""
print "App"
for (j in b) {
print " " j
}
print "\n"
# Print data in transformed format
for (i in a) {
print i
for (j in b) {
for (n=1; n<=NR; n++) {
if (d[n, x] == i && d[n, x, y] == j) {
print "," d[n, x, y, z]
# Save running totals
t[j] += d[n, x, y, z]
}
}
}
print "\n"
}
# Print totals
print "Sum"
for (j in t) {
print "," t[j]
}
print "\n"
# Print 100 -Totals
print "Bal"
for (j in t) {
print "," 100-t[j]
}
print "\n"
}
The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within awk and in standard awk cannot be controlled or changed.
Here are the results of running your script (with the change above applied):
So I do have some control over the structure of the data.
It would make sense to generate input data sorted in lexicographical order then. I.e. cpu,disk,mem rather than cpu,mem,disk. This way the gawk script could be greatly simplified:
Code:
#!/usr/bin/gawk -f
# Assuming the input data are already sorted (sort -t,)
BEGIN { FS=","; PROCINFO["sorted_in"]="@ind_str_asc" }
{
stats[$1] = stats[$1] FS $3
totals[$2] += $3
}
END {
print "App,cpu,disk,mem"
for (app in stats) print app stats[app]
printf "Bal"
for (t in totals) printf ",%s", 100-totals[t]
print ""
}
Speaking of fun. Python, of course, has the standard module csv, but reshaping table data with pandas or agate definitely makes more fun. Here it is with agate:
Code:
#!/usr/bin/python3
import agate
c_names=('App', 'stat', 'val')
table = agate.Table.from_csv('in.csv',
column_names=c_names,
header=None)
result = table.denormalize(*c_names)
totals = result.aggregate([
(1, agate.Sum('cpu')),
(2, agate.Sum('mem')),
(3, agate.Sum('disk'))
]).values()
result.print_csv()
print('Bal', *[100-x for x in totals], sep=',')
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.