How to find a specific data block in a huge file and then do algebra on them?
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
How to find a specific data block in a huge file and then do algebra on them?
Hello folks,
First I need to locate a specific data block in a huge file (I mean hundreds of thousands of lines). That block starts with "Frequency" in the first row, and ends with "End" in the last row. I need to get the data between this two rows, 1 data in each row.
Then, I will need to do a product of all the data in that block. How can I write a script to do this?
1) Where are the words "Frequency" and "End" located in the lines?
2) Is the case of the words significant?
3) Do those words occur other places in the data stream?
4) What format is used for the data values of which you wish to compute the product? (Shell script arithmetic is, basically, integer only.)
5) Do any "frequency" numbers occur on the same line(s) as the key words?
For my own curiosity, why do you need the product of the frequencies? If the "frequencies" were, for example, event probabilities and those events were independent, then you'd be computing the probability of all those event occurring at the same time. (Although the assumption of independence is seldom justified.) If, instead, you're looking at radiation (sound, light, power) frequencies, I can't think of anything to which the product would relate.
1) Where are the words "Frequency" and "End" located in the lines?
2) Is the case of the words significant?
3) Do those words occur other places in the data stream?
4) What format is used for the data values of which you wish to compute the product? (Shell script arithmetic is, basically, integer only.)
5) Do any "frequency" numbers occur on the same line(s) as the key words?
For my own curiosity, why do you need the product of the frequencies? If the "frequencies" were, for example, event probabilities and those events were independent, then you'd be computing the probability of all those event occurring at the same time. (Although the assumption of independence is seldom justified.) If, instead, you're looking at radiation (sound, light, power) frequencies, I can't think of anything to which the product would relate.
Hi,
I need data below the line of "Phonon frequencies:", and the the line of "end". "Phonon frequencies:" only occurs once. "end" occurs several times in the file, and this "end" actually marks the end of the file.
I need to do the product of these phonon frequencies and they are real.
The file looks like below:
...
Phonon frequencies:
+8168468677723.75879
+8173254220737.11621
...
+22835577550655.71484
end
======================
First I need to locate a specific data block in a huge file (I mean hundreds of thousands of lines). That block starts with "Frequency" in the first row, and ends with "End" in the last row. I need to get the data between this two rows, 1 data in each row.
Then, I will need to do a product of all the data in that block. How can I write a script to do this?
=======================
mcgao07:
If the huge file is a simple text file you can do it yourself. Read some tutorials about bash cat and piping commands. Try google and read about "Bash Scripting". The answer is just within your reach.
If you have made initial work and have need refine your script please post it here so that everyone can help you. But most of all post your entire objective with the parameters and methods you wanted to occur along the formula, not just the product.
I need data below the line of "Phonon frequencies:", and the the line of "end". "Phonon frequencies:" only occurs once. "end" occurs several times in the file, and this "end" actually marks the end of the file.
I need to do the product of these phonon frequencies and they are real.
The file looks like below:
...
Phonon frequencies:
+8168468677723.75879
+8173254220737.11621
...
+22835577550655.71484
end
Thank you.
Michael
The product of numbers of that magnitude will require a very large number of digits. Here's a shell script that does it, but there is no overflow check made in bash arithmetic, and all numbers must be integers, so the answer you'll get is meaningless.
Code:
#!/bin/bash
data=("$(sed -n '/Phonon frequencies/,/end/ {p;}' $1)")
p=1
for d in ${data[@]};do
[ -n "$(echo $d | grep [^+0-9.])" ] && continue
v=$(echo $d | sed -n 's/[+.]//g;p')
p=$(($p*$v))
done
echo Proudct: $p
Perhaps you'd like to re-phrase your question so a solution using, e.g., octave or some other language that can handle very large numbers without loss of precision.
The above program could be modified to extract the numbers you want from the file:
Code:
#!/bin/bash
data=("$(sed -n '/Phonon frequencies/,/end/ {p;}' $1)")
p=1
for d in ${data[@]};do
[ -n "$(echo $d | grep [^+0-9.])" ] && continue
echo $p
done
which you could use as an input file to your product computing program.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.