LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to find a specific data block in a huge file and then do algebra on them? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-find-a-specific-data-block-in-a-huge-file-and-then-do-algebra-on-them-711745/)

mcgao07 03-15-2009 10:06 AM

How to find a specific data block in a huge file and then do algebra on them?
 
Hello folks,

First I need to locate a specific data block in a huge file (I mean hundreds of thousands of lines). That block starts with "Frequency" in the first row, and ends with "End" in the last row. I need to get the data between this two rows, 1 data in each row.

Then, I will need to do a product of all the data in that block. How can I write a script to do this?

Thank you so much!

Michael

onebuck 03-15-2009 11:14 AM

Hi,

What have you done to date?

You could look at the 'Advanced Bash-Scripting Guide'.

This link and others are available from 'Slackware-Links'. More than just SlackwareŽ links!

PTrenholme 03-15-2009 12:15 PM

Can you provide more details?

1) Where are the words "Frequency" and "End" located in the lines?
2) Is the case of the words significant?
3) Do those words occur other places in the data stream?
4) What format is used for the data values of which you wish to compute the product? (Shell script arithmetic is, basically, integer only.)
5) Do any "frequency" numbers occur on the same line(s) as the key words?

For my own curiosity, why do you need the product of the frequencies? :scratch: If the "frequencies" were, for example, event probabilities and those events were independent, then you'd be computing the probability of all those event occurring at the same time. (Although the assumption of independence is seldom justified.) If, instead, you're looking at radiation (sound, light, power) frequencies, I can't think of anything to which the product would relate.

mcgao07 03-16-2009 11:23 AM

Quote:

Originally Posted by PTrenholme (Post 3476213)
Can you provide more details?

1) Where are the words "Frequency" and "End" located in the lines?
2) Is the case of the words significant?
3) Do those words occur other places in the data stream?
4) What format is used for the data values of which you wish to compute the product? (Shell script arithmetic is, basically, integer only.)
5) Do any "frequency" numbers occur on the same line(s) as the key words?

For my own curiosity, why do you need the product of the frequencies? :scratch: If the "frequencies" were, for example, event probabilities and those events were independent, then you'd be computing the probability of all those event occurring at the same time. (Although the assumption of independence is seldom justified.) If, instead, you're looking at radiation (sound, light, power) frequencies, I can't think of anything to which the product would relate.


Hi,

I need data below the line of "Phonon frequencies:", and the the line of "end". "Phonon frequencies:" only occurs once. "end" occurs several times in the file, and this "end" actually marks the end of the file.
I need to do the product of these phonon frequencies and they are real.

The file looks like below:
...
Phonon frequencies:
+8168468677723.75879
+8173254220737.11621
...
+22835577550655.71484
end

Thank you.

Michael

pixellany 03-16-2009 11:33 AM

Is this homework?

As requested by onebuck, please show what work you have done and tell us specifically where you are stuck.

It's also helpful to post a sample of the data, and a sample of the desired output.

malekmustaq 03-16-2009 12:21 PM

======================
First I need to locate a specific data block in a huge file (I mean hundreds of thousands of lines). That block starts with "Frequency" in the first row, and ends with "End" in the last row. I need to get the data between this two rows, 1 data in each row.

Then, I will need to do a product of all the data in that block. How can I write a script to do this?
=======================

mcgao07:

If the huge file is a simple text file you can do it yourself. Read some tutorials about bash cat and piping commands. Try google and read about "Bash Scripting". The answer is just within your reach.

If you have made initial work and have need refine your script please post it here so that everyone can help you. But most of all post your entire objective with the parameters and methods you wanted to occur along the formula, not just the product.

Good luck.

PTrenholme 03-16-2009 07:21 PM

Quote:

Originally Posted by mcgao07 (Post 3477221)
Hi,

I need data below the line of "Phonon frequencies:", and the the line of "end". "Phonon frequencies:" only occurs once. "end" occurs several times in the file, and this "end" actually marks the end of the file.
I need to do the product of these phonon frequencies and they are real.

The file looks like below:
...
Phonon frequencies:
+8168468677723.75879
+8173254220737.11621
...
+22835577550655.71484
end

Thank you.

Michael

The product of numbers of that magnitude will require a very large number of digits. Here's a shell script that does it, but there is no overflow check made in bash arithmetic, and all numbers must be integers, so the answer you'll get is meaningless.
Code:

#!/bin/bash
data=("$(sed -n '/Phonon frequencies/,/end/ {p;}' $1)")
p=1
for d in ${data[@]};do
  [ -n "$(echo $d | grep [^+0-9.])" ] && continue
  v=$(echo $d | sed -n 's/[+.]//g;p')
  p=$(($p*$v))
done
echo Proudct: $p

Perhaps you'd like to re-phrase your question so a solution using, e.g., octave or some other language that can handle very large numbers without loss of precision.

The above program could be modified to extract the numbers you want from the file:
Code:

#!/bin/bash
data=("$(sed -n '/Phonon frequencies/,/end/ {p;}' $1)")
p=1
for d in ${data[@]};do
  [ -n "$(echo $d | grep [^+0-9.])" ] && continue
  echo $p
done

which you could use as an input file to your product computing program.


All times are GMT -5. The time now is 08:03 AM.