LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   bash remove random text from command output (https://www.linuxquestions.org/questions/linux-newbie-8/bash-remove-random-text-from-command-output-4175655075/)

aristosv 06-03-2019 03:19 PM

bash remove random text from command output
 
I have a command output as shown below

Code:

8:15am  random text here 99629360 random text here
9:00am  random text here 99799779 random text here
10:00am  random text here 99102831 random text here
11:45am  random text here 99629320 random text here
12:30pm  random text here 96678497 random text here
2:30pm  random text here 99762314 random text here
3:00pm  random text here 99833711 random text here
5:15pm  random text here 99305212 random text here
6:00pm  random text here 96500528 random text here
7:00pm  random text here 99711372 random text here

This is 2 columns. One shows a time and the other some random text and some random numbers.

I need to remove all random text from the second column. I need to be left with the first column showing times and the second column showing only numbers. No random text.

How can I do this?

Thanks

wpeckham 06-03-2019 05:17 PM

Using BASH and the cut comment? (man cut)
There is also some pattern matching ability within bash that might serve. (man ash)

Or, if you want to combine your bash with something other than cut or the internal pattern matching and string handling:
With a simple PERL script?

I would not use AWK/NAWK, but only becauseI like PERL better. IT is certainly up to the task.

What exactly are your limits, and what are the characteristics of the text you are calling random?
I seriously doubt if the text is random, as generating truly random text would be seriously challenging.

Does any of the random text that you want stripped contain digits? If not, there is a clue as to how to craft your pattern matching to pick out the desired fields for display!

What exactly have you tried so far? How did that work for you?

syg00 06-03-2019 06:09 PM

Add sed to the list. If feeling particularly masochistic, grep with perlre ...

The trick is to use regex to keep what you want, rather than go through contortions trying to define the (multiple) "random text" to delete.

BW-userx 06-03-2019 06:26 PM

something a simple as cut works.
Code:

#!/bin/bash

while read -r f
do
echo $f
echo $(echo "$f" |  cut -d" " -f1-6)
done < data

considering it is all like data.
Code:

7:00pm random text here 99711372 random text here
7:00pm random text here 99711372

if that is what you are looking for.

aristosv 06-03-2019 10:59 PM

ok, lets not focus on removing the random text then.
The similarities in the numbers are that they all start with 99, 96 or 95.
Would this help in keeping just the numbers, instead of focusing on removing the random text?
Also the numbers are always 8 digits.

aristosv 06-03-2019 11:03 PM

So this works

Code:

sed 's/[^0-9]*//g'
But I need to apply it only to the second column

syg00 06-03-2019 11:26 PM

As I say, concentrate on what you need, not what you don't need. How good are your regex skills - do you understand back references ? For
example this will select only 8 consecutive digits - the same regex will work in sed.
Code:

grep -Eo "[0-9]{8}" your.file

tofino_surfer 06-04-2019 12:02 AM

Quote:

something a simple as cut works.
#!/bin/bash

while read -r f
do
echo $f
echo $(echo "$f" | cut -d" " -f1-6)
done < data

considering it is all like data.
7:00pm random text here 99711372 random text here
7:00pm random text here 99711372
I highly doubt that the random text referred to by the OP are the three actual words "random text here" which wouldn't really be random. The number of fields in actual random text is unknown.

BW-userx 06-04-2019 06:41 AM

Quote:

Originally Posted by tofino_surfer (Post 6001893)
I highly doubt that the random text referred to by the OP are the three actual words "random text here" which wouldn't really be random. The number of fields in actual random text is unknown.

do you always take everything out of context to try and prove a point, not paying attention to detail, or even trying to?

You should not take what I did out of context, (or any one for that matter) to try and prove your point, that is being dishonest by obscuring the truth of the matter. You removed my final statement, and the beginning of the OP's statement to try and prove your point. my final statement, "if that is what you are looking for." Which clearly means what?

and he OP clearly stated, "I need to remove all random text from the second column."

yes one then needs to guess what the second column really is, seeing that the keys words used here is random text, and second column , that is where I'd start, on the second random and remove all of it from there, because he too use the word ALL in the start of his sentence.

the use of the words "remove all" means what in conjunction with the rest of the sentence "random text from the second column"?

the guessing part is, is it is really random text?


If it is to be what you are trying to imply then a less confusing sentence for you would then be to say. "I need to just remove the second word random from the string." Which is no doubt more explicit.

if the OP actually means just remove the second word random from the string, then he or she really needs to work on there English more as well as Linux anything. Which that part is conjecture and the person in question is not here to comment on this.

syg00 06-04-2019 06:50 AM

I tend to be a reluctant user of "pure" bash, but further testing reveals its regex matching and BASH_REMATCH[] actually works very well here. Without the need for any external program like sed/awk/perl ...

Off the top of my head, here is one solution I came up with.
Code:

while read line ; do if [[ $line =~ ^([^ ]+)[[:space:]].*[[:space:]]([0-9]+)[[:space:]].* ]] ; then echo ${BASH_REMATCH[1]}" "${BASH_REMATCH[2}} ; fi ; done < your.file

BW-userx 06-04-2019 07:06 AM

Quote:

Originally Posted by aristosv (Post 6001876)
ok, lets not focus on removing the random text then.
The similarities in the numbers are that they all start with 99, 96 or 95.
Would this help in keeping just the numbers, instead of focusing on removing the random text?
Also the numbers are always 8 digits.

so you do not really need to remove the random text, but are needing to remove the eight digit numbers from the complete string?
Or are you now wanting to remove everything but the numbers?
Or you just need to remove everything after the 8 digit numbers?

as I stated that if your data is always the same then cut as I did will always work.


what is the actual criteria(s) that someone might have told you that you need to do to complete this task?

Verbatim please.


--- mod: now seeing what others have (just?) done as I was posing this. --- let me go see what they think you are now trying to says.

that BASH_REMATCH

gets this with me
Code:

$ ./bashremove
./bashremove: line 1: ${BASH_REMATCH[1]}" "${BASH_REMATCH[2}}: bad substitution

this gets the second occurrence of the WORD random removed from the strings.
Code:

$ sed 's@random@@2' data
8:15am  random text here 99629360  text here
9:00am  random text here 99799779  text here
10:00am  random text here 99102831  text here
11:45am  random text here 99629320  text here
12:30pm  random text here 96678497  text here
2:30pm  random text here 99762314  text here
3:00pm  random text here 99833711  text here
5:15pm  random text here 99305212  text here
6:00pm  random text here 96500528  text here
7:00pm  random text here 99711372  text here

again what exactly is the results you are looking for?
just keeping the 8 digit numbers, or now removing from the start of the 8 digits, and where exaclly is the second column starting?

Code:

                column
roll 0  1        2    3  4    5        6      7    8
    1 7:00pm  random text here 99711372  random text here
    2

is that a correct assessment?

try showing us a final product you are looking for so we all can then know what to figure out in how to get that. as a picture speaks a thousands words.

aristosv 06-04-2019 08:25 AM

I really didn't mean to cause a whole discussion for this. I though my question was clear, but it seems I was wrong. When I said "random text here" I didn't actually mean that the words "random text here" were in the command output. Random text could be anything. It could be a name, place or food. Sorry if I wasn't clear.

The point is, like I originally said, I need to be left with the first column showing the times and the second column showing only numbers. So no need to touch the first column. I just want to modify the second column so that it only shows the numbers. I have to remove all the text, only from the second column.

BW-userx 06-04-2019 08:30 AM

Quote:

Originally Posted by aristosv (Post 6002025)
I really didn't mean to cause a whole discussion for this. I though my question was clear, but it seems I was wrong. When I said "random text here" I didn't actually mean that the words "random text here" were in the command output. Random text could be anything. It could be a name, place or food. Sorry if I wasn't clear.

The point is, like I originally said, I need to be left with the first column showing the times and the second column showing only numbers. So no need to touch the first column. I just want to modify the second column so that it only shows the numbers. I have to remove all the text, only from the second column.

this is how you show an example of what you're looking for.
example #1
Code:

#before
7:00pm  random text here 99711372 random text here
#after
7:00pm 99711372

removing all random text within the string.

this is what you want?
or this
example #2
Code:

#before
7:00pm  random text here 99711372 random text here
#after
7:00pm 99711372  random text here


aristosv 06-04-2019 08:40 AM

the correct is example 1

BW-userx 06-04-2019 09:09 AM

Quote:

Originally Posted by aristosv (Post 6002030)
the correct is example 1

run this and see what you think
Code:

#!/bin/bash

#array of strings

data=(
"8:15am  random text here 99629360 random text here"
"9:00am  random text here 99799779 random text here"
"10:00am  random text here 99102831 random text here"
"11:45am  random text here 99629320 random text here"
"12:30pm  random text here 96678497 random text here"
"2:30pm  random text here 99762314 random text here"
"3:00pm  random text here 99833711 random text here"
"5:15pm  random text here 99305212 random text here"
"6:00pm  random text here 96500528 random text here"
)


for ((i=0;i<${#data[@]};i++))
do


 part1=$( echo ${data[$i]} | sed 's/[A-Za-z]*//g' | fmt -u )
 part2=$( echo ${data[$i]} | sed 's/[A-Za-z]*//g' |  awk '{print $1 " " $2 " " $5}' )
 
 echo "p1 $part1"
 echo "p2 $part2"
 echo
#split the string to keep the am or pm on the leading part of string.

part3=${data[$i]%% *}
part4=$(echo ${data[$i]} | sed 's/[^0-9]*//g')
echo
echo "p3 $part3"
echo "p4 $part4"
echo "
final product is:
$part3 $part4
"

done


allend 06-04-2019 10:59 AM

Quote:

I have a command output as shown below
Not the bash solution requested, but maybe pipe the command output to sed
Code:

<your command> | sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'

tofino_surfer 06-04-2019 03:19 PM

BW-userx have a look at the solution posted by allend

Quote:

Not the bash solution requested, but maybe pipe the command output to sed
Since sed is more powerful for text processing there is no need to apologize.

Code:

<your command> | sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'
This sed one liner does exactly what the OP requested which was extract the time at the start of the line and the 8-digit number and discard everything else.

Code:

$ echo 8:15am  random text here 99629360 random text here | sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'
8:15am 99629360


allend understood the problem fully and did in one line what you failed to do in five very long and sometimes angry posts with convoluted scripts.

Quote:

and he OP clearly stated, "I need to remove all random text from the second column."

yes one then needs to guess what the second column really is, seeing that the keys words used here is random text, and second column , that is where I'd start, on the second random and remove all of it from there, because he too use the word ALL in the start of his sentence.

the use of the words "remove all" means what in conjunction with the rest of the sentence "random text from the second column"?

the guessing part is, is it is really random text?


If it is to be what you are trying to imply then a less confusing sentence for you would then be to say. "I need to just remove the second worand he OP clearly stated, "I need to remove all random text from the second column."

yes one then needs to guess what the second column really is, seeing that the keys words used here is random text, and second column , that is where I'd start, on the second random and remove all of it from there, because he too use the word ALL in the start of his sentence.

the use of the words "remove all" means what in conjunction with the rest of the sentence "random text from the second column"?

the guessing part is, is it is really random text?


If it is to be what you are trying to imply then a less confusing sentence for you would then be to say. "I need to just remove the second word random from the string." Which is no doubt more explicit.d random from the string." Which is no doubt more explicit.

BW-userx 06-04-2019 03:28 PM

Quote:

Originally Posted by tofino_surfer (Post 6002148)
BW-userx have a look at the solution posted by allend



Since sed is more powerful for text processing there is no need to apologize.

Code:

<your command> | sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'
allend understood the problem fully and did in one line what you failed to do in five very long and sometimes angry posts with convoluted scripts.

This sed one liner does exactly what the OP requested.

and this quip at me with the use of someone elses work no less, a different solution then mine or others, after many have already tried to basically understand what OP is actually saying, is because why?

a good program is one that works, so its no real sweat off my @(#@s :p
You really need to properly work on your self esteem.

tofino_surfer 06-04-2019 03:39 PM

Quote:

and this quip at me with the use of someone elses work no less, a different solution then mine or others, after many have already tried to basically understand what OP is actually saying, is because why?
Everyone else did seem to understand what the OP was saying.

Quote:

if the OP actually means just remove the second word random from the string, then he or she really needs to work on there English more as well as Linux anything
Don't you mean their English ?

BW-userx 06-04-2019 03:43 PM

Quote:

Originally Posted by tofino_surfer (Post 6002151)
Everyone else did seem to understand what the OP was saying.



Don't you mean their English ?

now you're just trying to start an argument to prove yourself right and me wrong, to feed your ego.
puts you on permanent ignore.

crts 06-05-2019 12:23 AM

Pure Bash solution
 
If your random text does not contain any digits then you can also use this:
Code:

$ shopt -s extglob
$ <your command>|while read line;do echo "${line// *([^0-9])/ }";done

If there are digits in random text then it will break. I consider allend's 'sed' solution as more robust, if the second number always consists of exactly eight digits. It is an easy fix, though, if it does not.

I am posting this solution mainly because I like its simplicity - no RegEx backreferencing and no convoluted loops, KISS principle.

MadeInGermany 06-05-2019 03:38 AM

The following prints column #1 and the first column with a big number
Code:

awk '{
for (i=2; i<=NF && $i+0<=90000000; i++);
print $1,$i
}' inputfile


allend 06-05-2019 10:58 AM

As the OP has not returned and we are now playing code golf, I will say that I was expecting to be slapped for my sed, as it will fail if the string represented by "random text here" contains a colon character, due to the greedy nature of matching in sed.
A better solution that tightens the regular expression to match the time
Code:

sed -E "s/^([0-9]+:[0-5][0-9][ap]m).*([0-9]{8}).*/\1 \2/"

aristosv 06-05-2019 11:00 AM

Quote:

Originally Posted by allend (Post 6002066)
Code:

<your command> | sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'

Eventually I used this code. Thanks everyone for your help.

@allend if you have the time, it would be interesting to know the logic behind this code. I can only understand a few parts of it, but not everything.

Thanks

aristosv 06-05-2019 11:03 AM

Thanks for the better solution. The second column now became the 5th column. How can I change your code to represent this change?

Thanks

aristosv 06-05-2019 11:04 AM

I forgot to mention, now there is no am, pm in the output. It's all military time. So no need to account for that.

BW-userx 06-05-2019 11:09 AM

Getting the quick answer is nice, but have you tried playing around with sed, awk, (regular expressions') regex? there is a vast amount of information to help you learn as well.

if you look at that link it should help you figure out that sed command ..

allend 06-05-2019 11:41 AM

Quote:

sed -E 's/^(.*:.{5}).*([[:digit:]]{8}).*/\1\2/'
sed is shorthand for stream editor. It operates on the stream from command output.
The -E option calls for interpretation of extended regular expressions. The s/<what to match>/<what to replace>/ construction is the sed substitute option.
You wanted a match on eight consecutive digit characters, so [[:digit:]]{8}). This can also be written [0-9]{8} as suggested by syg00.
You also wanted a match on the time at the start of a line, so ^, the anchor for the start of a line, then .*:.{5} to match 0 or more characters followed by a colon followed by five more characters. The parentheses around the expressions tells sed to keep those as back references.
The back references are used in the <what to replace> part of the sed substitute option, hence the \1 (first back reference) and \2 (second back reference).
The ' characters protect the sed substitute option from interpretation by the shell where the command is run.

'info sed' will provide a more complete explanation.

aristosv 06-05-2019 11:46 AM

Thank you very much


All times are GMT -5. The time now is 02:44 AM.