[SOLVED] bash remove random text from command output

aristosv · 06-03-2019, 03:19 PM

I have a command output as shown below

Code:

8:15am  random text here 99629360 random text here
9:00am  random text here 99799779 random text here
10:00am  random text here 99102831 random text here
11:45am  random text here 99629320 random text here
12:30pm  random text here 96678497 random text here
2:30pm  random text here 99762314 random text here
3:00pm  random text here 99833711 random text here
5:15pm  random text here 99305212 random text here
6:00pm  random text here 96500528 random text here
7:00pm  random text here 99711372 random text here

This is 2 columns. One shows a time and the other some random text and some random numbers.

I need to remove all random text from the second column. I need to be left with the first column showing times and the second column showing only numbers. No random text.

How can I do this?

Thanks

wpeckham · 06-03-2019, 05:17 PM

Using BASH and the cut comment? (man cut)
There is also some pattern matching ability within bash that might serve. (man ash)

Or, if you want to combine your bash with something other than cut or the internal pattern matching and string handling:
With a simple PERL script?

I would not use AWK/NAWK, but only becauseI like PERL better. IT is certainly up to the task.

What exactly are your limits, and what are the characteristics of the text you are calling random?
I seriously doubt if the text is random, as generating truly random text would be seriously challenging.

Does any of the random text that you want stripped contain digits? If not, there is a clue as to how to craft your pattern matching to pick out the desired fields for display!

What exactly have you tried so far? How did that work for you?

syg00 · 06-03-2019, 06:09 PM

Add sed to the list. If feeling particularly masochistic, grep with perlre ...

The trick is to use regex to keep what you want, rather than go through contortions trying to define the (multiple) "random text" to delete.

BW-userx · 06-03-2019, 06:26 PM

something a simple as cut works.

Code:

#!/bin/bash

while read -r f
do
echo $f
echo $(echo "$f" |  cut -d" " -f1-6)
done < data

considering it is all like data.

Code:

7:00pm random text here 99711372 random text here
7:00pm random text here 99711372

if that is what you are looking for.

aristosv · 06-03-2019, 10:59 PM

ok, lets not focus on removing the random text then.
The similarities in the numbers are that they all start with 99, 96 or 95.
Would this help in keeping just the numbers, instead of focusing on removing the random text?
Also the numbers are always 8 digits.

aristosv · 06-03-2019, 11:03 PM

So this works

Code:

sed 's/[^0-9]*//g'

But I need to apply it only to the second column

syg00 · 06-03-2019, 11:26 PM

As I say, concentrate on what you need, not what you don't need. How good are your regex skills - do you understand back references ? For
example this will select only 8 consecutive digits - the same regex will work in sed.

Code:

grep -Eo "[0-9]{8}" your.file

tofino_surfer · 06-04-2019, 12:02 AM

Quote:

something a simple as cut works.
#!/bin/bash

while read -r f
do
echo $f
echo $(echo "$f" | cut -d" " -f1-6)
done < data

considering it is all like data.
7:00pm random text here 99711372 random text here
7:00pm random text here 99711372

I highly doubt that the random text referred to by the OP are the three actual words "random text here" which wouldn't really be random. The number of fields in actual random text is unknown.

BW-userx · 06-04-2019, 06:41 AM

Quote:

Originally Posted by tofino_surfer

I highly doubt that the random text referred to by the OP are the three actual words "random text here" which wouldn't really be random. The number of fields in actual random text is unknown.

do you always take everything out of context to try and prove a point, not paying attention to detail, or even trying to?

You should not take what I did out of context, (or any one for that matter) to try and prove your point, that is being dishonest by obscuring the truth of the matter. You removed my final statement, and the beginning of the OP's statement to try and prove your point. my final statement, "if that is what you are looking for." Which clearly means what?

and he OP clearly stated, "I need to remove all random text from the second column."

yes one then needs to guess what the second column really is, seeing that the keys words used here is random text, and second column , that is where I'd start, on the second random and remove all of it from there, because he too use the word ALL in the start of his sentence.

the use of the words "remove all" means what in conjunction with the rest of the sentence "random text from the second column"?

the guessing part is, is it is really random text?

If it is to be what you are trying to imply then a less confusing sentence for you would then be to say. "I need to just remove the second word random from the string." Which is no doubt more explicit.

if the OP actually means just remove the second word random from the string, then he or she really needs to work on there English more as well as Linux anything. Which that part is conjecture and the person in question is not here to comment on this.

syg00 · 06-04-2019, 06:50 AM

I tend to be a reluctant user of "pure" bash, but further testing reveals its regex matching and BASH_REMATCH[] actually works very well here. Without the need for any external program like sed/awk/perl ...

Off the top of my head, here is one solution I came up with.

Code:

while read line ; do if [[ $line =~ ^([^ ]+)[[:space:]].*[[:space:]]([0-9]+)[[:space:]].* ]] ; then echo ${BASH_REMATCH[1]}" "${BASH_REMATCH[2}} ; fi ; done < your.file

BW-userx · 06-04-2019, 07:06 AM

Quote:

Originally Posted by aristosv

ok, lets not focus on removing the random text then.
The similarities in the numbers are that they all start with 99, 96 or 95.
Would this help in keeping just the numbers, instead of focusing on removing the random text?
Also the numbers are always 8 digits.

so you do not really need to remove the random text, but are needing to remove the eight digit numbers from the complete string?
Or are you now wanting to remove everything but the numbers?
Or you just need to remove everything after the 8 digit numbers?

as I stated that if your data is always the same then cut as I did will always work.

what is the actual criteria(s) that someone might have told you that you need to do to complete this task?

Verbatim please.

--- mod: now seeing what others have (just?) done as I was posing this. --- let me go see what they think you are now trying to says.

that BASH_REMATCH

gets this with me

Code:

$ ./bashremove
./bashremove: line 1: ${BASH_REMATCH[1]}" "${BASH_REMATCH[2}}: bad substitution

this gets the second occurrence of the WORD random removed from the strings.

Code:

$ sed 's@random@@2' data
8:15am  random text here 99629360  text here
9:00am  random text here 99799779  text here
10:00am  random text here 99102831  text here
11:45am  random text here 99629320  text here
12:30pm  random text here 96678497  text here
2:30pm  random text here 99762314  text here
3:00pm  random text here 99833711  text here
5:15pm  random text here 99305212  text here
6:00pm  random text here 96500528  text here
7:00pm  random text here 99711372  text here

again what exactly is the results you are looking for?
just keeping the 8 digit numbers, or now removing from the start of the 8 digits, and where exaclly is the second column starting?

Code:

                 column
roll 0  1        2     3   4     5        6      7    8
     1 7:00pm  random text here 99711372  random text here
     2

is that a correct assessment?

try showing us a final product you are looking for so we all can then know what to figure out in how to get that. as a picture speaks a thousands words.

aristosv · 06-04-2019, 08:25 AM

I really didn't mean to cause a whole discussion for this. I though my question was clear, but it seems I was wrong. When I said "random text here" I didn't actually mean that the words "random text here" were in the command output. Random text could be anything. It could be a name, place or food. Sorry if I wasn't clear.

The point is, like I originally said, I need to be left with the first column showing the times and the second column showing only numbers. So no need to touch the first column. I just want to modify the second column so that it only shows the numbers. I have to remove all the text, only from the second column.

BW-userx · 06-04-2019, 08:30 AM

Quote:

Originally Posted by aristosv

I really didn't mean to cause a whole discussion for this. I though my question was clear, but it seems I was wrong. When I said "random text here" I didn't actually mean that the words "random text here" were in the command output. Random text could be anything. It could be a name, place or food. Sorry if I wasn't clear.

The point is, like I originally said, I need to be left with the first column showing the times and the second column showing only numbers. So no need to touch the first column. I just want to modify the second column so that it only shows the numbers. I have to remove all the text, only from the second column.

this is how you show an example of what you're looking for.
example #1

Code:

#before
7:00pm  random text here 99711372 random text here
#after
7:00pm 99711372

removing all random text within the string.

this is what you want?
or this
example #2

Code:

#before
7:00pm  random text here 99711372 random text here
#after
7:00pm 99711372  random text here

aristosv · 06-04-2019, 08:40 AM

the correct is example 1

BW-userx · 06-04-2019, 09:09 AM

Quote:

Originally Posted by aristosv

the correct is example 1

run this and see what you think

Code:

#!/bin/bash

#array of strings

data=(
"8:15am  random text here 99629360 random text here"
"9:00am  random text here 99799779 random text here"
"10:00am  random text here 99102831 random text here"
"11:45am  random text here 99629320 random text here"
"12:30pm  random text here 96678497 random text here"
"2:30pm  random text here 99762314 random text here"
"3:00pm  random text here 99833711 random text here"
"5:15pm  random text here 99305212 random text here"
"6:00pm  random text here 96500528 random text here"
)


for ((i=0;i<${#data[@]};i++))
do


 part1=$( echo ${data[$i]} | sed 's/[A-Za-z]*//g' | fmt -u )
 part2=$( echo ${data[$i]} | sed 's/[A-Za-z]*//g' |  awk '{print $1 " " $2 " " $5}' )
 
 echo "p1 $part1"
 echo "p2 $part2"
 echo
#split the string to keep the am or pm on the leading part of string.

part3=${data[$i]%% *}
part4=$(echo ${data[$i]} | sed 's/[^0-9]*//g')
echo
echo "p3 $part3"
echo "p4 $part4"
echo "
final product is:
$part3 $part4
"

done