LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   extract string shell script (https://www.linuxquestions.org/questions/programming-9/extract-string-shell-script-4175678586/)

sunlinux 07-12-2020 08:02 AM

extract string shell script
 
a file contains below entries

[1] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[2] http://live.bb.com/sys/orgs/org/?id=1342907248
[3] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[4] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[5] http://live.bb.com/sys/orgs/org/?id=1342907248
[6] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[7] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[8] http://live.bb.com/sys/orgs/org/?id=1342907248
[9] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[10] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[11] http://live.bb.com/sys/orgs/org/?id=1342907248
[12] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[13] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809868
[14] http://live.bb.com/sys/orgs/org/?id=1342907248
[15] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[16] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809955
[17] http://live.bb.com/sys/orgs/org/?id=1342907248

I want to extract only values after jsp?id= for example from 'jsp?id=1600809955' I want to extract 1600809955 and print it, if any duplicate entry remove while printing

wpeckham 07-12-2020 08:15 AM

Cool. And easy. What have you tried so far?

shruggy 07-12-2020 08:42 AM

I'd suggest using awk, but any script language providing arrays (e.g. bash) will do it easily.

individual 07-12-2020 08:55 AM

There are a lot of ways to accomplish this. A couple have already been suggested, but I would add cut and sort. Check the man pages for each of those.

TB0ne 07-12-2020 10:53 AM

Quote:

Originally Posted by sunlinux (Post 6144535)
a file contains below entries

[1] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[2] http://live.bb.com/sys/orgs/org/?id=1342907248
[3] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[4] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[5] http://live.bb.com/sys/orgs/org/?id=1342907248
[6] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[7] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[8] http://live.bb.com/sys/orgs/org/?id=1342907248
[9] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[10] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[11] http://live.bb.com/sys/orgs/org/?id=1342907248
[12] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[13] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809868
[14] http://live.bb.com/sys/orgs/org/?id=1342907248
[15] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[16] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809955
[17] http://live.bb.com/sys/orgs/org/?id=1342907248

I want to extract only values after jsp?id= for example from 'jsp?id=1600809955' I want to extract 1600809955 and print it, if any duplicate entry remove while printing

This thread has some great tips: https://www.linuxquestions.org/quest...le-4175598104/

...since you asked about extracting strings three years ago. And that thread contains links to several OTHER of your posts over the years, asking for scripts and similar things. Can't you apply what you've been told many times previously, and make it work for you?

You've been here FOURTEEN YEARS now, so you should be well familiar (especially since you've been told many times) with the "Question Guidelines" about doing your own research, and showing your own efforts.

teckk 07-12-2020 12:25 PM

Code:

list="
[1] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[2] http://live.bb.com/sys/orgs/org/?id=1342907248
[3] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
"
cut -d "=" -f2 <<< "$list"

1600810147
1342907248
1342908940

Look at:
man sed
man awk
man grep
man cut
man sort

shruggy 07-12-2020 12:35 PM

@teckk. That's not enough. As individual suggested above, the output of cut should be fed to sort -u. I'd also use the -s / --only-delimited option to cut here, just in case.

Besides,
Quote:

Originally Posted by sunlinux (Post 6144535)
I want to extract only values after jsp?id=

Please note that not all URLs from the top post include jsp, but this may be a misrepresentation on the part of OP.

So, to not repeat you here, the same using fex:
Code:

fex '//\.jsp\?id/=2' <<<"$list"|sort -u

MadeInGermany 07-13-2020 12:07 AM

With awk, split by search pattern and print the RHS if not yet seen
Code:

awk -F'\.jsp\?id=' 'NF>=2 && !($2 in s) { s[$2]; print $2 }'

shruggy 07-13-2020 01:49 AM

@MadeInGermany. Nice.

To sum it up,

Bash:
Code:

#!/bin/bash
while IFS='=' read -r url id
do [[ $url == *.jsp\?id ]] && [[ -n $id ]] && a["$id"]=
done <"$file"
printf %s\\n "${!a[@]}"

POSIX shell:
Code:

#!/bin/sh
set --
while IFS='=' read -r url id
do case $url in *.jsp\?id)
  [ -n "$id" ] && {
    new=true
    for i
    do [ "$i" -eq "$id" ] && { new=false; break;}
    done
    $new && set -- "$id" "$@"
  }
  esac
done <"$file"
IFS='
'; echo "$*"


pan64 07-13-2020 02:07 AM

Code:

sort -t= -u -k2n $file | grep -oP '(?<=jsp\?id=)\d*$'
but post #8 is probably better

shruggy 07-13-2020 02:21 AM

Why not the other way round? Seems a bit easier to me:
Code:

grep -Po '\.jsp\?id=\K\d+' "$file"|sort -u

grail 07-13-2020 03:23 AM

If we can assume the data is as presented:
Code:

awk -F= '!_[$2]++{print $2}' file

shruggy 07-13-2020 03:37 AM

As I said in #7, not sure if the OP really meant what they said, but if yes then this should be
Code:

awk -F= '$1~/[.]jsp[?]id$/ && !_[$2]++, $0=$2' file

Skaperen 07-14-2020 07:18 PM

since this is a programming forum, i assume you are coding this instead of seeking a command line to do it. which language are you using? awk? bash? c? c++? go? java? lua? perl? python? rust? something else? what do you want to happen to lines without "jsp" (such as line 2)?

rnturn 07-15-2020 06:12 AM

Quote:

Originally Posted by sunlinux (Post 6144535)
a file contains below entries

[1] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[2] http://live.bb.com/sys/orgs/org/?id=1342907248
[3] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[4] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810147
[5] http://live.bb.com/sys/orgs/org/?id=1342907248
[6] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[7] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[8] http://live.bb.com/sys/orgs/org/?id=1342907248
[9] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[10] http://live.bb.com/sys/diag/viewrun.jsp?id=1600810122
[11] http://live.bb.com/sys/orgs/org/?id=1342907248
[12] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[13] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809868
[14] http://live.bb.com/sys/orgs/org/?id=1342907248
[15] http://live.bb.com/sys/orgs/org/acc/?id=1342908940
[16] http://live.bb.com/sys/diag/viewrun.jsp?id=1600809955
[17] http://live.bb.com/sys/orgs/org/?id=1342907248

I want to extract only values after jsp?id= for example from 'jsp?id=1600809955' I want to extract 1600809955 and print it, if any duplicate entry remove while printing

Code:

cat file | grep '\.jsp' | cut -d= -f2 | sort | uniq
returns
Code:

1600809868
1600809955
1600810122
1600810147



All times are GMT -5. The time now is 06:55 AM.