LinuxQuestions.org - Help to cut a string

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Help to cut a string (https://www.linuxquestions.org/questions/programming-9/help-to-cut-a-string-771665/)

Help to cut a string

Hello,
I have following file list:
sample:
/data1/TEST_LOAD_DATA1.ASC
/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_V_DATA1000.ASC
/data1/TEST_LOAD/Results/LARGE_LOAD_V_DATA2000.ASC
/data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_DATA100.ASC

for each file name I want get the extract few strings and assign
for ex: /data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
what to extract this file what to store
FNAME= TEST_LOAD_MESH_
TYPE= DATA
NUMBER = 1000

for ex: /data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
what to extract this file what to store
FNAME= SMALL_LOAD_
TYPE= DATA
NUMBER = 30

Hi,

This should work:

Code:

#!/bin/bash



inFile="$1"



cat $inFile | \

while read THISLINE

do

  fullFileName=${THISLINE##*/}

  FNAME=${fullFileName%_*}_

  tmpPart=${fullFileName##*_}

  restPart=${tmpPart%\.*}

  TYPE=${restPart/%[0-9]*/}

  NUMBER=${restPart/*[A-Z]/}



  echo "-------------------------------------"

  echo "FNAME  : $FNAME"

  echo "TYPE  : $TYPE"

  echo "NUMBER : $NUMBER"



done

Test run:

Quote:

cat infile
/data1/TEST_LOAD_DATA1.ASC
/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_V_DATA1000.ASC
/data1/TEST_LOAD/Results/LARGE_LOAD_V_DATA2000.ASC
/data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_DATA100.ASC

./tets infile
-------------------------------------
FNAME : TEST_LOAD_
TYPE : DATA
NUMBER : 1
-------------------------------------
FNAME : TEST_LOAD_MESH_
TYPE : DATA
NUMBER : 1000
-------------------------------------
FNAME : TEST_LOAD_V_
TYPE : DATA
NUMBER : 1000
-------------------------------------
FNAME : LARGE_LOAD_V_
TYPE : DATA
NUMBER : 2000
-------------------------------------
FNAME : SMALL_LOAD_
TYPE : DATA
NUMBER : 30
-------------------------------------
FNAME : TEST_LOAD_
TYPE : DATA
NUMBER : 100

I did assume the following:

1) DATA can be anything, but is always in CAPS,
2) The number part is 1 or larger,
3) the trailing _ (FNAME output) is wanted/needed.

Hope this helps.

Hi Druuna...
Thank you very much .. it was perfect..!:)
And hey any chance can you explain the syntax..?
Basically how it works..

Hi,

My solution makes use of bash internals (parameter expansions), which makes it (a lot) faster then using external commands (sed, awk, cut, ....).

The while loop is taking one line at the time from the input file. The THISLINE variable holds that complete line (/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC for example).

fullFileName=${THISLINE##*/} -> Strips of everything up to and including the right most /. This is done by the blue part. The double ## tells bash to be greedy (otherwise only the /data1/ part would be stripped), the */ is the pattern: * (everything) up to and including the /.
This: /data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC becomes: TEST_LOAD_MESH_DATA1000.ASC and is placed in fullFileName.

FNAME=${fullFileName%_*}_ -> Basically the same as the ## one, but this one strippes from right to left (# from left to right). There's no need to be greedy, so a single % is used. The _* is the pattern (from right to left) everything up to and including the _.

This: TEST_LOAD_MESH_DATA1000.ASC becomes: TEST_LOAD_MESH The extra _ after the curly bracket is there to honor your specs in post #1). FNAME is filled with: TEST_LOAD_MESH_

The next 2 work the same as the previous 2, the only difference being the pattern that is looked for.
tmpPart=${fullFileName##*_} -> TEST_LOAD_MESH_DATA1000.ASC becomes: DATA1000.ASC
restPart=${tmpPart%\.*} -> DATA1000.ASC becomes: DATA1000

The following two split the content of restPart (DATA1000 in this example). This looks a bit like a sed replace (sed 's/x/y/'). The syntax has a search pattern between the first 2 slashes (/*[A-Z]/) and a replace part after the last slash (which is empty in this case).
TYPE=${restPart/%[0-9]*/} -> Strip only numbers [0-9]* from the end (the leading %). DATA1000 becomes: DATA
NUMBER=${restPart/*[A-Z]/} -> Strip only capitals (*[AZ]) from the beginning (default behaviour, no need for an extra token).

I hope this clears things up a bit. Bash internals are not always easy to understand and I would suggest playing around with them and take a look at the bash manpage (search for: Parameter Expansion). Once you understand them a bit, they will make programming a lot of fun :-)

A few simple examples:

Code:

$ foo="/home/druuna/test.01.txt"

$ echo ${foo##*/}

test.01.txt

$ echo ${foo#*/}

home/druuna/test.01.txt

$ echo ${foo%%/*}



$ echo ${foo%/*}

/home/druuna

$ echo ${foo%.*}

/home/druuna/test.01

$ echo ${foo%%.*}

/home/druuna/test

Quote:

Originally Posted by druuna (Post 3771630)

Hi,

My solution makes use of bash internals (parameter expansions), which makes it (a lot) faster then using external commands (sed, awk, cut, ....).

the bash's while loop is MUCH slower than external tools like awk, when it comes to big files.
@OP, you can try gawk

Code:

awk -F"." '{

 gsub(/.*\//,"")

 b=gensub("(.*_)(.*[^0-9])([0-9]+).*", "\\1,\\2,\\3","g",$0) 

 m=split(b,parts,",")

 print "FILENAME: "parts[1]

 print "TYPE: "parts[2]

 print "NUM: "parts[3]

}' file

Thank you very much... !