LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-26-2009, 04:52 AM   #1
deepudixit
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Rep: Reputation: 0
Help to cut a string


Hello,
I have following file list:
sample:
/data1/TEST_LOAD_DATA1.ASC
/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_V_DATA1000.ASC
/data1/TEST_LOAD/Results/LARGE_LOAD_V_DATA2000.ASC
/data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_DATA100.ASC

for each file name I want get the extract few strings and assign
for ex: /data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
what to extract this file what to store
FNAME= TEST_LOAD_MESH_
TYPE= DATA
NUMBER = 1000

for ex: /data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
what to extract this file what to store
FNAME= SMALL_LOAD_
TYPE= DATA
NUMBER = 30
 
Old 11-26-2009, 05:50 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394
Hi,

This should work:
Code:
#!/bin/bash

inFile="$1"

cat $inFile | \
while read THISLINE
do
  fullFileName=${THISLINE##*/}
  FNAME=${fullFileName%_*}_
  tmpPart=${fullFileName##*_}
  restPart=${tmpPart%\.*}
  TYPE=${restPart/%[0-9]*/}
  NUMBER=${restPart/*[A-Z]/}

  echo "-------------------------------------"
  echo "FNAME  : $FNAME"
  echo "TYPE   : $TYPE"
  echo "NUMBER : $NUMBER"

done
Test run:
Quote:
cat infile
/data1/TEST_LOAD_DATA1.ASC
/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_V_DATA1000.ASC
/data1/TEST_LOAD/Results/LARGE_LOAD_V_DATA2000.ASC
/data1/TEST_LOAD/Results/SMALL_LOAD_DATA30.ASC
/data1/TEST_LOAD/Results/TEST_LOAD_DATA100.ASC

./tets infile
-------------------------------------
FNAME : TEST_LOAD_
TYPE : DATA
NUMBER : 1
-------------------------------------
FNAME : TEST_LOAD_MESH_
TYPE : DATA
NUMBER : 1000
-------------------------------------
FNAME : TEST_LOAD_V_
TYPE : DATA
NUMBER : 1000
-------------------------------------
FNAME : LARGE_LOAD_V_
TYPE : DATA
NUMBER : 2000
-------------------------------------
FNAME : SMALL_LOAD_
TYPE : DATA
NUMBER : 30
-------------------------------------
FNAME : TEST_LOAD_
TYPE : DATA
NUMBER : 100
I did assume the following:

1) DATA can be anything, but is always in CAPS,
2) The number part is 1 or larger,
3) the trailing _ (FNAME output) is wanted/needed.

Hope this helps.

Last edited by druuna; 11-26-2009 at 06:39 AM.
 
Old 11-27-2009, 12:43 PM   #3
deepudixit
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Hi Druuna...
Thank you very much .. it was perfect..!
And hey any chance can you explain the syntax..?
Basically how it works..
 
Old 11-27-2009, 01:20 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394Reputation: 2394
Hi,

My solution makes use of bash internals (parameter expansions), which makes it (a lot) faster then using external commands (sed, awk, cut, ....).

The while loop is taking one line at the time from the input file. The THISLINE variable holds that complete line (/data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC for example).

fullFileName=${THISLINE##*/} -> Strips of everything up to and including the right most /. This is done by the blue part. The double ## tells bash to be greedy (otherwise only the /data1/ part would be stripped), the */ is the pattern: * (everything) up to and including the /.
This: /data1/TEST_LOAD/TEST_LOAD_MESH_DATA1000.ASC becomes: TEST_LOAD_MESH_DATA1000.ASC and is placed in fullFileName.

FNAME=${fullFileName%_*}_ -> Basically the same as the ## one, but this one strippes from right to left (# from left to right). There's no need to be greedy, so a single % is used. The _* is the pattern (from right to left) everything up to and including the _.

This: TEST_LOAD_MESH_DATA1000.ASC becomes: TEST_LOAD_MESH The extra _ after the curly bracket is there to honor your specs in post #1). FNAME is filled with: TEST_LOAD_MESH_

The next 2 work the same as the previous 2, the only difference being the pattern that is looked for.
tmpPart=${fullFileName##*_} -> TEST_LOAD_MESH_DATA1000.ASC becomes: DATA1000.ASC
restPart=${tmpPart%\.*} -> DATA1000.ASC becomes: DATA1000

The following two split the content of restPart (DATA1000 in this example). This looks a bit like a sed replace (sed 's/x/y/'). The syntax has a search pattern between the first 2 slashes (/*[A-Z]/) and a replace part after the last slash (which is empty in this case).
TYPE=${restPart/%[0-9]*/} -> Strip only numbers [0-9]* from the end (the leading %). DATA1000 becomes: DATA
NUMBER=${restPart/*[A-Z]/} -> Strip only capitals (*[AZ]) from the beginning (default behaviour, no need for an extra token).

I hope this clears things up a bit. Bash internals are not always easy to understand and I would suggest playing around with them and take a look at the bash manpage (search for: Parameter Expansion). Once you understand them a bit, they will make programming a lot of fun :-)

A few simple examples:
Code:
$ foo="/home/druuna/test.01.txt"
$ echo ${foo##*/}
test.01.txt
$ echo ${foo#*/}
home/druuna/test.01.txt
$ echo ${foo%%/*}

$ echo ${foo%/*}
/home/druuna
$ echo ${foo%.*}
/home/druuna/test.01
$ echo ${foo%%.*}
/home/druuna/test
 
Old 11-27-2009, 06:48 PM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by druuna View Post
Hi,

My solution makes use of bash internals (parameter expansions), which makes it (a lot) faster then using external commands (sed, awk, cut, ....).
the bash's while loop is MUCH slower than external tools like awk, when it comes to big files.
@OP, you can try gawk
Code:
awk -F"." '{
 gsub(/.*\//,"")
 b=gensub("(.*_)(.*[^0-9])([0-9]+).*", "\\1,\\2,\\3","g",$0) 
 m=split(b,parts,",")
 print "FILENAME: "parts[1]
 print "TYPE: "parts[2]
 print "NUM: "parts[3]
}' file

Last edited by ghostdog74; 11-27-2009 at 07:44 PM.
 
Old 11-28-2009, 09:21 AM   #6
deepudixit
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Thank you very much... !
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
help with cut command using find. Cut last 8 characters leaving the rest ncsuapex Programming 4 09-16-2009 09:55 PM
cut part of a string using awk m4rtin Programming 2 09-03-2009 08:32 PM
How to use command grep,cut,awk to cut a data from a file? hocheetiong Linux - Newbie 7 09-11-2008 08:16 PM
how to cut first four digit of a string kkpal Linux - Newbie 3 07-22-2008 03:32 AM
String manipulation (subst for cut) EmLS Programming 4 08-16-2007 01:53 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration