LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 05-11-2009, 03:24 AM   #1
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Rep: Reputation: 0
Question How to parse strings in bash script


Hi Scripting Masters,

This is my first ever post in this forum.

I just want to ask if, is there an easier way to parse string?
say for example: ant-1.3.5-1 is the string. i want to parse this variable into this:

artifactId=ant
version=1.3.5-1

Note: The rule in this parser is that the string may change its length and value because it is located in a directory wherein there are also a lot of other artifacts in the directory. So the first thing that will come up to your mind is that you should create a looping statement wherein all artifacts will be placed in an array. Then, inside the loop is the parsing and assignment of the correct values to the correct fields.

Thanks in advance!

Last edited by crimson08; 05-11-2009 at 03:34 AM.
 
Old 05-11-2009, 03:40 AM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,378

Rep: Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963
Per the LQ Rules, please do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and Google searches) and we'll do our best to help. Also, keep in mind that your instructor might also be an LQ member.
 
Old 05-11-2009, 03:44 AM   #3
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Sorry, posted a message as the mod posted his. Please ignore.
 
Old 05-11-2009, 03:55 AM   #4
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Sorry, this is not a homework. im doing this to migrate my maven 1 jars to maven 2 repository.

This is what i have done so far...

#!/bin/bash

dir=~/.maven/repository
i=1
ctr=1

for file in $dir/*/jars/*.jar
do
jarfile[$i]=$file

#echo ${jarfile[$i]}

len=${#file}

#replace '/' with a white space
var=$(echo "${jarfile[$i]}" | tr '/' ' ')
#echo $var

#get the groupId
gtemp=$(echo $var | awk '{print $5}')
glength=$(echo -n $gtemp | wc -c)
groupId=$(echo $gtemp | cut -c 1-$glength)
echo $groupId

#get the artifactId
artemp=$(echo $var | awk '{print $7}')
#echo $artemp
arlength=$(echo -n $artemp | wc -c)
artifactId=$(echo $artemp | cut -c 1-$((arlength-4)))
#echo $artifactId

#extract the version of the jar from the artifactId
args=$(echo $artifactId | perl -lne '$c++ while /-/g; END {print $c; }')
while [ $ctr -le $(expr $args + 1) ]
do
temp=$(echo $artifactId | cut -d'-' -f $ctr)
numseries=$(echo $temp | sed -e 's/^[0-9]//')
if [ -z $numseries ]
then
echo "null numseries"
else
if [ $temp != $numseries ]
then
echo "$temp is a number!"
else
tempartifact[$ctr]=$temp
echo "artifact: ${tempartifact[$ctr]}"
fi
fi
ctr=$(expr $ctr + 1)
done

echo "artifacts: ${tempartifact[*]}"




#mvn install:install-file -Dfile=${jarfile[$i]} -DgroupId=$groupId -DartifactId=$artifactId -Dversion=$ver -Dpackaging=jar

i=$(expr $i + 1)
done

Last edited by crimson08; 05-11-2009 at 03:59 AM.
 
Old 05-11-2009, 03:59 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
if you have Python,
Code:
#!/usr/bin/env python
thestring="ant-1.3.5-1"
artifactid,version = thestring.split("-",1)
print artifactid,version
this is assuming anything before the first dash is your artfifact id
 
Old 05-11-2009, 04:07 AM   #6
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,378

Rep: Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963
Hmm, not homework yet you quote some explaining explaing the sort of thought process you should adopt when tackling it?? It's a pretty simple one liner in many forms, just a single bash substitution potentially - http://tldp.org/LDP/abs/html/string-manipulation.html Not sure if that will fit in with what the teacher wants though.
 
Old 05-11-2009, 04:11 AM   #7
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ghostdog74 View Post
if you have Python,
Code:
#!/usr/bin/env python
thestring="ant-1.3.5-1"
artifactid,version = thestring.split("-",1)
print artifactid,version
this is assuming anything before the first dash is your artfifact id
Many Thanks!
 
Old 05-11-2009, 04:14 AM   #8
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by acid_kewpie View Post
Hmm, not homework yet you quote some explaining explaing the sort of thought process you should adopt when tackling it?? It's a pretty simple one liner in many forms, just a single bash substitution potentially - http://tldp.org/LDP/abs/html/string-manipulation.html Not sure if that will fit in with what the teacher wants though.
Sir, I appreciate your help but I know how to manipulate strings. The problem is that the strings in each artifact is not fixed in just one artifact id. so i cannot assume that the delimiter should be a dash. example: commons-logging-1.1.8

and the versions would also appear like this: jslt-13.4.5-84.1
 
Old 05-11-2009, 04:25 AM   #9
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,965
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Well ... as long as the "word" part of your artifactid
doesn't contain numbers, and the versions have no alpha
components it's still trivial.

Code:
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-.+/\1/'
commons-logging
jslt
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^[-a-z\.]+-(.+)/\1/'
1.1.8
13.4.5-84.1

If the conditions above DON'T apply I'd say you're
screwed unless you have a dictionary of your IDs,
since a parsing by lexicographic rules w/o language
knowledge is impossible.


cheers,
Tink

Last edited by Tinkster; 05-11-2009 at 04:27 AM.
 
Old 05-11-2009, 04:27 AM   #10
i92guboj
Gentoo support team
 
Registered: May 2008
Location: Lucena, Córdoba (Spain)
Distribution: Gentoo
Posts: 4,036

Rep: Reputation: 372Reputation: 372Reputation: 372Reputation: 372
In other words, both atoms can contain dashes inside of them. Then you are going to have to filter by contents. For anything that's not basic string mangling you are going to have to use something more advanced, like awk or sed.
 
Old 05-11-2009, 04:30 AM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
depending on how each package is named, i am going to assume that the first number encounter and after are the version numbers
Code:
#!/usr/bin/env python
import sys
thestring = sys.argv[1]
for n,c in enumerate(thestring):
    if c.isdigit():
        ind=n
        break
artifactid,version = thestring[:ind-1], thestring[ind:]
print "artifactid: ",artifactid
print "version: " ,version
output
Code:
# ./test.py jslt-13.4.5-84.1
artifactid:  jslt
version:  13.4.5-84.1

# ./test.py commons-logging-1.1.8
artifactid:  commons-logging
version:  1.1.8
 
Old 05-11-2009, 04:34 AM   #12
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Thanks sirs!!! replies are all appreciated. thanks for all the ideas... im just making the code more dynamic...
 
Old 05-11-2009, 04:39 AM   #13
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Tinkster View Post
Well ... as long as the "word" part of your artifactid
doesn't contain numbers, and the versions have no alpha
components it's still trivial.

Code:
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-.+/\1/'
commons-logging
jslt
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^[-a-z\.]+-(.+)/\1/'
1.1.8
13.4.5-84.1

If the conditions above DON'T apply I'd say you're
screwed unless you have a dictionary of your IDs,
since a parsing by lexicographic rules w/o language
knowledge is impossible.


cheers,
Tink

Sir, one last question. how about getting the version from the given artifact? what should i need to configure with the reg ex?
 
Old 05-11-2009, 04:44 AM   #14
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,965
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Code:
echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-(.+)/\1 \2/'|while read arti vers;do echo $arti;echo $vers ;echo "";done
commons-logging
1.1.8

jslt
13.4.5-84.1
Like that?
 
Old 05-11-2009, 04:59 AM   #15
crimson08
LQ Newbie
 
Registered: May 2009
Location: Smallville
Distribution: Ubuntu Hardy
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Tinkster View Post
Code:
echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-(.+)/\1 \2/'|while read arti vers;do echo $arti;echo $vers ;echo "";done
commons-logging
1.1.8

jslt
13.4.5-84.1
Like that?
sir,
is it possible to use this without the while loop?
the output should be like this:

artifactId=commons-logging
version=1.1.8
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse String in a Bash script jimwelc Linux - Newbie 8 11-09-2012 07:47 AM
Want to compare strings in bash script IsharaComix Programming 6 10-28-2008 08:49 PM
Parse XML in bash script MikeyCarter Linux - Software 1 02-16-2007 01:19 PM
bash script help (arrays and strings from files) nkoplm Programming 14 12-02-2005 09:50 AM
bash script help to parse out text slack guy Linux - Newbie 3 12-30-2004 08:42 AM


All times are GMT -5. The time now is 03:56 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration