LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to parse strings in bash script (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-parse-strings-in-bash-script-725134/)

crimson08 05-11-2009 03:24 AM

How to parse strings in bash script
 
Hi Scripting Masters,

This is my first ever post in this forum.

I just want to ask if, is there an easier way to parse string?
say for example: ant-1.3.5-1 is the string. i want to parse this variable into this:

artifactId=ant
version=1.3.5-1

Note: The rule in this parser is that the string may change its length and value because it is located in a directory wherein there are also a lot of other artifacts in the directory. So the first thing that will come up to your mind is that you should create a looping statement wherein all artifacts will be placed in an array. Then, inside the loop is the parsing and assignment of the correct values to the correct fields.

Thanks in advance!

acid_kewpie 05-11-2009 03:40 AM

Per the LQ Rules, please do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and Google searches) and we'll do our best to help. Also, keep in mind that your instructor might also be an LQ member.

jschiwal 05-11-2009 03:44 AM

Sorry, posted a message as the mod posted his. Please ignore.

crimson08 05-11-2009 03:55 AM

Sorry, this is not a homework. im doing this to migrate my maven 1 jars to maven 2 repository.

This is what i have done so far...

#!/bin/bash

dir=~/.maven/repository
i=1
ctr=1

for file in $dir/*/jars/*.jar
do
jarfile[$i]=$file

#echo ${jarfile[$i]}

len=${#file}

#replace '/' with a white space
var=$(echo "${jarfile[$i]}" | tr '/' ' ')
#echo $var

#get the groupId
gtemp=$(echo $var | awk '{print $5}')
glength=$(echo -n $gtemp | wc -c)
groupId=$(echo $gtemp | cut -c 1-$glength)
echo $groupId

#get the artifactId
artemp=$(echo $var | awk '{print $7}')
#echo $artemp
arlength=$(echo -n $artemp | wc -c)
artifactId=$(echo $artemp | cut -c 1-$((arlength-4)))
#echo $artifactId

#extract the version of the jar from the artifactId
args=$(echo $artifactId | perl -lne '$c++ while /-/g; END {print $c; }')
while [ $ctr -le $(expr $args + 1) ]
do
temp=$(echo $artifactId | cut -d'-' -f $ctr)
numseries=$(echo $temp | sed -e 's/^[0-9]//')
if [ -z $numseries ]
then
echo "null numseries"
else
if [ $temp != $numseries ]
then
echo "$temp is a number!"
else
tempartifact[$ctr]=$temp
echo "artifact: ${tempartifact[$ctr]}"
fi
fi
ctr=$(expr $ctr + 1)
done

echo "artifacts: ${tempartifact[*]}"




#mvn install:install-file -Dfile=${jarfile[$i]} -DgroupId=$groupId -DartifactId=$artifactId -Dversion=$ver -Dpackaging=jar

i=$(expr $i + 1)
done

ghostdog74 05-11-2009 03:59 AM

if you have Python,
Code:

#!/usr/bin/env python
thestring="ant-1.3.5-1"
artifactid,version = thestring.split("-",1)
print artifactid,version

this is assuming anything before the first dash is your artfifact id

acid_kewpie 05-11-2009 04:07 AM

Hmm, not homework yet you quote some explaining explaing the sort of thought process you should adopt when tackling it?? It's a pretty simple one liner in many forms, just a single bash substitution potentially - http://tldp.org/LDP/abs/html/string-manipulation.html Not sure if that will fit in with what the teacher wants though. :)

crimson08 05-11-2009 04:11 AM

Quote:

Originally Posted by ghostdog74 (Post 3536788)
if you have Python,
Code:

#!/usr/bin/env python
thestring="ant-1.3.5-1"
artifactid,version = thestring.split("-",1)
print artifactid,version

this is assuming anything before the first dash is your artfifact id

Many Thanks!

crimson08 05-11-2009 04:14 AM

Quote:

Originally Posted by acid_kewpie (Post 3536792)
Hmm, not homework yet you quote some explaining explaing the sort of thought process you should adopt when tackling it?? It's a pretty simple one liner in many forms, just a single bash substitution potentially - http://tldp.org/LDP/abs/html/string-manipulation.html Not sure if that will fit in with what the teacher wants though. :)

Sir, I appreciate your help but I know how to manipulate strings. The problem is that the strings in each artifact is not fixed in just one artifact id. so i cannot assume that the delimiter should be a dash. example: commons-logging-1.1.8

and the versions would also appear like this: jslt-13.4.5-84.1

Tinkster 05-11-2009 04:25 AM

Well ... as long as the "word" part of your artifactid
doesn't contain numbers, and the versions have no alpha
components it's still trivial.

Code:

[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-.+/\1/'
commons-logging
jslt
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^[-a-z\.]+-(.+)/\1/'
1.1.8
13.4.5-84.1


If the conditions above DON'T apply I'd say you're
screwed unless you have a dictionary of your IDs,
since a parsing by lexicographic rules w/o language
knowledge is impossible.


cheers,
Tink

i92guboj 05-11-2009 04:27 AM

In other words, both atoms can contain dashes inside of them. Then you are going to have to filter by contents. For anything that's not basic string mangling you are going to have to use something more advanced, like awk or sed.

ghostdog74 05-11-2009 04:30 AM

depending on how each package is named, i am going to assume that the first number encounter and after are the version numbers
Code:

#!/usr/bin/env python
import sys
thestring = sys.argv[1]
for n,c in enumerate(thestring):
    if c.isdigit():
        ind=n
        break
artifactid,version = thestring[:ind-1], thestring[ind:]
print "artifactid: ",artifactid
print "version: " ,version

output
Code:

# ./test.py jslt-13.4.5-84.1
artifactid:  jslt
version:  13.4.5-84.1

# ./test.py commons-logging-1.1.8
artifactid:  commons-logging
version:  1.1.8


crimson08 05-11-2009 04:34 AM

Thanks sirs!!! replies are all appreciated. thanks for all the ideas... im just making the code more dynamic...

crimson08 05-11-2009 04:39 AM

Quote:

Originally Posted by Tinkster (Post 3536808)
Well ... as long as the "word" part of your artifactid
doesn't contain numbers, and the versions have no alpha
components it's still trivial.

Code:

[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-.+/\1/'
commons-logging
jslt
[tink:~]$ echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^[-a-z\.]+-(.+)/\1/'
1.1.8
13.4.5-84.1


If the conditions above DON'T apply I'd say you're
screwed unless you have a dictionary of your IDs,
since a parsing by lexicographic rules w/o language
knowledge is impossible.


cheers,
Tink


Sir, one last question. how about getting the version from the given artifact? what should i need to configure with the reg ex?

Tinkster 05-11-2009 04:44 AM

Code:

echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-(.+)/\1 \2/'|while read arti vers;do echo $arti;echo $vers ;echo "";done
commons-logging
1.1.8

jslt
13.4.5-84.1

Like that?

crimson08 05-11-2009 04:59 AM

Quote:

Originally Posted by Tinkster (Post 3536832)
Code:

echo -e "commons-logging-1.1.8\njslt-13.4.5-84.1" | sed -r 's/^([-a-z\.]+)-(.+)/\1 \2/'|while read arti vers;do echo $arti;echo $vers ;echo "";done
commons-logging
1.1.8

jslt
13.4.5-84.1

Like that?

sir,
is it possible to use this without the while loop?
the output should be like this:

artifactId=commons-logging
version=1.1.8


All times are GMT -5. The time now is 07:27 PM.