LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-04-2010, 10:01 AM   #16
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037

Quote:
Originally Posted by catkin View Post
grail suggested it in the second post of the thread.
Why so he did. In fact, it looks like you could even skip that step entirely and go straight to:

Code:
if [[ ${lineoftext:0:1} == "0" ]]; then
   ...do all the other stuff...
fi
I see other things that can be simplified too. There are entirely way too many sed statements, for one thing. I'm sure they could be condensed to extract the desired strings in a single step. And what's the use of "category=2"? It seems to have no purpose at all in the script.
 
Old 05-05-2010, 03:55 AM   #17
LUB997
Member
 
Registered: Jul 2003
Distribution: openSUSE Linux, Apple Darwin UNIX
Posts: 66

Original Poster
Rep: Reputation: 15
Yes, those carriage returns sure are annoying... At least I'll certainly remember that in the future when converting files over from Windows to Linux. My goal is not to have to, and to only use Linux programs and just not use Windows, but unfortunately, there are still about 1 or 2 programs that don't have good Linux equivalents, such as Family Tree Maker, AnyDVD, and Netflix Watch Now. I think that will change though, when you look at all the progress Linux has made over even just the past few years. 5 years ago I dreamed of being able to eliminate Windows and only use Linux, but didn't see it as a reality; today Linux has come so far that I run nothing but Linux on all my computers and just keep Windows XP safely contained in a virtual machine where it can't harm anything to run 1 or 2 programs. Who knows how long it will take before I can even go ahead and delete my virtual machine, but until then, you can bet I'll be remembering to check for carriage returns after this.

Thanks to everyone for all suggestions! I really do appreciate also the comments people made on simplifying code. Like I said before, although I am experienced with C#, I am definitely not experienced with shell scripting, so things that might make sense to me in C# are not common knowledge to me yet in shell scripting, and I have to just kind of go with the examples I see on the internet to learn how to do it as I go, so any comments about how to do it more simply are very useful. The code has evolved quite a lot in the past few days and is now actually putting out useful output, though it still has a ways go to in order to get it to do exactly what I want it to do, and I have really been noticing how it would be nice to simplify it, so I will keep all of your suggestions in mind.

I believe David asked what is the reason for category 2... David, you are right; category 2 did not have a purpose YET in the code as it was posted previously. However, I was looking ahead, and knew that it was going to have a purpose, which is why it was there. If you look at the example of the file format I am dealing with, you might notice that each line begins with 0, 1, 2, or very rarely 3. I am never going to use 3 for my purposes, so 3 is not there. The way the file format works (in general) is that category 0 lines represent something new, such as a new individual or a new family. Category 1 lines identify some type of event about that something new identified by category 0, such as category 0's birth, death, etc. Then, category 2 lines tell even more specific details about that category 1 event, such as the event's place or date. I will definitely need category 2 lines in order to read in where people were in each census year, since to sort out who was in a given location in what year, I have to read that information in in the first place, and that will use category 2 lines. Not sure who came up with such an odd file format, but I like it because it wasn't too difficult to figure out how it worked. Anyone really curious about it about how family tree files work, if you take a look, at each individual, you might also notice that they have a line that says FAMC. FAMC identifies the family that the individual was born from. Some individuals also have one or more lines with FAMS. FAMS means that the individual is the father or mother of that family unit. That means an individual can have multiple FAMS lines, but only one FAMC line. Since I've figured out how all this works, maybe at some point I'll make a descent Linux family tree program, but it would have to use Mono since I'm best at C#. I guess that's ok though, now that Mono has come as far as it has. It does almost everything the .NET environment on Windows does at this point. There are already some genealogy programs for Linux, but none of them are anywhere near as professionally done as Family Tree Maker. The only one that comes close is Gramps, and I've used it and didn't like the layout of the GUI at all. If I do make a program like that, then we just need AnyDVD and Netflix Watch Now on Linux, and I'll be all set to delete my Windows virtual machine. Lots of people say those things won't happen, but 5 years ago lots of people said we wouldn't be doing all of the things we are doing on Linux today. Anyway, it's coming along well now, and thanks for all the suggestions that everyone gave!
 
Old 05-05-2010, 04:40 AM   #18
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Glad you found the replies useful and thank you for sharing what the code is for
 
Old 05-05-2010, 07:24 AM   #19
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yes, thank you for the explanation. I thought you might be planning ahead for further additions.

I've been trying to figure out exactly what this section is supposed to do, and how to simplify it. As best as I can tell, you're trying to extract the number from the line, correct?

Code:
lineoftext=`echo $lineoftext | sed "s/0\ \@I// g"`;
lineoftext=`echo $lineoftext | sed "s/\@\ INDI\r// g"`;
lineoftext="$(echo $lineoftext | sed 's/0*//')";
individual=$lineoftext;
First of all, sed can apply multiple expressions at once using the -e option.
Code:
lineoftext=$(echo "$lineoftext"|sed -e 's/0\ \@I//' -e 's/\@\ INDI\r//' -e 's/0*//')
Note that the "g" command is unnecessary here, and the third expression appears to be superfluous, unless you want to strip off all leading zeroes.

A simpler version would be something like this:
Code:
lineoftext=$(echo "$lineoftext"|sed -r 's/.*@I([0-9]+)@.*/\1/')
But I've figured out how to strip everything down to just the number using only bash's built-in functions. Using built-ins over external commands usually improves efficiency.

Taking your sample text above, and converting the file to dos-encoding, I did this:
Code:
$ line=$(head -n1 file.txt)  #read the first line from the dos-encoded file.
$ cat -v <<<$line            #displays non-printing characters.
0 @I1039@ INDI^M

$ line="${line:0:$((${#line}-1))}"   #strip the last character (the cr) from the line.  This is the tricky part, and actually unnecessary, since you'll be stripping off the ending below anyway. ;)
$ cat -v <<<$line
0 @I1039@ INDI

$ line=${line#*@I}   #strip everything up to and including @I.
$ line=${line%%@*}   #strip everything from the @ to the end.
cat -v <<<$line
1039
If you can guarantee that the line will always have only the pattern above, and the desired part is always a number, you can make it even easier.
Code:
line=${line//[^0-9]}
line=${line:1:${#line}}  #or even just "line=${line#0}"

Last edited by David the H.; 05-05-2010 at 07:34 AM. Reason: tpyo
 
Old 05-05-2010, 08:35 AM   #20
petrus4
LQ Newbie
 
Registered: May 2010
Posts: 9

Rep: Reputation: 1
I don't consider the data format particularly well designed, myself; although it is workable.

If there's still interest, I will post my own solution to this problem, although it will likely be reasonably long, and focus initially on cleaning up the data format, so I've got something better to work with. I won't be using arrays, though.
 
Old 05-05-2010, 09:02 AM   #21
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
@OP - just thought I would mention that a good rule of thumb when I deal with M$ files is to run them through dos2unix to get rid of any nasties
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
passing variable from bash to perl in a bash script quadmore Programming 6 02-21-2011 04:11 AM
[SOLVED] Using a long Bash command including single quotes and pipes in a Bash script antcore Linux - General 9 07-22-2009 11:10 AM
[SOLVED] bash : getopts problem in bash script. angel115 Programming 2 03-02-2009 10:53 AM
Strange if statement behaviour when using bash/bash script freeindy Programming 7 08-04-2008 06:00 AM
Bash script to create bash script jag7720 Programming 10 09-10-2007 07:01 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration