Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
06-30-2017, 04:29 AM
|
#1
|
LQ Newbie
Registered: Jun 2017
Posts: 2
Rep:
|
help with extracting entries from a multiple entry columns in a file
Hi All,
I have a tab separated file where the entries from the 2nd column onwards are multiple and separated by a ;.
I would like to get only the 1st entry of each column. I have 8000+ columns and not giving all here.
Here is a sample:
rs1
AG;0.79780;0.132;0.204;487;923
GG;0.79780;0.115;0.161;213;457
AG;0.79780;0.095;0.152;375;835
I would like to have
rs1
AG
GG
AG
Appreciate any help.
Thank you,
Davina
Last edited by DavinaP; 06-30-2017 at 04:36 AM.
|
|
|
06-30-2017, 04:39 AM
|
#2
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264
|
Trivial in awk - nominate ";" as field separator.
|
|
|
06-30-2017, 04:40 AM
|
#3
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
|
That should be pretty easy with awk or perl, and there are several ways to approach the problem in either. Which one are you trying and can you show how far you have gotten?
|
|
|
06-30-2017, 04:43 AM
|
#4
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
|
Quote:
Originally Posted by syg00
Trivial in awk - nominate ";" as field separator.
|
The sample above has only one column. The other columns are apparently separated by tabs.
So I'd keep tabs as the separator, but use gsub() to zap everything starting with the first semicolon in each field. But is there a way to do that or otherwise get the same result without needing a loop to go through the fields in each row?
|
|
|
06-30-2017, 04:45 AM
|
#5
|
LQ Newbie
Registered: Jun 2017
Posts: 2
Original Poster
Rep:
|
Quote:
Originally Posted by Turbocapitalist
That should be pretty easy with awk or perl, and there are several ways to approach the problem in either. Which one are you trying and can you show how far you have gotten?
|
Thanks, I have not gotten anywhere much except trying this command:
tr -s '; ' '\t' < "file name".
However that splits each column into multiple columns at the points where ; occurs.
I just want the first entries of each column (remember I have thousands of columns).
Last edited by DavinaP; 06-30-2017 at 04:46 AM.
|
|
|
06-30-2017, 04:51 AM
|
#6
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264
|
Oops - didn't read that too well did I. Sorry about that. I'll be back.
|
|
|
06-30-2017, 04:52 AM
|
#7
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
|
Ok. Try escalating to awk then. Be sure to see the manual page.
But that is a reference (actually the reference) only and though you should use it a lot, it might not be the best place to start with awk. So also see this site:
http://www.grymoire.com/Unix/Awk.html
It is a very thorough introduction.
|
|
1 members found this post helpful.
|
06-30-2017, 05:45 AM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264
|
Quote:
Originally Posted by Turbocapitalist
So I'd keep tabs as the separator, but use gsub() to zap everything starting with the first semicolon in each field. But is there a way to do that or otherwise get the same result without needing a loop to go through the fields in each row?
|
gensub maybe - that way you can use back-references.
Personally I'd use sed - same/similar regex.
|
|
|
06-30-2017, 06:31 AM
|
#9
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264
|
How's your regex fu Davina ?.
Your data (for this discussion) can be defined as "a bunch of non-semicolon characters (that you want to keep), followed by a bunch of non-whitespace characters (that you want to remove)". Define that in regex, and make the substitution global.
|
|
|
07-01-2017, 12:41 AM
|
#10
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
|
Quote:
Originally Posted by syg00
Personally I'd use sed - same/similar regex.
|
Yes. If one thinks about the lines as a single unit, then sed is a good idea. I had been thinking about the line as a record with fields and thus gravitated to awk. Either will work. The language sed is a little terse while awk is a little more complicated, though.
DavinaP, the substitution command in sed is what to look at:
Code:
sed -e 's/old/new/g;' < oldfile.txt > newfile.txt
The greater than > and less than < signs are IO redirects in the shell.
|
|
|
07-01-2017, 07:40 AM
|
#11
|
LQ Guru
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342
|
I do not use awk but try this,
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\; '{print $1}' fileDirLit
rs1
AG
GG
AG
http://cs.canisius.edu/ONLINESTUFF/P...K/awk.examples
or to keep it handy
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\; '{print $1}' fileDirLit > results
userx%slackwhere ⚡ testDIR ⚡> cat results
rs1
AG
GG
AG
to skip that first line
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\; 'NR > 1 {print $1}' fileDirLit
AG
GG
AG
GG
AG
Last edited by BW-userx; 07-01-2017 at 08:07 AM.
|
|
2 members found this post helpful.
|
07-01-2017, 01:02 PM
|
#12
|
LQ Guru
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
|
You can use:
Code:
$ tr ';' \t < file > file2
cat file2 | awk '{print $1}' > file3
That is untested, but I think it will work. What you're doing is changing the semicolons to tabs, which are white space, and then selecting the column before the first white space.
|
|
|
07-03-2017, 08:05 AM
|
#13
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,460
|
My feeling is that the example given has been confusing.
If the data format is tab separated columns with semicolon delimiters within columns, such as
Code:
AG;0.79780;0.132;0.204;487;923 AG;0.79780;0.132;0.204;487;923 AG;0.79780;0.132;0.204;487;923
GG;0.79780;0.115;0.161;213;457 GG;0.79780;0.115;0.161;213;457 GG;0.79780;0.115;0.161;213;457
AG;0.79780;0.095;0.152;375;835 AG;0.79780;0.095;0.152;375;835 AG;0.79780;0.095;0.152;375;835
then I suggest using awk
Code:
awk -F ";[^\t]+" '{for (i=1;i<NF;i++){printf"%s", $i}; printf"\n"}' <inputfile>
Last edited by allend; 07-03-2017 at 08:14 AM.
|
|
1 members found this post helpful.
|
07-03-2017, 08:11 AM
|
#14
|
LQ Guru
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342
|
in OP post he says all he wants is the very first column which is all of the AG GG AG etc..
which this actually gives him
Code:
awk -F\; '{print $1}' fileToLooKAt > results
Last edited by BW-userx; 07-03-2017 at 08:12 AM.
|
|
|
07-03-2017, 10:32 AM
|
#15
|
LQ Guru
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
|
But the OP doesn't show any tabs, so $1 is the whole row.
Sorry, didn't see the "-F".
Last edited by AwesomeMachine; 07-03-2017 at 10:34 AM.
|
|
|
All times are GMT -5. The time now is 03:04 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|