LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-30-2017, 04:29 AM   #1
DavinaP
LQ Newbie
 
Registered: Jun 2017
Posts: 2

Rep: Reputation: Disabled
Wink help with extracting entries from a multiple entry columns in a file


Hi All,
I have a tab separated file where the entries from the 2nd column onwards are multiple and separated by a ;.
I would like to get only the 1st entry of each column. I have 8000+ columns and not giving all here.
Here is a sample:

rs1
AG;0.79780;0.132;0.204;487;923
GG;0.79780;0.115;0.161;213;457
AG;0.79780;0.095;0.152;375;835

I would like to have

rs1
AG
GG
AG

Appreciate any help.
Thank you,
Davina

Last edited by DavinaP; 06-30-2017 at 04:36 AM.
 
Old 06-30-2017, 04:39 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264

Rep: Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163
Trivial in awk - nominate ";" as field separator.
 
Old 06-30-2017, 04:40 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
Blog Entries: 4

Rep: Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861
That should be pretty easy with awk or perl, and there are several ways to approach the problem in either. Which one are you trying and can you show how far you have gotten?
 
Old 06-30-2017, 04:43 AM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
Blog Entries: 4

Rep: Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861
Quote:
Originally Posted by syg00 View Post
Trivial in awk - nominate ";" as field separator.
The sample above has only one column. The other columns are apparently separated by tabs.

So I'd keep tabs as the separator, but use gsub() to zap everything starting with the first semicolon in each field. But is there a way to do that or otherwise get the same result without needing a loop to go through the fields in each row?
 
Old 06-30-2017, 04:45 AM   #5
DavinaP
LQ Newbie
 
Registered: Jun 2017
Posts: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
That should be pretty easy with awk or perl, and there are several ways to approach the problem in either. Which one are you trying and can you show how far you have gotten?
Thanks, I have not gotten anywhere much except trying this command:
tr -s '; ' '\t' < "file name".
However that splits each column into multiple columns at the points where ; occurs.
I just want the first entries of each column (remember I have thousands of columns).

Last edited by DavinaP; 06-30-2017 at 04:46 AM.
 
Old 06-30-2017, 04:51 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264

Rep: Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163
Oops - didn't read that too well did I. Sorry about that. I'll be back.
 
Old 06-30-2017, 04:52 AM   #7
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
Blog Entries: 4

Rep: Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861
Ok. Try escalating to awk then. Be sure to see the manual page.

Code:
man awk
But that is a reference (actually the reference) only and though you should use it a lot, it might not be the best place to start with awk. So also see this site:

http://www.grymoire.com/Unix/Awk.html

It is a very thorough introduction.
 
1 members found this post helpful.
Old 06-30-2017, 05:45 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264

Rep: Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163
Quote:
Originally Posted by Turbocapitalist View Post
So I'd keep tabs as the separator, but use gsub() to zap everything starting with the first semicolon in each field. But is there a way to do that or otherwise get the same result without needing a loop to go through the fields in each row?
gensub maybe - that way you can use back-references.
Personally I'd use sed - same/similar regex.
 
Old 06-30-2017, 06:31 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,264

Rep: Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163Reputation: 4163
How's your regex fu Davina ?.
Your data (for this discussion) can be defined as "a bunch of non-semicolon characters (that you want to keep), followed by a bunch of non-whitespace characters (that you want to remove)". Define that in regex, and make the substitution global.
 
Old 07-01-2017, 12:41 AM   #10
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,565
Blog Entries: 4

Rep: Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861Reputation: 3861
Quote:
Originally Posted by syg00 View Post
Personally I'd use sed - same/similar regex.
Yes. If one thinks about the lines as a single unit, then sed is a good idea. I had been thinking about the line as a record with fields and thus gravitated to awk. Either will work. The language sed is a little terse while awk is a little more complicated, though.

DavinaP, the substitution command in sed is what to look at:

Code:
sed -e 's/old/new/g;' < oldfile.txt > newfile.txt
The greater than > and less than < signs are IO redirects in the shell.
 
Old 07-01-2017, 07:40 AM   #11
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
I do not use awk but try this,
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\;  '{print $1}' fileDirLit
rs1
AG
GG
AG
http://cs.canisius.edu/ONLINESTUFF/P...K/awk.examples

or to keep it handy
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\;  '{print $1}' fileDirLit > results       
userx%slackwhere ⚡ testDIR ⚡> cat results
rs1
AG
GG
AG
to skip that first line
Code:
userx%slackwhere ⚡ testDIR ⚡> awk -F\; 'NR > 1 {print $1}' fileDirLit 
AG
GG
AG
GG
AG

Last edited by BW-userx; 07-01-2017 at 08:07 AM.
 
2 members found this post helpful.
Old 07-01-2017, 01:02 PM   #12
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
You can use:
Code:
$ tr ';' \t < file > file2
cat file2 | awk '{print $1}' > file3
That is untested, but I think it will work. What you're doing is changing the semicolons to tabs, which are white space, and then selecting the column before the first white space.
 
Old 07-03-2017, 08:05 AM   #13
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,460

Rep: Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796Reputation: 2796
My feeling is that the example given has been confusing.
If the data format is tab separated columns with semicolon delimiters within columns, such as
Code:
AG;0.79780;0.132;0.204;487;923	AG;0.79780;0.132;0.204;487;923	AG;0.79780;0.132;0.204;487;923
GG;0.79780;0.115;0.161;213;457	GG;0.79780;0.115;0.161;213;457	GG;0.79780;0.115;0.161;213;457
AG;0.79780;0.095;0.152;375;835	AG;0.79780;0.095;0.152;375;835	AG;0.79780;0.095;0.152;375;835
then I suggest using awk
Code:
 awk -F ";[^\t]+" '{for (i=1;i<NF;i++){printf"%s", $i}; printf"\n"}' <inputfile>

Last edited by allend; 07-03-2017 at 08:14 AM.
 
1 members found this post helpful.
Old 07-03-2017, 08:11 AM   #14
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
in OP post he says all he wants is the very first column which is all of the AG GG AG etc..
which this actually gives him
Code:
awk -F\;  '{print $1}' fileToLooKAt > results

Last edited by BW-userx; 07-03-2017 at 08:12 AM.
 
Old 07-03-2017, 10:32 AM   #15
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
But the OP doesn't show any tabs, so $1 is the whole row.

Sorry, didn't see the "-F".

Last edited by AwesomeMachine; 07-03-2017 at 10:34 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
merge columns from multiple files in a directory based on match of two columns prasanthi yanamala Linux - Newbie 2 11-12-2015 10:11 AM
Extracting rows from one file based on column entries in another file mphillips67 Linux - Newbie 3 05-06-2014 06:26 PM
extracting columns from multiple files with awk orcaja Linux - Newbie 7 02-14-2012 10:24 PM
[SOLVED] Bash script to read csv file with multiple length columns japena Linux - Newbie 17 07-27-2011 01:47 PM
[SOLVED] Extracting columns ben1173 Linux - General 5 10-18-2010 10:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration