how to find unique characters within each column in a txt.file in linux ?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
> please ignore the first column
Not possible. Let's assume this:
Code:
080 001 100 020
002 111 025
003
Well, pick up a programming language (script or otherwise, but 64-bit capable), and do the following:
1. read everything into memory; arrange data column-wize
2. for every column: sort and drop duplicates
3. create the output
THank you, But how? i am just beginner. I do not know how. I used this command in linux :
awk < input.txt '{print $1}' | sort | uniq > ouput.txt
but it gave me the answer only for the first column. I am wondering how to change this command to have the answers for all columns at the same time.
can you guid me please?
and for the previous example for the first column I only would have had one row with 080 .
Would you be able to explain how you got this output?
You have moved 003 from the 4th row up to the 3rd
Left a gap in column 2 / row 3 which you have not done previously
And left 2 gaps in the 4th row
You have previously mentioned that not all rows may have the same number of columns, but your now saying you also do not wish to lose any columns ... is this correct?
Or in another way, if a row starts with 4 columns it will always have 4 columns but some may now be empty?
You say that you would have only one row with 080 ... so are you now saying that the first column will never be repeated?
Are there other rows that this may also occur with?
as you can see, this is not an easy problem and is only made harder with the more information you omit.
Would you be able to explain how you got this output?
You have moved 003 from the 4th row up to the 3rd
Left a gap in column 2 / row 3 which you have not done previously
And left 2 gaps in the 4th row
You have previously mentioned that not all rows may have the same number of columns, but your now saying you also do not wish to lose any columns ... is this correct?
Or in another way, if a row starts with 4 columns it will always have 4 columns but some may now be empty?
You say that you would have only one row with 080 ... so are you now saying that the first column will never be repeated?
Are there other rows that this may also occur with?
as you can see, this is not an easy problem and is only made harder with the more information you omit.
I just want to extract uniqe values within each column which means if a valuse repeat 3 more that one time in a columns I want to see it only one time in my new data.
In column 1 The unique values are:
123
232
In column 2 The unique values are:
000
123
In column 3 The unique values are:
111
123
Daniel B. Martin
Thank you! but i want my new data file have the same number of columns like the original one (exactly same structure but without duplication). my mean is I want it to be like this:
new file:
Code:
123 000 111
232 123 123
I can also send you a part of my original data if you would like to see.
awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;}
END{for (j=1;j<=NF;j++) print b[j]}' $InFile \
|awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j}
END {j=1; while (j in a) {print a[j];j++}}' >$OutFile
... produced this OutFile ...
Code:
123 000 111
232 123 123
Daniel B. Martin
for this code i got this error when I tried to run it in linux:
awk: fatal: cannot open file ` ' for reading (No such file or directory)
while I am in correct directory and I changed my input file name into hap.txt
could you help me by solving the problem?
... but that won't work on your computer because I don't know the names of your input and output files.
My preference is to have the program and data files in the same directory. Many people follow a different convention. Regardless of this distinction the awk code should work if you correctly identify the files.
On my machine the program is named dbm1484.bin; the InFile is dbm1484inp.txt; the OutFile is dbm1484out.txt.
Suggestion: get help from someone at your location.
Daniel B. Martin
Last edited by danielbmartin; 09-08-2015 at 05:25 PM.
Reason: Cosmetic improvements
... but that won't work on your computer because I don't know the names of your input and output files.
My preference is to have the program and data files in the same directory. Many people follow a different convention. Regardless of this distinction the awk code should work if you correctly identify the files.
On my machine the program is named dbm1484.bin; the InFile is dbm1484inp.txt; the OutFile is dbm1484out.txt.
Suggestion: get help from someone at your location.
Daniel B. Martin
My input file is hap.txt and my output file is uniqhap.txt I changed them to these but I get this error:
skarimi@signal[19:20][~]$ cd mkhap
skarimi@signal[19:26][~/mkhap]$ awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;} END{for (j=1;j<=NF;j++) print b[j]}' hap.txt \ |awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j} END {j=1; while (j in a) {print a[j];j++}}' > uniqhap.txt
awk: cmd. line:1: fatal: cannot open file ` ' for reading (No such file or directory)
skarimi@signal[19:27][~/mkhap]$
should I add a program to my file? is this the problem? can you please guide me?
This means you have not correctly identified the InFile, so the awk says "there is no InFile so I can't execute."
Again, you need help from someone at your location.
Daniel B. Martin
but it works based on the previous command that you wrote. and it does not give any error. look:
skarimi@signal[19:41][~/mkhap]$ awk '{for (j=1;j<=NF;j++) !b[j","$j]++?b[j]=b[j]"\n"$j:0;} END{for (j=1;j<=NF;j++) print "\nIn column",j,
> "The unique values are:"b[j]}' hap.txt > uniq.txt
skarimi@signal[19:46][~/mkhap]$
it runs without any error and I can see the result. but it does not work base on the last command you sent to me. are you sure nothin is wrong with the command? I really appreciat your help!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.