LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Question about operating files (https://www.linuxquestions.org/questions/linux-newbie-8/question-about-operating-files-793222/)

Jason7449 03-04-2010 01:28 PM

Question about operating files
 
Hi guys, I am totally new to linux, and got a question with shell:

if there's a tab-delimited file under /usr/desktop, how can I determine the number of rows and columns of the file in shell?

And, if told the the 3rd column of the file contains only numerical values and all values in the 5th column are unique, how can I verify these in shell?

Thank you very much!!!

rweaver 03-04-2010 01:47 PM

You can use wc (wordcount) to get some of the information, for example rows (this would count the header row if there is one):

Code:

wc -l /usr/desktop/filename
If you want to count cols then you can use some simple sed or perl to translate the tabs on a single line to line returns then count the lines...

Code:

tail -1 /usr/desktop/filename | tr '\t' '\n' | wc -l
Edit: (interestingly this will give the correct # of columns, because the end of line gets counted where normally you'd have to add one if it didn't)

Code:

for i in $(cat /usr/desktop/filename | cut -f3 -d'  ' | grep -v "\D"); do grep -i $i /usr/desktop/filename; done
(there is a ctrl-v[tab] between the '') Will show you all lines where field 3 is not numeric.


Code:

cat /usr/desktop/filename | cut -f5 -d'  ' | uniq -c | grep -v "\w*1"
(there is a ctrl-v[tab] between the '') Will show you all non-unique fields there are in col5 and how many occurrences there are... you could toss it into a for loops and extract the full lines if you wanted too.

Also keep in mind you could perform most of these operations with perl, awk, sed, etc. easily also.

Jason7449 03-04-2010 02:07 PM

Thank you very much for your help!!! Learning!

Jason7449 03-05-2010 11:56 AM

Quote:

Originally Posted by rweaver (Post 3885897)
You can use wc (wordcount) to get some of the information, for example rows (this would count the header row if there is one):

Code:

wc -l /usr/desktop/filename
If you want to count cols then you can use some simple sed or perl to translate the tabs on a single line to line returns then count the lines...

Code:

tail -1 /usr/desktop/filename | tr '\t' '\n' | wc -l
Edit: (interestingly this will give the correct # of columns, because the end of line gets counted where normally you'd have to add one if it didn't)

Code:

for i in $(cat /usr/desktop/filename | cut -f3 -d'  ' | grep -v "\D"); do grep -i $i /usr/desktop/filename; done
(there is a ctrl-v[tab] between the '') Will show you all lines where field 3 is not numeric.


Code:

cat /usr/desktop/filename | cut -f5 -d'  ' | uniq -c | grep -v "\w*1"
(there is a ctrl-v[tab] between the '') Will show you all non-unique fields there are in col5 and how many occurrences there are... you could toss it into a for loops and extract the full lines if you wanted too.

Also keep in mind you could perform most of these operations with perl, awk, sed, etc. easily also.


Hi, maybe I am doing something incorrectly, but I these codes are not working as expected:

The first one:
Code:

wc -l /usr/desktop/filename
It always gives a 0 but not the number of rows.

The Second:
Code:

tail -1 /usr/desktop/filename | tr '\t' '\n' | wc -l
It gives the number of rows, which I thought it was for the number of columns.

The third:
Code:

for i in $(cat /usr/desktop/filename | cut -f3 -d'  ' | grep -v "\D"); do grep -i $i /usr/desktop/filename; done
This only print the last row repeatedly, but does not show the rows where the 3rd colums is not numeric.

The last one:
Code:

cat /usr/desktop/filename | cut -f5 -d'  ' | uniq -c | grep -v "\w*1"
This does not show anything after I click enter...

I am running this in the Mac Terminal, could this be the problem? Does this only work in linux?

Thank you sooooooo much!!!

schneidz 03-05-2010 01:48 PM

my experience is that mac doesnt have all the default command line utilities you would normally find in a posix system.
can you please post the output of:
Code:

which wc tail tr grep uniq cut cat
assuming which exists on mac.

also post the contents of the file and the exact error message you are receiving.

rweaver 03-05-2010 02:47 PM

You are replacing /usr/desktop/filename with the full path and name of the file you're attempting to examine correct?

On a mac the syntax may be slightly different, although I've not got one handy to test on at the moment, for the first one for example tyr:

Code:

cat /usr/desktop/filename | wc -l
As the previous poster stated paste the exact command you're using and the errors its returning along with the returned info from the which command. The mac commands tend towards being more bsd like than linux like so some of those returns aren't going to probably work thus causing of commands the chain to fail.

rweaver 03-05-2010 02:56 PM

Quote:

Originally Posted by schneidz (Post 3887312)
can you please post the output of:
Code:

which wc tail tr grep uniq cut cat
assuming which exists on mac.

Which does exist.

aszurom 03-05-2010 03:38 PM

Are you hitting CTRL+V then TAB, or CMD+V ? On osx we're so wired to replace CTRL with CMD that we do it subconsciously sometimes

CTRL+V then TAB will insert a literal [TAB] keypress into the field. So if you're cutting and pasting his code into your terminal, you have to edit those ' ' spots to have a literal [TAB] keystroke inside of them.

Tinkster 03-05-2010 03:40 PM

My understanding of mac text-files is that it uses yet another
method than Linux/Unix or DOS does for line endings.

Code:

Linux = \LF
DOS  = \CR\LF
Mac  = \CR

Which might explain that Unix tools only see one line when you
believe there's many.

Maybe try
Code:

tr '\r' '\n' < /usr/desktop/filename | wc -l

Cheers,
Tink

MTK358 03-05-2010 04:30 PM

I thought that Mac OS X uses Linux (Unix) newlines, because it is Unix based.

rweaver 03-05-2010 04:47 PM

In this case it really doesn't matter, it's translating tabs to newlines. not dos2unix or mac2dos or whatever.

frieza 03-05-2010 06:24 PM

Quote:

Originally Posted by rweaver (Post 3887549)
In this case it really doesn't matter, it's translating tabs to newlines. not dos2unix or mac2dos or whatever.

osX yes but not pre os/x

mac osX uses a bsd mach kernel which was originally the base of nextstep/openstep os which was developed by steve jobs after he left apple then incorporated into mac osX when apple bout out next and jobbs became part of apple


and it is possible that even then that apple did someething custom

Jason7449 03-05-2010 10:59 PM

Quote:

Originally Posted by schneidz (Post 3887312)
my experience is that mac doesnt have all the default command line utilities you would normally find in a posix system.
can you please post the output of:
Code:

which wc tail tr grep uniq cut cat
assuming which exists on mac.

also post the contents of the file and the exact error message you are receiving.

Here are the outputs:
For the code
Code:

which wc tail tr grep uniq cut cat
the out put is
Code:

$ which wc tail tr grep uniq cut cat
/usr/bin/wc
/usr/bin/tail
/usr/bin/tr
/usr/bin/grep
/usr/bin/uniq
/usr/bin/cut
/bin/cat

The file I used is just a simple and non-sense tab-delimited file I made up, here it is:
Code:

1        11        a        I        ab
2        22        b        II        bc
3        33        c        III        cd
4        33        d        IV        de
5        55        e        V        ef
6        66        f        VI        fg
7        77        g        VII        gh
8        88        h        VIII        hi

Out put for the first code:
Code:

0 /users/**/desktop/test.txt
, where ** is my user name.

2nd:
Code:

32
3rd:
Code:

8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi
8        88        h        VIII    hi

4th code shows nothing.

schneidz 03-05-2010 11:53 PM

most of these work for me:
Code:

[liveuser@localhost ~]$ uname -a -m -p
Linux localhost.localdomain 2.6.29.4-167.fc11.x86_64 #1 SMP Wed May 27 17:27:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[liveuser@localhost ~]$ cat test.txt
1        11        a        I        ab
2        22        b        II        bc
3        33        c        III        cd
4        33        d        IV        de
5        55        e        V        ef
6        66        f        VI        fg
7        77        g        VII        gh
8        88        h        VIII        hi
[liveuser@localhost ~]$ wc -l test.txt
8 test.txt
[liveuser@localhost ~]$ tail -1 test.txt | tr '\t' '\n' | wc -l
5
[liveuser@localhost ~]$ for i in $(cat test.txt | cut -f3 -d'  ' | grep -v "\D"); do grep -i $i test.txt; done
cut: the delimiter must be a single character
Try `cut --help' for more information.
[liveuser@localhost ~]$ cat test.txt | cut -f5 -d'  ' | uniq -c | grep -v "\w*1"
cut: the delimiter must be a single character
Try `cut --help' for more information.

is this homework ?; if your instructor knows you are using a mac maybe they know the correct syntax for those commands?
not sure what the last 2 commands are supposed to do; as for the original post:
Code:

[liveuser@localhost ~]$ wc -l test.txt
8 test.txt
[liveuser@localhost ~]$ awk '{print "current record number = " NR " : number of feilds = " NF}' test.txt
current record number = 1 : number of feilds = 5
current record number = 2 : number of feilds = 5
current record number = 3 : number of feilds = 5
current record number = 4 : number of feilds = 5
current record number = 5 : number of feilds = 5
current record number = 6 : number of feilds = 5
current record number = 7 : number of feilds = 5
current record number = 8 : number of feilds = 5

man awk for more info (this seems like a decent page: http://wolfram.schneider.org/bsd/7th...awk/awk-1.html ).
for identifying numbers:
man test
google(regex)
and for unique columns:
man uniq

you will need to know some basic bash to assemble it all together.


All times are GMT -5. The time now is 12:29 PM.