LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-26-2014, 01:59 PM   #1
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Rep: Reputation: Disabled
head and tail dont seem to work


Hi,

I have a 6G file.
Code:
2.6.32-400.21.1.el5uek #1 SMP Wed Feb 20 01:35:01 PST 2013 x86_64 x86_64 x86_64 GNU/Linux

file insert.txt 
insert.txt: UTF-8 Unicode English text, with very long lines, with CR, LF line terminators
Code:
head -10 insert.txt
does not show 10 lines, instead it shows the whole file.

Your thoughts please
Thank you.
 
Old 03-26-2014, 02:12 PM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,087
Blog Entries: 10

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
The proper syntax is:
Code:
head -n 10 insert.txt
What complaint do you have with tail, the same? Also use the -n argument, this is also the equivalent of --lines=10; I prefer the briefer forms.
 
Old 03-26-2014, 02:17 PM   #3
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
That does not seem to work either.
head command will be sufficient if it works, no complaints/need of tail
 
Old 03-26-2014, 02:25 PM   #4
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,087
Blog Entries: 10

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
You have something crazy like Busybox which is giving you a lousy version of head and tail?

Code:
$ which head
/usr/bin/head
$ ls -l /usr/bin/head
-rwxr-xr-x 1 root root 42600 2010-09-21 14:33 /usr/bin/head
$ head --version
head (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David MacKenzie and Jim Meyering.
If head is some symbolic link pointing to busybox then that's your problem and you can try to install the real head command. If you have the real command, see what version it is. I just checked on a file longer than 10 lines and the command form I suggested:
Code:
head -n 10 test-file.c
Works perfectly fine.
 
Old 03-26-2014, 02:47 PM   #5
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
Code:
head -3 marketing_campaign.ctl 
LOAD DATA
CHARACTERSET UTF16
INFILE '/u02/Elq/Control_data.txt'
The head command is fine as long as it is used with any other file.

Code:
head (GNU coreutils) 5.97
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by David MacKenzie and Jim Meyering.
This has something to do with the unusual-ness of the file.
Does
Code:
head
work with control M as a line end. Thats what I see when I do a 'cat -v'
It is a Mac OS-X file if that helps.

Last edited by oraenthu@live.com; 03-26-2014 at 02:56 PM.
 
Old 03-26-2014, 02:54 PM   #6
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,087
Blog Entries: 10

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
Hmm ... yeah. Dos2unix that. Copy it and try a conversion if you're concerned about corruption or loss of the file. Not sure, but it's something to consider. Lousy DOS file types!
 
Old 03-26-2014, 03:06 PM   #7
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
Code:
dos2unix insert.txt updated.txt
dos2unix: converting file insert.txt to UNIX format ...
dos2unix: converting file updated.txt to UNIX format ...
dos2unix: problems converting file updated.txt

dos2unix command almost never has been usable to me, as I probably have never encountered the niche purpose it serves.

The last time I got a file with weird @s and @A control characters I used iconv to convert it from utf16 to utf8

This is not working. Head command is not working.

Last edited by oraenthu@live.com; 03-26-2014 at 03:10 PM.
 
Old 03-26-2014, 03:12 PM   #8
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,602

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
That almost seems like what can happen converting a Microsoft word formatted file to text.

Only paragraphs have line termination... And even then, it is possible to get the entire file as a single line...
 
Old 03-26-2014, 03:25 PM   #9
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,087
Blog Entries: 10

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
Any chance you use gnuemacs? Because when in there I can see the ^M characters, I have to copy those because they're un-type-able or I don't know the key sequence to match, and if I set mark and move using right arrow, the two character sequence of CARAT-M is traversed via a single right arrow. Either case, when in emacs, I can select one of those, copy it and then perform a global replace-string command pasting my copied text in the first query where it wants to know what to replace and then just nothing for the second query where it wants to know what to replace it with.

Unfortunately it's not much of a bother to me. I run into it when someone edits a configuration file (so it's small, 20 lines or so) using one of Wordpad or Notepad; I forget which one causes the changes, and then saves it. However my parse code was designed to contend with the two types of line terminators, so I don't bother to fix; however I may if I'm viewing that file in emacs and the end of lines are a nuisance to me.

Sounds like a big file, so gnuemacs will likely gripe and warn you about the size, but ultimately let you open it. I don't know if gedit shows the different line terminators.
 
Old 03-26-2014, 03:26 PM   #10
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
Refer below post please

Last edited by oraenthu@live.com; 03-26-2014 at 03:31 PM.
 
Old 03-26-2014, 03:28 PM   #11
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
Here is the actual file excerpt
Code:
edited out
This file has only ^M control characters that I want to get rid off.
I dont mind their presence if they let 'head' work

Last edited by oraenthu@live.com; 03-26-2014 at 04:06 PM.
 
Old 03-26-2014, 03:30 PM   #12
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,258

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Quote:
Originally Posted by oraenthu@live.com View Post
dos2unix command almost never has been usable to me, as I probably have never encountered the niche purpose it serves.
The "niche" purpose it serves is to convert from DOS line terminators to UNIX line terminators. If you never deal with Windows, or text files from Windows users, then you likely have never needed it. Personally, I use it all the time, and have never encountered an error with it. It does what it's supposed to do very quickly and easily. Your file must not use standard DOS line terminators either, since dos2unix failed. What is the output of the "file" command?

Last edited by suicidaleggroll; 03-26-2014 at 03:31 PM.
 
Old 03-26-2014, 03:31 PM   #13
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,087
Blog Entries: 10

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
Same recommendation which is to figure a way to edit and remove the ^M sequences.

Recommend you edit your post and remove that quoted file section, it has like phone numbers, emails, names, contact info in it.
 
Old 03-26-2014, 03:40 PM   #14
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,774
Blog Entries: 1

Rep: Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339
You can use head/tail with the -r/-R parameter to show raw control characters.

edit: well, there used to be a -r switch anyway. Meant raw-characters.

edit: workaround...

Code:
cat -v somefile | head 

or

cat -v somefile | tail

Last edited by szboardstretcher; 03-26-2014 at 03:45 PM.
 
1 members found this post helpful.
Old 03-26-2014, 03:53 PM   #15
oraenthu@live.com
LQ Newbie
 
Registered: Sep 2012
Posts: 26

Original Poster
Rep: Reputation: Disabled
The emails are changed and numbers also are changed.
The _changed in each email is the one I added after manually removing random number of letters.
Same with phone numbers.
The output of "file"command is in the first post.
pasting it here again
insert.txt: UTF-8 Unicode English text, with very long lines, with CR, LF line terminators
I have come up with cat -v filename > file2name
Not a utility but a workaround that will work for now.
The output looks okay though.
"wc -l" is the same for the output file and input file.
Code:
mac2unix
helps

Last edited by oraenthu@live.com; 03-27-2014 at 01:41 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to read 2nd to last line using head or tail command techie_san778 Linux - Server 7 09-29-2013 09:09 AM
LXer: Use head and tail to read text files LXer Syndicated Linux News 0 09-07-2012 02:40 AM
[SOLVED] Tail/head JJ Linux Linux - Newbie 4 02-11-2011 11:06 AM
more efficient {min,max} than "sort | {head,tail} -1"? magicbronson Linux - Software 17 07-04-2009 01:26 PM
Things dont work when you dont understand withoutaclue Linux - Newbie 3 03-12-2003 10:51 AM


All times are GMT -5. The time now is 06:34 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration