LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-07-2019, 06:53 AM   #1
dani1234
LQ Newbie
 
Registered: Sep 2014
Posts: 7

Rep: Reputation: Disabled
read third string from second line of each file


Hi,
I have 100K files and want to read third and 4th string of each file and stored in a single file .

see sample below file content i want in the output file 0002 20190228

Code:
VUP gfppgdht
GHT HT 00002 20190228 20190227 100122 N 0
GHT CVD JYT 20190228 359 001 1
next file is so the outputfile only 00010 20190228 should come
Code:
VGT HT_TXTGHP
HTR HT 00010 20190228 20190227 100122 Y 
HTR BHY JYT 20190228 359 001 1
Like this for all 100K files output in a single file ..

AWK or SED can do?
 
Old 03-07-2019, 06:56 AM   #2
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Just a note that for both these examples, according to your code, the highlighted strings are the 5th and 6th strings in their respective files, not the 3rd and 4th.
 
Old 03-07-2019, 07:02 AM   #3
joe_2000
Senior Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 1,016

Rep: Reputation: 308Reputation: 308Reputation: 308Reputation: 308
Many ways would achieve this, but...

when you iterate over all files,
Code:
head -n 2
will give you the top two lines of a file.
Code:
tail -n 1
can be used to obtain the bottom line of the two.
Use cut on the result to get the third and fourth string.

@hydrurga: According to the title this is about the second line of each file.
 
1 members found this post helpful.
Old 03-07-2019, 07:09 AM   #4
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Quote:
Originally Posted by joe_2000 View Post
@hydrurga: According to the title this is about the second line of each file.
Ah, I didn't see the title (which appears not to give the complete story, anyway ). I was just going by the contents of the opening post. My bad. Thanks Joe.
 
Old 03-07-2019, 07:29 AM   #5
dani1234
LQ Newbie
 
Registered: Sep 2014
Posts: 7

Original Poster
Rep: Reputation: Disabled
I just pasted the header of the file , so how we can do by using AWK or SED and combining loop with 100K.
 
Old 03-07-2019, 07:32 AM   #6
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Quote:
Originally Posted by dani1234 View Post
I just pasted the header of the file , so how we can do by using AWK or SED and combining loop with 100K.
You may get someone who gives you the code straight out, but usually how it works on here is that you are asked to have a go at it yourself and then are helped to get it right. So, what have you tried so far? Can you simplify the problem, for example, by printing out every second line of each file to stdout? If you get that right then you can build on it.
 
1 members found this post helpful.
Old 03-07-2019, 07:33 AM   #7
joe_2000
Senior Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 1,016

Rep: Reputation: 308Reputation: 308Reputation: 308Reputation: 308
Quote:
Originally Posted by dani1234 View Post
I just pasted the header of the file , so how we can do by using AWK or SED and combining loop with 100K.
Why do you insist on using awk or sed?
Did you look at my previous post?
What have you tried for yourself?
 
2 members found this post helpful.
Old 03-07-2019, 08:08 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,784

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
you can use awk easily, you only need to check the line number and print 3rd and 4th values of all files passed (no sed required). But obviously you can do it in sed too.
 
2 members found this post helpful.
Old 03-07-2019, 08:15 AM   #9
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,879
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
@dani1234,

Given your span of time on the LQ site, I feel you should understand how LQ works and how best to pose a technical question. Asking for others to provide you with code is not the best way to proceed. Please review the site guidelines about how to ask questions and offer some additional background for your question, such as illustrating what you have tried already to solve your problem.
 
Old 03-07-2019, 03:59 PM   #10
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,132
Blog Entries: 6

Rep: Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826
Code:
var="VUP gfppgdht
GHT HT 00002 20190228 20190227 100122 N 0
GHT CVD JYT 20190228 359 001 1"

awk 'FNR==2{print $3, $4}' <<< "$var"

sed -sn 2p <<< "$var" | cut -d " " -f3,4

head -n2 <<<"$var" | tail -n1 | cut -d " " -f3,4
Code:
#array of file names
list=(file1 file2 file3)

#loop through them and do something
for i in "${list[@]}"; do
    awk 'FNR==2{print $3, $4}' "$i"
done
 
Old 03-08-2019, 12:59 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,784

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
Quote:
Originally Posted by teckk View Post
Code:
sed -sn 2p <<< "$var" | cut -d " " -f3,4

head -n2 <<<"$var" | tail -n1 | cut -d " " -f3,4
would be better to solve with a single awk or sed. No need to use pipes.
Quote:
Originally Posted by teckk View Post
Code:
#array of file names
list=(file1 file2 file3)

#loop through them and do something
for i in "${list[@]}"; do
    awk 'FNR==2{print $3, $4}' "$i"
done
actually the for loop is superfluous here:
Code:
awk 'script' filelist
should do the same. If we have no "command line too long" error.
 
Old 03-08-2019, 01:56 AM   #12
joe_2000
Senior Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 1,016

Rep: Reputation: 308Reputation: 308Reputation: 308Reputation: 308
Building on pan64's excellent comments:

If you are going to use a "for" loop, better do it this way:
Code:
for filename in ./*; do
    # do stuff such as
    echo $filename
done
The advantage being that it is simpler and doesn't break when filenames contain whitespaces.
If you don't have all files in one directory, use find.
 
Old 03-08-2019, 06:48 AM   #13
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,367

Rep: Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747Reputation: 2747
If you use a bash shell and have files more than one subdirectory deep, you can set the globstar shell option. When wrapped in parentheses, the commands are run in a subshell, so that the parent shell stays the same when the subshell exits.
Code:
(shopt -s globstar; for f in **/*; do awk 'FNR==2{print $3,$4;exit}' "$f"; done)
I prefer this construction as it will handle filenames containing spaces, as noted above.
I have added an exit statement to the awk for efficiency, which might be measurable for 100K of files.

Last edited by allend; 03-08-2019 at 06:50 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ text file line by line/each line to string/array guru11 Programming 5 12-29-2011 09:34 AM
C++ text file line by line/each line to string/array Dimitris Programming 15 03-11-2008 08:22 AM
linux scripting help needed read from file line by line exc commands each line read atokad Programming 4 12-26-2003 10:24 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:54 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration