LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-15-2018, 04:57 AM   #1
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Rep: Reputation: 1
Shell program how to find a string in a file line by line


Guys,


question please


I have in a long text File many lines and I want to loop through the file and find in the file a "string of text"



For example this line



<a href="http://website.com/happy.php?id=12345"><strong>aassasaSaaassajjas as sajsan asjasjjasas</strong></a>


I want to find: <a href="http://website.com/happy.php?id=



After it has been found, search in the same line for this <strong>



After they have been found the search should continue for other occurrences.



Any pointers on how to do that ??


Thx in advance
 
Old 10-15-2018, 05:01 AM   #2
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Have you investigated around the grep or sed commands?

Last edited by l0f4r0; 10-15-2018 at 05:04 AM. Reason: sed suggestion as well
 
Old 10-15-2018, 05:07 AM   #3
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Yep but it doesn't make sense to me !

I used it with | (pipe) but I also was struggling to get the " (chr(34) which are part of the string to search for

In VB.net I use

Code:
if instr(string to search, string to find)
but in BASH it doesn't seem to work

Not much of a BASH programmer I admit
 
Old 10-15-2018, 05:16 AM   #4
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Code:
FILE=$filename

while read line; do
	if echo $line | grep $startstring
then
	if echo $line | grep $andstring
	then
       echo $LINE
	fi
fi
done < $FILE
this shows lines with both $startstring and $andstring. No idea why

For my logic, it should only print lines that have BOTH startstring and andstring but my Logic must be completely wrong

Edit: Solved it I guess, put in -q in the first GREP

Last edited by iammike2; 10-15-2018 at 05:19 AM.
 
Old 10-15-2018, 05:20 AM   #5
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by iammike2 View Post
Yep but it doesn't make sense to me !
I used it with | (pipe) but I also was struggling to get the " (chr(34) which are part of the string to search for
In VB.net I use
Code:
if instr(string to search, string to find)
but in BASH it doesn't seem to work
If you want to do it with bash, I think you have to familiarize yourself with it before (syntax, use cases, basic commands...). There are many many tutos on ithe Internet, just try googling around.
Regarding grep specifically, you can have additional details on how it works (like most of commands) by entering:
Code:
man grep
 
Old 10-15-2018, 05:24 AM   #6
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Thanks @l0f4r0

But I think I already made a start. See Post #4
 
Old 10-15-2018, 05:24 AM   #7
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by iammike2 View Post
Code:
FILE=$filename

while read line; do
	if echo $line | grep $startstring
then
	if echo $line | grep $andstring
	then
       echo $LINE
	fi
fi
done < $FILE
You are complicating things
You can use grep straight from the command line without any loop like "while read line" as grep works line by line by design and what you are trying to achieve is not so complex to require any cursom script!
 
1 members found this post helpful.
Old 10-15-2018, 05:28 AM   #8
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Advice: resolve the problem step by step. First, search for "a href" strings or URLs syntax.
After that, you will search for "<strong>" strings
Post here your attempts and we'll try to help you finding your way.
 
1 members found this post helpful.
Old 10-15-2018, 05:33 AM   #9
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Thx !!

The reason why I use a loop is that it there a more occurrences and more strings to be found in the SAME file, so the script should run from the TOP to the Bottom of the file.

So, lets assume there are four occurrences where all the strings are found

After all those strings have been found, I need to clean up those lines.

But that is something for in the future

Thx again, really appreciated.


Edit: With clean up i Mean

Start is this: <a href="http://website.com/happy.php?id=12345"><strong>aassasaSaaassajjas as sajsan asjasjjasas</strong></a>

End result is this
1 - 12345
2 - aassasaSaaassajjas as sajsan asjasjjasas

Last edited by iammike2; 10-15-2018 at 05:35 AM.
 
Old 10-15-2018, 05:35 AM   #10
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 13,073

Rep: Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136Reputation: 4136
yes, it is a good start. I would suggest you to read some tutorials/examples on the net about usage of grep.
You will need something like this:
Code:
grep 'pattern1.*pattern2' filename
(yes, one single line would be sufficient)
 
1 members found this post helpful.
Old 10-15-2018, 05:35 AM   #11
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,166
Blog Entries: 3

Rep: Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061Reputation: 2061
Quote:
Originally Posted by iammike2 View Post
Any pointers on how to do that ??
Yes. In order to save yourself many hours or days of ineffectual efforts, please look into using an HTML parser. If the HTML is actually well-formed XML then you have a wider range of options and could use any of a good number of XML parsers. But if you are dealing with messy HTML, then my generic recommendation is perl 5 with either HTML::Parser or HTML::TokeParser from CPAN, or else run the document through HTML Tidy first to error correct it and convert it to XML.

However, there are many standalone applications which work just find with XPaths, such as xmlstarlet

Code:
xmlstarlet select -N x=http://www.w3.org/1999/xhtml \
        -t -m "//x:a[@href[contains(.,'happy.php')]]/x:strong" -c . -n x.xml \
        | sed -e 's/ xmlns="[^"]*"//'
See https://www.w3.org/TR/2011/WD-html5-...amespaces.html
 
1 members found this post helpful.
Old 10-15-2018, 05:39 AM   #12
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by Turbocapitalist View Post
Yes. In order to save yourself many hours or days of ineffectual efforts, please look into using an HTML parser. If the HTML is actually well-formed XML then you have a wider range of options and could use any of a good number of XML parsers. But if you are dealing with messy HTML, then my generic recommendation is perl 5 with either HTML::Parser or HTML::TokeParser from CPAN, or else run the document through HTML Tidy first to error correct it and convert it to XML.

However, there are many standalone applications which work just find with XPaths, such as xmlstarlet

Code:
xmlstarlet select -N x=http://www.w3.org/1999/xhtml \
        -t -m "//x:a[@href[contains(.,'happy.php')]]/x:strong" -c . -n x.xml \
        | sed -e 's/ xmlns="[^"]*"//'
See https://www.w3.org/TR/2011/WD-html5-...amespaces.html

Thx for reply, but Unfortunately my Distro (Synology DSM) doesn't have that so I have to work with the standard Stuff (Awk/Grep etc etc)


Ps: No idea how to install extra stuff on there and I rather not break it
 
Old 10-15-2018, 05:44 AM   #13
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Make it one step harder to find the next string

Would something like this work ??

Code:
while read line; do
	if echo $line | grep -q $startstring
then
	if echo $line | grep -q $andstring
	then
	until echo $line | grep $stopstring
	do
       echo $LINE
	done
	fi
	
fi
done < $FILE
 
Old 10-15-2018, 05:48 AM   #14
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by pan64 View Post
yes, it is a good start. I would suggest you to read some tutorials/examples on the net about usage of grep.
You will need something like this:
Code:
grep 'pattern1.*pattern2' filename
(yes, one single line would be sufficient)

Thx, but unfortunately I need to find more strings in the same file


If string 1 has been found,
search for string 2.
If string 2 has been found, search for string 3 until you hit <stop string>


then search again for a new occurrence of string 1



This is just a hobby project I am trying to put on my NAS and keeps my brain busy (hahahahahaah)
 
Old 10-15-2018, 05:51 AM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,133

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
bash (the shell you are probably using) supports regex matching without having to call external programs like grep; uses "=~". Say like this
Code:
if [[ $line =~ $startstring.*$stopstring ]] ...
That will find both (in order) in the same line separated by anything (or nothing). Note that the regex engine is not the same as .NET, so there may be a slight learning curve depending on complexity.

Edit: Ahhh - posts crossed; that introduces a different wrinkle.

Last edited by syg00; 10-15-2018 at 05:58 AM. Reason: typos
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ text file line by line/each line to string/array guru11 Programming 5 12-29-2011 09:34 AM
C++ text file line by line/each line to string/array Dimitris Programming 15 03-11-2008 08:22 AM
How to identify a line and replace another string on that line using Shell script? Sid2007 Programming 10 10-01-2007 08:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration