LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-15-2018, 05:55 AM   #16
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286

I think the solutions already provided should do the trick but really to be sure, do you mind posting a bogus input file (1 line is not enough) and what your output should look like please?

Last edited by l0f4r0; 10-15-2018 at 05:57 AM.
 
1 members found this post helpful.
Old 10-15-2018, 05:58 AM   #17
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by syg00 View Post
<snip>

Edit:Ahhh - posts crossed; that introduces a different wrinkle.

Yeah sorry about that ! I thought in my opening post to 1st get to the issue was having and NOT post the FINAL result I hope to achieve as that maybe is WAY out of my LEAGUE for me.


Sorry about the confusion.
 
Old 10-15-2018, 06:00 AM   #18
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by l0f4r0 View Post
I think the solutions already provided should do the trick but really to be sure, do you mind posting a bogus input file (1 line is not enough) and what your output should look like please?

Will do, will post something tmrw.


Thanks for the help and comments everyone, really appreciated !!
 
Old 10-15-2018, 08:25 AM   #19
individual
Member
 
Registered: Jul 2018
Posts: 234

Rep: Reputation: 177Reputation: 177
It would be helpful to have a test file, but I added your test line to a few places in the HTML of this site and got the correct result. It only allows for a small number of newlines between <a> and <strong>.
Code:
#!/usr/bin/env bash

file='lq.html'
s='http://website.com/happy.php?id='

grep -A5 "<a href=\".*$s" $file | grep -Po '(?<=<strong>)(.+)(?=</strong>)'

Last edited by individual; 10-15-2018 at 08:56 AM. Reason: .
 
1 members found this post helpful.
Old 10-15-2018, 08:29 AM   #20
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 13,103

Rep: Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145
Quote:
Originally Posted by iammike2 View Post
Thx, but unfortunately I need to find more strings in the same file


If string 1 has been found,
search for string 2.
If string 2 has been found, search for string 3 until you hit <stop string>


then search again for a new occurrence of string 1
I would rather try awk (or perl/python) instead of grep
 
1 members found this post helpful.
Old 10-15-2018, 09:11 PM   #21
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Here is the sample file

I put it in code tags. Don't know if that is correct

Clarification

Test file Total 66 Lines (real one 3148)
from the file I want Line 1 - 21 (see theoneiwant.gif)
from the file I DON'T want 23-43 (see theoneIdontwant.gif)
from the file I want line 45 - 65 (see theoneiwant.gif)
etc etc

The output I want is:
The ID number of the line of text, so (line 1-21) 12345 and (line 45-65) 21222
and the description: (line 1-21) this is a description of this file that I want 1
(line 45-65) this is a description of this file that I want 2
etc etc

Edit: If the description proves hard to do, then the ID (alone) would be enough !

Please note: there are more lines in the real file for each sample, but that is just noise and I removed it. Also between "description" and the "img file" there are uneven lines of text.
But each section is between <tr> and </tr> tags.

Any questions just shout

Code:
	<tr>
			<td align="center" style="width: 62px; height: 48px;" class="unsortable2">
				<a href="http://website.com/happy.php?category=26">/></a>
			</td>
			<td valign="top" align="left">
				
				<div class="tooltip-target" id="port-target-1">
					<a href="http://website.com/happy.php?id=12345"><strong>this is a description of this file that I want 1</strong></a>
				</div>
				
				<div class="tooltip-content" id="port-content-1" style="width: 400;">
					<img src="http://website.com/sometxt/images/12345.jpg" border="0" alt="" title="" width="400" height="286" />
					
				</div>
				<div>
					<span style="float: right;">
						<img src="http://website.com/include/templates/default/images/sametxt_flags/theoneiwant.gif" border="0" class="inlineimg" /> 
					</span>
				</div>
			</td>							
		</tr>
		
		<tr>
			<td align="center" style="width: 62px; height: 48px;" class="unsortable2">
				<a href="http://website.com/happy.php?category=26">/></a>
			</td>
			<td valign="top" align="left">
				
				<div class="tooltip-target" id="port-target-1">
					<a href="http://website.com/happy.php?id=12346"><strong>this is a description of this file that I dont want</strong></a>
				</div>
				
				<div class="tooltip-content" id="port-content-1" style="width: 400;">
					<img src="http://website.com/sometxt/images/12346.jpg" border="0" alt="" title="" width="400" height="286" />
					
				</div>
				<div>
					<span style="float: right;">
						<img src="http://website.com/include/templates/default/images/sametxt_flags/theoneidontwant.gif" border="0" class="inlineimg" /> 
					</span>					
				</div>
			</td>							
		</tr>
		
			<tr>
			<td align="center" style="width: 62px; height: 48px;" class="unsortable2">
				<a href="http://website.com/happy.php?category=26">/></a>
			</td>
			<td valign="top" align="left">
				
				<div class="tooltip-target" id="port-target-1">
					<a href="http://website.com/happy.php?id=21222"><strong>this is a description of this file that I want 2</strong></a>
				</div>
				
				<div class="tooltip-content" id="port-content-1" style="width: 400;">
					<img src="http://website.com/sometxt/images/21222.jpg" border="0" alt="" title="" width="400" height="286" />
					
				</div>
				<div>
					<span style="float: right;">
						<img src="http://website.com/include/templates/default/images/sametxt_flags/theoneiwant.gif" border="0" class="inlineimg" /> 
					</span>
				</div>
			</td>							
		</tr>

Last edited by iammike2; 10-16-2018 at 01:24 AM.
 
Old 10-15-2018, 09:28 PM   #22
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by pan64 View Post
I would rather try awk (or perl/python) instead of grep

AWK I have, but I just checked Perl I don't have, python I have (2.7.12)


Will check if I can install Perl. Yep I can it's listed in the Package Center as Devtools


I wish I could do it in VB.Net as I already done it on my PC but MONO (that I can install), doesn't come with VBNC only CSC )




I want to run this on my NAS as that one is running 24/7 anyway
 
Old 10-16-2018, 01:00 AM   #23
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 13,103

Rep: Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145
(would be nice to edit your post and mark the text you are looking for - with bold or different color or ...)

ok, let's have a try:
to catch lines like
Code:
<a href="http://website.com/happy.php?id=12345"><strong>this is a description of this file that I want 1</strong></a>
you can write a regexp for grep (perl/python/whatever). So as long as the structure of this file is the same it may work.
Code:
grep 'a href=.*id=.*</strong></a>'
or something similar can be used. You can use
Code:
grep -Po '(?<=<strong>)(.+)(?=</strong>)'
like syntax if you wish to print only part of the expression (see for lookahead and lookbehind regexps).

I don't really know if this is what you want, but I think you will tell us
 
1 members found this post helpful.
Old 10-16-2018, 01:33 AM   #24
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by pan64 View Post
(would be nice to edit your post and mark the text you are looking for - with bold or different color or ...)

ok, let's have a try:
to catch lines like
Code:
<a href="http://website.com/happy.php?id=12345"><strong>this is a description of this file that I want 1</strong></a>
you can write a regexp for grep (perl/python/whatever). So as long as the structure of this file is the same it may work.
Code:
grep 'a href=.*id=.*</strong></a>'

Thx, I add bold to the ones I want and RED to the one I don't want



I only want to get that ID (line 8 example) if the .GIF (line 17 example) reads "theoneiwant"


So id in line 30 I don't want because the .GIF reads "theoneidontwant"
etc etc



I made something in VB.net that kind of works, but that means I have to keep my PC running 24/7 and I rather not do that.


I don't want to burden you guys with this because it's nothing major (just a hobby project)



Unfortunately no experience at all with Perl or Python.


Thx you guys for the time put in to this !! Really appreciated.
 
Old 10-16-2018, 01:40 AM   #25
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,177
Blog Entries: 3

Rep: Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067
Quote:
Originally Posted by iammike2 View Post
Unfortunately no experience at all with Perl or Python.
Perl is quite flexible and easy to pick up. Its main power is in its pattern matching but there are modules for nearly anything via The Comprehensive Perl Archive Network (CPAN) already in your distro's repository.
 
1 members found this post helpful.
Old 10-16-2018, 01:46 AM   #26
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by Turbocapitalist View Post
Perl is quite flexible and easy to pick up. Its main power is in its pattern matching but there are modules for nearly anything via The Comprehensive Perl Archive Network (CPAN) already in your distro's repository.

Thx. What I like about VB.net that it has a Step by Step Debugger. Does something like this exist for Perl ??

Edit: Already found something: http://www.enginsite.com/Perl.htm

And ideally would be if I could develop it on PC (Windows) and it still works on my NAS (but maybe that is too much to ask )

Thx again

Last edited by iammike2; 10-16-2018 at 01:48 AM.
 
Old 10-16-2018, 02:00 AM   #27
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 13,103

Rep: Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145Reputation: 4145
both perl and python have a debugger.
perl and python are mainly platform independent, so they may work on windows, linux and also on nas without any problem. But obviously you need to check it.
I still don't really understand how can you decide if you need or don't need a line.
 
1 members found this post helpful.
Old 10-16-2018, 02:00 AM   #28
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,177
Blog Entries: 3

Rep: Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067Reputation: 2067
Perhaps this debugger would be similar:

https://perldoc.perl.org/perldebug.html
https://perldoc.perl.org/perldebtut.html

Perl is quite portable and, as for running on Windows, it can be done but since I don't use or condone it, I'd have to guess. One name that I have seen on the mailing lists over the years has been Strawberry Perl. As long as both machines have the same CPAN Modules the scripts will run just fine for the most part. However, if you are using system level functions be sure to read the documentation because every once in a while there are differences between Windows and the rest of the world.

My griping about legacy operating systems aside, Perl is really useful and fast for working with text. That's what it's designed for. The speed refers both to both running and writing. Though with the latter you need to focus on clarity rather than cleverness.

https://perldoc.perl.org/perlre.html

There is a bit of overlap with Python these days, though Perl is hands-down better at patterns. So if one does not feel good then there is the other.
 
1 members found this post helpful.
Old 10-16-2018, 02:08 AM   #29
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by pan64 View Post
<snip>
I still don't really understand how can you decide if you need or don't need a line.

If the name of .GIF is "theoneIwant" then I want it, if the .GIF is called "theoneIdon'twant" then I don't want it.


Oke, So there are 3 <TR> </TR> sections in that test file

Section 1 <TR> .GIF is called "theoneIwant" so I want the ID that is in that section </TR>

Section 2 <TR> .GIF is called "theoneIDont want" so I DON'T want the ID that is in that section </TR>

and finally

Section 3 <TR> .GIF is called "theoneIwant" so I want the ID that is in that section </TR>

So from that test file I would get 2 ID's back !

I hope now it's more clear

Last edited by iammike2; 10-16-2018 at 02:12 AM.
 
Old 10-16-2018, 02:24 AM   #30
iammike2
LQ Newbie
 
Registered: Oct 2018
Distribution: Synology DSM
Posts: 28

Original Poster
Rep: Reputation: 1
logic is the following

Search for <a href="http://website.com/happy.php?id=

If found <a href="http://website.com/happy.php?id= search for if SAME line contains <strong>
(If so, then place id in variable)

Go on searching for theoneiwant

if found write the ID to text file.

If NOT FOUND theoneiwant stop when reaching </TR> and then start searching the next <TR></TR> section

Last edited by iammike2; 10-16-2018 at 02:33 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ text file line by line/each line to string/array guru11 Programming 5 12-29-2011 09:34 AM
C++ text file line by line/each line to string/array Dimitris Programming 15 03-11-2008 08:22 AM
How to identify a line and replace another string on that line using Shell script? Sid2007 Programming 10 10-01-2007 08:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration