Expand/transform an URL which is inside a credential accessed site

dedec0 · 08-01-2021, 09:22 AM

Where i study, there is a site which all students use to do things related to the institution. Within this site, there is also access to a subsystem which runs Moodle, and it is used to have areas specific for each discipline, in each moment.

Inside that Moodle, sometimes we have an URL which is shared which students. But this URL is hidden. Id est, it is an URL within the school domain:

https:// virtual. school. br/2021/mod/url/view. php?id=13531

But if we visit that, we will be redirected to (possibly) any external site. And there is the problem: sometimes, teachers shares bad URLs which i *do not* want to visit with the browser profile i use to access the school and other "serious" things.

Pages to expand short/redirected URLs do not work here because they will always give, as result, the URL

https:// systems. school. br/idp/login. jsp

I searched for a browser (Firefox 52, in this situation) extension which could expand this URL, but using my existing session cookies to get the correct result. I did not find (neither for newer Firefox, or other browsers).

This may also be possible to solve with a silly programming in javascript, or in the browser development things. I even thought about opening this thread in the programming or in the general forums. Well, here is a good bet, i think.

Ideas? Solutions?

Turbocapitalist · 08-01-2021, 10:40 AM

Can you watch the response codes using wget or a home made perl or python script?

Code:

wget -S -O /dev/null https:// virtual. school. br/2021/mod/url/view. php?id=13531

You'll have to add in the authentication options as well, since you are logging in to Moodle.

It should show some response codes in the 300 range. If so, you have something you can work with when you write your own browser plug-in.

dedec0 · 08-01-2021, 02:26 PM

Quote:

Originally Posted by Turbocapitalist

Can you watch the response codes using wget or a home made perl or python script?

sure. i even have a good setup of python, now. and perl should be already ok, too. i use (at least) one command that is made purely in perl (maybe there are more, i do not examine things all around).

Quote:

Originally Posted by Turbocapitalist

Code:

wget -S -O /dev/null https:// virtual. school. br/2021/mod/url/view. php?id=13531

I ran and formatted (editing all the possibly sensitive data i saw) the above command. The output is below - but i forgot to run it with POSIX locale, so you can understand all messages. I think it is acceptable, in this case, but if you want, ask me to remake it.

Code:

$ wget -S -O /dev/null https:// virtual. school.
br/2021/mod/url/view. php?id=13531

--2021-08-01 15:30:07--  https://virtual. school
.br/2021/mod/url/view. php?id=13531

Resolvendo virtual. school. br (virtual. school. br)...
123.456.789.11

Conectando-se a virtual. school. br (virtual. school.
br)|123.456.789.11|:443... conectado.

A requisição HTTP foi enviada, aguardando resposta... 

  HTTP/1.1 303 See Other

  Server: nginx/1.10.1

  Date: Sun, 01 Aug 2021 18:30:07 GMT

  Content-Type: text/html; charset=utf-8

  Transfer-Encoding: chunked

  Connection: keep-alive

  Set-Cookie: MoodleSession2021=and8ejbnayejinvalidkwqwert;
  path=/2021/; domain=school. br; secure; HttpOnly

  Expires: Thu, 19 Nov 1981 08:76:54 GMT

  Cache-Control: no-store, no-cache, must-revalidate

  Pragma: no-cache

  X-Redirect-By: Moodle

  Location: https://virtual. school. br/2021/login/index. php

  Content-Language: pt-br

  Strict-Transport-Security: max-age=13579000; includeSubDomains;
  preload;

  X-Content-Type-Options: nosniff

  X-XSS-Protection: 1; mode=block

  X-Robots-Tag: none

  X-Download-Options: noopen

  X-Permitted-Cross-Domain-Policies: none

  X-Frame-Options: SAMEORIGIN

  Referrer-Policy: no-referrer

Localização: https://virtual. school. br/2021/login/index. php
[redirecionando]

--2021-08-01 15:30:07--  https://virtual. school.
br/2021/login/index. php

Reaproveitando a conexão existente para virtual. school. br:443.

A requisição HTTP foi enviada, aguardando resposta... 

  HTTP/1.1 303 See Other

  Server: nginx/1.10.1

  Date: Sun, 01 Aug 2021 18:30:07 GMT

  Content-Type: text/html; charset=utf-8

  Transfer-Encoding: chunked

  Connection: keep-alive

  Expires: Thu, 19 Nov 1981 08:52:00 GMT

  Cache-Control: no-store, no-cache, must-revalidate

  Pragma: no-cache

  X-Redirect-By: Moodle

  Location: https://virtual. school. br/2021/auth/shibboleth/

  Content-Language: pt-br

  Strict-Transport-Security: max-age=15768000; includeSubDomains;
  preload;

  X-Content-Type-Options: nosniff

  X-XSS-Protection: 1; mode=block

  X-Robots-Tag: none

  X-Download-Options: noopen

  X-Permitted-Cross-Domain-Policies: none

  X-Frame-Options: SAMEORIGIN

  Referrer-Policy: no-referrer

Localização: https://virtual. school. br/2021/auth/shibboleth/
[redirecionando]

--2021-08-01 15:30:07--  https://virtual. school.
br/2021/auth/shibboleth/

Reaproveitando a conexão existente para virtual. school. br:443.

A requisição HTTP foi enviada, aguardando resposta... 

  HTTP/1.1 302 Moved Temporarily

  Server: nginx/1.10.1

  Date: Sun, 01 Aug 2021 18:30:07 GMT

  Content-Type: text/html

  Transfer-Encoding: chunked

  Connection: keep-alive

  Location: https://systems. school.
  br/idp/profile/SAML2/Redirect/SSO?SAMLRequest={value edited:
  besides a lot of uninteligible parts with [a-zA-Z0-9], it
  contained several ASCII encoded things like %2F %2B %3A %3D } 

  Expires: Wed, 01 Jan 1987 12:34:56 GMT

  Cache-Control: private,no-store,no-cache,max-age=0

  Set-Cookie: _opensaml_req_ss%3A{ [a-zA-Z0-9%]\+ }; path=/;
  domain=.school. br; secure; HttpOnly;; SameSite=None

  Strict-Transport-Security: max-age=12345678; includeSubDomains;
  preload;

  X-Content-Type-Options: nosniff

  X-XSS-Protection: 1; mode=block

  X-Robots-Tag: none

  X-Download-Options: noopen

  X-Permitted-Cross-Domain-Policies: none

  X-Frame-Options: SAMEORIGIN

  Referrer-Policy: no-referrer

Erro de sintaxe em Set-Cookie: _opensaml_req_ss%3A{
    [a-zA-Z0-9%]\+ }; path=/; domain=.school. br; secure;
    HttpOnly;; SameSite=None na posição 167.

Localização: https://systems. school.
br/idp/profile/SAML2/Redirect/SSO?SAMLRequest={ [a-zA-Z0-9%]\+
# with "RelayState" inside } [redirecionando]

--2021-08-01 15:30:07--  https://systems. school.
br/idp/profile/SAML2/Redirect/SSO?SAMLRequest={ [a-zA-Z0-9%]\+
# with "RelayState" inside }

Resolvendo systems. school. br (systems. school. br)...
123.456.789.5

Conectando-se a systems. school. br (systems. school.
br)|123.456.789.5|:443... conectado.

A requisição HTTP foi enviada, aguardando resposta... 

  HTTP/1.1 302 Found

  Date: Sun, 01 Aug 2021 18:30:07 GMT

  Content-Length: 0

  Expires: Thu, 01 Dec 1994 12:34:00 GMT

  Cache-Control: no-cache="set-cookie, set-cookie2"

  Strict-Transport-Security: max-age=15552000

  Location: https://systems. school. br/idp/login.jsp

  Set-Cookie:
  WASReqURL=https:///idp/profile/SAML2/Redirect/SSO?SAMLRequest={
      [a-zA-Z0-9%]\+ # with "RelayState" inside }; path=/;
      secure; HttpOnly

  Keep-Alive: timeout=30, max=250

  Connection: Keep-Alive

  Content-Language: en-US

Localização: https://systems. school. br/idp/login.jsp
[redirecionando]

--2021-08-01 15:30:07--  https://systems. school.
br/idp/login.jsp

Reaproveitando a conexão existente para systems. school. br:443.

A requisição HTTP foi enviada, aguardando resposta... 

  HTTP/1.1 200 OK

  Date: Sun, 01 Aug 2021 18:30:07 GMT

  Expires: 0

  Cache-Control: no-cache, no-store, must-revalidate, max-age=0

  Pragma: no-cache

  Strict-Transport-Security: max-age=15552000

  Set-Cookie: JSESSIONID_CL01={ [a-zA-Z0-9%]\+ }:{ [a-zA-Z0-9%]\+
  }; Path=/; Secure

  Set-Cookie:
  WASReqURL=https:///idp/profile/SAML2/Redirect/SSO?SAMLRequest={
      [a-zA-Z0-9%]+ # with "RelayState" inside }; Path=/

  Set-Cookie:
  WASReqURL=https:///idp/profile/SAML2/Redirect/SSO?SAMLRequest={
      [a-zA-Z0-9%]+ # with "RelayState" inside }; Expires=Thu,
      01-Dec-94 12:34:56 GMT; Path=/; Domain=.school. br

  Keep-Alive: timeout=30, max=249

  Connection: Keep-Alive

  Transfer-Encoding: chunked

  Content-Type: text/html;charset=ISO-8859-1

  Content-Language: en-US

Tamanho: não especificada [text/html]

Salvando em: “/dev/null”


/dev/null               [ <=>                ]   2,87K  --.-KB/s
in 0,03s   


2021-08-01 15:30:07 (85,2 KB/s) - “/dev/null” salvo [2941]

Quote:

Originally Posted by Turbocapitalist

You'll have to add in the authentication options as well, since you are logging in to Moodle.

It should show some response codes in the 300 range. If so, you have something you can work with when you write your own browser plug-in.

Indeed, there are 303 responses. But the problem is the cookies that are exist and valid in the browser, but not in a script. And the way the system works, where moodle is inside a subdomain, but we do not login specifically to it. We login with yet another subdomain (like "my. school. br") and are redirected all around. Is this a reason to conclude it is complicated? Or not necessarily?

To sniff what my browser sends and receives from network, helps anything? Get the cookie names+values+properties from the browser itself is very easy, but...

dedec0 · 08-01-2021, 03:40 PM

Quote:

Originally Posted by Turbocapitalist

Can you watch the response codes using wget or a home made perl or python script?

Code:

wget -S -O /dev/null https:// virtual. school. br/2021/mod/url/view. php?id=13531

Another thought: i do not want to save the whole response content, like this command does (although it throws it out). It shows the network "path" that was made to get the content, find. But what i really want it to check the URL before each step is taken. There are URLs i do not even will visit before editing them (like those with visit or share IDs, and similar).

For this new thought, i ask: browser addons deal with such basic details of network operations?

Turbocapitalist · 08-01-2021, 10:31 PM

You'd have to write your own script in perl or python or similar to make the request to the web server and check what it returns each time and then allow you to choose whether to follow the next stage in the request or not. The above wget is not a solution. It will, however, show you all the stages in the request along with their response codes. That will tell you what kind of possibilities you have for writing your script.

dedec0 · 08-02-2021, 07:06 AM

Quote:

Originally Posted by Turbocapitalist

You'd have to write your own script in perl or python or similar to make the request to the web server and check what it returns each time and then allow you to choose whether to follow the next stage in the request or not. The above wget is not a solution. It will, however, show you all the stages in the request along with their response codes. That will tell you what kind of possibilities you have for writing your script.

So, the core of what i need in an HTTP library, right? For example, in python:

https://thumbs2.imgbox.com/43/be/R3899Bxf_t.png

Or would an http request/response parser be what i need? Look:

https://thumbs2.imgbox.com/34/38/YXtHpiQn_t.png

I am not sure about the difference of these 2 results. What you think?

Turbocapitalist · 08-02-2021, 07:08 AM

Can you please post the text here inside [code] [/code] tags?

dedec0 · 08-02-2021, 07:19 AM

Quote:

Originally Posted by Turbocapitalist

Can you please post the text here inside [code] [/code] tags?

I did not write any code, Turbo. The images i sent are just with the description and names of the libraries that i am in doubt to which to choose, which i need. They are screenshots of synaptic window.

teckk · 08-02-2021, 07:24 AM

I'm not sure what you are wanting. Why are there spaces in your urls like that?

Code:

#!/usr/bin/python

from http.client import HTTPSConnection
from time import sleep

#Example list, some good, some bad, on purpose.
u = ('/questions/linux-newbie-8/', 'questions/linux-newbie-8/', 
       '/questions/linux-software-2/', 'questions/linux-software-2/')
       
url = 'linuxquestions.org'

for i in u:
    a = HTTPSConnection(url)
    a.request('GET', i)
    b = a.getresponse()
    print('\n', i)
    print(b.status, b.reason)
    data = b.read().decode('utf-8', errors='ignore')
    a.close()
    sleep(2)

dedec0 · 08-02-2021, 07:43 AM

Quote:

Originally Posted by teckk

I'm not sure what you are wanting. Why are there spaces in your urls like that?

Code:

#!/usr/bin/python

from http.client import HTTPSConnection
from time import sleep

#Example list, some good, some bad, on purpose.
u = ('/questions/linux-newbie-8/', 'questions/linux-newbie-8/', 
       '/questions/linux-software-2/', 'questions/linux-software-2/')
       
url = 'linuxquestions.org'

for i in u:
    a = HTTPSConnection(url)
    a.request('GET', i)
    b = a.getresponse()
    print('\n', i)
    print(b.status, b.reason)
    data = b.read().decode('utf-8', errors='ignore')
    a.close()
    sleep(2)

I wrote spaces in the URLs just to avoid LinuxQuestions automatic URL transforming, which destroyed the output in some parts. So, instead of disabling it (impossible now), i added spaces in a way that we still can read things easily. After that, I separated each line with an empty line, and broke the long ones to fit a medium screen width, so anyone reading my whole output can simply roll it down, without losing anything. And the URL domains, subdomains and the other unique details were changed/replaced with something that represent what i see in them, but just in a way that does not reveal their real values, for privacy concerns.

Thank you for the python example, teckk. But i think i need to deal with more complex details of a request. In post #3 (https://www.linuxquestions.org/quest...4/#post6271483), i show the requests and responses that happened in the browser. I do not want to make all of them. But i will need to make the requests pretty similar (i guess). The variable values (cookies) i get copy from the browser, or i will have to send requests since the first login page of my school (and probably do a robot browser job?).

teckk · 08-02-2021, 07:52 AM

https://docs.python.org/3/library/http.client.html

Edit:
https://docs.python.org/3/library/urllib.html
https://docs.python-requests.org/en/master/index.html

Turbocapitalist · 08-02-2021, 07:52 AM

Quote:

Originally Posted by dedec0

They are screenshots of synaptic window.

Then please embed them here so they can be viewed in the context of your question both now and in the future after imgbox goes to the great bit bucket in the sky. Few will click on links to dodgy sites but embedding them here means that they are vetted to a substantial extent.

dedec0 · 08-02-2021, 09:02 AM

Quote:

Originally Posted by Turbocapitalist

Then please embed them here so they can be viewed in the context of your question both now and in the future after imgbox goes to the great bit bucket in the sky. Few will click on links to dodgy sites but embedding them here means that they are vetted to a substantial extent.

imgbox is not a dodgy site. It is pretty safe, maintained by ads, but providing a great service for its users, registered (for free) or not registered. It is also very flexible, giving me options to choose how to share each image. In this thread, i used the BB code it gives (yes, LQ is not BB, but since it is equal in a few other tags, why not being in this KEY one?).

embed? How?? I just read the hints of all post editing buttons, and none is about image. And if you are talking about message attachments, we have a limited quota of them in LQ (or this changed, and nobody told me). In imgbox, there is no limit of how many images i can have, or how many galleries i can have (i keep things much organized, there, which is great), and there is no bandwidth limitation (except for abuses, of course).

boughtonp · 08-02-2021, 10:09 AM

Quote:

Originally Posted by dedec0

I wrote spaces in the URLs just to avoid LinuxQuestions automatic URL transforming, which destroyed the output in some parts. So, instead of disabling it (impossible now)

Uh? Untick "Automatically parse links in text" under the message box. This can be done both at post-time and edit-time.

Quote:

if you are talking about message attachments, we have a limited quota of them in LQ

The quota is 35MB. The total size of the two images you linked is ~6KB - you could attach several thousand such images without exceeding the quota.

dedec0 · 08-02-2021, 10:30 AM

Quote:

Originally Posted by boughtonp

Uh? Untick "Automatically parse links in text" under the message box. This can be done both at post-time and edit-time.

kkkk.... indeed. I did not see that, although i checked that place in compose page. For a mysterious reason, i did not recognize it. I wanted to have an option to never parse URLs automatically. Have it always off. And to always disable smileys in text - i hate that option. I also do not like the fact that tags in quoted text are not in separate lines, when we quote messages. This makes it harder to quote separate paragraphs using linuxes' selected text copy, which i do a lot. There is also a bad "feature" with code tags that they *always* add an empty line below our code - even if we write a single line of code, and leave both tags in the same line. I have reported (suggested?) this problem, and found out that i was not the only person who noticed it. This was long ago, it never changed, or showed anything.

Quote:

Originally Posted by boughtonp

The quota is 35MB. The total size of the two images you linked is ~6KB - you could attach several thousand such images without exceeding the quota.

These 2 images are small, but they are exception. The quota here will be quickly filled, if i start using it. I prefer not to worry about size, and keep worrying just about showing the right parts.