LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   why do YouTube video codes have _ and - ? (https://www.linuxquestions.org/questions/general-10/why-do-youtube-video-codes-have-_-and-4175689953/)

Skaperen 02-05-2021 01:53 PM

why do YouTube video codes have _ and - ?
 
does anyone know why YouTube uses _ and - in the video codes of their URLs?

teckk 02-05-2021 02:13 PM

Do you mean like this?
https://m.youtube.com/watch?v=abc123de-fg
https://m.youtube.com/watch?v=ABcDE_12f45

I's a 11 character identifier. Apparently _ and - are allowed characters.

boughtonp 02-05-2021 05:24 PM


 
It's most likely a variant of Base64 using URL-friendly characters, as defined in sections 4/5 of RFC 4648.

(I don't know why they use Base64 though.)


Skaperen 02-05-2021 09:49 PM

Quote:

Originally Posted by teckk (Post 6216623)
Do you mean like this?
https://m.youtube.com/watch?v=abc123de-fg
https://m.youtube.com/watch?v=ABcDE_12f45

I's a 11 character identifier. Apparently _ and - are allowed characters.

yeah. those. i figured out it is 11 characters and that _ and - are allowed characters. but i also read that encodes a 64-bit integer. but _ and - are not needed for that.

Skaperen 02-05-2021 10:09 PM

Quote:

Originally Posted by boughtonp (Post 6216704)
It's most likely a variant of Base64 using URL-friendly characters, as defined in sections 4/5 of RFC 4648.

(I don't know why they use Base64 though.)


most likely, whatever data this encodes is of no value to users. but i am curious enough to try to figure out what they are doing. so maybe it's standard base64 with _ and - instead of + and /. but i would use base57 for this, eliminating the 2 extra characters entirely and eliminating 5 more for better human interface. see 3.4 in RFC 4648.

or may they are encoding a 66 bit number.

ondoho 02-06-2021 01:52 PM

I don't think YT video IDs are encoded or checksums or anything.
Considering how many videos there have ever been on youtube, cumulatively...
They are probably simple sequential IDs with all alphanumeric characters + (under)score.

Skaperen 02-06-2021 03:11 PM

i would not expect them to be checksums. but that would not be impossible. it would still have to be an effective ID since that is all that varies in the URL to know which video is requested. the remaining question is whether the data is 64 bits or 66 bits. if it is 64 bits and i designed it, there would only be 57 alphanumerics.

boughtonp 02-06-2021 03:15 PM

Quote:

Originally Posted by ondoho (Post 6217074)
They are probably simple sequential IDs

Unlikely - the predictability of straight sequential IDs could be a security/privacy issue.


Quote:

I don't think YT video IDs are encoded or checksums or anything.
I see no reason to include hyphens if they're not, and a search revealed a thorough explanation of how it's likely that video IDs are 64-bit integers and channel/playlist IDs are 128-bit integers, and representing those in base64 requires strings 11 and 22 characters long respectively. (The latter have prefixes to make them 24 characters long.)

It also points out that, due to the nature of base64, a video ID will always end in one of these characters [048AEIMQUYcgkosw] whilst a channel ID is more constrained and will end in one of only four characters [AQgw]. (An ID that doesn't end in any of those characters would be evidence against base64 being used, if you can find one?)

Quote:

Considering how many videos there have ever been on youtube, cumulatively...
A 64-bit integer allows enough for over a hundred million videos per person, for every person who has ever been on this planet.


Skaperen 02-09-2021 03:13 PM

i doubt they are sequential IDs. i have seen videos posted the same day with radically different IDs and other videos posted long ago with and ID between them. they could be picked randomly with a, hopefully strong, random number generator. just how strong it needs to be could be a debate topic.

i do suspect they are 64-bit integers and not 66-bit because that is so many that 66 would have little advantage over 64. i'd then go with 128-bit.

11 characters allows encoding 64 bits in a base as small as 57 (2**64 < 57**11). that base allows an alphabet with 7 fewer characters. i'd get rid of the 2 special characters, first. then, i'd get rid of 5 characters that could look like another to humans, such as the letter O.

my alphabet57:
Code:

ABCDEFGHJKLMNPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz0123456789

Skaperen 02-09-2021 03:17 PM

can we find even one user who has posted over a hundred million videos even if just auto-posting junk?


All times are GMT -5. The time now is 03:34 AM.