LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 02-05-2021, 02:53 PM   #1
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
why do YouTube video codes have _ and - ?


does anyone know why YouTube uses _ and - in the video codes of their URLs?
 
Old 02-05-2021, 03:13 PM   #2
teckk
Senior Member
 
Registered: Oct 2004
Distribution: FreeBSD Arch
Posts: 3,359

Rep: Reputation: 1017Reputation: 1017Reputation: 1017Reputation: 1017Reputation: 1017Reputation: 1017Reputation: 1017Reputation: 1017
Do you mean like this?
https://m.youtube.com/watch?v=abc123de-fg
https://m.youtube.com/watch?v=ABcDE_12f45

I's a 11 character identifier. Apparently _ and - are allowed characters.
 
Old 02-05-2021, 06:24 PM   #3
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 1,066

Rep: Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871

It's most likely a variant of Base64 using URL-friendly characters, as defined in sections 4/5 of RFC 4648.

(I don't know why they use Base64 though.)

 
Old 02-05-2021, 10:49 PM   #4
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230

Original Poster
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
Quote:
Originally Posted by teckk View Post
Do you mean like this?
https://m.youtube.com/watch?v=abc123de-fg
https://m.youtube.com/watch?v=ABcDE_12f45

I's a 11 character identifier. Apparently _ and - are allowed characters.
yeah. those. i figured out it is 11 characters and that _ and - are allowed characters. but i also read that encodes a 64-bit integer. but _ and - are not needed for that.
 
Old 02-05-2021, 11:09 PM   #5
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230

Original Poster
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
Quote:
Originally Posted by boughtonp View Post
It's most likely a variant of Base64 using URL-friendly characters, as defined in sections 4/5 of RFC 4648.

(I don't know why they use Base64 though.)

most likely, whatever data this encodes is of no value to users. but i am curious enough to try to figure out what they are doing. so maybe it's standard base64 with _ and - instead of + and /. but i would use base57 for this, eliminating the 2 extra characters entirely and eliminating 5 more for better human interface. see 3.4 in RFC 4648.

or may they are encoding a 66 bit number.
 
Old 02-06-2021, 02:52 PM   #6
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 16,858
Blog Entries: 10

Rep: Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013Reputation: 5013
I don't think YT video IDs are encoded or checksums or anything.
Considering how many videos there have ever been on youtube, cumulatively...
They are probably simple sequential IDs with all alphanumeric characters + (under)score.
 
Old 02-06-2021, 04:11 PM   #7
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230

Original Poster
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
i would not expect them to be checksums. but that would not be impossible. it would still have to be an effective ID since that is all that varies in the URL to know which video is requested. the remaining question is whether the data is 64 bits or 66 bits. if it is 64 bits and i designed it, there would only be 57 alphanumerics.
 
Old 02-06-2021, 04:15 PM   #8
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 1,066

Rep: Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871Reputation: 871
Quote:
Originally Posted by ondoho View Post
They are probably simple sequential IDs
Unlikely - the predictability of straight sequential IDs could be a security/privacy issue.


Quote:
I don't think YT video IDs are encoded or checksums or anything.
I see no reason to include hyphens if they're not, and a search revealed a thorough explanation of how it's likely that video IDs are 64-bit integers and channel/playlist IDs are 128-bit integers, and representing those in base64 requires strings 11 and 22 characters long respectively. (The latter have prefixes to make them 24 characters long.)

It also points out that, due to the nature of base64, a video ID will always end in one of these characters [048AEIMQUYcgkosw] whilst a channel ID is more constrained and will end in one of only four characters [AQgw]. (An ID that doesn't end in any of those characters would be evidence against base64 being used, if you can find one?)

Quote:
Considering how many videos there have ever been on youtube, cumulatively...
A 64-bit integer allows enough for over a hundred million videos per person, for every person who has ever been on this planet.

 
Old 02-09-2021, 04:13 PM   #9
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230

Original Poster
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
i doubt they are sequential IDs. i have seen videos posted the same day with radically different IDs and other videos posted long ago with and ID between them. they could be picked randomly with a, hopefully strong, random number generator. just how strong it needs to be could be a debate topic.

i do suspect they are 64-bit integers and not 66-bit because that is so many that 66 would have little advantage over 64. i'd then go with 128-bit.

11 characters allows encoding 64 bits in a base as small as 57 (2**64 < 57**11). that base allows an alphabet with 7 fewer characters. i'd get rid of the 2 special characters, first. then, i'd get rid of 5 characters that could look like another to humans, such as the letter O.

my alphabet57:
Code:
ABCDEFGHJKLMNPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz0123456789
 
Old 02-09-2021, 04:17 PM   #10
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,230

Original Poster
Blog Entries: 21

Rep: Reputation: 161Reputation: 161
can we find even one user who has posted over a hundred million videos even if just auto-posting junk?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple codes in input-event-codes.h file for FORWARD BUTTON/KEY bharathsm Linux - Newbie 5 04-18-2019 07:49 AM
ns2 codes for OLSR codes sukrutha Linux - Newbie 3 01-26-2012 12:42 AM
[SOLVED] How to compile .jar Java byte-codes into native machine codes? ilgaar Linux - Software 1 08-30-2010 01:52 AM
How to compile .jar Java byte-codes into native machine codes? ilgaar Linux - Software 6 08-29-2010 12:20 AM
Are the hex codes for colors in a jpg the same codes as used in html? abefroman Linux - Security 3 07-31-2005 04:21 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 04:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration