MP3 to MIDI.

ondoho · 07-14-2016, 02:02 AM

Quote:

Originally Posted by Emerson

Funny, I was a professional sound engineer for 7+ years. It dawns to me what I see as impossible may be not that apparent for people with less experience on this field.

don't get me - us - wrong: wav-to-midi software exists, but it's pretty useless if you have more than one instrument, and even with only one, the results are questionable.
i'm sure there's a difference between the crappy windows freeware i tried and some professional version, and i'm sure the software will get better with time, much better, but to be able to seperate everything, just like "unmixing" a master track, that's stuff for sci-fi novels.

another analogy:
there are programs able to convert JPG images to SVG.
now imagine you have an SVG of a filled circle on a plain background. my guess, the SVG file contains about 10 lines.
convert to JPG - no problem.
convert back to SVG - how much code will the SVG file contain now?

hydrurga · 07-14-2016, 05:23 AM

Quote:

Originally Posted by ondoho

don't get me - us - wrong: wav-to-midi software exists, but it's pretty useless if you have more than one instrument, and even with only one, the results are questionable.
i'm sure there's a difference between the crappy windows freeware i tried and some professional version, and i'm sure the software will get better with time, much better, but to be able to seperate everything, just like "unmixing" a master track, that's stuff for sci-fi novels.

another analogy:
there are programs able to convert JPG images to SVG.
now imagine you have an SVG of a filled circle on a plain background. my guess, the SVG file contains about 10 lines.
convert to JPG - no problem.
convert back to SVG - how much code will the SVG file contain now?

My argument however is that musical instruments, and voices, are such rich sources that it *will* be possible to separate them out.

On the simplest level, take the number 11. You are told that it consists of 2 numbers added together and asked to find those numbers. In the absence of any more data, it's impossible. However in music you have both far more complex sources, generating multiple data, and also the ability to look over the entire timeline of the music in order to determine what frequencies and volumes are being generated by the various instruments over the piece. You will also have templates - the usual frequencies generated by a trumpet, say, with minimum and maximum bounds and probabilities.

If we humans can do it (I have for example a very talented friend who can listen to a piece of music and then transcribe the music for any instrument from it), computers will be able to do it. It may take some time however. :-)

jamison20000e · 07-14-2016, 07:22 AM

AI, now talk about a trained ear.

273 · 07-14-2016, 12:51 PM

Quote:

Originally Posted by hydrurga

My argument however is that musical instruments, and voices, are such rich sources that it *will* be possible to separate them out.

On the simplest level, take the number 11. You are told that it consists of 2 numbers added together and asked to find those numbers. In the absence of any more data, it's impossible. However in music you have both far more complex sources, generating multiple data, and also the ability to look over the entire timeline of the music in order to determine what frequencies and volumes are being generated by the various instruments over the piece. You will also have templates - the usual frequencies generated by a trumpet, say, with minimum and maximum bounds and probabilities.

If we humans can do it (I have for example a very talented friend who can listen to a piece of music and then transcribe the music for any instrument from it), computers will be able to do it. It may take some time however. :-)

So, what is the mathematical difference between a Mini Moog and a 2600 playing the same thing?
That it may be possible to create a file containing virtual instruments which sound, to the human ear, identical to the original is not in dispute -- it's called MP3.

jamison20000e · 07-14-2016, 01:16 PM

273 · 07-14-2016, 01:25 PM

Quote:

Originally Posted by jamison20000e

.ogg

Well, yes, a perhaps better version of the whole "How many perfect synths to make this?".
My point being that MIDI is sample based so is just a list of samples which is never, ever, going to sound like the original and it's going to be almost or actually impossible to tell which "real" instruments were used to produce a given piece of music.
By the way, people cannot tell either, they just know that there's a pool of instruments that could be pulled from and work it out from that.

jamison20000e · 07-14-2016, 03:15 PM

Seems like a good spot to add tags, e.g: MIDI, convert, soundfont and maybe frequency?

Beryllos · 07-14-2016, 10:04 PM

Quote:

Originally Posted by hydrurga

My argument however is that musical instruments, and voices, are such rich sources that it *will* be possible to separate them out.

On the simplest level, take the number 11. You are told that it consists of 2 numbers added together and asked to find those numbers. In the absence of any more data, it's impossible. However in music you have both far more complex sources, generating multiple data, and also the ability to look over the entire timeline of the music in order to determine what frequencies and volumes are being generated by the various instruments over the piece. You will also have templates - the usual frequencies generated by a trumpet, say, with minimum and maximum bounds and probabilities.

If we humans can do it (I have for example a very talented friend who can listen to a piece of music and then transcribe the music for any instrument from it), computers will be able to do it. It may take some time however. :-)

Yes, we can be sure a computer will someday do it. The fact that a human can do it means that the information is there. It's just a matter of extracting and sorting it out.

I recently stumbled across a book about human hearing and how it works. It's really complicated. I have also been developing some artificial hearing software but still have much to do and much more to learn.

273 · 07-15-2016, 12:56 AM

Quote:

Originally Posted by Beryllos

Yes, we can be sure a computer will someday do it. The fact that a human can do it means that the information is there. It's just a matter of extracting and sorting it out.

I recently stumbled across a book about human hearing and how it works. It's really complicated. I have also been developing some artificial hearing software but still have much to do and much more to learn.

AS I mentioned, a human can't do it -- humans just compare what they hear to a database they've built up over the years as a computer would. I guarantee that there are instruments that 90% of people haven't heard and even the more common ones can be confused with one another. Again, I'll mentioned Tom Morello's guitar and go on to say that there are synth-written guitar parts which can be indistinguishable from "the real thing".
Then there's MIDI being simply a load of samples from a very finite library.
So, I suppose, yes you could have a program which turns any sound into a very poor MIDI rendition using built-in software synthesizers but you would find that the majority don't sound close to the original at all.

Beryllos · 07-15-2016, 03:01 AM

Quote:

Originally Posted by 273

AS I mentioned, a human can't do it -- humans just compare what they hear to a database they've built up over the years as a computer would. I guarantee that there are instruments that 90% of people haven't heard and even the more common ones can be confused with one another. Again, I'll mentioned Tom Morello's guitar and go on to say that there are synth-written guitar parts which can be indistinguishable from "the real thing".
Then there's MIDI being simply a load of samples from a very finite library.
So, I suppose, yes you could have a program which turns any sound into a very poor MIDI rendition using built-in software synthesizers but you would find that the majority don't sound close to the original at all.

With all due respect to the OP, which was about MIDI, I wasn't talking about MIDI, and I'm aware of its limitations. Rather, hydrurga and I were speculating about whether a machine can, in principle, perform as well as a human (or better) at analyzing music and describing it in terms that would allow instruments to be separated, accurately reproduced, edited, retuned, remixed, transcribed, or whatever the user requires. At the moment, it's science fiction... kind of like cell phones used to be...

Click image for larger version

Name: kirk-communicator.jpg
Views: 25
Size: 23.3 KB
ID: 22464

(click image to magnify)

... but not impossible.

ondoho · 07-15-2016, 12:13 PM

BIG FAT WARNING: THIS IS OT; I'M BECOMING PHILOSOPHICAL!

Quote:

Originally Posted by hydrurga

My argument however is that musical instruments, and voices, are such rich sources that it *will* be possible to separate them out.

On the simplest level, take the number 11. You are told that it consists of 2 numbers added together and asked to find those numbers. In the absence of any more data, it's impossible. However in music you have both far more complex sources, generating multiple data, and also the ability to look over the entire timeline of the music in order to determine what frequencies and volumes are being generated by the various instruments over the piece. You will also have templates - the usual frequencies generated by a trumpet, say, with minimum and maximum bounds and probabilities.

If we humans can do it (I have for example a very talented friend who can listen to a piece of music and then transcribe the music for any instrument from it), computers will be able to do it. It may take some time however. :-)

this and most other posts seem to assume a certain type of music: studio recordings, mostly, with a finite number of tracks (=instruments=voices?) and known instruments.

my understanding is a little different; what if you have a campfire song where you can hardly distinguish guitar & singer from other sounds, like crackling fire, crickets, clinking beer bottles, giggling girls - a human being still can dustinguish the song, and be able to hear & play the song after that.

actually that's an example where i can imagine a computer being somewhat succesful in seperating out guitar and human voice, but it leads me to another point:

what if all the noise is actually intended to be part of the music?

or, noise and music kind of flow into each other, and become indistinguishable?

or, it becomes an important part of the message of the piece whether you use real strings or synth strings?

or, the sometimes uncanny capability of people to recognize the slightest accent in the speech of others?

i think computers & software are still very, very far away from dealing with something like that.

illustrations, randomly chosen from my own sonic universe:
https://www.youtube.com/watch?v=nDoBIcwnwcI
https://www.youtube.com/watch?v=NHaXPBL1I-I
https://www.youtube.com/watch?v=YPZg...C30D4282816F8B
https://www.youtube.com/watch?v=kn14Rq8sUAg
https://www.youtube.com/watch?v=XZUbBfPpg-w

273 · 07-15-2016, 01:31 PM

Sorry, yes, computers can, to a degree, and will be able to even more distinguish between known instruments better than people. Human patter recognition sometimes gets side-tracked and the human understanding of probability is such that they conspire to make hum recognition of edge cases more problematic.

Shadow_7 · 07-15-2016, 02:29 PM

If all you need is a conversion of frequencies (.wav) to musical events (.mid) you can do that now. You'll end up with a piano score of sorts (not to imply playable by human hands on a piano). But every instrument will be converted to a single midi instrument / track. With human intervention you can break out the notes into parts for multiple instruments. It's not exactly an original score as the key signatures and meters might be omitted or vary drastically from the original work. But the notes will for the most part be there. To include wrong notes generated during the performance. And any incidental sounds (harmonics) created when more than one instrument plays in tune. It's not an exact science, but you can extract the "jist" of it programmatic-ally. Now if you have a studio master and each instrument was mic'd individually, the accuracy improves greatly. But by no means an automated process and most of the conversion tools in linux still suck for the most part. Although some of the ones to convert images to music are interesting.

jamison20000e · 07-16-2016, 01:05 PM

Further OT:

I feel philosophically once AI is born and graduating college,,, can still fool any*.

Off any and all topics:
frequencies seemingly trailing a 0 does so infinitely thus won't remain that 0...

Quote:

Did you hear about the jurisprudence fetishist? He got off on a technicality.

How easy is it to count in binary? It’s as easy as 01 10 11.

Sorry, got that out of my system now and on to yours.