IMHO, in this case you are "comparing apples and oranges." Instead of pursuing that line of reasoning, which is more-or-less fruitless, I suggest that you should think about finding an alternative metric which is "clearly measurable, and clearly relevant to what I want to do."
"DISK IOPS," quite clearly, is a metric that is entirely dependent upon all of the (many ...) differences that you list in your original post. Therefore, it is tightly bound to every one of them, such that a "DISK IOPS" (or whatever ...) measurement, taken from any one of these scenarios, is pretty well incomparable to a like-named measurement taken from any other one.
Therefore, you need to step back and think of: "what else can I measure?" And, to do that, you need to ask: "What is it, really, that I need (not 'to know,' but ...) to decide?"
Re-frame your question in terms of the business requirement: of what this system, ultimately, must be able to do. (Consider not only "raw performance, when everything's working perfectly," but also, "what if this-or-that disk drive throws a piston?") Instead of anchoring your inquiry upon "an abstract physical metric (such as DISK OPS)," consider the requirements of the application first, and then weigh the various "abstract implementation-specific(!) physical metrics" in relation to it. "The cart pays the bills. The horse is just a means to an end."