To make a long story short, I have two physical servers hosting two almost identical VMs, but one of these scales very badly in some workloads. And only some, not always in all workloads and even the problematic workloads work e.g. always after restarts. It starts to not work anymore after some time only. I'm unable to reproduce this problem in the other VM and am now comparing differences of the output of "sysctl -a". One of those differences addresses the task/kernel/cpu scheduler.
So I'm wandering where those different values come from initially?
E.g. if they are calculated, which facts they depend on, maybe the VM-host, if they change on runtime automatically etc. Estimations about if these concrete differences in my values could have any reasonable impact on overall scaling of a system most likely are welcome as well.
Good vs. Bad VM:
Code:
--- C:/Users/tschoening/Desktop/Good VM.txt Mi 18. Apr 19:24:47 2018
+++ C:/Users/tschoening/Desktop/Bad VM.txt Mi 18. Apr 19:24:44 2018
@@ -8,3 +8,3 @@ kernel.sched_domain.cpu0.domain0.imbalance_pct = 1
-kernel.sched_domain.cpu0.domain0.max_interval = 4
-kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 75519
-kernel.sched_domain.cpu0.domain0.min_interval = 2
+kernel.sched_domain.cpu0.domain0.max_interval = 16
+kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 155384
+kernel.sched_domain.cpu0.domain0.min_interval = 8
@@ -15 +15 @@ kernel.sched_domain.cpu0.domain0.wake_idx = 0
-kernel.sched_latency_ns = 12000000
+kernel.sched_latency_ns = 24000000
@@ -17 +17 @@ kernel.sched_migration_cost_ns = 500000
-kernel.sched_min_granularity_ns = 1500000
+kernel.sched_min_granularity_ns = 3000000
@@ -25 +25 @@ kernel.sched_tunable_scaling = 1
-kernel.sched_wakeup_granularity_ns = 2000000
+kernel.sched_wakeup_granularity_ns = 4000000
Good VM:
Code:
kernel.sched_domain.cpu0.domain0.busy_factor = 32
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4143
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 1
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 4
kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 75519
kernel.sched_domain.cpu0.domain0.min_interval = 2
kernel.sched_domain.cpu0.domain0.name = DIE
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0
kernel.sched_latency_ns = 12000000
kernel.sched_migration_cost_ns = 500000
kernel.sched_min_granularity_ns = 1500000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 25
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 2000000
Bad VM:
Code:
kernel.sched_domain.cpu0.domain0.busy_factor = 32
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4143
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 1
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 16
kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 155384
kernel.sched_domain.cpu0.domain0.min_interval = 8
kernel.sched_domain.cpu0.domain0.name = DIE
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0
kernel.sched_latency_ns = 24000000
kernel.sched_migration_cost_ns = 500000
kernel.sched_min_granularity_ns = 3000000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 25
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 4000000