A remediаtiоn plаnner cоmpаres three actiоns: patch a VPN service, segment an engineering subnet, and rotate a shared local administrator password. The greedy optimizer recommends rotation first because it yields the highest BRS reduction per unit cost in the current graph. A team lead argues that the first greedy choice proves the whole plan is globally optimal and should be executed without review. Evidence packet: patching costs 2 and reduces average BRS by 1.0 in isolation; segmentation costs 5 and reduces average BRS by 2.6 in isolation; credential rotation costs 1 and reduces average BRS by 1.1 in isolation. When segmentation and rotation are combined, the second action has diminishing returns because both affect the same path family. The change board requires a rollback owner for any first action. Select all recommendations that should survive review.
An MDP stаte vectоr fоr а trаining envirоnment tracks compromised hosts, accessible hosts, discovered services, discovered vulnerabilities, credential store, current host, step count, and total reward. A proposed policy assigns a large positive reward to `exfiltrate_proof` whenever a critical host is reachable, even if the host is not compromised and the proof action's preconditions are false. The team argues that reward shaping will teach the agent to find a legal path eventually. Evidence packet: `exfiltrate_proof` has preconditions `host_compromised=true`, `proof_allowed=true`, and `target_in_scope=true`; the proposed policy sets only `target_reachable=true`. The current state has `accessible_hosts={H7}`, `compromised_hosts={}`, and `critical_hosts={H7}`. The team must decide whether to solve this in reward shaping, action masking, or state representation. Select all recommendations that should survive review.