Maintaining subjects freedom to decide imposes structure and constraints on systems that seek to help inform those decisions. Two natural sources to learn to make good decisions are past experiences and advice from others. Both are affected by subjects freedom to ultimately act as they wish; giving rise to learning theoretic and game theoretic repercussions respectively.
To study the first we extend the standard bandit setting: after the algorithm picks an action, the subject may carry out a different one and then this is observed along with the reward. Algorithms whose choice of action are mediated by a subject, can gain from awareness of the subjects' actual actions, which we term compliance awareness. We prove that algorithms that do not have access to a compliance can have unbounded regret relative to those which do when action spaces are growing, or actions' rewards regularly shift. We present algorithms that take advantage of compliance awareness, while maintaining worst case regret bounds up to multiplicative constants. We study their empirical finite sample performance on synthetic and real data from clinical trials. We draw connections to a broader machine learning literature on "learning under privileged information", and show how its natural bandit instantiation is a generalization of the compliance aware bandit we term bandits with hindsight. Where, instead of a compliance variable which maps to the action space, arbitrary variables observed after the action has been chosen by the algorithm can be taken into account.
To study the advice of others, we consider the literature on incentivizing multiple experts by a decision maker that will take an action and receive a reward about which the experts may have information. Existing mechanisms for multiple experts are known not to be truthful, even in the limited sense of myopic incentive compatibility, unless the decision maker renounces their ability to always take the best ex-post action, and commits to a randomized strategy with full support. We present a new class of algorithms based on second price auctions that maintain subjects freedom. Experts submit their private information, and the algorithm auctions of the rights to a share of the reward of the subject, allowing the winner to select an action; voiding the market if the action is not taken by the decision maker. We show several situations in which existing mechanisms fail and this one succeeds. We also consider strategic limitations of this mechanism beyond the myopic setting and practical considerations in its implementation in real institutions