Direct Preference Optimization (DPO)