Skip to content

Conversation

@itea1001
Copy link
Contributor

added DefaultGenerationContinuous and DefaultUpdateContinuous
new reward: \frac{1}{1+MSE} + exploration
criteria for "wrong example": |pred - actual_value| > given threshold

@itea1001
Copy link
Contributor Author

reward changed to:
$\frac{1}{n} \sum_{i=1}^n (reward_a - reward_b (y_i - \hat{y_i})^2) + \alpha \sqrt{\frac{\log t}{n}}$

(need to set $reward_a$ and $reward_b$ for DefaultGenerationContinuous and DefaultUpdateContinuous)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants