Are you actually using the learned intrinsic reward for the agent?

Hi,

I can only see that you optimize the intrinsic loss in your code. Can you point me to the line where you add the intrinsic rewards to the actual environment/extrinsic rewards?

In some areas of your code I can see comments like 
`# total reward = int reward`
which would, according to the original paper, be wrong, no?  


Thank you.