•1 min read•from Machine Learning
[R] Literature on optimizing user feedback in the form of Thumbs up/ Thumbs down?
I am working in a project where I have a dataset of model responses tagged with "thumbs up" or "thumbs down" by the user. That's all the info I have and I cannot pop up new generations to the user, I have to make use only of the dataset.
Is there any literature on the best ways to evaluate the model who generated those responses and/or fine tune the model?
The most obvious thing I can think of is calculating the % of responses that got thumbs up for performance, and for fine tuning training a reward model on the dataset I have and later applying RLHF to the model.
Is there any publication exploring some better ways of doing that?
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#large dataset processing
#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#big data performance