1 min readfrom Machine Learning

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets turn up basically no results, despite its announcement including a new training speed record for Cifar-10. In my experience faster training usually comes with better final models, so what's the deal? Does it not actually scale? Have I missed papers?

submitted by /u/lukeiy
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#rows.com
#financial modeling with spreadsheets
#Muon
#Transformers
#LLM training
#ConvNets
#training speed
#Cifar-10
#faster training
#final models
#scaling
#adoption
#model performance
#speed record
#research papers
#machine learning
#data efficiency