Scaling Neural DNA to GPT-2
354 parameters wire a language model. 99,970:1 compression. Beats GPT-2 on 3 benchmarks.
machine-learningneural-architecturesparsitygpt2transformerpytorch
deep dives into topics that fascinate me - organized by theme.
354 parameters wire a language model. 99,970:1 compression. Beats GPT-2 on 3 benchmarks.
A compact genome for growing neural network architecture