1.4M 4 months ago

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b