Efficient Attention-based Model, memory efficiency, Mixture of Experts, sequence modeling, sparse attention, transformer.