9.8. Further Reading¶

A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-Core Systems 1
SCOP: Scientific Control for Reliable Neural Network Pruning 2
Searching for Low-Bit Weights in Quantized Neural Networks 3
GhostNet: More Features from Cheap Operations 4
AdderNet: Do We Really Need Multiplications in Deep Learning? 5
Blockwise Parallel Decoding for Deep Autoregressive Models 6
Medusa: Simple framework for accelerating LLM generation with multiple decoding heads 7
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning 8

1: https://proceedings.mlsys.org/paper/2021/file/a5e00132373a7031000fd987a3c9f87b-Paper.pdf
2: https://arxiv.org/abs/2010.10732
3: https://arxiv.org/abs/2009.08695
4: https://arxiv.org/abs/1911.11907
5: https://arxiv.org/abs/1912.13200
6: https://arxiv.org/abs/1811.03115
7: https://www.together.ai/blog/medusa
8: https://arxiv.org/abs/2307.08691