Further Reading =============== 1. A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-Core Systems [1]_ 2. SCOP: Scientific Control for Reliable Neural Network Pruning [2]_ 3. Searching for Low-Bit Weights in Quantized Neural Networks [3]_ 4. GhostNet: More Features from Cheap Operations [4]_ 5. AdderNet: Do We Really Need Multiplications in Deep Learning? [5]_ 6. Blockwise Parallel Decoding for Deep Autoregressive Models [6]_ 7. Medusa: Simple framework for accelerating LLM generation with multiple decoding heads [7]_ 8. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning [8]_ .. [1] https://proceedings.mlsys.org/paper/2021/file/a5e00132373a7031000fd987a3c9f87b-Paper.pdf .. [2] https://arxiv.org/abs/2010.10732 .. [3] https://arxiv.org/abs/2009.08695 .. [4] https://arxiv.org/abs/1911.11907 .. [5] https://arxiv.org/abs/1912.13200 .. [6] https://arxiv.org/abs/1811.03115 .. [7] https://www.together.ai/blog/medusa .. [8] https://arxiv.org/abs/2307.08691