Further Reading
===============

1. A Distributed Graph-Theoretic Framework for Automatic Parallelization
   in Multi-Core Systems [1]_

2. SCOP: Scientific Control for Reliable Neural Network Pruning [2]_

3. Searching for Low-Bit Weights in Quantized Neural Networks [3]_

4. GhostNet: More Features from Cheap Operations [4]_

5. AdderNet: Do We Really Need Multiplications in Deep Learning? [5]_

6. Blockwise Parallel Decoding for Deep Autoregressive Models [6]_

7. Medusa: Simple framework for accelerating LLM generation with
   multiple decoding heads [7]_

8. FlashAttention-2: Faster Attention with Better Parallelism and Work
   Partitioning [8]_

.. [1]
   https://proceedings.mlsys.org/paper/2021/file/a5e00132373a7031000fd987a3c9f87b-Paper.pdf

.. [2]
   https://arxiv.org/abs/2010.10732

.. [3]
   https://arxiv.org/abs/2009.08695

.. [4]
   https://arxiv.org/abs/1911.11907

.. [5]
   https://arxiv.org/abs/1912.13200

.. [6]
   https://arxiv.org/abs/1811.03115

.. [7]
   https://www.together.ai/blog/medusa

.. [8]
   https://arxiv.org/abs/2307.08691