This paper discusses techniques to optimize code for platforms running Linux on Alpha processors. It is based on four years of experience with the Alpha architecture and uses many real-world examples to illustrate the practicality and importance of the techniques. The primary lesson from this experience is that, for many applications, the memory system, and not the processor itself, is the primary bottleneck. For this reason, most techniques are targeted at avoiding the memory system bottleneck. Since the gap between processor and memory system speed is large, these techniques achieve performance improvements of up to 1700%. While the focus is on the Alpha architecture, many of the covered techniques are readily applicable to other RISC processors and even modern CISC CPUs.