8 Conclusion The trend towards larger DRAM devices exacerbates the processor / memory bottleneck, requireing costly cache hierarchies to effectively support high performance microprocessors. A viable alternative is to move the processor closer to the memory, by integrating it onto the DRAM chip. Processor / memory integration is advantageious, even if it requires the use of a simpler processor. It was shown that a conventional, single scalar processor with a small cache, integrated with a 256 Mbit DRAM array can form a selfcontained, general purpose processing element with comptetitive performance that can approach that of high-end superscaler processors with large, multilevel caches. Small (8k) direct mapped instruction caches with long lines (512 bytes) perform surprizingly well with a