5 citations found. Retrieving documents...
Lok T. Liu and David E. Culler. Evaluation of the Intel Paragon on active message communication. In Proceedings of Intel Supercomputer Users Group Conference, 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....program to communicate without kernel intervention and is supported in recent low latency communication hardware [BCF 95,BLA 94, Hor95,OZH 96] Fast access to the hardware by itself is not enough to achieve low latencies. As the experience from message passing multicomputers has shown [vECGS92, PLC95, vEBBV95] traditional messaging interfaces have not been able to realize the hardware performance due to fixed software overheads associated with sending and receiving messages. Therefore, a newer generation of low overhead messaging interfaces attacked the software overheads. Some software ....

....associated with sending and receiving messages. Therefore, a newer generation of low overhead messaging interfaces attacked the software overheads. Some software architectures originated in the networking community [DP93,Osb94] while others arose from the multicomputer community [vECGS92,PLC95, HGDG94] The emergence of system area networks and networks of workstations have blurred the distinction. In general however, the latter have been more preoccupied with low latencies than the former. Among the key proposals that emerged from the multicomputer community have been the Berkeley ....

[Article contains additional citation context not shown here]

Lok T. Liu and David E. Culler. Evaluation of the Intel Paragon on active message communication. In Proceedings of Intel Supercomputer Users Group Conference, 1995.


Design and Evaluation of Network Interfaces for System Area.. - Mukherjee (1998)   (Correct)

....cache. I expect that each user process will negotiate at least two CQs one to send messages and the other to receive messages from the CNI with the CNI. A key advantage of CQs is that they simplify the reuse handshake and amortize its overhead over the entire queue of blocks. Liu and Culler [70] used cachable queues to communicate small messages and control information between the compute processor and message processor in the Intel Paragon. This section focuses on how a single user processor can use CQs to communicate messages directly from a network interface device. I first describe ....

....Three Message Four Header for Message One Header for Message Four cache blocks 53 The pseudo code for enqueue and dequeue with the message valid bit would be as follows: Message valid bits are not new. The T NG network interface [22] supports uncached message valid bits. Liu and Culler [70] used cached message valid bits in their Paragon Active Message implementation. 1 The Scalable Coherent Interface [116] optionally supports a primitive called QOLB (queue on lock bit) directly in the coherence protocol. This lock bit (per cache block) could be used as a cached message valid bit. ....

Lok Tin Liu and David E. Culler. Evaluation of the Intel Paragon on Active Message Communication. In Proceedings of Intel Supercomputer Users Group Conference, June 1995.


Address Translation Mechanisms in Network Interfaces - Schoinas, Hill (1998)   (9 citations)  (Correct)

.... of information technology [17] Even today, studies have shown that network protocols spend a significant amount of time simply copying data [49] Therefore, many designs have attempted to avoid redundant copying at the application interface [13,42,56] the OS [25] and the network interface [36,11,26,1]. To push the envelope of possibilities, we ask whether it is possible to efficiently implement messaging with no extra copying where message data are only copied out of sender s data structures into the sender s NI and from the receiver s NI to the receiver s data structures (the data should ....

....proportional to the number of entries in the cache set that we must examine sequentially to find a match. Alternatively, we can consider hardware support for the lookup as in designs with a network coprocessors that include their own memory management unit and address translation hardware (TLBs) [1,26]. Such hardware structures should have high associativity with relative few entries ( tens) and zero lookup overhead. When a mapping is invalidated (paging activity, process termination) throughout the system, the host CPU must flush the entry out of the NI page tables, This is an operation that ....

[Article contains additional citation context not shown here]

Lok T. Liu and David E. Culler. Evaluation of the Intel Paragon on Active Message Communication. In Proceedings of Intel Supercomputer Users Group Conference, 1995.


Mechanisms for Efficient, Protected Messaging - Lee   (Correct)

....architecture like the M Machine, the timeout value must be chosen carefully, so that a thread is unlikely to be interrupted prematurely just because of the normal variabilities in its execution timings. Flexibility is a major factor for the many implementations of the Active Message interface [35, 57, 23]. Robustness however is not. The protocol uses integer tags to match up messages with their intended destination nodes. This reduces the likelihood for an inadvertently misdelivered message to be accepted at the destination, but can give no guarantees due to the unprotected integer tags. In any ....

Lok T. Liu, David E. Culler, "Evaluation of the Intel Paragon on Active Message Communication", in Proceedings of Intel Supercomputer users Group Conference, 1995.


Coherent Network Interfaces for Fine-Grain Communication - Mukherjee, Falsafi, al. (1996)   (25 citations)  (Correct)

....managed as a queue. CQs are a general mechanism that can be used to communicate messages between two processor caches or a processor cache and a device cache. A key advantage of CQs is that they simplify the reuse handshake and amortize its overhead over the entire queue of blocks. Liu and Culler [31] used cachable queues to communicate small messages and control information between the compute processor and message processor in the Intel Paragon. We show how CQs can be used to communicate directly between a processor and a network interface device. We first describe the basic queue operation, ....

....tail gets invalidated. Thus in the worst case, each message arrival causes a cache miss on tail. Instead, we use message valid bits stored either as a single bit in the message header or in a separate word to allow the receiver to detect message arrivals without ever checking the tail pointer [10, 31]. The valid bits indicate whether a cache block contains a valid message, or not. On a poll attempt, the receiver simply examines the first message in the queue (i.e. the one pointed to by head) if it s invalid, the queue is empty. Thus no bus traffic normally occurs in this case. When a valid ....

Lok Tin Liu and David E. Culler. Evaluation of the Intel Paragon on Active Message Communication. In Proceedings of Intel Supercomputer Users Group Conference, June 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC