Results 1 - 10
of
38,685
Table 2. Complexity Classes of Resource Allocation, n = size of task set #02, m = size of operation set #0A
"... In PAGE 14: ... We define different categories of difficulties of the problem and present complexity results for them. Table2 summarizes these complexity results. To address these formalized problems, we define the notion of Dynamic Distributed Constraint Satisfaction Problem (DyDCSP) and present a generalized mapping from distributed resource allocation to DyDCSP.... ..."
Table 1: Resource allocation used
2004
"... In PAGE 40: ...2 Syntax For the moment, agent programming takes the form of a set of annotations that result in assembly-level transformations. These transformations are not yet implemented in a compiler, they are performed by hand, but the corresponding static transformations AP Annotation Semantic // AP divide Divide agent (before loops and procedure calls) // AP shared Atomic access to variable (before statement using variable) // AP reduction Variable for storing the result of a reduction (before statement using variable) Table1 : Annotations for agent programming. are straightforward and can be automated.... In PAGE 60: ... Con- Fetch queue size 64 Branch predictor comb. of bimodal and 2-level Bimodal predictor size 2048 Level 1 predictor 1024 entries, history 10 Level 2 predictor 4096 entries BTB size 2048 sets, 2-way Branch mispredict penalty at least 12 cycles Fetch width 8 (across up to 2 basic blocks) Dispatch and commit width 16 Issue queue size 15 per cluster (int and fp, each) Register file size 30 per cluster (int and fp, each) Re-order Buffer size 480 Integer ALUs/mult-div 1/1 (in each cluster) FP ALUs/mult-div 1/1 (in each cluster) L1 I-cache 32KB 2-way L1 D-cache 32KB 2-way set-associative, 6 cycles, 4-way word-interleaved L2 unified cache 2MB 8-way, 25 cycles I and D TLB 128 entries, 8KB page size Memory latency 160 cycles for the first chunk Table1 . Simplescalar simulator parameters.... In PAGE 66: ...ith in this paper. Using 0.13 micron technology, we have assumed features such as a clock rate of 3 GHz, a relatively large L2 cache, and other parameters scaled up accordingly. Our parameters for a single-context superscalar processor, SMT processor, and two CMP configurations are shown in Table1 . Our models assume the same parameters for SMT as the single- context superscalar and CMP resources are mostly ... In PAGE 67: ... Table1 . Processor parameters.... In PAGE 81: ...13a5 0.13 a5 Machine Width 4 wide fetch, 4 wide issue, 4 wide commit Window Size 128 entry RUU 64 entry RUU 64 entry load/store queue 32 entry load/store queue Branch Misprediction Latency 19 cycles 12 cycles L1 Icache 16K, 4-way 16K, 4-way 32 byte lines 32 byte lines 2 cycle latency 3 cycle latency L1 Data Cache 8K, 4-way 16 K, 4-way 32 byte lines 32 byte lines 2 cycle latency 3 cycle latency L2 Combined 512K, 8-way 512K, 8-way (Shared) 128 byte lines 128 byte lines 10 cycle latency 7 cycle latency Memory 128 bit wide 128 bit wide 92 cycle latency 41 cycle latency BTB 4096 entry, 4-way set-associative 512 entry, 4-way set-associative 32 entry return address stack 32 entry return address stack TLB 128 entry (I), 128 entry (D) 64 entry (I), 64 entry (D) 4-way set-associative 4-way set-associative 30 cycle miss latency 30 cycle miss latency Functional Units and 2 Int ALU (1/1), 1 Int Mult (2/2) / Div(2/2) 1 Int ALU (1/1), 1 Int Mult (2/2) / Div(2/2) Latency (total/issue) 4 Load/Store (2/1), 1 FP Add (5/3) 2 Load/Store (2/1), 1 FP Add (5/3) 1 FP Mult (6/5) / Div (6/5) / Sqrt (6/5) 1 FP Mult (6/5) / Div (6/5) / Sqrt (6/5) Table1 : Simulation Parameters 5 ms. Processing is returned to Core 1 and the pro- cess repeats itself.... In PAGE 89: ... Data shown in Table 1 from a previous study [Cameron99Tutorial] indicates that for some processors, counters are essentially asymptotically accurate, reaching a steady-state value which is close to actual event counts at large granularities. Results such as those in Table1 are sufficient for coarse-grain profiling or averaging, but would introduce a large amount of noise into a phase detection system for optimization or adaptation. Another use of performance counters is in software testing, particularly for isolation of performance bugs.... In PAGE 89: ... [Zagha96SC] Event Generator Event Generator Event Generator Event Generator Central Event Collector (Monitor Unit) Custom Routing Figure 1: Current performance monitoring. Table1 : Counter inaccuracy. [Cameron99Tutorial] 99,054 100,097 100,000 10,055 9,997 10,000 54 950 1,000 54 957 100 53 956 10 Meas.... In PAGE 99: ...Previous Stream Previous Stream Current Address hash Address Indexed Table Path Indexed Table Decoded Address Tag Length Next Stream Hysteresis Tag Length Next Stream Hysteresis Decoded Length Decoded Valid (2nd Level) (1st Level) Figure 2. The cascaded design of the next stream predictor Table1 . Configuration of the simulated processors 4-wide processor 8-wide processor fetch width 4 instructions 8 instructions rename/commit width 4 instructions 8 instructions integer issue width 4 instructions 8 instructions floating point issue width 4 instructions 8 instructions load/store issue width 2 instructions 4 instructions fetch target queue 4 entries 4 entries integer issue queue 32 entries 64 entries floating point issue queue 32 entries 64 entries load/store issue queue 32 entries 64 entries reorder buffer 128 entries 256 entries integer registers 96 160 floating point registers 96 160 L1 instruction cache 64 Kbytes, 2-way associative, (4*fetch width) byte block, 3 cycle latency L1 data cache 64 Kbytes, 2-way associative, 64 byte block, 3 cycle latency L2 unified cache 1 Mbyte, 4-way associative, 128 byte block, 16 cycle latency main memory latency 350 cycles 1024 entry, 4-way associative first level next stream predictor 4096 entry, 4-way associative second level... In PAGE 100: ... We simulate two processor setups, a 4-wide and an 8- wide superscalar processor, both having a 20-stage pipeline. The main values of these setups are shown in Table1 . Our first level instruction cache uses wide cache lines, that is, four times the processor fetch width, as described in [12].... In PAGE 107: ... For an overriding perceptron, all partial sums in ight in the pipeline need to be checkpointed. See Table1 for the formulas used to determine the amount of state to be checkpointed. Since the partial sums are distributed accross the whole predictor in pipeline latches, the checkpointing tables and associated circuitry must also be distributed.... In PAGE 114: ... The wire length, L, is a function of the number of func- tional units being bypassed while the resistance (Rmetal) and capacitance (Cmetal) per unit length remain constant. Plugging in their parameter values and wire length estimations into this equation produces delays for various bypass widths, shown in Table1 . Using their assumption of non-scalability, we use this as the bypass delay at 180nm and 90nm.... In PAGE 114: ... From these numbers, it is evident that the length term is dominant as the delay grows exponen- tially with more bypassed units. Table1 . Calculated bypass delays for various processor widths.... ..."
Table 8 Some results of the timing evaluation with static allocation of hardware resources.
"... In PAGE 19: ... NVP: Yc = YE + Yd = max{E1, E2, E3, E4} + Yd . NVP-TB: Yc = Yc1 = YF1 + Yd , with probability p1NVP- TB Yc2 = YF2 + Yd , YF1 + 2Yd , bracelefttbt with probability (1 - p1NVP- TB )d with probability (1 - p1NVP- TB )(1 - d ) bracelefttbt Figure 9 and Table8 show the results of the numerical example analyzed. nvp-tb rb scop nvp (a) time for one execution (in milliseconds) = 1/50 l w (b) nvp nvp-tb rb scop time for one execution (in milliseconds) = 1/5 l w time for one execution (in milliseconds) nvp nvp-tb scop rb (c) = 2 l W Figure 9 Distribution of Yc under static allocation of hardware resources:... ..."
Cited by 1
Table 2, display a decrease in the multiplexer area when large resource bags are used. This gain over the dual port memory mapping technique [1] is due to the use of more banks. Since the allocated variables were distributed over more memories, fewer multiplexers were needed at memory inputs. Thus, the best results appeared with larger resource bags and reached up to 30 % gain in multiplexer area. Knowing that with the distributed technique fewer variables are allocated per memory and more memories are used, it is expected that more multiplexers will be required at the inputs of the functional units. Accordingly, with the smallest resource bags, the results show a loss (up to 72%) in the total multiplexer area. With more functional units available, more operations are executed in parallel and fewer multiplexers are required.
"... In PAGE 8: ...Table 1: Gain over Register Binding Using Left Edge Algorithm + x # of Memories MUX Area Total Area Register Area MUX Area Total Area MUX gain total gain Dual Port Memory Mapping [1] Left Edge Register Binding % Area Gain 1 1 20 400 1160 320 880 1960 55 41 4 16 20 752 12552 320 800 12920 6 3 Distributed Mapping (32 memories) Left Edge Register Binding % Area Gain 1 1 32 688 1448 320 880 1960 22 26 4 16 32 608 12408 320 800 12920 24 4 Single R/W port Memory Mapping Left Edge Register Binding % Area Gain 1 1 20 496 1256 320 880 1960 44 36 4 16 36 656 12456 320 800 12920 18 4 Multi R/W Memory Mapping (5 R/W ports) Left Edge Register Binding % Area Gain 1 1 4 288 1048 320 880 1960 67 47 1 16 8 288 11998 320 864 12894 67 7 4 16 8 688 12488 320 800 12920 14 3 Multi Read Multi Write Port Memory Mapping (3 Read / 2 Write ports) Left Edge Register Binding % Area Gain 1 1 10 304 1064 320 880 1960 65 46 4 16 10 784 12584 320 800 12920 2 3 Table2 : Dual Port vs. Distributed Dual Port Memory Mapping ... ..."
Table 1: Resource Allocation Types Resource Allocation
Table 2: Resource query in superscheduling systems
2006
"... In PAGE 8: ...f resources available. The distributed ocking is based on the P2P query mechanism. Once the job is migrated to the remote pool, basic matchmaking [89] mechanism is applied for resource allocation. In Table2 , we present RLQ and RUQ queries in some well-known superscheduling systems. 3.... ..."
Cited by 2
Table 2: Resource query in superscheduling systems
2007
"... In PAGE 9: ...f resources available. The distributed flocking is based on the P2P query mechanism. Once the job is migrated to the remote pool, basic matchmaking [94] mechanism is applied for resource allocation. In Table2 , we present RLQ and RUQ queries in some well-known superscheduling systems. 2.... ..."
Table 3: Resource Allocation for Self-Limiting Applications with Nsim src = 1. to O(1) in star networks. In contrast, the shared reserva- tion style has an advantage of n2 in all networks with acyclic distribution meshes. Observe also that the results for the Shared and Independent reservation styles are consistent with the intuition that the resource requirements of Inde- pendent scale as O(nL) whereas those of Shared scale as O(L).
1994
Cited by 18
Table 2: Resource query in superscheduling systems
2006
"... In PAGE 9: ...esources available. The distributed flocking is based on the P2P query mechanism. Once the job is migrated to the remote pool, basic matchmaking [90] mechanism is applied for resource allocation. In Table2 , we present RLQ and RUQ queries in some well-known superscheduling systems. 3.... ..."
Cited by 2
Table 7. Resource Allocation Constraints
"... In PAGE 9: ... However, they are often named differently, as Table 6 shows. Table7 shows the constraints for the Resource Allocation models of Staffware, FileNet and FLOWer. Table 6.... ..."
Results 1 - 10
of
38,685