60 citations found. Retrieving documents...
M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, no. 2-3, pp. 279--303, 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Effective Reinforcement Learning for Mobile Robots - Smart, Kaelbling (2002)   (6 citations)  (Correct)

....based only on LWA, and to be robust across a number of test domains [6] In all of the work presented here, we use Hedger as part of our Q learning implementation. Previous work has generally solved this problem either by using domain knowledge to create a good discretization of the state space [9] or by hierarchically decomposing the problem by hand to make the learning task easier [10] The other main problem is that of incorporating prior knowledge into the learning system. The only way that Q learning can find out information about its environment is to take actions and observe their ....

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, pp. 279--303, 1996.


The CMUnited-98 Champion Small-Robot Team - Veloso, Bowling, Stone (1999)   (6 citations)  (Correct)

....view of the world state. This setup may simplify the sharing of information among multiple agents, but it also presents a challenge for reliable and real time processing of the movement of multiple mobile objects, namely the ball, five robots on our team, and five robots on the opponent s team [3, 4, 1]. This article presents the main technical contributions of our CMUnited 98 small robot team. It focuses on the problems of motion control and the robots strategy. The vision processing for the see http: www.robocup.org RoboCup Figure 2: The CMUnited 98 robots. team is the same system used ....

Minoru Asada, Shoichi Noda, Sukoya Tawaratumida, and Koh Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Reactive Visual Control of Multiple Non-Holonomic Robotic Agents - Han, Veloso (1998)   (3 citations)  (Correct)

....with on board and off board types have appeared in recent years. All have found that the reactiveness of soccer robots requires a vision system with a high processing cycle time. However, due to the rich visual input, researchers have found that dedicated proces sors or even DSPs are often needed [2, 7]. Our current system uses a frame grabber with frame rate transfer from a 3CCD camera. A relatively slow processor (166MHz Pentium) was at the heart of the vision sys tem, performing all vision computations. The RoboCup rules specify well defined colors for different objects in the field and ....

Minoru Asada, Shoichi Noda, Sukoya Tawaratu- mida, and Koh Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279303, 1996.


A Multi-Agent Architecture Integrating Learning.. - Busquets..   (Correct)

....values so that RL can detect progress in moving toward (or away from) the goal. If the actions operate at a finer grain than the features can represent, most actions appear to leave the state unchanged, and learning becomes impossible. This has been termed the state action deviation problem [2]. It can be fixed by using more levels of discrete values for each feature, but this slows learning by requiring that the robot perform more interactions with the environment. Asada et al. suggest another solution in which self loops (cases where the state is unchanged) are ignored in the ....

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


A Neural Field Approach to Topological Reinforcement.. - Gross, Stephan, Krabbes (1998)   (3 citations)  (Correct)

....the mean values of seven independent Q learning experiments, each of them consists of 35 docking trials. trials that are necessary to learn the docking behavior is really an encouraging result, since this type of sequential decision making problems theoretically has exponential time complexity [2]. Therefore, it is a common practice to modify the learning scheme in such a way that the robot is started in very easy initial situations close to the goal. Later on, the initial situations are shifted into more and more difficult ones. Asada called this simplifying scheme Learning from Easy ....

....it is a common practice to modify the learning scheme in such a way that the robot is started in very easy initial situations close to the goal. Later on, the initial situations are shifted into more and more difficult ones. Asada called this simplifying scheme Learning from Easy Missions (LEM) [2]. For a couple of reasons (e.g. real world applicability) we did not apply this kind of biased Q learning in our experiments. That we nevertheless could achieve such learning results can be just explained by the generalization ability of our topological Q learning. This learning is localized and ....

[Article contains additional citation context not shown here]

M. Asada, Sh. Noda, K. Hosoda. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning, Machine Learning, 23 (1996) 2&3, 279-303.


Reinforcement Learning for Landmark-based Robot Navigation - Busquets..   (Correct)

....reinforcement learning can detect progress in moving toward (or away from) the goal. If the actions operate at a finer grain than the features can represent, most actions appear to leave the state unchanged, and learning becomes impossible. This has been termed the state action deviation problem [2]. It can be fixed by using more levels of discrete values for each feature, but this slows learning by requiring that the robot perform more interactions with the environment. Asada et al. suggest another solution in which self loops (cases where the state is unchanged) are ignored in the ....

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Reinforcement Learning for Landmark-based Robot.. - Busquets..   (Correct)

....reinforcement learning can detect progress in moving toward (or away from) the goal. If the actions operate at a finer grain than the features can represent, most actions appear to leave the state unchanged, and learning becomes impossible. This has been termed the state action deviation problem [2]. It can be fixed by using more levels of discrete values for each feature, but this slows learning by requiring that the robot perform more interactions with the environment. Asada et al. suggest another solution in which self loops (cases where the state is unchanged) are ignored in the ....

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Multi-Agent Systems by Incremental Gradient Reinforcement .. - Dutech, Buffet.. (2001)   (1 citation)  (Correct)

.... dir(agent) #4 dir(yellow bloc) #8 dir(blue bloc) #8 near(yellow bloc) #2 near(blue bloc) #2 total # 1024 4 (MDP with 2 agents and 2 cubes for an 8 # 8 world: 15.248.024 states ) 15 21 # # # # # # # 1 st step: 2 agents and 2 cubes Problem s reduction [ANTH96] Goal: to begin learning with simple cases. size reduced environments, typical situations. Task progression: example # Start n moves N tries 6 1500 10 1500 100 150 16 21 # # # # # # # Results (2a2c) 0 20 40 60 80 100 0 100 200 300 400 500 600 number of merges (for ....

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Learning a Door-Reaching Behavior Using Visual Information - Cicirelli, D'Orazio.. (2000)   (Correct)

....sonar sensor, odometry and proximity sensors have been used to solve elementary behavior however limited only to local tasks. Visual sensors, instead, can be more useful since they are able to detect distant goals and permit the acquisition of suitable behaviors for more global goal directed tasks [2,7]. In our work wehave considered a goal reaching task: the working environment is a close environment without obstacles into and with a red door. The robot hastomovetowards the door from every pointinthe environmentuntil it is located adjacent to the door. The door reaching behavior has been ....

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposivebehavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning,page279,May/June 1996.


CMUnited: A Team of Robotic Soccer Agents Collaborating in.. - Veloso, Han, Achim (1997)   (3 citations)  (Correct)

....with on board and offboard types have appeared in recent years. All have found that the reactiveness of soccer robots requires realtime vision processing. However, due to rich visual input, researchers have found that dedicated processors or even DSPs are often needed [ Sahota et al. 1995; Asada et al. 1996 ] Our approach to such a problem is more simplistic. We have acquired a fast framegrabber board that can perform visual captures at frame rate. By engineering the input scene in an appropriate manner, we found that a fast general purpose processor (a 166MHz Pentium processor) is adequate for ....

M. Asada, S. Noda, S. Tawaratumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Learning Actions From Vision-Based Positioning in.. - Cicirelli.. (1998)   (1 citation)  (Correct)

....allow a robot to learn in unknown environments, to adapt dynamically to changes and to choose actions based on the perceived state. However these techniques have generally been used to produce simulation results [4, 5, 6] only in a few experiments have they been applied on real robot platforms[7, 8, 9]. This is due to some difficulties: in real complex situations the convergence conditions cannot be met, the size of the state space can be too large, the actions space can be continuous, and then learning can take too much time to be practical. A solution to these problems has been to integrate ....

....up the learning phase, the starting positions are orderly chosen so as to make their distances from the goal a non decreasing sequence. In this way the system can learn in easy situations at the early stages (near the goal) and later on difficult situations (e.g. behind obstacles) as suggested in [9] with LEM. Figure 2 shows the map learned in simulation after five iterations on all the starting points, besides the fig.3 shows the decreasing number of steps, from a maximum of 24 to 13, respectively after the first and the fifth iteration. The number of steps represents the number of actions ....

K. Hosoda M. Asada S. Noda, S. Tawaratsumida. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, page 279, May/June 1996.


Repeatability of Real World Training Experiments: A Case Study - Hougen, Rybski, Gini (1998)   (1 citation)  (Correct)

....learning systems that go beyond simulation. In these cases simulation may be used to develop a learning system capable of learning on a real robot (e.g. Benhrahim et al. 1992 ] or the system may do its learning in simulation and use the learned solution to perform a task in the real world [ Asada et al. 1996 ] Because this emphasis is quite recent, potential pitfalls of these methods are largely unknown. Nonetheless, problem areas are beginning to be uncovered. One such problem that has been previously uncovered is the e ect that di erences in the physics of the simulation and real world can have ....

....has been previously uncovered is the e ect that di erences in the physics of the simulation and real world can have on the solutions learned. If the robot learns in simulation and acts in the real world, performance on the real robot may not be as pro cient as it is in the simulated environment [ Asada et al. 1996 ] This is quite similar to the e ect encountered in systems that do not learn but are programmed based on simulation models. If the di erences between the simulation and the real world are too great, the solution learned in simulation may not be worthwhile and learning must take place on board ....

Asada, M., S. Noda, S. Tawaratsumida, and K. Hosada: 1996, `Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning'. Machine Learning pp. 279-303.


Module Based Reinforcement Learning for a Real Robot - Kalmár, Szepesvári, Lorincz   (1 citation)  (Correct)

....should then be employed to learn both the low level controllers and the switching controller possibly located simultaneously. Asada et al. considered many aspects of mobile robot learning. They applied a vision based state estimation approach and defined macro actions similar to our controllers [1]. In one of their papers they describe a goal shooting 31 problem in which a mobile robot shot a goal while avoiding another robot [5] First, the robot learned two behaviours separately: the shot and avoid behaviours. Then the two behaviours were synthetised by a handcrafted rule and later ....

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Direct-Vision-BasedReinforcementLearningin"GoingtoaTarget"Task.. - Shibata   (Correct)

....learning has been focused recently[1] Some algorithms to realize the reinforcement learning, such as TD Learning[2] and Q Learning[3] have been proposed. Visual sensory signals including the most information among many kinds of sensory signals, have been used in the learning by Asada et al.[4]. Then, the visual signals were pre processed and the present state of the robot was assigned to one of some discrete states in the state space. And the mapping from the states to some motions was trained by QLearning. Accordingly, it is impossible for the robot to generate a continuous mapping ....

Asada,M., Noda, S., Tawaratsumida, S. and Hosoda, K., "Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning", Machine Learning, Vol. 24, pp. 279-303, 1996


Continuous Valued Q-learning for Vision-Guided Behavior.. - Takahashi, Takeda, Asada (1999)   (3 citations)  Self-citation (Asada)   (Correct)

....and adaptive behaviors through such interactions [1] Asada et al. have presented a series of works on soccer robot agents which chase and shoot a ball into the goal or pass it to another agent. In their reinforcement learning methods, the state and action spaces are quantized by the designer [2, 3] or constructed through the learning process [4, 5, 6] in order to make Q learning, a most widely used reinforcement learning method [7] applicable. That is, well de ned and quantized state and action spaces are needed to apply Q learning to real robot tasks. This causes two kinds of problems: ....

....sucient to cause one state transition. The Q value updating just after taking a physical actions without state transition causes an underestimate of Q value for the state action pair, and the learner cannot acquire any appropriate policy. Asada et.al. called this state action deviation problem [2] and re de ned one action as a series of one kind physical action primitives which causes one state transition. That is, one physical action primitive is repeated until a state transition. To avoid this problem, we update Q value using eq. 9) at not every physical xed time step, but sampled time ....

[Article contains additional citation context not shown here]

M. Asada, S. Noda, S. Tawaratumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303, 1996.


Vision-based Behavior Learning and Development for.. - Asada, Hosoda, Suzuki (1996)   (1 citation)  Self-citation (Asada Hosoda)   (Correct)

....indicates the order of the division: the darker is the earlier. Labels F and B indicate the motions of forward and backward, respectively, and subscript shows the number of state transitions towards the goal. Grid lines indicate the boundaries divided by programmer in the previous work [8]. The remainder of the state space in Figure 5 corresponds to infeasible situations such as the goal and the ball are observed at the center of image, and the size of the goal is large, but that of the ball is small although we had not recognized such a meaningless state in the previous work. As ....

....the goal is large, but that of the ball is small although we had not recognized such a meaningless state in the previous work. As we can see, the sensor space categorization by the proposed method is quite di erent from the one designed by the programmer (rectangular grids) in the previous work [8]. Figure 6: The robot succeeded in nding and shooting a ball into the goal Figure 7: Images taken by the robot during the task execution We applied the method to a real robot environment. The success ratio is worse than the simulation because of the disturbances due to several causes such as ....

[Article contains additional citation context not shown here]

M. Asada, S. Noda, S. Tawaratumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279{ 303, 1996.


Cooperative Behavior Acquisition for Mobile Robots in.. - Asada, Uchibe, Hosoda (1999)   (10 citations)  Self-citation (Asada Hosoda)   (Correct)

....little or no a priori knowledge and that has a higher capability of reactive and adaptive behaviors [9] There have been few works published on reinforcement learning with vision and action. Whitehead and Ballard proposed an active vision system [27] involving a computer simulation. Asada et al. [6] applied vision based reinforcement learning to a real robot task. In these methods, the environment does not include independently moving agents; therefore, the complexity of the environment is not as great as one including other agents. In the case of a multi robot environment, the internal ....

M. Asada, S. Noda, S. Tawaratumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303, 1996.


RoboCup: The Robot World Cup Initiative - Kitano, Asada, Kuniyoshi, Noda.. (1995)   (146 citations)  Self-citation (Asada Noda)   (Correct)

.... increased attention with little or no a priori knowledge giving higher capability of reactive and adaptive behaviors [Connel and Mahadevan 93b] However, almost all of the existing applications have been done only with computer simulations in a virtual world, real robot applications are very few [Asada et al. 94a, Connel and Mahadevan 93a] Since the prominence of the reinforcement learning role is largely determined by the extent to which it can be scaled to larger and complex robot learning tasks, the RoboCup seems a very good platform. At the primary stage of the RoboCup tournament, one to one ....

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. "purposive behavior acquisition on a real robot by vision-based reinforcement learning ". In Proc. of MLC-COLT (Machine Learning Conference and Computer Learning Theory) Workshop on Robot Learning, pages 1--9, 1994.


Learning and Using Models of Kicking Motions for Legged Robots - Sonia Chernova And   (Correct)

No context found.

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, no. 2-3, pp. 279--303, 1996.


Explicit Knowledge Distribution in an.. - Menegatti..   (Correct)

No context found.

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, pp. 279--303, 1996.


Multi-Agent Systems by Incremental Gradient Reinforcement .. - Buffet, Dutech.. (2001)   (Correct)

No context found.

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279--303, 1996.


Reinforcement Learning by Policy Search - Peshkin (2001)   (7 citations)  (Correct)

No context found.

Minoru Asada, Takayuki Nakamura, and Koh Hosoda, Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Machine Learning 23 (1996), 1-40.


Design of Intelligent Mechatronical Systems - With High-Level Petri   (Correct)

No context found.

M. Asada, S Noda, S Tawaratsumida, K. Hosada, Purposive behavior acquisition for a real robot by vision-based reinforcement learning in Machine Learning, Kluwer Academic Publishers, Boston, 1996


Between MDPs and Semi-MDPs: A Framework for Temporal.. - Sutton, Precup, Singh (1999)   (73 citations)  (Correct)

No context found.

Asada, M., Noda, S., Tawaratsumida, S., Hosada, K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning 23:279--303.


Training and Delayed Reinforcements in Q-Learning Agents - Caironi, al. (1994)   (3 citations)  (Correct)

No context found.

M. Asada, S. Noda, S. Tawaratsumida and K. Hosoda, "Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning," Machine Learning, 23, 279--303, (1996).

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC