MAKING COMPLEX DECISIONS
AI in Modern Approach : Stuart Russell Peter Norvig, Prentice Hall, 2003, Page 462~488
|
|
|
p613
4 × 3 +1 -1
p614

±×¸² 1 4 × 3 0.8 0.2 +1 -1 -0.04
[Up, Up, Right, Right,
Right] (1, 1) Up (1, 2)
(2, 1) (1, 1) [Up, Up,
Right, Right, Right] (4, 3)
0.32776
T(s, a, s') s' a s s' s s T (s, a, s')
s R(s) -0.04 +1 -1
p615
+1 10 0.6 -0.04

T(s, a, s')
R(s)
¥ð ¥ð(s) ¥ð
¥ð* ¥ð* s ¥ð*(s)
(4, 2) (3, 1) (4, 2)
R(s) R(s) R(s) ¡Â -1.6284 -1 -0.4278 ¡Â R(s) ¡Â -0.0850 +1 -1 (3, 1) (-0.0221 < R(s) < 0) (4, 1) (3, 2)
p616

±×¸² 2 R(s) = -0.04 R(s)
-1 R(s) > 0 (4, 1), (3, 2), (4, 2) R(s)

N
k > 0 (3, 1) 4
× 3
p617
N = 3 +1 Up N = 100

1. 
4 × 3
2. 
0 1
0
1
(1/
) - 1
= 1
p618
+¡Ä -¡Ä +¡Ä
1.

(1)
2. 
3. 4 × 3 (1, 1)
¥ð ¥ð*
(2)
p619
¥ð
¥ð t
(3)
U(s)
U(s) R(s) R(s) U(s)
s 4 × 3 +1

±×¸² 3 4
× 3
R(s) = -0.04
U(s)
(4)
(5)
p620
4 × 3 (1, 1)

n n
n 
(6)
4 × 3
p621
|
VALUE-ITERATION(mdp,
mdp,
S T R
U, U' S ¥ä U ¡ç U' ; ¥ä ¡ç 0 s S
U |
±×¸² 4

±×¸² 5 k
c 
p622




(7)
U BU = U

N


N
N

p623
(8)
i
U
s
¥ð*
if
then
(9)
4 × 3



±×¸² 6

p624



s
|
POLICY-ITERATION(mdp) mdp S T U, U' S ¥ð
s S
P |
±×¸² 7
p625
s
(10)


n n


k
p626
¥ð(s) s s s
4 × 3 +1 77.5 % 81.8 %

±×¸² 8
T(s, a, s') R(s) O(s, o) o s

p627
b(s) s b b(s) a o
(11)
¥á 

1. b 
2. o
3. 
4 × 3
b b'
a
b' o a b
s'

b' b
a 




¥ð(b) b
4 × 3

+1 +1 86.6 %
p629
t
t
t
O(s, o)
t
t

±×¸² 9 t 
p630

±×¸² 10
U

t
p631
d
E 
p632
1. O E f f O f E f E f O
n-player n > 2 O E

|
|
O : one |
O : two |
|
E : one E : two |
E = 2, O = -2 E = -3, O = 3 |
E = -3, O = 3 E = 4, O = -4 |
O two E two 4 E -4 O
a p b [p ; a ; (1 - p) : b] [0.5 : one ; 0.5 : two]
p633
|
|
|
|
|
|
A = -5, B = -5 A = 0, B = -10 |
A = -10, B = 0 A = -1, B = -1 |
s p s' s
p s' s s'
p634
(-1, -1) 
(-1, -1)


|
|
|
|
|
|
A = 9, B = 9 A = -3, B = -1 |
A = -4, B = -1 A = 5, B = 5 |
two (dvd, dvd) (cd, cd)
p635
(dvd, dvd) (dvd, dvd)O E
E
E E e O
o
E O
O U
-3 
O E
E U
+2 U ¡Â +2
p636
U

U [p
: one ; (1 - p) : two]

E E [p : one ; (1 - p) : two] O p O one E 2p - 3 (1 - p) = 5p - 3 -3p + 4 (1 - p) = 4 - 7p p x-axis O E p

E 
O O [q : one ; (1 - q) : two] E q 2q - 3 (1 - q) = 5q - 3 -3q + 4 (1 - q) = 4 - 7q O

E 
-1/12 -1/12 -1/12 O E [7/12 : one ; 5/12 : two] -1/12
p637

±×¸² 11
n
p638
[7/12 : one ; 5/12
: two] -1/12 E
-1/12 E

p639




n m
> n
p641
U

p642

i



p643

= LENGTH (path with
) - LENGTH (path with
)
n 