1
00:00:11,780 --> 00:00:17,560
In this lecture we are going to define the environment class we're going to call it multi stock UniFi

2
00:00:18,710 --> 00:00:19,540
at the top here.

3
00:00:19,550 --> 00:00:26,630
We just have some comments describing both the state and action since these are non-trivial.

4
00:00:26,640 --> 00:00:31,170
So as you recall the state consists of the number of shares of each stock that we own.

5
00:00:31,170 --> 00:00:34,890
The price of each stock and the cash that we own on invested.

6
00:00:35,280 --> 00:00:45,350
The actions can be different combinations of sell hold or buy for the three different stocks.

7
00:00:45,380 --> 00:00:52,430
Next we have the Constructor it takes in two arguments a two dimensional data array containing a time

8
00:00:52,430 --> 00:00:58,430
series of stock prices and also the initial investment amount with a default value of twenty thousand

9
00:00:58,430 --> 00:01:00,370
dollars.

10
00:01:00,400 --> 00:01:05,710
Next we set the attribute stock price history to the data we passed in.

11
00:01:05,710 --> 00:01:12,670
Next we set the attributes and step and in stock which we obtained from the shape of the data array.

12
00:01:12,680 --> 00:01:19,100
Next we set the attribute initial investment to the initial investment we passed in.

13
00:01:19,170 --> 00:01:22,230
Next we initialize a few attributes to none.

14
00:01:22,230 --> 00:01:29,160
Although we'll be selling these shortly next we set the attribute action space which is equal to the

15
00:01:29,220 --> 00:01:32,430
integers 0 up to a 3 to the power and stock

16
00:01:37,510 --> 00:01:42,030
next for convenience will want to have a corresponding action list.

17
00:01:42,280 --> 00:01:47,550
Since we have twenty seven possible actions this will be a list of length twenty seven.

18
00:01:47,800 --> 00:01:55,870
So the action space only tells us the integers from zero up to twenty six whereas for this thing it

19
00:01:55,870 --> 00:02:00,060
actually gives us some examples of what we can have.

20
00:02:00,090 --> 00:02:08,490
So for example 0 0 0 means sell all your stock 0 0 1 means sell the first 2 stocks but do nothing for

21
00:02:08,490 --> 00:02:10,000
the third star.

22
00:02:10,020 --> 00:02:15,630
Now this code may look complicated but you can try it in your terminal or in a notebook to confirm that

23
00:02:15,780 --> 00:02:16,570
this is the result.

24
00:02:16,600 --> 00:02:25,650
So you're gonna get an array consisting of values just like these.

25
00:02:25,820 --> 00:02:30,650
Next we set that each state dim which will tell us the size of the state vector.

26
00:02:30,860 --> 00:02:33,120
As mentioned earlier this is the number of stocks.

27
00:02:33,140 --> 00:02:35,700
Times two plus one.

28
00:02:35,900 --> 00:02:41,240
Finally we call the reset function which initialize is a few more attributes and returns the initial

29
00:02:41,240 --> 00:02:41,570
state

30
00:02:47,430 --> 00:02:49,930
so let's look at the reset function.

31
00:02:49,950 --> 00:02:53,250
Here we set the attribute current step equal to zero.

32
00:02:53,250 --> 00:02:58,350
This means that we point to the first day of stock prices in our dataset.

33
00:02:58,350 --> 00:03:03,480
Next we set the attribute stock owned to an array of all zeros.

34
00:03:03,510 --> 00:03:06,870
This array will tell us how many shares of each stock we owe.

35
00:03:06,960 --> 00:03:11,300
So when we start we're not going to own any stock.

36
00:03:11,320 --> 00:03:17,820
Next we have the stock price which tells us the current price of each stock on the current day.

37
00:03:17,870 --> 00:03:22,920
This is of course just the time series index by the current step.

38
00:03:22,940 --> 00:03:28,040
Next we have the cash in hand attribute which to begin with is our initial investment.

39
00:03:28,070 --> 00:03:31,370
Since we've not yet bought any stocks.

40
00:03:31,370 --> 00:03:36,790
Finally we want to return the state vector which is done by using the function to get ups.

41
00:03:36,860 --> 00:03:38,810
We'll see what's inside this function shortly

42
00:03:43,770 --> 00:03:49,020
next we have the step function which performs an action in the environment and returns the next state

43
00:03:49,050 --> 00:03:52,140
and reward among other things.

44
00:03:52,170 --> 00:03:57,430
First we check that the action that we passed in exists interaction space.

45
00:03:57,550 --> 00:04:03,990
Next we get the current value of our portfolio and assign this to the variable we've thou.

46
00:04:04,030 --> 00:04:11,270
Next we increment the current step pointer and also the current stock prices attribute so the day goes

47
00:04:11,270 --> 00:04:17,300
up the next day and the price goes up to the price for the next day.

48
00:04:17,540 --> 00:04:19,550
Next we perform the trade.

49
00:04:19,580 --> 00:04:21,880
This is another function we'll be discussing later.

50
00:04:22,010 --> 00:04:31,540
But you can assume it involves some combination of buying selling or holding of the stocks we are considering.

51
00:04:31,810 --> 00:04:38,040
Next we get the value again and assign this to the curve our variable.

52
00:04:38,070 --> 00:04:43,470
Next we can calculate the reward which is just the difference between the current value of our portfolio

53
00:04:43,950 --> 00:04:45,330
and the value of our portfolio.

54
00:04:45,360 --> 00:04:52,450
Before making the trade Max we said the Don flag which is true only if we've reached the end of the

55
00:04:52,450 --> 00:05:00,160
time series MAX WE SET THE INFO dictionary which allows us to return at the current value of our portfolio

56
00:05:00,640 --> 00:05:04,960
since this is part of neither the state nor the reward.

57
00:05:04,980 --> 00:05:12,260
Lastly we returned the next day the reward done flag an info dictionary just like the open AGM API

58
00:05:17,360 --> 00:05:20,890
so that's it for the public functions of this class.

59
00:05:20,900 --> 00:05:24,460
Next let's look at the functions we made use of earlier.

60
00:05:24,570 --> 00:05:31,690
First we have get OBS This returns the state by the way sometimes like right now we are going to use

61
00:05:31,690 --> 00:05:35,440
the terms state and observation interchangeably.

62
00:05:35,500 --> 00:05:41,230
There are cases where we might not want to do this such as when the state is some transformation of

63
00:05:41,230 --> 00:05:47,320
the observation or it could even be multiple past observations stacked up together.

64
00:05:47,320 --> 00:05:52,760
For this script however we'll assume that they are one in the same as mentioned earlier.

65
00:05:52,890 --> 00:05:56,220
This is going to be a vector with three components.

66
00:05:56,220 --> 00:05:59,650
The first component is the number of shares of each stock that we own.

67
00:05:59,700 --> 00:06:01,900
So that's Stockholm.

68
00:06:02,010 --> 00:06:05,940
This should be a list of size three.

69
00:06:06,020 --> 00:06:08,980
The second component is the value of the stock.

70
00:06:08,990 --> 00:06:12,080
This should also be a list of size three.

71
00:06:12,090 --> 00:06:18,690
Finally we add the cash in hand attribute to our observation at the last index and then we return our

72
00:06:18,690 --> 00:06:23,680
observation.

73
00:06:23,720 --> 00:06:25,790
Next we have the get Val function.

74
00:06:26,030 --> 00:06:29,160
This returns the current value of our portfolio.

75
00:06:29,210 --> 00:06:34,940
This is equal to the number of shares we own for each stock multiplied by the value of each stock.

76
00:06:35,000 --> 00:06:36,170
Added to the cash we have

77
00:06:40,530 --> 00:06:46,790
lastly we have the trade function which is probably the most complex function in this class to start.

78
00:06:47,130 --> 00:06:53,820
We need to grab the action vector from our action list which is index by the action argument which is

79
00:06:53,820 --> 00:06:56,260
just an integer.

80
00:06:56,330 --> 00:07:02,900
So this is going to return us vector of size 3 which will tell us for each of the three stocks whether

81
00:07:02,900 --> 00:07:05,920
to sell buy or hold.

82
00:07:06,270 --> 00:07:09,550
So let's just do a simple example to explain this more clearly.

83
00:07:09,960 --> 00:07:16,350
First that the action will be an integer from 0 up to twenty six and we're going to use that to tell

84
00:07:16,350 --> 00:07:20,580
us which actions we actually need to perform for each stock.

85
00:07:20,580 --> 00:07:24,970
So for example if we retrieve the vector to 1 0.

86
00:07:25,350 --> 00:07:30,160
That means buy the first stock hold the second stock and sell the third stock.

87
00:07:32,290 --> 00:07:38,860
Next since we want to sell everything we want to sell before we buy anything we have to figure out what

88
00:07:38,860 --> 00:07:41,370
needs to be bought and what needs to be sold.

89
00:07:41,380 --> 00:07:45,550
So we are going to create these two lists sell index and buy index.

90
00:07:45,880 --> 00:07:52,080
These will contain the indices for the stocks that we want to buy or sell to populate these lists.

91
00:07:52,080 --> 00:07:58,020
We're going to live through action back then we check if value is equal to zero.

92
00:07:58,290 --> 00:07:58,950
If it's zero.

93
00:07:58,950 --> 00:08:08,340
That means we want to sell this stock and if it's 2 That means we want to buy this stock.

94
00:08:08,520 --> 00:08:12,000
Next we're going to sell all the stocks we want to sell.

95
00:08:12,000 --> 00:08:17,090
As mentioned this problem is greatly simplified because it's all or nothing for each stock.

96
00:08:17,100 --> 00:08:18,210
We want to sell.

97
00:08:18,210 --> 00:08:20,430
We're going to sell every share we own.

98
00:08:20,820 --> 00:08:24,290
So we lived through each index in the sell index list.

99
00:08:24,570 --> 00:08:31,940
Then we increment our cash in hand to be the total value of the shares that we previously owned so that

100
00:08:31,940 --> 00:08:38,210
would be stock price times the number of stocks sold and then we set the number of shares we own to

101
00:08:38,210 --> 00:08:42,250
zero.

102
00:08:42,290 --> 00:08:44,320
Next we have our buy loop.

103
00:08:44,330 --> 00:08:46,660
Again this is extremely simplified.

104
00:08:46,820 --> 00:08:51,950
We're just going to loop through each stock one at a time and buy one share of that stock until we run

105
00:08:51,950 --> 00:08:53,690
out of cash.

106
00:08:53,690 --> 00:08:59,510
This is of course a great simplification since let's say we have nine dollars but the share we want

107
00:08:59,510 --> 00:09:01,910
to buy next is ten dollars.

108
00:09:01,910 --> 00:09:08,210
We can't buy it but maybe the next stock is only five dollars so we could have afforded one share but

109
00:09:08,210 --> 00:09:09,200
we would never see this.

110
00:09:09,260 --> 00:09:13,320
Since we set the can buy flight to false and exit the loop.

111
00:09:13,540 --> 00:09:16,060
Of course this only happens on the boundary of the stock.

112
00:09:16,060 --> 00:09:19,930
So you're looking at stock 3 and going back to stock 1.

113
00:09:19,980 --> 00:09:25,050
In any case inside the loop first we check if we have enough cash to afford a one share of the current

114
00:09:25,050 --> 00:09:29,010
stock so cash in hand must be greater than the stock price.

115
00:09:29,130 --> 00:09:35,030
Then we increment the stock owned by one and subtract the value of the stock from the cash in hand.
