WEBVTT

1
00:00:00.000 --> 00:00:08.480
In this chapter, we'll take a deeper dive into the rich text and chat functionality the API has to offer.

2
00:00:08.480 --> 00:00:08.480


3
00:00:08.480 --> 00:00:08.560


4
00:00:08.560 --> 00:00:15.640
So far, we've used the Completions endpoint to answer questions, but the model's capabilities go far beyond this.

5
00:00:15.640 --> 00:00:15.640


6
00:00:15.640 --> 00:00:16.840


7
00:00:16.840 --> 00:00:24.440
To understand where these capabilities come from, let's take a step back and discuss how text completion works.

8
00:00:24.440 --> 00:00:24.440


9
00:00:24.440 --> 00:00:24.480


10
00:00:24.480 --> 00:00:30.120
When we send a prompt to the Completions endpoint, the model returns the text that it believes

11
00:00:30.120 --> 00:00:36.800
is most likely to complete the prompt, which it infers based on the data the model was developed on.

12
00:00:36.800 --> 00:00:36.800


13
00:00:36.800 --> 00:00:36.800


14
00:00:36.800 --> 00:00:44.880
If we send "Life is like a box of chocolates" to the model, it correctly completes the quote with high probability.

15
00:00:44.880 --> 00:00:48.200
We say high probability here because the model results are

16
00:00:48.200 --> 00:00:54.520
non-deterministic, so the model may only correctly complete the quote 98 times out of 100.

17
00:00:54.520 --> 00:00:55.720


18
00:00:55.720 --> 00:00:55.720


19
00:00:55.720 --> 00:01:01.280
There are many use cases where randomness is undesirable; think of a customer service

20
00:01:01.280 --> 00:01:08.280
chatbot - we wouldn't want the chatbot to provide different guidance to customers with the same issue.

21
00:01:08.280 --> 00:01:11.560
However, we would like the model to be flexible to different

22
00:01:11.560 --> 00:01:15.440
inputs, so there's often a trade off in the amount of randomness.

23
00:01:15.440 --> 00:01:16.680


24
00:01:16.680 --> 00:01:16.680


25
00:01:16.680 --> 00:01:21.520
We can control the amount of randomness in the response using the temperature parameter.

26
00:01:21.520 --> 00:01:28.440
temperature is set to one by default, but can range from zero to two,

27
00:01:28.440 --> 00:01:34.480
where zero is almost entirely deterministic and two is extremely random.

28
00:01:34.480 --> 00:01:34.480


29
00:01:34.480 --> 00:01:34.480


30
00:01:34.480 --> 00:01:38.360
If we add a temperature of two here, we can see the model completes

31
00:01:38.360 --> 00:01:42.760
the prompt by putting its own bizarre spin on Forrest Gump's famous quote.

32
00:01:42.760 --> 00:01:42.760


33
00:01:42.760 --> 00:01:43.840


34
00:01:43.840 --> 00:01:49.160
Because the text completion model returns the most likely text to follow the prompt, it can be used to

35
00:01:49.160 --> 00:01:56.600
solve a number of tasks besides answering questions, including text content generation and transformation.

36
00:01:56.600 --> 00:01:56.600


37
00:01:56.600 --> 00:01:56.600


38
00:01:56.600 --> 00:02:00.440
Text transformation involves changing text

39
00:02:00.440 --> 00:02:08.200
based on an instruction, and examples include find and replace, summarization, and copyediting.

40
00:02:08.200 --> 00:02:08.200


41
00:02:08.200 --> 00:02:08.200


42
00:02:08.200 --> 00:02:15.920
For example, we can use the API to update the name, pronouns, and job title in a bio.

43
00:02:15.920 --> 00:02:15.920


44
00:02:15.920 --> 00:02:15.920


45
00:02:15.920 --> 00:02:21.720
Notice that the prompt starts with the instruction, then the text to transform.

46
00:02:21.720 --> 00:02:29.560
We've also used triple quotes to define a multi-line prompt for ease of readability and processing.

47
00:02:29.560 --> 00:02:29.560


48
00:02:29.560 --> 00:02:29.560


49
00:02:29.560 --> 00:02:37.120
Then, as before, we send this prompt to the Completions endpoint of the API using Completion-dot-create.

50
00:02:37.120 --> 00:02:37.120


51
00:02:37.120 --> 00:02:37.120


52
00:02:37.120 --> 00:02:39.040
Voilà!

53
00:02:39.040 --> 00:02:41.360
We have our updated text.

54
00:02:41.360 --> 00:02:47.520
Even with a find and replace tool, this task would normally require us to specify every word to update.

55
00:02:47.520 --> 00:02:47.520


56
00:02:47.520 --> 00:02:47.520


57
00:02:47.520 --> 00:02:54.640
Text completions are also used to generate new text content from a prompt providing an instruction.

58
00:02:54.640 --> 00:02:54.640


59
00:02:54.640 --> 00:02:55.680


60
00:02:55.680 --> 00:03:00.400
For example, we can create a request to generate a tagline for a new

61
00:03:00.400 --> 00:03:05.640
hot dog stand - the API does a good job, and even includes a subtle pun!

62
00:03:05.640 --> 00:03:06.680


63
00:03:06.680 --> 00:03:06.680


64
00:03:06.680 --> 00:03:13.160
By default, the response from the API is quite short, which may be unsuitable for many use cases.

65
00:03:13.160 --> 00:03:13.160


66
00:03:13.160 --> 00:03:13.160


67
00:03:13.160 --> 00:03:19.120
The max_tokens parameter can be used to control the maximum length of the response.

68
00:03:19.120 --> 00:03:19.120


69
00:03:19.120 --> 00:03:19.120


70
00:03:19.120 --> 00:03:27.400
Tokens are a unit of one or more characters used by language models to understand and interpret text.

71
00:03:27.400 --> 00:03:27.400


72
00:03:27.400 --> 00:03:27.400


73
00:03:27.400 --> 00:03:36.760
In English, one token translates to about four characters, and 100 tokens to 75 words, so if

74
00:03:36.760 --> 00:03:44.240
our use case requires no more than around 150 words, a max_tokens of 200 would be a good choice.

75
00:03:44.240 --> 00:03:44.240


76
00:03:44.240 --> 00:03:44.240


77
00:03:44.240 --> 00:03:50.920
Increasing max_tokens will likely also increase the usage cost for each request.

78
00:03:50.920 --> 00:03:50.920


79
00:03:50.920 --> 00:03:50.920


80
00:03:50.920 --> 00:03:57.120
Recall that the usage costs are dependent on the model used and the amount of generated text.

81
00:03:57.120 --> 00:04:04.320
Each model is actually priced based upon the cost per 1000 tokens, where input tokens, the

82
00:04:04.320 --> 00:04:10.520
tokens used in the prompt, and output tokens, the generated text, can be priced differently.

83
00:04:10.520 --> 00:04:10.520


84
00:04:10.520 --> 00:04:10.520


85
00:04:10.520 --> 00:04:15.960
When scoping the potential cost of a new AI feature, the first step is

86
00:04:15.960 --> 00:04:20.800
often a back-of-the-envelope calculation to determine the cost per unit time.

87
00:04:20.800 --> 00:04:20.800


88
00:04:20.800 --> 00:04:20.800


89
00:04:20.800 --> 00:04:24.720
Onward to the exercises!

