1
00:00:00,000 --> 00:00:08,100
Welcome to this video about using natural language to create data visualizations with

2
00:00:08,100 --> 00:00:11,239
built-in agents in LangChain.

3
00:00:11,239 --> 00:00:13,399
After watching this video, you'll be able to

4
00:00:13,399 --> 00:00:16,700
Identify when to use the LangChain Pandas agent

5
00:00:16,700 --> 00:00:20,360
Explain how the LangChain Pandas agent differs from other agents

6
00:00:20,360 --> 00:00:25,780
Describe how to set up the LangChain Pandas agent with an IBM watsonx.ai model

7
00:00:25,780 --> 00:00:31,219
Explain how to use natural language to analyze and visualize data using a Pandas DataFrame

8
00:00:31,219 --> 00:00:35,860
Summarize how the agent-generated Python code connects to the data outputs

9
00:00:35,860 --> 00:00:42,740
And identify best practices for safely prompting and using AI tools for data analysis tasks

10
00:00:42,740 --> 00:00:47,459
This dynamic method of running code is ideal for exploration and rapid prototyping, but

11
00:00:47,459 --> 00:00:52,180
is not recommended for production environments unless comprehensive safeguards are in place.

12
00:00:52,180 --> 00:00:56,659
These innovative features are currently available within LangChain's langchain-experimental

13
00:00:56,659 --> 00:00:57,659
package.

14
00:00:57,659 --> 00:00:59,939
Let's get started.

15
00:00:59,939 --> 00:01:06,419
The create_pandas_dataframe_agent works just like other LangChain agents with a few

16
00:01:06,419 --> 00:01:08,419
key differences.

17
00:01:08,419 --> 00:01:15,419
When you use the create_pandas_dataframe_agent, this agent uses a pre-configured set of functions

18
00:01:15,419 --> 00:01:17,980
and prompts, saving time and effort.

19
00:01:17,980 --> 00:01:22,900
Next, this agent operates on the existing Pandas DataFrame you provided.

20
00:01:22,900 --> 00:01:27,300
The user inputs a natural language prompt, and then the agent responds with the appropriate

21
00:01:27,300 --> 00:01:31,580
answer, whether the answer is a value, summary, or visualization.

22
00:01:31,580 --> 00:01:35,580
Next, explore how to set up and use this built-in agent.

23
00:01:35,580 --> 00:01:38,580
First, import pandas as pd.

24
00:01:38,580 --> 00:01:42,300
Next, you'll load the DataFrame object, or df.

25
00:01:42,300 --> 00:01:47,339
In this instance, you will use the Student Alcohol Consumption CSV Formatted dataset

26
00:01:47,339 --> 00:01:50,360
by UCI Machine Learning.

27
00:01:50,360 --> 00:01:52,519
You can display the dataset as a table.

28
00:01:52,519 --> 00:01:58,220
In this instance, you can use the df.head command to display the headings and the first

29
00:01:58,220 --> 00:02:00,180
five rows of data.

30
00:02:00,180 --> 00:02:03,580
Your analysis will focus on the sex and age parameters.

31
00:02:03,580 --> 00:02:08,619
In this instance, sex is synonymous with gender and is represented by M for male and F for

32
00:02:08,660 --> 00:02:14,220
female, and the student's ages are displayed as numeric values between 15 and 22.

33
00:02:14,220 --> 00:02:19,699
Next, set up their credentials, including the model ID and generation parameters.

34
00:02:19,699 --> 00:02:27,380
From ibm_watsonx_ai.metanames, import GenTextParamsMetaNames as GenParams

35
00:02:27,380 --> 00:02:31,979
and create a dictionary to store credential information.

36
00:02:31,979 --> 00:02:37,179
Specify a model ID for the Llama 3 70B model, define the generation parameters to

37
00:02:37,179 --> 00:02:43,059
initialize the model, and use the model via IBM watsonx.ai for text generation.

38
00:02:43,059 --> 00:02:46,899
You can also include additional configuration settings, such as the number of generation

39
00:02:46,899 --> 00:02:48,460
tokens to use.

40
00:02:48,460 --> 00:02:53,220
An important note, models available on watsonx.ai can change.

41
00:02:53,220 --> 00:02:58,339
Always check for the latest models on the watsonx API or any other LLM host APIs.

42
00:02:58,339 --> 00:03:04,639
Next, let's load a watsonx LLM and connect the LLM to LangChain.

43
00:03:04,639 --> 00:03:10,080
To load and connect the model to LangChain, begin by setting up the watsonx LLM by importing

44
00:03:10,080 --> 00:03:16,160
the model class from IBM's Foundation Models and watsonx LLM from LangChain's watsonx

45
00:03:16,160 --> 00:03:17,160
extension.

46
00:03:17,160 --> 00:03:21,320
Next, provide the previously configured necessary credentials.

47
00:03:21,320 --> 00:03:26,199
Here you see the model ID, parameters, project ID, and space ID.

48
00:03:26,199 --> 00:03:30,160
This step connects your model instance with the required configurations.

49
00:03:30,160 --> 00:03:33,800
Then specify watsonx LLM as the LLM model.

50
00:03:33,800 --> 00:03:39,119
This action integrates watsonx LLM within LangChain to build chatbots, chains, and tools

51
00:03:39,119 --> 00:03:41,360
using LangChain's features.

52
00:03:41,360 --> 00:03:46,440
And always check the latest LangChain documentation, as the specific syntax can evolve with product

53
00:03:46,440 --> 00:03:48,199
updates.

54
00:03:48,199 --> 00:03:51,360
Now set up the Pandas DataFrame agent from LangChain.

55
00:03:51,360 --> 00:03:55,960
First, import create_pandas_dataframe_agent.

56
00:03:55,960 --> 00:04:01,360
Next, create a DataFrame agent that connects the LLM to a DataFrame to answer data questions

57
00:04:01,360 --> 00:04:03,179
using natural language.

58
00:04:03,179 --> 00:04:09,020
Then pass the LLM, the DataFrame (df), and the standard Pandas object created earlier.

59
00:04:09,020 --> 00:04:13,080
Set verbose to true if you want to see more details while the code runs.

60
00:04:13,080 --> 00:04:18,179
Set return_intermediate_steps=True to view the code generated

61
00:04:18,179 --> 00:04:19,480
along the way.

62
00:04:19,480 --> 00:04:21,820
Great for debugging or understanding the logic.

63
00:04:21,820 --> 00:04:24,760
Now let's learn how to implement natural language.

64
00:04:24,760 --> 00:04:29,220
You can ask a natural question, such as, "how many rows are in this file?"

65
00:04:29,220 --> 00:04:35,040
Using the invoke method, and the Pandas agent returns the answer instantly: 395 rows.

66
00:04:35,040 --> 00:04:37,859
If you're curious, you can view the code.

67
00:04:37,859 --> 00:04:42,299
You can inspect these steps in the intermediate_steps parameter, which displays

68
00:04:42,299 --> 00:04:46,859
the generated code, including DataFrame filters and aggregation commands.

69
00:04:46,859 --> 00:04:51,720
You'll see the exact code the LLM used, such as len(df).

70
00:04:51,720 --> 00:04:55,339
This example asks the agent to output an answer to the question,

71
00:04:55,339 --> 00:04:57,339
"How many students are 18 years old?"

72
00:04:57,339 --> 00:05:01,100
The agent responds, "There are 82 students who are 18 years old."

73
00:05:01,100 --> 00:05:05,220
To see the code that produced this answer, you can view the intermediate_steps

74
00:05:05,220 --> 00:05:06,220
parameter.

75
00:05:06,220 --> 00:05:11,019
You will see that the code filtered the DataFrame for rows with an age of 18, and then counted

76
00:05:11,019 --> 00:05:13,079
the number of matching rows.

77
00:05:13,079 --> 00:05:17,100
You can also generate visualizations using natural language.

78
00:05:17,100 --> 00:05:21,299
Verbalize or type your natural language prompt that requests the information you want, such

79
00:05:21,299 --> 00:05:23,899
as plot the gender count with bars.

80
00:05:23,940 --> 00:05:28,220
And the LangChain Pandas agent generates clear, insightful charts and seconds.

81
00:05:28,220 --> 00:05:31,019
No coding needed by you or another user.

82
00:05:31,019 --> 00:05:33,739
You will use simple, natural instructions.

83
00:05:33,739 --> 00:05:37,619
Although you used the word gender in your request, the LLM understood that you were

84
00:05:37,619 --> 00:05:42,619
requesting data from the column labeled sex and located the information in the dataset.

85
00:05:42,619 --> 00:05:46,380
Now that you've seen the LangChain Pandas agent in action, here are some practices to

86
00:05:46,380 --> 00:05:50,059
help you safely obtain the best results.

87
00:05:50,059 --> 00:05:53,119
Follow these practices to obtain the best results.

88
00:05:53,119 --> 00:05:57,559
Always use sandboxed environments to prevent unintended modifications to live data and

89
00:05:57,559 --> 00:06:01,839
avoid prompt injection risks that could result in the running of malicious code.

90
00:06:01,839 --> 00:06:06,640
To effectively and safely prompt and use AI tools for data analysis, design clear and

91
00:06:06,640 --> 00:06:09,720
specific prompts to avoid ambiguous responses.

92
00:06:09,720 --> 00:06:14,839
Then combine LLM analysis with human expertise to validate the results, such as validating

93
00:06:14,839 --> 00:06:19,440
that the LLM analyzed the correct data and returned the correct results, and iteratively

94
00:06:19,440 --> 00:06:21,559
refine your prompts and analysis.

95
00:06:21,559 --> 00:06:24,359
Now let's recap what you've learned.

96
00:06:24,359 --> 00:06:29,299
You now know that using the LangChain Pandas agent is ideal for exploration and rapid prototyping

97
00:06:29,299 --> 00:06:35,040
but is not recommended for production environments unless comprehensive safeguards are in place.

98
00:06:35,040 --> 00:06:40,359
The LangChain Pandas agent is different because it uses preconfigured functions and prompts,

99
00:06:40,359 --> 00:06:45,640
operates on your existing Pandas dataframe, accepts prompt inputs from users, and responds

100
00:06:45,640 --> 00:06:47,920
to prompts with answers or visuals.

101
00:06:47,959 --> 00:06:53,480
You can set up the LangChain Pandas agent to work with an external LLM such as IBM watsonx.ai

102
00:06:53,480 --> 00:06:54,480
model.

103
00:06:54,480 --> 00:06:59,519
To set up the LangChain Pandas agent with an IBM watsonx.ai model, first initialize

104
00:06:59,519 --> 00:07:08,399
your model with watsonx LLM, then connect it to a Pandas dataframe using the create_pandas_dataframe_agent function.

105
00:07:08,399 --> 00:07:12,920
You can analyze and visualize data by asking natural language questions to the Pandas dataframe

106
00:07:12,920 --> 00:07:17,440
agent, which instantly generates code and returns clear data insights.

107
00:07:17,480 --> 00:07:23,279
The agent-generated Python code directly interacts with your dataframe, filtering, aggregating,

108
00:07:23,279 --> 00:07:27,160
and visualizing data based on your natural language prompts.

109
00:07:27,160 --> 00:07:32,320
And always use sandboxed environments, design clear prompts, validate LLM analysis with

110
00:07:32,320 --> 00:07:37,399
human expertise, and iteratively refine your queries for safe and effective AI-driven data

111
00:07:37,399 --> 00:07:38,200
analysis.