YOLO Pose Estimation and Key Points Detection with Python

Pose estimation is a task that involves identifying the location of specific points in an image, usually referred to as keypoints. The keypoints can represent various parts of the object such as joints, landmarks, or other distinctive features. The locations of the keypoints are usually represented as a set of 2D [x, y] or 3D [x, y, visible] coordinates. The output of a pose estimation model is a set of points that represent the keypoints on an object in the image, usually along with the confidence scores for each point. Pose estimation is a good choice when you need to identify specific parts of an object in a scene, and their location in relation to each other.

YOLOv8 pose estimation models have pose suffix and are pretrained on COCO dataset with the following Classes.

['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

YOLOv8 pretrained pose estimation models (nano, small, medium, large and extra large based on number of parameters) are shown in the table below:

Setup UltraLytics for YOLOv8

%pip install ultralytics
import ultralytics
ultralytics.checks()

Load YOLOv8 for Pose Estimation

from ultralytics import YOLO

# Load a model
# You can use different YOLOv8 variants (yolov8n-pose, yolov8s-pose, yolov8m-pose, yolov8l-pose, yolov8nx-pose)
model = YOLO('yolov8n-pose.pt')  # load a pretrained model

# Use the model
results = model('https://ultralytics.com/images/zidane.jpg')  # predict on an image
# Save the output image after pose estimation in Google Colab
results[0].save('/content/output_pose.jpg')
# Print the COCO dataset classes on which model is trained.
print(model.names.values())

Input Image

Display the Output Image after Pose Estimation

from IPython.display import Image

# Display the image in Google Colab
Image(filename='/content/output_pose.jpg')

Output Predicted Image

Print Detected Key Points

for r in results:
    print(r.keypoints)