GroundingDINO: Bridging Language and Vision for Open-Set Object Detection
Grounding DINO can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector DINO. for open-set concept generalization
You can find the official github repository here
https://github.com/IDEA-Research/GroundingDINO
Steps in this Tutorial
In this tutorial, we are going to cover:
Before you start
Install Grounding DINO 🦕
Download Grounding DINO Weights 🏋️
Download Example Data
Load Grounding DINO Model
Grounding DINO Demo
Let’s begin!
Before you start
We are using google colab tool for doing this demo so if you are following this tutorial, Let’s make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator, set it to GPU, and then click Save.
!nvidia-smi
# ----install libraries
!pip -q install transformers scipy
# ------install GroundingDINO, DINO weights
!git clone https://github.com/IDEA-Research/GroundingDINO.git
%cd GroundingDINO
!pip -q install -e .
%mkdir weights
%cd weights
!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
%cd ..
- The code clones a GitHub repository for a project called “GroundingDINO” using git clone. GroundingDINO appears to be a machine learning project.
- The code then navigates into the “GroundingDINO” directory using %cd GroundingDINO.
- It installs the project as an editable Python package using pip -q install -e .. This allows the code in this repository to be imported and used in other Python scripts or notebooks.
- A “weights” directory is created using %mkdir weights. This is likely where pre-trained model weights will be stored.
- The code moves into the “weights” directory using %cd weights.
- A pre-trained model weights are downloaded using wget -q.
- Weight file is “groundingdino_swint_ogc.pth.” These weights are likely to be used in the GroundingDINO project.
THIS IS THE sOURCE IMAGE WE ARE USING
# ----GroundingDINO
from groundingdino.util.inference import load_model, load_image, predict, annotate
from GroundingDINO.groundingdino.util import box_ops
# ----Extra Libraries
from PIL import Image
import torch
import cv2
import matplotlib.pyplot as plt
import numpy as np
device = "cuda"
img_path = '/content/image2.jpg'
src, img = load_image(img_path)
# ----Grounding DINO
groundingdino_model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
# ---- prompt is for identifying target object in the image
TEXT_PROMPT = "train and car and person"
BOX_TRESHOLD = 0.3
TEXT_TRESHOLD = 0.25
boxes, logits, phrases = predict(
model=groundingdino_model,
image=img,
caption=TEXT_PROMPT,
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
img_annnotated = annotate(image_source=src, boxes=boxes, logits=logits, phrases=phrases)[...,::-1]
fig, axes = plt.subplots(1, 2, figsize=(30, 20))
plt.title("Annotated Image with Text", fontsize=30)
axes[0].imshow(src)
axes[0].axis('off')
axes[1].imshow(img_annnotated)
axes[1].axis('off')
plt.show()