Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
huggingface
GitHub Repository: huggingface/notebooks
Path: blob/main/examples/paligemma/Fine_tuned_Model_Inference.ipynb
5906 views
Kernel: Python 3

Fine-tuned PaliGemma Inference

In this notebook we will see how to infer a PaliGemma fine-tuned model (using 🤗 transformers).

We need the latest version of transformers library.

!pip install -q -U transformers
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.1/9.1 MB 23.0 MB/s eta 0:00:00

Let's login to Hugging Face.

from huggingface_hub import notebook_login notebook_login()
VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Let's load the model.

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration model_id = "merve/paligemma_vqav2" model = PaliGemmaForConditionalGeneration.from_pretrained(model_id) processor = AutoProcessor.from_pretrained("google/paligemma-3b-pt-224")

We have fine-tuned the model on visual question answering (VQAv2). Hence, we will pass an image to the model and ask a question about it. Below is a rather challenging image for vision language models. Pretrained PaliGemma responds below image and question with "antique".

from PIL import Image import requests prompt = "What is behind the cat?" image_file = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png?download=true" raw_image = Image.open(requests.get(image_file, stream=True).raw)

inputs = processor(prompt, raw_image.convert("RGB"), return_tensors="pt") output = model.generate(**inputs, max_new_tokens=20) print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
gramophone