Sampling#

Open in Colab

Example on how to load a Gemma model and run inference on it.

The Gemma library has 3 ways to prompt a model:

  • gm.text.ChatSampler: Easiest to use, simply talk to the model and get answer. Support multi-turns conversations out-of-the-box.

  • gm.text.Sampler: Lower level, but give more control. The chat state has to be manually handeled for multi-turn.

  • model.apply: Directly call the model, only predict a single token.

!pip install -q gemma
# Common imports
import os
import jax
import jax.numpy as jnp

# Gemma imports
from gemma import gm

By default, Jax do not utilize the full GPU memory, but this can be overwritten. See GPU memory allocation:

os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

Load the model and the params. Here we load the instruction-tuned version of the model.

model = gm.nn.Gemma3_4B()

params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA3_4B_IT)

Multi-turns conversations#

The easiest way to chat with Gemma is to use the gm.text.ChatSampler. It hides the boilerplate of the conversation cache, as well as the <start_of_turn> / <end_of_turn> tokens used to format the conversation.

Here, we set multi_turn=True when creating gm.text.ChatSampler (by default, the ChatSampler start a new conversation every time).

In multi-turn mode, you can erase the previous conversation state, by passing chatbot.chat(..., multi_turn=False).

sampler = gm.text.ChatSampler(
    model=model,
    params=params,
    multi_turn=True,
    print_stream=True,  # Print output as it is generated.
)

turn0 = sampler.chat('Share one methapore linking "shadow" and "laughter".')
Okay, here's a metaphor linking "shadow" and "laughter," aiming for a slightly evocative and layered feel:

**"Laughter is the fleeting shadow of joy, dancing across a face that’s often hidden in the long shadow of sorrow."**

---

**Here's a breakdown of why this works:**

*   **"Shadow"** represents sadness, pain, or a past experience that lingers. It’s not necessarily a dark shadow, but a persistent presence.
*   **"Laughter"** is presented as a brief, bright appearance – a momentary flash of happiness.
*   **"Dancing across a face that’s often hidden"** emphasizes that the joy isn't constant, and the underlying sadness is still there, obscuring it.

---

Would you like me to:

*   Try a different type of metaphor?
*   Expand on this one with a short story snippet?
turn1 = sampler.chat('Expand it in a haiku.')
Okay, here’s a haiku based on the metaphor:

Shadow stretches long,
Laughter’s brief, bright, dancing grace,
Joy hides in the dark. 

---

Would you like me to try another haiku, or perhaps a different poetic form?

Note: By default (multi_turn=False), the conversation state is reset everytime, but you can still continue the previous conversation by passing sampler.chat(..., multi_turn=True)

By default, greedy decoding is used. You can pass a custom sampling= method as kwargs:

  • gm.text.Greedy(): (default) Greedy decoding

  • gm.text.RandomSampling(): Simple random sampling with temperature, for more variety

Sample a prompt#

For more control, we also provide a gm.text.Sampler which still perform efficient sampling (with kv-caching, early stopping,…).

Prompting the sampler require to correctly add format the prompt with the <start_of_turn> / <end_of_turn> tokens (see the custom token section doc on tokenizer).

sampler = gm.text.Sampler(
    model=model,
    params=params,
)

prompt = """<start_of_turn>user
Give me a list of inspirational quotes.<end_of_turn>
<start_of_turn>model
"""

out = sampler.sample(prompt, max_new_tokens=1000)
print(out)
Okay, here's a list of inspirational quotes, categorized a little to give you a variety:

**On Perseverance & Resilience:**

*   “The only way to do great work is to love what you do.” – Steve Jobs
*   “Fall seven times, stand up eight.” – Japanese Proverb
*   “The difference between ordinary and extraordinary is that little extra.” – Jimmy Johnson
*   “Success is not final, failure is not fatal: It is the courage to continue that counts.” – Winston Churchill
*   “Don’t watch the clock; do what it does. Keep going.” – Sam Levenson
*   “When the going gets tough, the tough get going.” – Theodore Roosevelt


**On Self-Love & Confidence:**

*   “You are enough.” – Brené Brown
*   “Believe you can and you’re halfway there.” – Theodore Roosevelt
*   “You must be the change you wish to see in the world.” – Mahatma Gandhi
*   “The best is yet to come.” – Frank Sinatra
*   “Be the energy you want to attract.” – Tony Gaskins
*   “Don’t be defined by your past. Define your future.” – Unknown


**On Dreams & Goals:**

*   “If you can dream it, you can do it.” – Walt Disney
*   “The future belongs to those who believe in the beauty of their dreams.” – Eleanor Roosevelt
*   “Shoot for the moon. Even if you miss, you’ll land among the stars.” – Les Brown
*   “Start where you are. Use what you have. Do what you can.” – Arthur Ashe
*   “Life begins at the end of your comfort zone.” – Unknown


**On Happiness & Perspective:**

*   “Happiness is not something readymade. It comes from your own actions.” – Dalai Lama
*   “It’s not the triumph that matters, it’s the effort.” – Winston Churchill
*   “Don’t wait for the perfect moment, take the moment and make it perfect.” – Oscar Wilde
*   “Be present. Be grateful. Be you.” – Unknown
*   “The only way out is through.” – Robert Frost



**Short & Powerful:**

*   “Be the change.” – Mahatma Gandhi
*   “Just breathe.”
*   “Keep going.”
*   “You got this.”
*   “Dream big.”

---

**Resources for More Quotes:**

*   **BrainyQuote:** [https://www.brainyquote.com/](https://www.brainyquote.com/)
*   **Goodreads:** [https://www.goodreads.com/quotes](https://www.goodreads.com/quotes)
*   **Quote Garden:** [https://quotegarden.com/](https://quotegarden.com/)

To help me give you even more relevant quotes, could you tell me:

*   **What kind of inspiration are you looking for?** (e.g., motivation for work, overcoming challenges, self-love, etc.)
*   **Is there a particular theme or topic you'd like quotes about?**<end_of_turn>

Use the model directly#

Here’s an example of predicting a single token, directly calling the model.

The model input expectes encoded tokens. For this, we first need to encode the prompt with our tokenizer. See our tokenizer documentation for more information on using the tokenizer.

tokenizer = gm.text.Gemma3Tokenizer()

Note: When encoding the prompt, don’t forget to add the beginning-of-string token with add_bos=True. All prompts feed to the model should start by this token.

prompt = tokenizer.encode('One word to describe Paris: \n\n', add_bos=True)
prompt = jnp.asarray(prompt)

We then can call the model, and get the predicted logits.

# Run the model
out = model.apply(
    {'params': params},
    tokens=prompt,
    return_last_only=True,  # Only predict the last token
)


# Sample a token from the predicted logits
next_token = jax.random.categorical(
    jax.random.key(1),
    out.logits
)
tokenizer.decode(next_token)
'Romantic'

You can also display the next token probability.

tokenizer.plot_logits(out.logits)

Next steps#

  • See our multimodal example to query the model with images.

  • See our finetuning example to train Gemma on your custom task