gm.data.AddSeq2SeqFields

gm.data.AddSeq2SeqFields#

class gemma.gm.data.AddSeq2SeqFields(*, in_prompt: typing.Annotated[typing.Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>], in_response: typing.Annotated[typing.Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>], out_input: typing.Annotated[typing.Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>], out_target: typing.Annotated[typing.Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>], out_target_mask: typing.Annotated[typing.Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>])[source]

Bases: grain._src.core.transforms.Map

Adds the model input, target and loss_mask.

From prompt and response token ids, generate the model input, target and loss_mask.

Example:

# Input:
{
    'prompt': [10, 11, 12, 13],
    'response': [20, 21, 1],  # Here, response ends with EOS token.
}
# Ouptut:
{
    'input':       [10, 11, 12, 13, 20, 21],
    'target':      [11, 12, 13, 20, 21,  1],
    'target_mask': [ 0,  0,  0,  1,  1,  1],
}

Note

  • Input and target are the same sequence shifted by one token.

  • The last token from the target is truncated from the input (as there’s no target for it)

in_prompt

Input key

Type:

Any

in_response

Input key

Type:

Any

out_input

Output key (will be added to the example dict)

Type:

Any

out_target

Output key (will be added to the example dict)

Type:

Any

out_target_mask

Output key (will be added to the example dict)

Type:

Any

in_prompt: Annotated[Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>]
in_response: Annotated[Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>]
out_input: Annotated[Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>]
out_target: Annotated[Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>]
out_target_mask: Annotated[Any, <kauldron.kontext.annotate._KeyToken object at 0x7001c239ecf0>]
map(element)[source]

Maps a single element.