# gm.data

[[[Source]]](https://github.com/google-deepmind/gemma/tree/main/gemma/gm/data/__init__.py)

```{eval-rst}
.. automodule:: gemma.gm.data
  :no-members:
```

## Symbols


### Class

|  |  |
--- | ---
[gm.data.AddSeq2SeqFields](AddSeq2SeqFields) | Adds the model `input`, `target` and `loss_mask`.
[gm.data.ContrastiveTask](ContrastiveTask) | Creates the contrastive model inputs for DPO-like loss.
[gm.data.DecodeBytes](DecodeBytes) | Decode `bytes` to `str`.
[gm.data.FormatText](FormatText) | Equivalent to `template.format(text=my_string)`.
[gm.data.MapInts](MapInts) | Replace each int by a new value.
[gm.data.Pad](Pad) | Add zeros to the end of the sequence to reach the max length.
[gm.data.Parquet](Parquet) | Parquet(*, _fake_refs: 'type[_FakeRefsUnset] | dict[str, _FakeRootCfg]' = <class 'kauldron.utils.config_util._FakeRefsUnset'>, batch_size: int | None = None, seed: Union[kauldron.ktyping.array_type_meta.UInt32['2'], kauldron.ktyping.array_type_meta.Fry[''], kauldron.ktyping.array_type_meta.KdPRNGKey, kauldron.ktyping.array_type_meta.ScalarInt, Sequence[int], NoneType] = _FakeRootCfg('cfg.seed'), transforms: 'tr_normalize.Transformations' = <factory>, num_epochs: 'Optional[int]' = None, batch_drop_remainder: 'bool | str | DropRemainder' = True, num_workers: 'int' = 16, read_options: 'grain.ReadOptions | None' = None, enable_profiling: 'bool' = False, per_worker_buffer_size: 'int' = 1, shard_by_process: 'bool' = True, worker_init_fn: 'Callable[[int, int], None] | None' = None, shuffle: 'bool', path: 'epath.PathLike | list[epath.PathLike]')
[gm.data.Seq2SeqTask](Seq2SeqTask) | Sequence-to-sequence task.
[gm.data.Tokenize](Tokenize) | Tokenize a string to ids.

### Function

|  |  |
--- | ---
[gm.data.make_seq2seq_fields](make_seq2seq_fields) | Create the model `input`, `target` and `loss_mask`.
[gm.data.pad](pad) | Add zeros to the end of the sequence to reach the max length.

```{toctree}
:hidden:

AddSeq2SeqFields
ContrastiveTask
DecodeBytes
FormatText
MapInts
Pad
Parquet
Seq2SeqTask
Tokenize
make_seq2seq_fields
pad
```