gm.data

gm.data#

[[Source]]

Data pipeline ops.

Symbols#

Class#

gm.data.AddSeq2SeqFields

Adds the model input, target and loss_mask.

gm.data.ContrastiveTask

Creates the contrastive model inputs for DPO-like loss.

gm.data.DecodeBytes

Decode bytes to str.

gm.data.FormatText

Equivalent to template.format(text=my_string).

gm.data.MapInts

Replace each int by a new value.

gm.data.Pad

Add zeros to the end of the sequence to reach the max length.

gm.data.Parquet

Parquet(*, _fake_refs: ‘type[_FakeRefsUnset]

gm.data.Seq2SeqTask

Sequence-to-sequence task.

gm.data.Tokenize

Tokenize a string to ids.

Function#

gm.data.make_seq2seq_fields

Create the model input, target and loss_mask.

gm.data.pad

Add zeros to the end of the sequence to reach the max length.