gm.nn.SigLiPFromPatches

gm.nn.SigLiPFromPatches#

class gemma.gm.nn.SigLiPFromPatches(
siglip_encoder: gemma.gm.nn.vision._vision_utils.ViTModel = <factory>,
siglip_exit: gemma.gm.nn.vision._vision.VisionExit = <factory>,
num_mm_tokens_per_image_prepool: int = 4096,
num_mm_tokens_per_image: int = 256,
image_height: int = 896,
image_width: int = 896,
image_channels: int = 3,
apply_stop_gradient: bool = True,
parent: flax.linen.module.Module | flax.core.scope.Scope | flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>,
name: str | None = None,
)[source]

Bases: flax.linen.module.Module

SigLIP vision encoder forward pass from PatchifiedMedia.

siglip_encoder: gemma.gm.nn.vision._vision_utils.ViTModel
siglip_exit: gemma.gm.nn.vision._vision.VisionExit
num_mm_tokens_per_image_prepool: int = 4096
num_mm_tokens_per_image: int = 256
image_height: int = 896
image_width: int = 896
image_channels: int = 3
apply_stop_gradient: bool = True
patchify_images(
images: kauldron.ktyping.array_type_meta.Float['*B H W C'],
) kauldron.ktyping.array_type_meta.Float['*B P D'][source]

Patchify images.

Parameters:

images – The images to patchify.

Returns:

The patches of the images of shape (*batch, num_patches, patch_size * patch_size * channels)

name: str | None = None
parent: flax.linen.module.Module | flax.core.scope.Scope | flax.linen.module._Sentinel | None = None
scope: flax.core.scope.Scope | None = None