gm.nn.SigLiPFromPatches#
- class gemma.gm.nn.SigLiPFromPatches(
- siglip_encoder: gemma.gm.nn.vision._vision_utils.ViTModel = <factory>,
- siglip_exit: gemma.gm.nn.vision._vision.VisionExit = <factory>,
- num_mm_tokens_per_image_prepool: int = 4096,
- num_mm_tokens_per_image: int = 256,
- image_height: int = 896,
- image_width: int = 896,
- image_channels: int = 3,
- apply_stop_gradient: bool = True,
- parent: flax.linen.module.Module | flax.core.scope.Scope | flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>,
- name: str | None = None,
Bases:
flax.linen.module.ModuleSigLIP vision encoder forward pass from PatchifiedMedia.
- siglip_encoder: gemma.gm.nn.vision._vision_utils.ViTModel
- siglip_exit: gemma.gm.nn.vision._vision.VisionExit
- num_mm_tokens_per_image_prepool: int = 4096
- num_mm_tokens_per_image: int = 256
- image_height: int = 896
- image_width: int = 896
- image_channels: int = 3
- apply_stop_gradient: bool = True
- patchify_images(
- images: kauldron.ktyping.array_type_meta.Float['*B H W C'],
Patchify images.
- Parameters:
images – The images to patchify.
- Returns:
The patches of the images of shape (*batch, num_patches, patch_size * patch_size * channels)
- name: str | None = None
- parent: flax.linen.module.Module | flax.core.scope.Scope | flax.linen.module._Sentinel | None = None
- scope: flax.core.scope.Scope | None = None