Skip to content

Checks

A check is the fundamental unit of evaluation. Each check answers a specific, bounded question about an image and returns a structured CheckResult.

Available Checks

VLM-Powered Checks

These use a vision-language model (Claude or GPT-4.1) as a judge.

PromptAdherence

Does the image match what was asked for?

from evalmedia.checks.image import PromptAdherence

check = PromptAdherence(threshold=0.6)  # default threshold
result = check.run(image="output.png", prompt="a cat on a table")

Evaluates: subject presence, spatial relationships, colors, styles, contradictions.

FaceArtifacts

Detects distorted faces, wrong eye count, melted features.

from evalmedia.checks.image import FaceArtifacts

check = FaceArtifacts(threshold=0.6)
result = check.run(image="portrait.png", prompt="portrait of a woman")

Looks for: wrong eye count, asymmetry, melted regions, uncanny valley effects. Returns score of 1.0 if no faces are present.

HandArtifacts

Detects extra/missing fingers, distorted hands.

from evalmedia.checks.image import HandArtifacts

check = HandArtifacts(threshold=0.6)
result = check.run(image="output.png", prompt="person waving")

Specifically counts fingers on each visible hand. Returns 1.0 if no hands present.

TextLegibility

Is text in the image readable and correctly spelled?

from evalmedia.checks.image import TextLegibility

check = TextLegibility(threshold=0.5)
result = check.run(image="banner.png", prompt="sale banner with 50% off")

Identifies all text elements, checks spelling and readability. Returns 1.0 if no text present.

AestheticQuality

Evaluates composition, lighting, color harmony, and overall appeal.

from evalmedia.checks.image import AestheticQuality

check = AestheticQuality(threshold=0.5)
result = check.run(image="output.png", prompt="landscape photo")

StyleConsistency

Does the image match a reference image's style?

from evalmedia.checks.image import StyleConsistency

check = StyleConsistency(reference="reference.png", threshold=0.5)
result = check.run(image="output.png", prompt="portrait in oil painting style")

Note

Requires a reference image. Returns SKIPPED if no reference is provided.

Classical Checks

These use traditional CV/ML metrics — no VLM judge needed.

CLIPSimilarity

CLIP cosine similarity between the prompt text and image.

from evalmedia.checks.image import CLIPSimilarity

check = CLIPSimilarity(threshold=0.25, model_name="ViT-B-32")
result = check.run(image="output.png", prompt="a red car")

Info

Requires pip install evalmedia[classical] for PyTorch and open-clip-torch.

ResolutionAdequacy

Is the image resolution sufficient?

from evalmedia.checks.image import ResolutionAdequacy

# Custom dimensions
check = ResolutionAdequacy(min_width=1024, min_height=1024)

# Or use a preset
check = ResolutionAdequacy(target="hd")     # 1920x1080
check = ResolutionAdequacy(target="4k")     # 3840x2160
check = ResolutionAdequacy(target="social") # 1080x1080

No API key needed — pure PIL-based check.

CheckResult

Every check returns a CheckResult:

result = check.run(image="output.png", prompt="...")

result.name          # "face_artifacts"
result.status        # CheckStatus.PASSED | FAILED | ERROR | SKIPPED
result.passed        # True/False
result.score         # 0.0 to 1.0
result.confidence    # 0.0 to 1.0 (judge's confidence)
result.reasoning     # "No facial artifacts detected..."
result.metadata      # check-specific extra data
result.threshold     # the threshold used for pass/fail
result.duration_ms   # how long the check took

Custom Thresholds

Every check accepts a threshold parameter. The check passes if score >= threshold:

# Strict — only pass if score is very high
check = FaceArtifacts(threshold=0.9)

# Lenient — pass even with some issues
check = FaceArtifacts(threshold=0.3)

Writing a Custom Check

from evalmedia.checks.base import VLMCheck

class BrandColorMatch(VLMCheck):
    name = "brand_color_match"
    display_name = "Brand Color Match"
    description = "Checks if the image uses brand colors."
    default_threshold = 0.6

    PROMPT_TEMPLATE = """
    Does this image use these brand colors: {colors}?
    The generation prompt was: "{prompt}"
    """

    def __init__(self, palette: list[str], **kwargs):
        super().__init__(**kwargs)
        self.palette = palette

    def get_check_prompt(self, prompt: str, **kwargs) -> str:
        return self.PROMPT_TEMPLATE.format(
            colors=", ".join(self.palette),
            prompt=prompt,
        )