Skip to content

API Reference

Core

ImageEval

evalmedia.eval.ImageEval

Run image quality evaluations.

arun(image, prompt='', checks=None, rubric=None, judge=None) async staticmethod

Async evaluation entry point. Runs all checks concurrently.

run(image, prompt='', checks=None, rubric=None, judge=None) staticmethod

Synchronous evaluation entry point.

CheckResult

evalmedia.core.CheckResult

Bases: BaseModel

Result of a single check evaluation.

EvalResult

evalmedia.core.EvalResult

Bases: BaseModel

Result of a full evaluation (multiple checks).

summary()

Return a human-readable one-line summary.

to_dict()

Serialize to a dict suitable for agent consumption.

CompareResult

evalmedia.core.CompareResult

Bases: BaseModel

Result of comparing multiple images.

best()

Return the top-ranked (label, result) pair.

CheckStatus

evalmedia.core.CheckStatus

Bases: str, Enum

Status of a check evaluation.


Checks

BaseCheck

evalmedia.checks.base.BaseCheck

Bases: ABC

Abstract base class for all checks.

arun(image, prompt='', judge=None) async

Async entry point that handles image loading and timing.

evaluate(image, prompt, judge=None) abstractmethod async

Run the check on an image. Subclasses implement the logic.

run(image, prompt='', **kwargs)

Synchronous entry point.

VLMCheck

evalmedia.checks.base.VLMCheck

Bases: BaseCheck

Base for checks powered by a vision-language model judge.

evaluate(image, prompt, judge=None) async

Standard VLM evaluation flow: build prompt -> call judge -> parse.

get_check_prompt(prompt, **kwargs) abstractmethod

Return the VLM evaluation prompt for this check.

ClassicalCheck

evalmedia.checks.base.ClassicalCheck

Bases: BaseCheck

Base for checks using traditional CV/ML metrics. No judge needed.


Image Checks

PromptAdherence

evalmedia.checks.image.prompt_adherence.PromptAdherence

Bases: VLMCheck

Evaluates whether the generated image matches the intent of the prompt.

FaceArtifacts

evalmedia.checks.image.face_artifacts.FaceArtifacts

Bases: VLMCheck

Detects distorted faces, wrong features, and uncanny valley effects.

HandArtifacts

evalmedia.checks.image.hand_artifacts.HandArtifacts

Bases: VLMCheck

Detects extra/missing fingers, distorted hands, and impossible poses.

TextLegibility

evalmedia.checks.image.text_legibility.TextLegibility

Bases: VLMCheck

Evaluates whether text in the image is readable, correctly spelled, and coherent.

AestheticQuality

evalmedia.checks.image.aesthetic_quality.AestheticQuality

Bases: VLMCheck

Evaluates composition, lighting, color harmony, and overall visual appeal.

StyleConsistency

evalmedia.checks.image.style_consistency.StyleConsistency

Bases: VLMCheck

Evaluates whether the image matches the style of a reference image.

evaluate(image, prompt, judge=None) async

Evaluate style consistency — requires a reference image.

CLIPSimilarity

evalmedia.checks.image.clip_similarity.CLIPSimilarity

Bases: ClassicalCheck

Computes CLIP cosine similarity between prompt text and image.

evaluate(image, prompt, judge=None) async

Compute CLIP cosine similarity between the image and prompt.

ResolutionAdequacy

evalmedia.checks.image.resolution_adequacy.ResolutionAdequacy

Bases: ClassicalCheck

Checks whether the image resolution meets minimum requirements.

evaluate(image, prompt, judge=None) async

Check image dimensions against minimum requirements.


Judges

Judge Protocol

evalmedia.judges.base.Judge

Bases: Protocol

Protocol that all judge backends must implement.

JudgeResponse

evalmedia.judges.base.JudgeResponse

Bases: BaseModel

Structured response from a VLM judge.


Rubrics

Rubric

evalmedia.rubrics.base.Rubric

Bases: BaseModel

A named collection of weighted checks with a pass/fail threshold.

compute_result(check_results)

Compute weighted overall score and pass/fail from individual check results.

from_dict(data) classmethod

Create a rubric from a dictionary (e.g., parsed YAML).

from_yaml(path) classmethod

Load a rubric from a YAML file.

WeightedCheck

evalmedia.rubrics.base.WeightedCheck

Bases: BaseModel

A check with an associated weight for rubric scoring.


Configuration

set_judge

evalmedia.config.set_judge(name, **kwargs)

Set the default judge backend.

Parameters:

Name Type Description Default
name str

Judge name (e.g. "claude", "openai").

required
**kwargs object

Additional config overrides (e.g. api_key).

{}

compare

evalmedia.eval.compare(images, prompt, checks=None, rubric=None, judge=None, labels=None) async

Evaluate multiple images and rank them by score.


Integrations

openai_tool_schema

evalmedia.integrations.openai_tools.openai_tool_schema()

Return a tool definition compatible with OpenAI's function calling format.

Usage

tools = [openai_tool_schema()] response = client.chat.completions.create(..., tools=tools)

anthropic_tool_schema

evalmedia.integrations.anthropic_tools.anthropic_tool_schema()

Return a tool definition compatible with Anthropic's tool_use format.

Usage

tools = [anthropic_tool_schema()] response = client.messages.create(..., tools=tools)