Skip to content

tlwolford-code/structured-spatial-geometry-l2i

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Project Overview

This project explores a Structured Prompt Engineering methodology to solve common spatial errors in Generative AI (specifically the "Nano Banana" model family).

By replacing natural language descriptions with a strict Cartesian (x, y, z) Coordinate System, this framework eliminates:

-Spatial Hallucinations: Objects floating or appearing in the wrong depth plane.

-Mirroring Errors: The model confusing "Subject's Left" with "Stage Left."

-Camera Drift: Inconsistent framing across generation attempts.

Methodology: The "Anchor" System Standard prompting relies on relative terms ("next to," "left of"). This project defines an Absolute Origin (0,0,0) to anchor the scene.

The "Red Ball" Calibration Experiment

  1. The Base JSON Structure (The "Control") This sets up an infinite white room (like the "Construct" in The Matrix) so depth and position are the only variables.
{
  "experiment_id": "spatial_integrity_test_001",
  "environment": {
    "type": "Infinite White Studio",
    "floor": {
      "material": "White matte with 1-meter grid lines",
      "grid_color": "Light gray (hex #D3D3D3)",
      "purpose": "Visual reference for depth and scale"
    },
    "lighting": "Softbox overhead (neutral white)",
    "camera_rig": {
      "position": [0, 1.5, -3], 
      "target": [0, 0.5, 0],
      "note": "Camera is fixed. It captures the 'stage' from the front center."
    }
  },
  "subject": {
    "entity": "Sphere",
    "properties": {
      "color": "Matte Red",
      "diameter": "0.5 meters",
      "physics": "Rigid body"
    },
    "spatial_state": {
      "coordinate_system": "Cartesian (x, y, z)",
      "origin": "Center of floor grid (0,0,0)",
      
  "CURRENT_POSITION": [0, 0, 0.25] 
  // CHANGE THIS VECTOR TO MOVE THE BALL
    }
  }
}
  1. The Test Cases (The "Variables")

Test A: The Origin (Center Stage)

Vector: [0, 0, 0.25]

Result: The ball sits perfectly in the center of the image, resting on the floor (radius 0.25m means center is at z=0.25).

Scientific Proof: It is equidistant from left and right frame edges.

About

Structured JSON prompting for deterministic 3D Cartesian spatial control in text-to-image models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors