Skip to content

OCR does not support fuzzy search in Chinese, but it can achieve fuzzy search in English. #23507

@pyccl

Description

@pyccl

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

For example, after checking the database, among the five Chinese characters "中关村延庆", only searching for "中关村延庆" can find it, and searching only for "中关村" or "延庆" will not yield any results. At the same time, there is' Open Olympic Games Clea Olypic Games', and searching for 'Game' will result in a normal search。
在检查数据库后,在五个汉字“中关村延庆”中,只有搜索“中关村延庆”才能找到它,而仅搜索“中关村”或“延庆”不会产生任何结果。同时,还有“Open Olympic Games Clea Olypic Games”,搜索“Game”将则可以正常搜索。

Image

经过测试,发现数据库中的每个字段都使用空格进行分段。在搜索两个空格之间的所有字符时,可以正常搜索。但是,如果只输入了部分字符,则不会获得任何结果。

Image Image Image

The OS that Immich Server is running on

Windows 11 25H2,DockerDesktop 4.49.0

Version of Immich Server

V2.2.1

Version of Immich Mobile App

V2.2.1

Platform with the issue

  • Server
  • Web
  • Mobile

Device make and model

Lenovo Notebook L450, Xiaomi Phone 13

Your docker-compose.yml content

#
# WARNING: To install Immich, follow our guide: https://docs.immich.app/install/docker-compose
#
# Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/data
      - /etc/localtime:/etc/localtime:ro
      - "E:/My photo:/mnt/media/myphoto"
      - ./geodata:/build/geodata
      - ./i18n-iso-countries/langs:/usr/src/app/server/node_modules/i18n-iso-countries/langs
    env_file:
      - .env
    ports:
      - '2283:2283'
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://docs.immich.app/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl, rknn] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
     # - "E:/immich/cache:/cache"
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/valkey/valkey:8@sha256:81db6d39e1bba3b3ff32bd3a1b19a6d69690f94a3954ec131277b9a26b95b3aa
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0@sha256:bcf63357191b76a916ae5eb93464d65c07511da41e3bf7a8416db519b40b1c23
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
      # Uncomment the DB_STORAGE_TYPE: 'HDD' var if your database isn't stored on SSDs
      # DB_STORAGE_TYPE: 'HDD'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
      - pgdata:/var/lib/postgresql/data
    shm_size: 128mb
    ports:
      - 5432:5432
    restart: always

  power-tools:
    container_name: immich_power_tools
    image: ghcr.io/varun-raj/immich-power-tools:latest
    ports:
      - "8001:3000"
    env_file:
      - .env
volumes:
  model-cache:
  pgdata:

Your .env content

# You can find documentation for all the supported env variables at https://docs.immich.app/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library

# The location where your database files are stored. Network shares are not supported for the database
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
TZ=Asia/Shanghai

# The Immich version to use. You can pin this to a specific version like "v2.1.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
# Please use only the characters `A-Za-z0-9`, without special characters or spaces
DB_PASSWORD=postgres

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. Search for keywords in Immich.
  2. Fuzzy search does not yield results.
  3. Use database tools to view the content recognized by OCR in the database.
  4. Testing using content between two spaces can search normally, but fuzzy search cannot.
  5. The English fuzzy search test is normal.

Relevant log output

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions