图片分析工具

众所周知，我不太喜欢起那么惊天动地的标题，所以若是你看到了这个页面，那你就看到了一个非常有用的工具。

这是一个用于图像标注和分析的 Python 脚本的详细分析，该脚本名为 image_annotator_ollama_fixed_v2.py。它是一个使用 PyQt6 构建的图形用户界面（GUI）工具，核心功能是利用本地运行的 Ollama 平台上的 llava:13b 视觉-语言模型来批量分析图像，并支持结果的存储、搜索、统计和多语言翻译。

🎨 图像分析工具：`image_annotator_ollama_fixed_v2.py` 深度解析

这个 Python 脚本是一个功能强大的、高度本地化的图像批处理工具。它巧妙地结合了最新的本地 AI 模型（Ollama/LLaVA）、稳健的并发处理和用户友好的 GUI 界面，实现了图片分析的全程离线操作。

🌟 核心特性概览

真正的离线 AI (True Local AI): 所有图像分析请求都通过 localhost:11434 发送给本地运行的 Ollama 服务，确保数据安全和离线可用性。
并发处理优化: 使用 线程池（ThreadPoolExecutor） 进行图像预处理（生成 Base64 编码和缩略图），并使用 单消费者队列（Single-Consumer Queue） 严格控制 GPU 推理，以防止并发模型调用导致的显存溢出（OOM）。
多语言支持: 集成了 Helsinki-NLP/opus-mt 翻译模型（基于 transformers 库），或在不可用时回退到 Ollama 文本模型（gemma:2b）进行中英文双向翻译。
持久化存储: 使用 SQLite 数据库 存储分析结果，包括原始路径、英文/中文关键词和描述。
状态管理: 支持分析进度的保存和加载（analysis_state.json），方便中断和恢复任务。
GUI 界面: 基于 PyQt6 框架，提供了一个包含“分析”、“搜索”和“统计”三个页面的用户界面。

⚙️ 技术架构与关键组件分析

脚本的结构清晰，主要分为配置、辅助函数、数据库、工作线程和 GUI 界面五大部分。

1. 配置与依赖 (Configuration & Imports)

模型与 API:
- OLLAMA_API_URL: http://localhost:11434/api/generate，指向本地 Ollama API。
- OLLAMA_MODEL: llava:13b，用于图像分析的主力模型。
- OLLAMA_TIMEOUT/OLLAMA_RETRIES: 设置了请求超时时间（120 秒）和重试次数（2 次），增强了网络通信的健壮性。
翻译模型:
- HF_TRANSLATE_MODEL: Helsinki-NLP/opus-mt-en-zh，首选的本地翻译模型。
文件与并发:
- DB_FILE/STATE_FILE: 定义了 SQLite 数据库文件和会话状态文件的路径。
- MAX_PREPROCESS_THREADS: 设置了 6 个线程用于图像的并发预处理，平衡了 CPU 负载。
库依赖: 引入了 requests, PIL (Pillow), sqlite3, threading, PyQt6, matplotlib, 以及可选的 transformers。

2. 图像与文本处理 (Image & Text Helpers)

clean_ollama_text(text): 这是一个关键的启发式（Heuristic）清理函数。由于 Ollama 模型输出可能包含不需要的 JSON 片段或元数据，该函数尝试通过查找并截断 { 符号来清洗文本，确保只保留自然语言描述。
make_thumbnail(path): 使用 PIL 库生成 PNG 格式的缩略图，并将其转换为 QPixmap 供 PyQt6 界面显示。
image_to_b64(path, max_dim=1024): 加载图像，将其最大边限制在 1024 像素，然后编码为 Base64 字符串，这是 LLaVA 模型 API 接受的输入格式。

3. 数据库与数据层 (Database Layer)

init_db(path): 初始化 SQLite 数据库，并创建 images 表。该表使用了兼容多语言的旧 Schema：
- rel_path (UNIQUE): 相对路径（用于索引和冲突解决）。
- key/key_zh: 英文/中文关键词。
- descript/descript_zh: 英文/中文描述。
upsert_record(...): 使用 ON CONFLICT(rel_path) DO UPDATE SET... 实现**插入或更新（Upsert）**操作，确保相同文件只会有一条最新的记录。
query_records_by_term(term): 在四个字段（key, key_zh, descript, descript_zh）上执行模糊查询 (LIKE %term%)，实现搜索功能。
get_keyword_rank(limit=50): 读取所有关键词，进行逗号分割和计数，生成关键词排名列表，供统计图表使用。

4. 翻译器 (Translator Class)

Translator 类封装了翻译逻辑。
首选方案: 使用 transformers 库加载 Helsinki-NLP/opus-mt-en-zh 模型进行本地翻译。
回退方案: 如果 HF 模型加载失败，则向 Ollama 发送一个纯文本请求，使用另一个通用文本模型（如 gemma:2b）来执行翻译任务。

5. Ollama 通信与工作线程

call_ollama_model(...): 负责与 Ollama API 进行 HTTP 通信。它处理了 Base64 图像数据的封装、多重尝试（Retries）以及对模型返回结果的健壮性解析，尝试从 response, output, text 等常用键中提取文本。
OllamaWorker (单消费者): 这是脚本的核心，它继承自 threading.Thread。
- 它从任务队列 (self.q) 中获取任务（rel, b64, prompt）。
- 执行高耗时的 call_ollama_model 调用。
- 结果解析（最关键的逻辑）: 在 run 方法内部，它首先使用 clean_ollama_text 清理原始输出，然后使用不区分大小写的 rfind('Keywords:') 逻辑来精确分割描述（descript_en）和关键词（keywords_en_str），并限制关键词数量（最多 8 个）。
- 调用 self.translator.translate_en_to_zh 完成中英文互译。
- 通过 result_callback 将结果传递回主 GUI 线程。
AnalyzerThread (主分析流程): 继承自 QThread，负责任务的调度和进度管理。
- 文件遍历: 递归遍历根目录，并检查数据库以跳过已完成的文件（实现断点续传/Resume）。
- 并发预处理: 使用 ThreadPoolExecutor 并发地将所有待处理图像转换为 Base64。
- 任务入队: 为每个预处理成功的图像生成一个详细的 Prompt（提示词），并将其放入 OllamaWorker 的任务队列中。
- 使用 sig_progress 和 sig_update 信号与主线程进行通信，更新进度条和列表状态。

🖥️ GUI 界面分析 (MainWindow)

MainWindow 类是整个应用程序的入口。

1. Analysis Tab (分析页)

目录选择与状态管理: 允许用户选择根目录，并自动尝试加载 analysis_state.json 以恢复上一次的会话。
列表视图 (QListWidget): 显示根目录下所有支持的图像文件，并实时显示它们的状态（Pending、Queued、Done、Failed）。
控制按钮: Start Analysis、Stop、Save Progress、Clear DB（清除所有记录）。
进度条 (QProgressBar): 显示分析进度的百分比。
on_worker_result: 从 OllamaWorker 接收结果后，执行数据库 upsert 操作，并将状态更新信号发回主线程。

2. Search Tab (搜索页)

搜索框 (QLineEdit): 接受用户输入的关键词（支持中/英文）。
结果表格 (QTableWidget): 用于显示搜索结果，列头清晰：Thumbnail、Filename、Keywords (EN/ZH)、Description (EN)、Description (ZH)。
交互优化: * 缩略图显示: 在第一列显示图像缩略图，增强了视觉效果。
- 列宽设置: 缩略图、文件名、关键词采用固定或自适应内容模式，而中英文描述列采用 拉伸（Stretch） 模式，最大化利用窗口空间。
- 双击打开文件: 通过 open_image_from_row 方法，双击表格行可以调用系统默认程序打开对应的图像文件，极大地提高了易用性。

3. Stats Tab (统计页)

控制: 允许用户通过 QSpinBox 设置要统计的 Top N 关键词数量，并提供 Refresh Stats 和 Export CSV 按钮。
可视化: 使用 Matplotlib 绘制关键词的水平条形图（Horizontal Bar Chart）。水平条形图避免了在关键词过多时标签重叠的问题，增强了可读性。
get_keyword_rank 确保了统计数据来源于数据库中所有已处理的关键词。

⚠️ 潜在问题与注意事项

Ollama 依赖: 脚本完全依赖于本地 Ollama 服务及其配置的模型（llava:13b）。如果 Ollama 未运行、模型未拉取或 API 地址错误，脚本将无法工作。
翻译模型体积: 如果启用了 transformers 的本地翻译，初次运行时下载模型（Helsinki-NLP/opus-mt-en-zh）会占用大量时间和磁盘空间。
解析健壮性: 虽然脚本包含 clean_ollama_text 和精确的关键词分割逻辑，但如果 LLaVA 模型输出格式发生重大变化（例如不包含 "Keywords:" 标签），解析逻辑仍可能失败，导致关键词缺失。
跨平台文件打开: _open_file_in_system 方法使用了跨平台逻辑（os.startfile for Windows, open for macOS, xdg-open for Linux），但对于某些不常见的 Linux 发行版，xdg-open 可能需要确保已安装。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
image_annotator_ollama_fixed_v2.py
GUI (PyQt6) tool to analyze images using local Ollama llava:13b model.
Features:
- True local (offline) usage via localhost:11434 (Ollama) — no external network calls.
- Preprocessing (thread pool) for base64/thumb generation.
- Single-consumer queue for GPU inference (to avoid concurrent model OOM).
- Robust parsing and improved prompt for rich, detailed descriptions.
- Save/Load analysis progress (analysis_state.json in script dir).
- SQLite DB to store results (Original Schema: rel_path, key, key_zh, descript, descript_zh, extra_json, updated_at).
- Search and Stats tabs (with matplotlib bar chart for keyword ranking).
- Stop/Resume, Save progress button.
- Integrated local translation (Helsinki opus-mt or Ollama fallback).
"""

import os
import subprocess
import sys
import io
import json
import time
import base64
import sqlite3
import threading
import traceback
import platform
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime
from collections import Counter

import requests
from PIL import Image, ImageFilter, ImageStat

# --- 可选：本地翻译（使用 transformers） ---
try:
    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

    HF_TRANSLATOR_AVAILABLE = True
except Exception:
    HF_TRANSLATOR_AVAILABLE = False

# PyQt6
from PyQt6.QtWidgets import (
    QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, QLabel,
    QPushButton, QFileDialog, QListWidget, QListWidgetItem, QProgressBar,
    QTabWidget, QLineEdit, QTableWidget, QTableWidgetItem, QHeaderView,
    QMessageBox, QSpinBox, QSizePolicy
)
from PyQt6.QtCore import Qt, QSize, QThread, pyqtSignal, QTimer
from PyQt6.QtGui import QPixmap

# Matplotlib for stats
from matplotlib.figure import Figure
import numpy as np

# NOTE: PyQt6 uses Qt6, but matplotlib often uses qt5agg backend in examples.
# We'll stick to FigureCanvasQTAgg which typically auto-selects if running outside a custom environment.
# Since the original new code used qt5agg, we'll try to keep it, but note the potential issue on pure PyQt6 environments.
try:
    from matplotlib.backends.backend_qt6agg import FigureCanvasQTAgg as FigureCanvas
except ImportError:
    from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas

# ---------------- Configuration ----------------
OLLAMA_API_URL = "http://localhost:11434/api/generate"
OLLAMA_MODEL = "llava:13b"  # your model
OLLAMA_TIMEOUT = 120  # seconds per request
OLLAMA_RETRIES = 2
HF_TRANSLATE_MODEL = "Helsinki-NLP/opus-mt-en-zh"

DB_FILE = os.path.join(os.path.dirname(__file__), "image_annotations.db")
STATE_FILE = os.path.join(os.path.dirname(__file__), "analysis_state.json")
THUMB_SIZE = (160, 90)
SUPPORTED_EXT = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.webp', '.tiff'}
MAX_PREPROCESS_THREADS = 6  # adjust for your CPU


# ------------------------------------------------
# ------------------------------------------------
# ... (Configuration section)
# ------------------------------------------------

# ---------- Text Cleaning Helper ----------
def clean_ollama_text(text):
    """
    Attempts to clean up raw Ollama text output that might contain JSON or
    unwanted boilerplate/metadata, aiming to return only the natural language part.
    """
    if not isinstance(text, str):
        return ""

    # 1. Remove common Ollama metadata/JSON snippets that might not be stripped by call_ollama_model
    # This specifically targets the unwanted JSON that seems to be polluting the output
    # Look for the start of a dictionary and try to remove everything from it onwards
    # Note: This is a heuristic and might need adjustment if models change their output format.

    cleaned_text = text

    # Find the first occurrence of '{' that is not at the very start (which might be the description itself)
    # This assumes metadata/JSON starts later in the string
    first_brace = cleaned_text.find('{', 10)
    if first_brace > -1 and first_brace < len(
            cleaned_text) / 2:  # Heuristic: if JSON starts too early, it might be part of the description
        cleaned_text = cleaned_text[:first_brace].strip()

    # 2. Remove common trailing model info if the above fails to catch it
    if 'Keywords:' in cleaned_text:
        # If keywords are present, clean up the description part up to keywords
        parts = cleaned_text.split("Keywords:", 1)
        description_part = parts[0].strip()
        # Look for JSON at the end of the description part
        first_brace_desc = description_part.find('{', len(description_part) // 2)
        if first_brace_desc > -1:
            description_part = description_part[:first_brace_desc].strip()

        cleaned_text = description_part + (f" Keywords:{parts[1]}" if len(parts) > 1 else "")

    # Simple final cleanup (e.g., if there's trailing garbage)
    cleaned_text = cleaned_text.strip()

    # Remove leading/trailing quotes if they look like simple string delimiters
    if cleaned_text.startswith('"') and cleaned_text.endswith('"'):
        cleaned_text = cleaned_text.strip('"')

    return cleaned_text


# ------------------------------------------------
# ---------- Database (FIXED to old schema) ----------
def init_db(path=DB_FILE):
    conn = sqlite3.connect(path, check_same_thread=False)
    cur = conn.cursor()
    # Reverting to old schema for compatibility and multilingual support
    cur.execute('''
                CREATE TABLE IF NOT EXISTS images
                (
                    id
                    INTEGER
                    PRIMARY
                    KEY
                    AUTOINCREMENT,
                    rel_path
                    TEXT
                    UNIQUE,
                    key
                    TEXT,
                    key_zh
                    TEXT,
                    descript
                    TEXT,
                    descript_zh
                    TEXT,
                    extra_json
                    TEXT,
                    updated_at
                    TEXT
                )
                ''')
    conn.commit()
    return conn


DB_CONN = init_db()


# NOTE: Arguments adjusted to match the new schema structure for 'new' data, but inserts into 'old' schema.
# rel_path: relative path
# descript_en: The rich English description (was 'result_text' in new) -> maps to 'descript'
# keywords_en_str: The comma-separated English keywords (was 'keywords' in new) -> maps to 'key'
def upsert_record(rel_path, descript_en, keywords_en_str, key_zh, descript_zh, extra_json_dict=None):
    now = datetime.utcnow().isoformat()
    cur = DB_CONN.cursor()
    cur.execute('''
                INSERT INTO images(rel_path, key, key_zh, descript, descript_zh, extra_json, updated_at)
                VALUES (?, ?, ?, ?, ?, ?, ?) ON CONFLICT(rel_path) DO
                UPDATE SET
                    key =excluded.key,
                    key_zh=excluded.key_zh,
                    descript=excluded.descript,
                    descript_zh=excluded.descript_zh,
                    extra_json=excluded.extra_json,
                    updated_at=excluded.updated_at
                ''',
                (rel_path, keywords_en_str, key_zh, descript_en, descript_zh, json.dumps(extra_json_dict or {}), now))
    DB_CONN.commit()


# Query adjusted to search old schema fields (key/key_zh/descript/descript_zh)
def query_records_by_term(term):
    cur = DB_CONN.cursor()
    like = f"%{term}%"
    # Select: rel_path, key, key_zh, descript, descript_zh
    cur.execute(
        'SELECT rel_path, key, key_zh, descript, descript_zh FROM images WHERE key LIKE ? OR key_zh LIKE ? OR descript LIKE ? OR descript_zh LIKE ?',
        (like, like, like, like))
    return cur.fetchall()


# Keyword ranking adjusted to read from old schema (key)
def get_keyword_rank(limit=50):
    cur = DB_CONN.cursor()
    cur.execute('SELECT key FROM images')
    rows = cur.fetchall()
    counter = Counter()
    total = 0
    for (k,) in rows:
        if not k:
            continue
        total += 1
        # k is a comma-separated string of keywords
        for part in k.split(','):
            w = part.strip().lower()
            if w:
                counter[w] += 1
    items = counter.most_common(limit)
    return items, total


# ---------- Image helpers (Keep as is from new code) ----------
def make_thumbnail(path, size=THUMB_SIZE):
    try:
        img = Image.open(path).convert("RGBA")
        # Ensure we use LANCZOS for quality resizing
        img.thumbnail(size, Image.Resampling.LANCZOS)
        buf = io.BytesIO()
        img.save(buf, format='PNG')
        buf.seek(0)
        pix = QPixmap()
        pix.loadFromData(buf.read(), "PNG")
        return pix
    except Exception:
        # traceback.print_exc()
        return QPixmap()


def image_to_b64(path, max_dim=1024):
    img = Image.open(path).convert("RGB")
    w, h = img.size
    max_side = max(w, h)
    if max_side > max_dim:
        scale = max_dim / max_side
        img = img.resize((int(w * scale), int(h * scale)), Image.Resampling.LANCZOS)
    buf = io.BytesIO()
    img.save(buf, format='JPEG', quality=85)
    return base64.b64encode(buf.getvalue()).decode('utf-8')


# Removed: get_basic_meta as it was unused in the core loop

# ---------- Translator (from old code) ----------
class Translator:
    def __init__(self):
        self.hf_available = HF_TRANSLATOR_AVAILABLE
        self.model = None
        self.tokenizer = None
        if self.hf_available:
            try:
                print("Loading HF translator model (this may take time)...")
                self.tokenizer = AutoTokenizer.from_pretrained(HF_TRANSLATE_MODEL)
                self.model = AutoModelForSeq2SeqLM.from_pretrained(HF_TRANSLATE_MODEL)
                # You can add model.to("cuda") if you need GPU acceleration and have it configured
            except Exception as e:
                print("HF translation model load failed, will fallback to Ollama. Error:", e)
                self.hf_available = False

    def translate_en_to_zh(self, text):
        if not text:
            return ""
        if self.hf_available and self.model:
            try:
                inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
                out = self.model.generate(**inputs, max_new_tokens=256)
                zh = self.tokenizer.decode(out[0], skip_special_tokens=True)
                return zh
            except Exception as e:
                print("HF translate failed, falling back to Ollama:", e)
                # fallthrough to ollama

        # fallback: ask Ollama to translate (note: requires a non-llava model like gemma:2b if llava isn't good at pure text)
        prompt = f"Translate the following English text to Chinese (Simplified) preserving meaning and punctuation. Return only the translation.\n\nText:\n{text}\n\nTranslation:"
        # Use a simplified call structure for pure text generation
        payload = {
            "model": "gemma:2b",  # Use a common non-vision model for better text quality
            "prompt": prompt,
            "stream": False,
        }
        try:
            resp = requests.post(OLLAMA_API_URL, json=payload, timeout=20)  # shorter timeout for text
            resp.raise_for_status()
            j = resp.json()
            # Try common keys for text output
            for k in ('output', 'text', 'result', 'response'):
                if k in j and isinstance(j[k], str):
                    return j[k].strip().splitlines()[0]  # Take first line
            return str(j).strip()
        except Exception as e:
            print("Ollama fallback translate failed:", e)
            return ""


# ---------- Ollama calling (single consumer) ----------
def call_ollama_model(base64_image, prompt_text, model=OLLAMA_MODEL, timeout=OLLAMA_TIMEOUT, retries=OLLAMA_RETRIES):
    """
    Call local Ollama server with image base64 and prompt. Retries internally.
    Returns (success:bool, text:str)
    """
    # Removed the initial environment variable check for full offline usage

    payload = {
        "model": model,
        "prompt": prompt_text,
        "stream": False,
        "images": [base64_image]
    }
    for attempt in range(retries + 1):
        try:
            resp = requests.post(OLLAMA_API_URL, json=payload, timeout=timeout)
            resp.raise_for_status()

            # --- Robust parsing logic from new code ---
            try:
                j = resp.json()
            except Exception:
                # sometimes Ollama returns raw text
                txt = resp.text
                return True, txt

            # response structure may vary; attempt common keys
            if isinstance(j, dict):
                # possible keys: 'output','text','result','choices'
                # Ollama 'generate' API typically uses 'response' for non-streaming output
                for k in ('response', 'output', 'text', 'result'):
                    if k in j and isinstance(j[k], str):
                        return True, j[k]
                if 'choices' in j and isinstance(j['choices'], list) and j['choices']:
                    ch = j['choices'][0]
                    if isinstance(ch, dict) and 'message' in ch:
                        return True, ch['message'].get('content', str(ch))
                    else:
                        return True, str(ch)
                # fallback to full json string
                return True, json.dumps(j, ensure_ascii=False)
            else:
                return True, str(j)

        except Exception as e:
            last_err = str(e)
            time.sleep(1 + attempt * 2)
            continue
    return False, f"Ollama request failed after {retries + 1} attempts: {last_err}"


# ---------- Worker thread and queue ----------
class OllamaWorker(threading.Thread):
    def __init__(self, task_queue, result_callback, translator: Translator):
        super().__init__(daemon=True)
        self.q = task_queue
        self._stop = threading.Event()
        self.result_callback = result_callback  # fn(rel, success, text, keywords_str, key_zh, descript_zh)
        self.translator = translator

    def stop(self):
        self._stop.set()

    def run(self):
        while not self._stop.is_set():
            try:
                task = self.q.get(timeout=0.5)
            except Empty:
                continue

            rel, b64, prompt = task
            success, text = call_ollama_model(b64, prompt)

            # --- Extract Keywords (keywords_en_str) and Description (descript_en) ---
            keywords_en_str = ""
            descript_en = ""
            key_zh = ""
            descript_zh = ""

            # 文件: image_annotator_ollama_fixed_v2.py

            # ... (inside OllamaWorker.run method)

            if success:
                # --- NEW: Clean the raw text before parsing ---
                cleaned_text = clean_ollama_text(text)
                # 打印前200字符，用于调试，确认模型输出是否包含关键词标签
                print(f"1:cleaned text (before split): {cleaned_text[:200]}...")

                # 初始化为空，以防解析失败
                descript_en = ""
                keywords_en_str = ""

                # 我们需要一个对 "Keywords:" 标签大小写不敏感的健壮分割方法
                keywords_delimiter = "Keywords:"
                index = -1

                # 查找分隔符，使用 rfind 确保找到的是最后一个 "Keywords:" (如果模型错误输出了多个)
                lower_cleaned_text = cleaned_text.lower()
                index = lower_cleaned_text.rfind(keywords_delimiter.lower())

                if index != -1:
                    # **情况一：找到分隔符。正常提取描述和关键词。**

                    # 描述是分隔符之前的所有内容
                    descript_en = cleaned_text[:index].strip()
                    # 关键词是分隔符之后的所有内容
                    keywords_raw = cleaned_text[index + len(keywords_delimiter):].strip()

                    # 确保描述部分不为空（防止模型在分隔符前没有输出任何内容）
                    if not descript_en and keywords_raw:
                        # 如果描述为空，但关键词有内容，这是一个异常情况，我们保留关键词，并设置描述为 "N/A"
                        descript_en = "N/A: Model failed to provide rich description."

                    # 过滤和清理关键词 (最多保留 8 个)
                    kws = [t.strip() for t in keywords_raw.split(',') if 2 <= len(t.strip()) <= 30]
                    keywords_en_str = ",".join(kws[:8])

                    print(f"2:descript (after split): {descript_en[:50]}...")
                    print(f"3:keywords (after split): {keywords_en_str[:50]}...")

                else:
                    # **情况二：未找到分隔符。假定整个输出都是描述。**
                    descript_en = cleaned_text.strip()
                    keywords_en_str = ""
                    print(
                        f"Warning: Keywords delimiter '{keywords_delimiter}' not found in model output for {rel}. Keywords will be empty. (Model failure to follow prompt)")

                # --- Translation (Integrated from old code) ---
                # 无论关键词是否为空，都执行翻译逻辑。如果 keywords_en_str 为空，则 key_zh 也为空。
                if keywords_en_str:
                    key_zh = self.translator.translate_en_to_zh(keywords_en_str)
                if descript_en:
                    # Note: Translating the rich, multi-sentence description might take longer
                    descript_zh = self.translator.translate_en_to_zh(descript_en)
            # ... (rest of the OllamaWorker.run method)

            # callback (rel, success, descript_en, keywords_en_str, key_zh, descript_zh)
            try:
                self.result_callback(rel, success, descript_en, keywords_en_str, key_zh, descript_zh)
            except Exception:
                traceback.print_exc()
            self.q.task_done()


# ---------- GUI Thread for analysis progression (AnalyzerThread) ----------
class AnalyzerThread(QThread):
    sig_update = pyqtSignal(str, str)  # rel, status
    sig_progress = pyqtSignal(int, int)  # processed, total
    sig_done = pyqtSignal()

    def __init__(self, root_dir, task_queue, main_window_ref):
        super().__init__()
        self.root_dir = root_dir
        self._stop = False
        self.task_queue = task_queue
        self.main_window_ref = main_window_ref  # To update UI from worker result

    def request_stop(self):
        self._stop = True

    def run(self):
        # build file list preserving order
        all_files = []
        for dp, dns, fns in os.walk(self.root_dir):
            for fn in fns:
                if os.path.splitext(fn)[1].lower() in SUPPORTED_EXT:
                    abs_path = os.path.join(dp, fn)
                    rel = os.path.relpath(abs_path, self.root_dir)
                    # Check if already processed (to handle resume logic better)
                    cur = DB_CONN.cursor()
                    cur.execute('SELECT 1 FROM images WHERE rel_path=?', (rel,))
                    if cur.fetchone() is None:
                        all_files.append((abs_path, rel))

        total_remaining = len(all_files)
        processed_count = 0

        # Preprocess in threadpool: prepare base64 (parallel)
        preprocess_results = {}
        # Max_workers adjusted for CPU (6 is the max_preprocess_threads)
        with ThreadPoolExecutor(max_workers=MAX_PREPROCESS_THREADS) as ex:
            future_to_rel = {ex.submit(image_to_b64, abs_path): rel for abs_path, rel in all_files}
            for fut in as_completed(future_to_rel):
                if self._stop:
                    break
                rel = future_to_rel[fut]
                try:
                    b64 = fut.result()
                    preprocess_results[rel] = b64
                except Exception as e:
                    print(f"Preprocess failed for {rel}: {e}")
                    preprocess_results[rel] = None

        # If stopped during preprocess
        if self._stop:
            self.sig_done.emit()
            return

        # push tasks to queue (but worker will consume serially)
        for abs_path, rel in all_files:
            if self._stop:
                self.sig_update.emit(rel, "Stopped")
                break

            # Skip if preprocess failed
            b64 = preprocess_results.get(rel)
            if not b64:
                self.sig_update.emit(rel, "Preprocess failed")
                processed_count += 1
                self.sig_progress.emit(processed_count, total_remaining)
                continue

            self.sig_update.emit(rel, "Queued")

            # --- IMPROVED PROMPT (from new code) ---
            prompt = (
                "You are a detailed image analyst. For the supplied image, produce a multi-sentence, "
                "rich natural-language description that includes: people (approx. number, age group, clothing, pose, expression), "
                "objects, background/setting, lighting, colors, mood, and any notable details. "
                "Then provide a short comma-separated list of 5-10 keywords labeled 'Keywords:'. "
                "Return plain text only.\n\nExample:\nA young woman... \nKeywords: woman,smile,indoor,white-sweater,portrait"
            )

            # enqueue
            self.task_queue.put((rel, b64, prompt))

            processed_count += 1
            self.sig_progress.emit(processed_count, total_remaining)

        # wait for queue to be empty or stop
        self.task_queue.join()

        # Final progress update if queue finished
        if not self._stop:
            self.sig_progress.emit(total_remaining, total_remaining)

        self.sig_done.emit()


# ---------- Main Window ----------
class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.setWindowTitle("Image Annotator - llava:13b (Local Ollama)")
        self.setMinimumSize(1100, 700)
        self.root_dir = None
        self.translator = Translator()  # Initialize translator

        # queue & worker
        self.task_queue = Queue()
        self.ollama_worker = OllamaWorker(self.task_queue, self.on_worker_result, self.translator)
        self.ollama_worker.start()

        # analyzer thread
        self.analyzer = None

        # UI
        tabs = QTabWidget()
        tabs.addTab(self.build_analysis_tab(), "Analysis")
        tabs.addTab(self.build_search_tab(), "Search")
        tabs.addTab(self.build_stats_tab(), "Stats")
        self.setCentralWidget(tabs)

        # attempt auto-load previous state (from new code)
        self._check_load_state()

    # --- State Management (from new code) ---
    def _check_load_state(self):
        if os.path.exists(STATE_FILE):
            try:
                with open(STATE_FILE, 'r', encoding='utf-8') as f:
                    st = json.load(f)
                if st.get('root_dir'):
                    resp = QMessageBox.question(self, "Resume previous session",
                                                f"Found previous analysis state for directory:\n{st.get('root_dir')}\nLoad it?",
                                                QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No)
                    if resp == QMessageBox.StandardButton.Yes:
                        self.root_dir = st.get('root_dir')
                        self.dir_label.setText(self.root_dir)
                        self.populate_list_from_dir()
            except Exception:
                pass

    def _save_basic_state(self, running=False):
        st = {"root_dir": self.root_dir, "running": running, "saved_at": datetime.utcnow().isoformat()}
        try:
            with open(STATE_FILE, 'w', encoding='utf-8') as f:
                json.dump(st, f, ensure_ascii=False, indent=2)
        except Exception:
            pass

    def save_state(self):
        # NOTE: Simplified save logic to just save the root_dir (as list item status is derived from DB/Worker now)
        if not self.root_dir:
            QMessageBox.warning(self, "No dir", "Select directory first")
            return
        self._save_basic_state(running=False)
        QMessageBox.information(self, "Saved", f"Basic state saved to {STATE_FILE}")

    def clear_db_prompt(self):
        ok = QMessageBox.question(self, "Clear DB", "This will delete all saved image analysis records. Continue?",
                                  QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No)
        if ok != QMessageBox.StandardButton.Yes:
            return
        cur = DB_CONN.cursor()
        cur.execute('DELETE FROM images')
        DB_CONN.commit()
        QMessageBox.information(self, "Cleared", "Database cleared.")
        self.populate_list_from_dir()

    # ------ Analysis tab ------
    def build_analysis_tab(self):
        w = QWidget()
        v = QVBoxLayout()
        h = QHBoxLayout()
        self.dir_label = QLabel("No directory selected")
        btn_select = QPushButton("Select Directory")
        btn_select.clicked.connect(self.select_dir)
        self.btn_start = QPushButton("Start Analysis")
        self.btn_start.clicked.connect(self.start_analysis)
        self.btn_stop = QPushButton("Stop")
        self.btn_stop.clicked.connect(self.stop_analysis)
        self.btn_stop.setEnabled(False)
        btn_save = QPushButton("Save Progress")
        btn_save.clicked.connect(self.save_state)
        btn_clear = QPushButton("Clear DB")
        btn_clear.clicked.connect(self.clear_db_prompt)

        h.addWidget(self.dir_label)
        h.addWidget(btn_select)
        h.addWidget(self.btn_start)
        h.addWidget(self.btn_stop)
        h.addWidget(btn_save)
        h.addWidget(btn_clear)
        v.addLayout(h)

        self.progress = QProgressBar()
        v.addWidget(self.progress)

        self.list_widget = QListWidget()
        v.addWidget(self.list_widget, 1)

        w.setLayout(v)
        return w

    def select_dir(self):
        d = QFileDialog.getExistingDirectory(self, "Select root image directory", os.path.expanduser("~"))
        if not d:
            return
        self.root_dir = d
        self.dir_label.setText(d)
        self.populate_list_from_dir()
        self._save_basic_state()

    def populate_list_from_dir(self):
        self.list_widget.clear()
        if not self.root_dir:
            return
        # Find all files and check DB for status
        for dp, dns, fns in os.walk(self.root_dir):
            for fn in fns:
                if os.path.splitext(fn)[1].lower() in SUPPORTED_EXT:
                    rel = os.path.relpath(os.path.join(dp, fn), self.root_dir)
                    cur = DB_CONN.cursor()
                    cur.execute('SELECT 1 FROM images WHERE rel_path=?', (rel,))
                    has = cur.fetchone() is not None
                    status = "Done" if has else "Pending"
                    self.list_widget.addItem(f"{rel} - {status}")

    def start_analysis(self):
        if not self.root_dir:
            QMessageBox.warning(self, "No directory", "Please select an image directory first.")
            return
        # Quick Ollama connectivity test
        try:
            r = requests.get(OLLAMA_API_URL.replace('/api/generate', '/api/tags'), timeout=3)
            r.raise_for_status()
        except Exception as e:
            QMessageBox.critical(self, "Ollama unreachable",
                                 f"Cannot reach Ollama at {OLLAMA_API_URL.replace('/api/generate', '')}.\nStart Ollama server and ensure model {OLLAMA_MODEL} is loaded.\nError: {e}")
            return

        self.btn_start.setEnabled(False)
        self.btn_stop.setEnabled(True)
        # Re-populate list to find all 'Pending' items before starting
        self.populate_list_from_dir()

        # create analyzer thread
        self.analyzer = AnalyzerThread(self.root_dir, self.task_queue, self)
        self.analyzer.sig_update.connect(self.on_analyzer_update)
        self.analyzer.sig_progress.connect(self.on_analyzer_progress)
        self.analyzer.sig_done.connect(self.on_analyzer_done)
        self.analyzer.start()
        self._save_basic_state(running=True)

    def stop_analysis(self):
        if self.analyzer:
            self.analyzer.request_stop()

        # Give a short delay to allow thread to stop gracefully before reenabling buttons
        QTimer.singleShot(500, self._finalize_stop)

    def _finalize_stop(self):
        self.btn_stop.setEnabled(False)
        self.btn_start.setEnabled(True)
        self._save_basic_state(running=False)

    def on_analyzer_update(self, rel, status):
        # update list item
        for i in range(self.list_widget.count()):
            it = self.list_widget.item(i)
            # Match only by relative path prefix (rel_path - status)
            if it.text().startswith(rel + " -"):
                it.setText(f"{rel} - {status}")
                return
        # If not found (e.g., if list wasn't populated first, although it should be)
        self.list_widget.addItem(f"{rel} - {status}")

    def on_analyzer_progress(self, processed, total):
        if total:
            self.progress.setValue(int(processed * 100 / total))
        else:
            self.progress.setValue(0)

        # 文件: image_annotator_ollama_fixed_v2.py

        # ... (在 MainWindow 类中任意位置，例如在 on_worker_result 之后)

    def _open_file_in_system(self, file_path):
        """使用系统默认应用打开文件（跨平台兼容）。"""
        if not os.path.exists(file_path):
            QMessageBox.warning(self, "File Not Found", f"Image file not found at: {file_path}")
            return

        system = platform.system()
        try:
            if system == "Windows":
                os.startfile(file_path)
            elif system == "Darwin":  # macOS
                subprocess.run(["open", file_path], check=True)
            else:  # Linux (使用 xdg-open)
                subprocess.run(["xdg-open", file_path], check=True)
        except Exception as e:
            QMessageBox.critical(self, "Open Failed", f"Could not open file {file_path}. Error: {e}")

    def open_image_from_row(self, row, column):
        """处理表格双击事件，打开图片文件。"""
        # 从 perform_search 中存储的列表中获取相对路径
        if hasattr(self, '_search_results_rel_paths') and row < len(self._search_results_rel_paths):
            rel = self._search_results_rel_paths[row]
            if self.root_dir:
                abs_path = os.path.join(self.root_dir, rel)
                self._open_file_in_system(abs_path)
            else:
                QMessageBox.warning(self, "Error", "根目录未设置 (Root directory is not set).")
        else:
            QMessageBox.warning(self, "Error", "无法获取文件路径 (Could not retrieve file path for this row).")

# ...
    def on_analyzer_done(self):
        QMessageBox.information(self, "Analysis", "Analysis finished or stopped.")
        self.btn_start.setEnabled(True)
        self.btn_stop.setEnabled(False)
        self._save_basic_state(running=False)
        self.progress.setValue(100)
        # refresh list to reflect DB
        self.populate_list_from_dir()

    # NOTE: Worker result now includes translated fields
    def on_worker_result(self, rel, success, descript_en, keywords_en_str, key_zh, descript_zh):
        # when worker finishes analyzing an image, save to DB and update list
        try:
            # Save English results and Chinese translations
            upsert_record(rel, descript_en, keywords_en_str, key_zh, descript_zh,
                          {"raw_ollama": descript_en, "keywords": keywords_en_str.split(',')})
        except Exception as e:
            print("DB save error:", e)
            traceback.print_exc()

        # update UI
        status = "Done" if success else "Failed"
        # We need to post the UI update back to the main thread
        QTimer.singleShot(0, lambda: self.on_analyzer_update(rel, status))

        print(f"[Worker] {rel} -> {'OK' if success else 'ERR'}")

    # ------ Search tab ------
        # 文件: image_annotator_ollama_fixed_v2.py

        # ... (在 MainWindow 类中替换 build_search_tab 方法)

        # ------ Search tab ------
    def build_search_tab(self):
        w = QWidget()
        v = QVBoxLayout()
        h = QHBoxLayout()
        self.search_input = QLineEdit()
        self.search_input.setPlaceholderText("Enter keyword or text (English or Chinese), press Enter to search")
        self.search_input.returnPressed.connect(self.perform_search)  # Connect to Enter key
        btn = QPushButton("Search")
        btn.clicked.connect(self.perform_search)
        h.addWidget(self.search_input)
        h.addWidget(btn)
        v.addLayout(h)

        # 更改列数为 5：Thumbnail, Filename, Keywords (EN/ZH), Description (EN), Description (ZH)
        self.table = QTableWidget(0, 5)
        self.table.setHorizontalHeaderLabels(
            ["Thumbnail", "Filename", "Keywords (EN/ZH)", "Description (EN)", "Description (ZH)"])
        self.table.verticalHeader().setVisible(False)
        self.table.setWordWrap(True)

        # 连接双击信号以打开图片
        self.table.cellDoubleClicked.connect(self.open_image_from_row)

        # --- 列宽度优化 (新逻辑) ---
        self.table.horizontalHeader().setSectionResizeMode(0, QHeaderView.ResizeMode.Fixed)  # Thumbnail (固定宽度)
        self.table.horizontalHeader().setSectionResizeMode(1,
                                                           QHeaderView.ResizeMode.Fixed)  # Filename (适应内容，因此会比较短)
        self.table.horizontalHeader().setSectionResizeMode(2,
                                                           QHeaderView.ResizeMode.Fixed)  # Keywords (适应内容，会比 Filename 略长)
        self.table.horizontalHeader().setSectionResizeMode(3,
                                                           QHeaderView.ResizeMode.Stretch)  # Description EN (自适应拉伸)
        self.table.horizontalHeader().setSectionResizeMode(4,
                                                           QHeaderView.ResizeMode.Stretch)  # Description ZH (自适应拉伸)

        self.table.setColumnWidth(0, 180)  # 固定缩略图宽度
        self.table.horizontalHeader().setStretchLastSection(True)  # 确保最后的列可以拉伸

        v.addWidget(self.table, 1)
        w.setLayout(v)
        return w

    def perform_search(self):
        term = self.search_input.text().strip()
        if not term:
            QMessageBox.information(self, "Empty", "Enter a search term.")
            return

        # Query returns: rel_path, key, key_zh, descript, descript_zh
        rows = query_records_by_term(term)
        self.table.setRowCount(0)

        # 存储相对路径列表，供双击事件使用
        self._search_results_rel_paths = []

        for rel, key_en, key_zh, desc_en, desc_zh in rows:
            self._search_results_rel_paths.append(rel)  # 存储路径

            r = self.table.rowCount()
            self.table.insertRow(r)

            # 0. Thumbnail
            abs_path = os.path.join(self.root_dir or os.path.expanduser("~"), rel)
            pix = make_thumbnail(abs_path)
            lbl = QLabel()
            lbl.setPixmap(pix.scaled(QSize(160, 90), Qt.AspectRatioMode.KeepAspectRatio,
                                     Qt.TransformationMode.SmoothTransformation))
            self.table.setCellWidget(r, 0, lbl)
            self.table.setRowHeight(r, 100)  # 确保行高适应缩略图

            # 1. Filename
            self.table.setItem(r, 1, QTableWidgetItem(os.path.basename(rel)))

            # 2. Keywords (EN/ZH)
            keywords_text = f"EN: {key_en or 'N/A'}\nZH: {key_zh or 'N/A'}"
            item_key = QTableWidgetItem(keywords_text)
            item_key.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignVCenter)
            self.table.setItem(r, 2, item_key)

            # 3. Description (EN) - 独立列
            item_desc_en = QTableWidgetItem(desc_en or 'N/A')
            item_desc_en.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignTop)
            self.table.setItem(r, 3, item_desc_en)

            # 4. Description (ZH) - 独立列
            item_desc_zh = QTableWidgetItem(desc_zh or 'N/A')
            item_desc_zh.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignTop)
            self.table.setItem(r, 4, item_desc_zh)

            # 强制调整行高以适应自适应拉伸的列中的多行文本
            self.table.resizeRowToContents(r)

            # self.table.resizeRowToContents(r) # Let stretch/auto handle it

    # ------ Stats tab (Improved with Matplotlib) ------
    def build_stats_tab(self):
        w = QWidget()
        v = QVBoxLayout()
        h = QHBoxLayout()
        h.addWidget(QLabel("Top N"))
        self.spin_top = QSpinBox()
        self.spin_top.setRange(5, 200)
        self.spin_top.setValue(30)
        btn_refresh = QPushButton("Refresh Stats")
        btn_refresh.clicked.connect(self.refresh_stats)
        btn_export = QPushButton("Export CSV")
        btn_export.clicked.connect(self.export_stats)
        h.addWidget(self.spin_top)
        h.addWidget(btn_refresh)
        h.addWidget(btn_export)
        v.addLayout(h)

        # matplotlib canvas
        self.fig = Figure(figsize=(6, 3))
        # Pass self to FigureCanvas constructor for proper parentage
        self.canvas = FigureCanvas(self.fig)
        v.addWidget(self.canvas, 1)
        w.setLayout(v)
        return w

    def refresh_stats(self):
        topn = self.spin_top.value()
        # items: [(keyword, count), ...], total: count of total images with keywords
        items, total = get_keyword_rank(limit=topn)

        kws = [k for k, c in items]
        counts = [c for k, c in items]

        self.fig.clear()
        ax = self.fig.add_subplot(111)

        # Use horizontal bar chart for better label readability
        y_pos = np.arange(len(kws))
        ax.barh(y_pos, counts)
        ax.set_yticks(y_pos)
        ax.set_yticklabels(kws, fontsize=8)  # Smaller font for many labels
        ax.invert_yaxis()
        ax.set_xlabel("Count")
        ax.set_title(f"Top {len(kws)} keywords (total images indexed: {total})")

        self.fig.tight_layout()
        self.canvas.draw()

    def export_stats(self):
        topn = self.spin_top.value()
        items, total = get_keyword_rank(limit=topn)

        path, _ = QFileDialog.getSaveFileName(self, "Save CSV", os.path.expanduser("~"), "CSV files (*.csv)")
        if not path:
            return

        try:
            with open(path, 'w', encoding='utf-8') as f:
                f.write("keyword,count\n")
                for k, c in items:
                    f.write(f'"{k}",{c}\n')
            QMessageBox.information(self, "Saved", f"Saved to {path}")
        except Exception as e:
            QMessageBox.warning(self, "Export failed", f"Failed: {e}")

    def closeEvent(self, event):
        # Stop worker thread before closing
        self.ollama_worker.stop()
        self.ollama_worker.join()
        if self.analyzer and self.analyzer.isRunning():
            self.analyzer.request_stop()
            self.analyzer.wait()
        DB_CONN.close()
        super().closeEvent(event)


# ---------- Main ----------
def main():
    # Use os.environ to set the PyQt API version explicitly to avoid mixing, especially with Matplotlib
    # os.environ['QT_API'] = 'pyqt6' 
    app = QApplication(sys.argv)
    # Global check for Matplotlib backend (if needed)
    try:
        if FigureCanvas.__name__ == 'FigureCanvasQTAgg':
            print("Using Matplotlib with PyQt6 backend.")
    except Exception:
        print("Matplotlib backend check failed.")

    mw = MainWindow()
    mw.show()
    sys.exit(app.exec())


if __name__ == "__main__":
    # Ensure application is created before any Qt objects requiring it
    main()

图片分析工具

🎨 图像分析工具：image_annotator_ollama_fixed_v2.py 深度解析

🌟 核心特性概览

⚙️ 技术架构与关键组件分析

1. 配置与依赖 (Configuration & Imports)

2. 图像与文本处理 (Image & Text Helpers)

3. 数据库与数据层 (Database Layer)

4. 翻译器 (Translator Class)

5. Ollama 通信与工作线程

🖥️ GUI 界面分析 (MainWindow)

1. Analysis Tab (分析页)

2. Search Tab (搜索页)

3. Stats Tab (统计页)

⚠️ 潜在问题与注意事项

🎨 图像分析工具：`image_annotator_ollama_fixed_v2.py` 深度解析