众所周知,我不太喜欢起那么惊天动地的标题,所以若是你看到了这个页面,那你就看到了一个非常有用的工具。

这是一个用于图像标注和分析的 Python 脚本的详细分析,该脚本名为 image_annotator_ollama_fixed_v2.py。它是一个使用 PyQt6 构建的图形用户界面(GUI)工具,核心功能是利用本地运行的 Ollama 平台上的 llava:13b 视觉-语言模型来批量分析图像,并支持结果的存储、搜索、统计和多语言翻译。

🎨 图像分析工具:image_annotator_ollama_fixed_v2.py 深度解析

这个 Python 脚本是一个功能强大的、高度本地化的图像批处理工具。它巧妙地结合了最新的本地 AI 模型(Ollama/LLaVA)、稳健的并发处理和用户友好的 GUI 界面,实现了图片分析的全程离线操作。

🌟 核心特性概览

  • 真正的离线 AI (True Local AI): 所有图像分析请求都通过 localhost:11434 发送给本地运行的 Ollama 服务,确保数据安全和离线可用性。
  • 并发处理优化: 使用 线程池(ThreadPoolExecutor) 进行图像预处理(生成 Base64 编码和缩略图),并使用 单消费者队列(Single-Consumer Queue) 严格控制 GPU 推理,以防止并发模型调用导致的显存溢出(OOM)。
  • 多语言支持: 集成了 Helsinki-NLP/opus-mt 翻译模型(基于 transformers 库),或在不可用时回退到 Ollama 文本模型(gemma:2b)进行中英文双向翻译。
  • 持久化存储: 使用 SQLite 数据库 存储分析结果,包括原始路径、英文/中文关键词和描述。
  • 状态管理: 支持分析进度的保存和加载(analysis_state.json),方便中断和恢复任务。
  • GUI 界面: 基于 PyQt6 框架,提供了一个包含“分析”、“搜索”和“统计”三个页面的用户界面。

⚙️ 技术架构与关键组件分析

脚本的结构清晰,主要分为配置、辅助函数、数据库、工作线程和 GUI 界面五大部分。

1. 配置与依赖 (Configuration & Imports)

  • 模型与 API:
    • OLLAMA_API_URL: http://localhost:11434/api/generate,指向本地 Ollama API。
    • OLLAMA_MODEL: llava:13b,用于图像分析的主力模型。
    • OLLAMA_TIMEOUT/OLLAMA_RETRIES: 设置了请求超时时间(120 秒)和重试次数(2 次),增强了网络通信的健壮性。
  • 翻译模型:
    • HF_TRANSLATE_MODEL: Helsinki-NLP/opus-mt-en-zh,首选的本地翻译模型。
  • 文件与并发:
    • DB_FILE/STATE_FILE: 定义了 SQLite 数据库文件和会话状态文件的路径。
    • MAX_PREPROCESS_THREADS: 设置了 6 个线程用于图像的并发预处理,平衡了 CPU 负载。
  • 库依赖: 引入了 requests, PIL (Pillow), sqlite3, threading, PyQt6, matplotlib, 以及可选的 transformers

2. 图像与文本处理 (Image & Text Helpers)

  • clean_ollama_text(text): 这是一个关键的启发式(Heuristic)清理函数。由于 Ollama 模型输出可能包含不需要的 JSON 片段或元数据,该函数尝试通过查找并截断 { 符号来清洗文本,确保只保留自然语言描述。
  • make_thumbnail(path): 使用 PIL 库生成 PNG 格式的缩略图,并将其转换为 QPixmap 供 PyQt6 界面显示。
  • image_to_b64(path, max_dim=1024): 加载图像,将其最大边限制在 1024 像素,然后编码为 Base64 字符串,这是 LLaVA 模型 API 接受的输入格式。

3. 数据库与数据层 (Database Layer)

  • init_db(path): 初始化 SQLite 数据库,并创建 images 表。该表使用了兼容多语言的旧 Schema:
    • rel_path (UNIQUE): 相对路径(用于索引和冲突解决)。
    • key/key_zh: 英文/中文关键词。
    • descript/descript_zh: 英文/中文描述。
  • upsert_record(...): 使用 ON CONFLICT(rel_path) DO UPDATE SET... 实现**插入或更新(Upsert)**操作,确保相同文件只会有一条最新的记录。
  • query_records_by_term(term): 在四个字段(key, key_zh, descript, descript_zh)上执行模糊查询 (LIKE %term%),实现搜索功能。
  • get_keyword_rank(limit=50): 读取所有关键词,进行逗号分割和计数,生成关键词排名列表,供统计图表使用。

4. 翻译器 (Translator Class)

  • Translator 类封装了翻译逻辑。
  • 首选方案: 使用 transformers 库加载 Helsinki-NLP/opus-mt-en-zh 模型进行本地翻译。
  • 回退方案: 如果 HF 模型加载失败,则向 Ollama 发送一个纯文本请求,使用另一个通用文本模型(如 gemma:2b)来执行翻译任务。

5. Ollama 通信与工作线程

  • call_ollama_model(...): 负责与 Ollama API 进行 HTTP 通信。它处理了 Base64 图像数据的封装、多重尝试(Retries)以及对模型返回结果的健壮性解析,尝试从 response, output, text 等常用键中提取文本。

  • OllamaWorker (单消费者): 这是脚本的核心,它继承自 threading.Thread

    • 它从任务队列 (self.q) 中获取任务(rel, b64, prompt)。
    • 执行高耗时的 call_ollama_model 调用。
    • 结果解析(最关键的逻辑):run 方法内部,它首先使用 clean_ollama_text 清理原始输出,然后使用不区分大小写的 rfind('Keywords:') 逻辑来精确分割描述(descript_en)和关键词(keywords_en_str),并限制关键词数量(最多 8 个)。
    • 调用 self.translator.translate_en_to_zh 完成中英文互译。
    • 通过 result_callback 将结果传递回主 GUI 线程。
  • AnalyzerThread (主分析流程): 继承自 QThread,负责任务的调度和进度管理。

    • 文件遍历: 递归遍历根目录,并检查数据库以跳过已完成的文件(实现断点续传/Resume)。
    • 并发预处理: 使用 ThreadPoolExecutor 并发地将所有待处理图像转换为 Base64。
    • 任务入队: 为每个预处理成功的图像生成一个详细的 Prompt(提示词),并将其放入 OllamaWorker 的任务队列中。
    • 使用 sig_progresssig_update 信号与主线程进行通信,更新进度条和列表状态。

🖥️ GUI 界面分析 (MainWindow)

MainWindow 类是整个应用程序的入口。

1. Analysis Tab (分析页)

  • 目录选择与状态管理: 允许用户选择根目录,并自动尝试加载 analysis_state.json 以恢复上一次的会话。
  • 列表视图 (QListWidget): 显示根目录下所有支持的图像文件,并实时显示它们的状态(PendingQueuedDoneFailed)。
  • 控制按钮: Start AnalysisStopSave ProgressClear DB(清除所有记录)。
  • 进度条 (QProgressBar): 显示分析进度的百分比。
  • on_worker_result: 从 OllamaWorker 接收结果后,执行数据库 upsert 操作,并将状态更新信号发回主线程。

2. Search Tab (搜索页)

  • 搜索框 (QLineEdit): 接受用户输入的关键词(支持中/英文)。
  • 结果表格 (QTableWidget): 用于显示搜索结果,列头清晰:ThumbnailFilenameKeywords (EN/ZH)Description (EN)Description (ZH)
  • 交互优化: * 缩略图显示: 在第一列显示图像缩略图,增强了视觉效果。
    • 列宽设置: 缩略图、文件名、关键词采用固定或自适应内容模式,而中英文描述列采用 拉伸(Stretch) 模式,最大化利用窗口空间。
    • 双击打开文件: 通过 open_image_from_row 方法,双击表格行可以调用系统默认程序打开对应的图像文件,极大地提高了易用性。

3. Stats Tab (统计页)

  • 控制: 允许用户通过 QSpinBox 设置要统计的 Top N 关键词数量,并提供 Refresh StatsExport CSV 按钮。
  • 可视化: 使用 Matplotlib 绘制关键词的水平条形图(Horizontal Bar Chart)。水平条形图避免了在关键词过多时标签重叠的问题,增强了可读性。
  • get_keyword_rank 确保了统计数据来源于数据库中所有已处理的关键词。

⚠️ 潜在问题与注意事项

  1. Ollama 依赖: 脚本完全依赖于本地 Ollama 服务及其配置的模型(llava:13b)。如果 Ollama 未运行、模型未拉取或 API 地址错误,脚本将无法工作。
  2. 翻译模型体积: 如果启用了 transformers 的本地翻译,初次运行时下载模型(Helsinki-NLP/opus-mt-en-zh)会占用大量时间和磁盘空间。
  3. 解析健壮性: 虽然脚本包含 clean_ollama_text 和精确的关键词分割逻辑,但如果 LLaVA 模型输出格式发生重大变化(例如不包含 "Keywords:" 标签),解析逻辑仍可能失败,导致关键词缺失。
  4. 跨平台文件打开: _open_file_in_system 方法使用了跨平台逻辑(os.startfile for Windows, open for macOS, xdg-open for Linux),但对于某些不常见的 Linux 发行版,xdg-open 可能需要确保已安装。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
image_annotator_ollama_fixed_v2.py
GUI (PyQt6) tool to analyze images using local Ollama llava:13b model.
Features:
- True local (offline) usage via localhost:11434 (Ollama) — no external network calls.
- Preprocessing (thread pool) for base64/thumb generation.
- Single-consumer queue for GPU inference (to avoid concurrent model OOM).
- Robust parsing and improved prompt for rich, detailed descriptions.
- Save/Load analysis progress (analysis_state.json in script dir).
- SQLite DB to store results (Original Schema: rel_path, key, key_zh, descript, descript_zh, extra_json, updated_at).
- Search and Stats tabs (with matplotlib bar chart for keyword ranking).
- Stop/Resume, Save progress button.
- Integrated local translation (Helsinki opus-mt or Ollama fallback).
"""

import os
import subprocess
import sys
import io
import json
import time
import base64
import sqlite3
import threading
import traceback
import platform
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime
from collections import Counter

import requests
from PIL import Image, ImageFilter, ImageStat

# --- 可选:本地翻译(使用 transformers) ---
try:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

HF_TRANSLATOR_AVAILABLE = True
except Exception:
HF_TRANSLATOR_AVAILABLE = False

# PyQt6
from PyQt6.QtWidgets import (
QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, QLabel,
QPushButton, QFileDialog, QListWidget, QListWidgetItem, QProgressBar,
QTabWidget, QLineEdit, QTableWidget, QTableWidgetItem, QHeaderView,
QMessageBox, QSpinBox, QSizePolicy
)
from PyQt6.QtCore import Qt, QSize, QThread, pyqtSignal, QTimer
from PyQt6.QtGui import QPixmap

# Matplotlib for stats
from matplotlib.figure import Figure
import numpy as np

# NOTE: PyQt6 uses Qt6, but matplotlib often uses qt5agg backend in examples.
# We'll stick to FigureCanvasQTAgg which typically auto-selects if running outside a custom environment.
# Since the original new code used qt5agg, we'll try to keep it, but note the potential issue on pure PyQt6 environments.
try:
from matplotlib.backends.backend_qt6agg import FigureCanvasQTAgg as FigureCanvas
except ImportError:
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas

# ---------------- Configuration ----------------
OLLAMA_API_URL = "http://localhost:11434/api/generate"
OLLAMA_MODEL = "llava:13b" # your model
OLLAMA_TIMEOUT = 120 # seconds per request
OLLAMA_RETRIES = 2
HF_TRANSLATE_MODEL = "Helsinki-NLP/opus-mt-en-zh"

DB_FILE = os.path.join(os.path.dirname(__file__), "image_annotations.db")
STATE_FILE = os.path.join(os.path.dirname(__file__), "analysis_state.json")
THUMB_SIZE = (160, 90)
SUPPORTED_EXT = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.webp', '.tiff'}
MAX_PREPROCESS_THREADS = 6 # adjust for your CPU


# ------------------------------------------------
# ------------------------------------------------
# ... (Configuration section)
# ------------------------------------------------

# ---------- Text Cleaning Helper ----------
def clean_ollama_text(text):
"""
Attempts to clean up raw Ollama text output that might contain JSON or
unwanted boilerplate/metadata, aiming to return only the natural language part.
"""
if not isinstance(text, str):
return ""

# 1. Remove common Ollama metadata/JSON snippets that might not be stripped by call_ollama_model
# This specifically targets the unwanted JSON that seems to be polluting the output
# Look for the start of a dictionary and try to remove everything from it onwards
# Note: This is a heuristic and might need adjustment if models change their output format.

cleaned_text = text

# Find the first occurrence of '{' that is not at the very start (which might be the description itself)
# This assumes metadata/JSON starts later in the string
first_brace = cleaned_text.find('{', 10)
if first_brace > -1 and first_brace < len(
cleaned_text) / 2: # Heuristic: if JSON starts too early, it might be part of the description
cleaned_text = cleaned_text[:first_brace].strip()

# 2. Remove common trailing model info if the above fails to catch it
if 'Keywords:' in cleaned_text:
# If keywords are present, clean up the description part up to keywords
parts = cleaned_text.split("Keywords:", 1)
description_part = parts[0].strip()
# Look for JSON at the end of the description part
first_brace_desc = description_part.find('{', len(description_part) // 2)
if first_brace_desc > -1:
description_part = description_part[:first_brace_desc].strip()

cleaned_text = description_part + (f" Keywords:{parts[1]}" if len(parts) > 1 else "")

# Simple final cleanup (e.g., if there's trailing garbage)
cleaned_text = cleaned_text.strip()

# Remove leading/trailing quotes if they look like simple string delimiters
if cleaned_text.startswith('"') and cleaned_text.endswith('"'):
cleaned_text = cleaned_text.strip('"')

return cleaned_text


# ------------------------------------------------
# ---------- Database (FIXED to old schema) ----------
def init_db(path=DB_FILE):
conn = sqlite3.connect(path, check_same_thread=False)
cur = conn.cursor()
# Reverting to old schema for compatibility and multilingual support
cur.execute('''
CREATE TABLE IF NOT EXISTS images
(
id
INTEGER
PRIMARY
KEY
AUTOINCREMENT,
rel_path
TEXT
UNIQUE,
key
TEXT,
key_zh
TEXT,
descript
TEXT,
descript_zh
TEXT,
extra_json
TEXT,
updated_at
TEXT
)
''')
conn.commit()
return conn


DB_CONN = init_db()


# NOTE: Arguments adjusted to match the new schema structure for 'new' data, but inserts into 'old' schema.
# rel_path: relative path
# descript_en: The rich English description (was 'result_text' in new) -> maps to 'descript'
# keywords_en_str: The comma-separated English keywords (was 'keywords' in new) -> maps to 'key'
def upsert_record(rel_path, descript_en, keywords_en_str, key_zh, descript_zh, extra_json_dict=None):
now = datetime.utcnow().isoformat()
cur = DB_CONN.cursor()
cur.execute('''
INSERT INTO images(rel_path, key, key_zh, descript, descript_zh, extra_json, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?) ON CONFLICT(rel_path) DO
UPDATE SET
key =excluded.key,
key_zh=excluded.key_zh,
descript=excluded.descript,
descript_zh=excluded.descript_zh,
extra_json=excluded.extra_json,
updated_at=excluded.updated_at
''',
(rel_path, keywords_en_str, key_zh, descript_en, descript_zh, json.dumps(extra_json_dict or {}), now))
DB_CONN.commit()


# Query adjusted to search old schema fields (key/key_zh/descript/descript_zh)
def query_records_by_term(term):
cur = DB_CONN.cursor()
like = f"%{term}%"
# Select: rel_path, key, key_zh, descript, descript_zh
cur.execute(
'SELECT rel_path, key, key_zh, descript, descript_zh FROM images WHERE key LIKE ? OR key_zh LIKE ? OR descript LIKE ? OR descript_zh LIKE ?',
(like, like, like, like))
return cur.fetchall()


# Keyword ranking adjusted to read from old schema (key)
def get_keyword_rank(limit=50):
cur = DB_CONN.cursor()
cur.execute('SELECT key FROM images')
rows = cur.fetchall()
counter = Counter()
total = 0
for (k,) in rows:
if not k:
continue
total += 1
# k is a comma-separated string of keywords
for part in k.split(','):
w = part.strip().lower()
if w:
counter[w] += 1
items = counter.most_common(limit)
return items, total


# ---------- Image helpers (Keep as is from new code) ----------
def make_thumbnail(path, size=THUMB_SIZE):
try:
img = Image.open(path).convert("RGBA")
# Ensure we use LANCZOS for quality resizing
img.thumbnail(size, Image.Resampling.LANCZOS)
buf = io.BytesIO()
img.save(buf, format='PNG')
buf.seek(0)
pix = QPixmap()
pix.loadFromData(buf.read(), "PNG")
return pix
except Exception:
# traceback.print_exc()
return QPixmap()


def image_to_b64(path, max_dim=1024):
img = Image.open(path).convert("RGB")
w, h = img.size
max_side = max(w, h)
if max_side > max_dim:
scale = max_dim / max_side
img = img.resize((int(w * scale), int(h * scale)), Image.Resampling.LANCZOS)
buf = io.BytesIO()
img.save(buf, format='JPEG', quality=85)
return base64.b64encode(buf.getvalue()).decode('utf-8')


# Removed: get_basic_meta as it was unused in the core loop

# ---------- Translator (from old code) ----------
class Translator:
def __init__(self):
self.hf_available = HF_TRANSLATOR_AVAILABLE
self.model = None
self.tokenizer = None
if self.hf_available:
try:
print("Loading HF translator model (this may take time)...")
self.tokenizer = AutoTokenizer.from_pretrained(HF_TRANSLATE_MODEL)
self.model = AutoModelForSeq2SeqLM.from_pretrained(HF_TRANSLATE_MODEL)
# You can add model.to("cuda") if you need GPU acceleration and have it configured
except Exception as e:
print("HF translation model load failed, will fallback to Ollama. Error:", e)
self.hf_available = False

def translate_en_to_zh(self, text):
if not text:
return ""
if self.hf_available and self.model:
try:
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
out = self.model.generate(**inputs, max_new_tokens=256)
zh = self.tokenizer.decode(out[0], skip_special_tokens=True)
return zh
except Exception as e:
print("HF translate failed, falling back to Ollama:", e)
# fallthrough to ollama

# fallback: ask Ollama to translate (note: requires a non-llava model like gemma:2b if llava isn't good at pure text)
prompt = f"Translate the following English text to Chinese (Simplified) preserving meaning and punctuation. Return only the translation.\n\nText:\n{text}\n\nTranslation:"
# Use a simplified call structure for pure text generation
payload = {
"model": "gemma:2b", # Use a common non-vision model for better text quality
"prompt": prompt,
"stream": False,
}
try:
resp = requests.post(OLLAMA_API_URL, json=payload, timeout=20) # shorter timeout for text
resp.raise_for_status()
j = resp.json()
# Try common keys for text output
for k in ('output', 'text', 'result', 'response'):
if k in j and isinstance(j[k], str):
return j[k].strip().splitlines()[0] # Take first line
return str(j).strip()
except Exception as e:
print("Ollama fallback translate failed:", e)
return ""


# ---------- Ollama calling (single consumer) ----------
def call_ollama_model(base64_image, prompt_text, model=OLLAMA_MODEL, timeout=OLLAMA_TIMEOUT, retries=OLLAMA_RETRIES):
"""
Call local Ollama server with image base64 and prompt. Retries internally.
Returns (success:bool, text:str)
"""
# Removed the initial environment variable check for full offline usage

payload = {
"model": model,
"prompt": prompt_text,
"stream": False,
"images": [base64_image]
}
for attempt in range(retries + 1):
try:
resp = requests.post(OLLAMA_API_URL, json=payload, timeout=timeout)
resp.raise_for_status()

# --- Robust parsing logic from new code ---
try:
j = resp.json()
except Exception:
# sometimes Ollama returns raw text
txt = resp.text
return True, txt

# response structure may vary; attempt common keys
if isinstance(j, dict):
# possible keys: 'output','text','result','choices'
# Ollama 'generate' API typically uses 'response' for non-streaming output
for k in ('response', 'output', 'text', 'result'):
if k in j and isinstance(j[k], str):
return True, j[k]
if 'choices' in j and isinstance(j['choices'], list) and j['choices']:
ch = j['choices'][0]
if isinstance(ch, dict) and 'message' in ch:
return True, ch['message'].get('content', str(ch))
else:
return True, str(ch)
# fallback to full json string
return True, json.dumps(j, ensure_ascii=False)
else:
return True, str(j)

except Exception as e:
last_err = str(e)
time.sleep(1 + attempt * 2)
continue
return False, f"Ollama request failed after {retries + 1} attempts: {last_err}"


# ---------- Worker thread and queue ----------
class OllamaWorker(threading.Thread):
def __init__(self, task_queue, result_callback, translator: Translator):
super().__init__(daemon=True)
self.q = task_queue
self._stop = threading.Event()
self.result_callback = result_callback # fn(rel, success, text, keywords_str, key_zh, descript_zh)
self.translator = translator

def stop(self):
self._stop.set()

def run(self):
while not self._stop.is_set():
try:
task = self.q.get(timeout=0.5)
except Empty:
continue

rel, b64, prompt = task
success, text = call_ollama_model(b64, prompt)

# --- Extract Keywords (keywords_en_str) and Description (descript_en) ---
keywords_en_str = ""
descript_en = ""
key_zh = ""
descript_zh = ""

# 文件: image_annotator_ollama_fixed_v2.py

# ... (inside OllamaWorker.run method)

if success:
# --- NEW: Clean the raw text before parsing ---
cleaned_text = clean_ollama_text(text)
# 打印前200字符,用于调试,确认模型输出是否包含关键词标签
print(f"1:cleaned text (before split): {cleaned_text[:200]}...")

# 初始化为空,以防解析失败
descript_en = ""
keywords_en_str = ""

# 我们需要一个对 "Keywords:" 标签大小写不敏感的健壮分割方法
keywords_delimiter = "Keywords:"
index = -1

# 查找分隔符,使用 rfind 确保找到的是最后一个 "Keywords:" (如果模型错误输出了多个)
lower_cleaned_text = cleaned_text.lower()
index = lower_cleaned_text.rfind(keywords_delimiter.lower())

if index != -1:
# **情况一:找到分隔符。正常提取描述和关键词。**

# 描述是分隔符之前的所有内容
descript_en = cleaned_text[:index].strip()
# 关键词是分隔符之后的所有内容
keywords_raw = cleaned_text[index + len(keywords_delimiter):].strip()

# 确保描述部分不为空(防止模型在分隔符前没有输出任何内容)
if not descript_en and keywords_raw:
# 如果描述为空,但关键词有内容,这是一个异常情况,我们保留关键词,并设置描述为 "N/A"
descript_en = "N/A: Model failed to provide rich description."

# 过滤和清理关键词 (最多保留 8 个)
kws = [t.strip() for t in keywords_raw.split(',') if 2 <= len(t.strip()) <= 30]
keywords_en_str = ",".join(kws[:8])

print(f"2:descript (after split): {descript_en[:50]}...")
print(f"3:keywords (after split): {keywords_en_str[:50]}...")

else:
# **情况二:未找到分隔符。假定整个输出都是描述。**
descript_en = cleaned_text.strip()
keywords_en_str = ""
print(
f"Warning: Keywords delimiter '{keywords_delimiter}' not found in model output for {rel}. Keywords will be empty. (Model failure to follow prompt)")

# --- Translation (Integrated from old code) ---
# 无论关键词是否为空,都执行翻译逻辑。如果 keywords_en_str 为空,则 key_zh 也为空。
if keywords_en_str:
key_zh = self.translator.translate_en_to_zh(keywords_en_str)
if descript_en:
# Note: Translating the rich, multi-sentence description might take longer
descript_zh = self.translator.translate_en_to_zh(descript_en)
# ... (rest of the OllamaWorker.run method)

# callback (rel, success, descript_en, keywords_en_str, key_zh, descript_zh)
try:
self.result_callback(rel, success, descript_en, keywords_en_str, key_zh, descript_zh)
except Exception:
traceback.print_exc()
self.q.task_done()


# ---------- GUI Thread for analysis progression (AnalyzerThread) ----------
class AnalyzerThread(QThread):
sig_update = pyqtSignal(str, str) # rel, status
sig_progress = pyqtSignal(int, int) # processed, total
sig_done = pyqtSignal()

def __init__(self, root_dir, task_queue, main_window_ref):
super().__init__()
self.root_dir = root_dir
self._stop = False
self.task_queue = task_queue
self.main_window_ref = main_window_ref # To update UI from worker result

def request_stop(self):
self._stop = True

def run(self):
# build file list preserving order
all_files = []
for dp, dns, fns in os.walk(self.root_dir):
for fn in fns:
if os.path.splitext(fn)[1].lower() in SUPPORTED_EXT:
abs_path = os.path.join(dp, fn)
rel = os.path.relpath(abs_path, self.root_dir)
# Check if already processed (to handle resume logic better)
cur = DB_CONN.cursor()
cur.execute('SELECT 1 FROM images WHERE rel_path=?', (rel,))
if cur.fetchone() is None:
all_files.append((abs_path, rel))

total_remaining = len(all_files)
processed_count = 0

# Preprocess in threadpool: prepare base64 (parallel)
preprocess_results = {}
# Max_workers adjusted for CPU (6 is the max_preprocess_threads)
with ThreadPoolExecutor(max_workers=MAX_PREPROCESS_THREADS) as ex:
future_to_rel = {ex.submit(image_to_b64, abs_path): rel for abs_path, rel in all_files}
for fut in as_completed(future_to_rel):
if self._stop:
break
rel = future_to_rel[fut]
try:
b64 = fut.result()
preprocess_results[rel] = b64
except Exception as e:
print(f"Preprocess failed for {rel}: {e}")
preprocess_results[rel] = None

# If stopped during preprocess
if self._stop:
self.sig_done.emit()
return

# push tasks to queue (but worker will consume serially)
for abs_path, rel in all_files:
if self._stop:
self.sig_update.emit(rel, "Stopped")
break

# Skip if preprocess failed
b64 = preprocess_results.get(rel)
if not b64:
self.sig_update.emit(rel, "Preprocess failed")
processed_count += 1
self.sig_progress.emit(processed_count, total_remaining)
continue

self.sig_update.emit(rel, "Queued")

# --- IMPROVED PROMPT (from new code) ---
prompt = (
"You are a detailed image analyst. For the supplied image, produce a multi-sentence, "
"rich natural-language description that includes: people (approx. number, age group, clothing, pose, expression), "
"objects, background/setting, lighting, colors, mood, and any notable details. "
"Then provide a short comma-separated list of 5-10 keywords labeled 'Keywords:'. "
"Return plain text only.\n\nExample:\nA young woman... \nKeywords: woman,smile,indoor,white-sweater,portrait"
)

# enqueue
self.task_queue.put((rel, b64, prompt))

processed_count += 1
self.sig_progress.emit(processed_count, total_remaining)

# wait for queue to be empty or stop
self.task_queue.join()

# Final progress update if queue finished
if not self._stop:
self.sig_progress.emit(total_remaining, total_remaining)

self.sig_done.emit()


# ---------- Main Window ----------
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Image Annotator - llava:13b (Local Ollama)")
self.setMinimumSize(1100, 700)
self.root_dir = None
self.translator = Translator() # Initialize translator

# queue & worker
self.task_queue = Queue()
self.ollama_worker = OllamaWorker(self.task_queue, self.on_worker_result, self.translator)
self.ollama_worker.start()

# analyzer thread
self.analyzer = None

# UI
tabs = QTabWidget()
tabs.addTab(self.build_analysis_tab(), "Analysis")
tabs.addTab(self.build_search_tab(), "Search")
tabs.addTab(self.build_stats_tab(), "Stats")
self.setCentralWidget(tabs)

# attempt auto-load previous state (from new code)
self._check_load_state()

# --- State Management (from new code) ---
def _check_load_state(self):
if os.path.exists(STATE_FILE):
try:
with open(STATE_FILE, 'r', encoding='utf-8') as f:
st = json.load(f)
if st.get('root_dir'):
resp = QMessageBox.question(self, "Resume previous session",
f"Found previous analysis state for directory:\n{st.get('root_dir')}\nLoad it?",
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No)
if resp == QMessageBox.StandardButton.Yes:
self.root_dir = st.get('root_dir')
self.dir_label.setText(self.root_dir)
self.populate_list_from_dir()
except Exception:
pass

def _save_basic_state(self, running=False):
st = {"root_dir": self.root_dir, "running": running, "saved_at": datetime.utcnow().isoformat()}
try:
with open(STATE_FILE, 'w', encoding='utf-8') as f:
json.dump(st, f, ensure_ascii=False, indent=2)
except Exception:
pass

def save_state(self):
# NOTE: Simplified save logic to just save the root_dir (as list item status is derived from DB/Worker now)
if not self.root_dir:
QMessageBox.warning(self, "No dir", "Select directory first")
return
self._save_basic_state(running=False)
QMessageBox.information(self, "Saved", f"Basic state saved to {STATE_FILE}")

def clear_db_prompt(self):
ok = QMessageBox.question(self, "Clear DB", "This will delete all saved image analysis records. Continue?",
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No)
if ok != QMessageBox.StandardButton.Yes:
return
cur = DB_CONN.cursor()
cur.execute('DELETE FROM images')
DB_CONN.commit()
QMessageBox.information(self, "Cleared", "Database cleared.")
self.populate_list_from_dir()

# ------ Analysis tab ------
def build_analysis_tab(self):
w = QWidget()
v = QVBoxLayout()
h = QHBoxLayout()
self.dir_label = QLabel("No directory selected")
btn_select = QPushButton("Select Directory")
btn_select.clicked.connect(self.select_dir)
self.btn_start = QPushButton("Start Analysis")
self.btn_start.clicked.connect(self.start_analysis)
self.btn_stop = QPushButton("Stop")
self.btn_stop.clicked.connect(self.stop_analysis)
self.btn_stop.setEnabled(False)
btn_save = QPushButton("Save Progress")
btn_save.clicked.connect(self.save_state)
btn_clear = QPushButton("Clear DB")
btn_clear.clicked.connect(self.clear_db_prompt)

h.addWidget(self.dir_label)
h.addWidget(btn_select)
h.addWidget(self.btn_start)
h.addWidget(self.btn_stop)
h.addWidget(btn_save)
h.addWidget(btn_clear)
v.addLayout(h)

self.progress = QProgressBar()
v.addWidget(self.progress)

self.list_widget = QListWidget()
v.addWidget(self.list_widget, 1)

w.setLayout(v)
return w

def select_dir(self):
d = QFileDialog.getExistingDirectory(self, "Select root image directory", os.path.expanduser("~"))
if not d:
return
self.root_dir = d
self.dir_label.setText(d)
self.populate_list_from_dir()
self._save_basic_state()

def populate_list_from_dir(self):
self.list_widget.clear()
if not self.root_dir:
return
# Find all files and check DB for status
for dp, dns, fns in os.walk(self.root_dir):
for fn in fns:
if os.path.splitext(fn)[1].lower() in SUPPORTED_EXT:
rel = os.path.relpath(os.path.join(dp, fn), self.root_dir)
cur = DB_CONN.cursor()
cur.execute('SELECT 1 FROM images WHERE rel_path=?', (rel,))
has = cur.fetchone() is not None
status = "Done" if has else "Pending"
self.list_widget.addItem(f"{rel} - {status}")

def start_analysis(self):
if not self.root_dir:
QMessageBox.warning(self, "No directory", "Please select an image directory first.")
return
# Quick Ollama connectivity test
try:
r = requests.get(OLLAMA_API_URL.replace('/api/generate', '/api/tags'), timeout=3)
r.raise_for_status()
except Exception as e:
QMessageBox.critical(self, "Ollama unreachable",
f"Cannot reach Ollama at {OLLAMA_API_URL.replace('/api/generate', '')}.\nStart Ollama server and ensure model {OLLAMA_MODEL} is loaded.\nError: {e}")
return

self.btn_start.setEnabled(False)
self.btn_stop.setEnabled(True)
# Re-populate list to find all 'Pending' items before starting
self.populate_list_from_dir()

# create analyzer thread
self.analyzer = AnalyzerThread(self.root_dir, self.task_queue, self)
self.analyzer.sig_update.connect(self.on_analyzer_update)
self.analyzer.sig_progress.connect(self.on_analyzer_progress)
self.analyzer.sig_done.connect(self.on_analyzer_done)
self.analyzer.start()
self._save_basic_state(running=True)

def stop_analysis(self):
if self.analyzer:
self.analyzer.request_stop()

# Give a short delay to allow thread to stop gracefully before reenabling buttons
QTimer.singleShot(500, self._finalize_stop)

def _finalize_stop(self):
self.btn_stop.setEnabled(False)
self.btn_start.setEnabled(True)
self._save_basic_state(running=False)

def on_analyzer_update(self, rel, status):
# update list item
for i in range(self.list_widget.count()):
it = self.list_widget.item(i)
# Match only by relative path prefix (rel_path - status)
if it.text().startswith(rel + " -"):
it.setText(f"{rel} - {status}")
return
# If not found (e.g., if list wasn't populated first, although it should be)
self.list_widget.addItem(f"{rel} - {status}")

def on_analyzer_progress(self, processed, total):
if total:
self.progress.setValue(int(processed * 100 / total))
else:
self.progress.setValue(0)

# 文件: image_annotator_ollama_fixed_v2.py

# ... (在 MainWindow 类中任意位置,例如在 on_worker_result 之后)

def _open_file_in_system(self, file_path):
"""使用系统默认应用打开文件(跨平台兼容)。"""
if not os.path.exists(file_path):
QMessageBox.warning(self, "File Not Found", f"Image file not found at: {file_path}")
return

system = platform.system()
try:
if system == "Windows":
os.startfile(file_path)
elif system == "Darwin": # macOS
subprocess.run(["open", file_path], check=True)
else: # Linux (使用 xdg-open)
subprocess.run(["xdg-open", file_path], check=True)
except Exception as e:
QMessageBox.critical(self, "Open Failed", f"Could not open file {file_path}. Error: {e}")

def open_image_from_row(self, row, column):
"""处理表格双击事件,打开图片文件。"""
# 从 perform_search 中存储的列表中获取相对路径
if hasattr(self, '_search_results_rel_paths') and row < len(self._search_results_rel_paths):
rel = self._search_results_rel_paths[row]
if self.root_dir:
abs_path = os.path.join(self.root_dir, rel)
self._open_file_in_system(abs_path)
else:
QMessageBox.warning(self, "Error", "根目录未设置 (Root directory is not set).")
else:
QMessageBox.warning(self, "Error", "无法获取文件路径 (Could not retrieve file path for this row).")

# ...
def on_analyzer_done(self):
QMessageBox.information(self, "Analysis", "Analysis finished or stopped.")
self.btn_start.setEnabled(True)
self.btn_stop.setEnabled(False)
self._save_basic_state(running=False)
self.progress.setValue(100)
# refresh list to reflect DB
self.populate_list_from_dir()

# NOTE: Worker result now includes translated fields
def on_worker_result(self, rel, success, descript_en, keywords_en_str, key_zh, descript_zh):
# when worker finishes analyzing an image, save to DB and update list
try:
# Save English results and Chinese translations
upsert_record(rel, descript_en, keywords_en_str, key_zh, descript_zh,
{"raw_ollama": descript_en, "keywords": keywords_en_str.split(',')})
except Exception as e:
print("DB save error:", e)
traceback.print_exc()

# update UI
status = "Done" if success else "Failed"
# We need to post the UI update back to the main thread
QTimer.singleShot(0, lambda: self.on_analyzer_update(rel, status))

print(f"[Worker] {rel} -> {'OK' if success else 'ERR'}")

# ------ Search tab ------
# 文件: image_annotator_ollama_fixed_v2.py

# ... (在 MainWindow 类中替换 build_search_tab 方法)

# ------ Search tab ------
def build_search_tab(self):
w = QWidget()
v = QVBoxLayout()
h = QHBoxLayout()
self.search_input = QLineEdit()
self.search_input.setPlaceholderText("Enter keyword or text (English or Chinese), press Enter to search")
self.search_input.returnPressed.connect(self.perform_search) # Connect to Enter key
btn = QPushButton("Search")
btn.clicked.connect(self.perform_search)
h.addWidget(self.search_input)
h.addWidget(btn)
v.addLayout(h)

# 更改列数为 5:Thumbnail, Filename, Keywords (EN/ZH), Description (EN), Description (ZH)
self.table = QTableWidget(0, 5)
self.table.setHorizontalHeaderLabels(
["Thumbnail", "Filename", "Keywords (EN/ZH)", "Description (EN)", "Description (ZH)"])
self.table.verticalHeader().setVisible(False)
self.table.setWordWrap(True)

# 连接双击信号以打开图片
self.table.cellDoubleClicked.connect(self.open_image_from_row)

# --- 列宽度优化 (新逻辑) ---
self.table.horizontalHeader().setSectionResizeMode(0, QHeaderView.ResizeMode.Fixed) # Thumbnail (固定宽度)
self.table.horizontalHeader().setSectionResizeMode(1,
QHeaderView.ResizeMode.Fixed) # Filename (适应内容,因此会比较短)
self.table.horizontalHeader().setSectionResizeMode(2,
QHeaderView.ResizeMode.Fixed) # Keywords (适应内容,会比 Filename 略长)
self.table.horizontalHeader().setSectionResizeMode(3,
QHeaderView.ResizeMode.Stretch) # Description EN (自适应拉伸)
self.table.horizontalHeader().setSectionResizeMode(4,
QHeaderView.ResizeMode.Stretch) # Description ZH (自适应拉伸)

self.table.setColumnWidth(0, 180) # 固定缩略图宽度
self.table.horizontalHeader().setStretchLastSection(True) # 确保最后的列可以拉伸

v.addWidget(self.table, 1)
w.setLayout(v)
return w

def perform_search(self):
term = self.search_input.text().strip()
if not term:
QMessageBox.information(self, "Empty", "Enter a search term.")
return

# Query returns: rel_path, key, key_zh, descript, descript_zh
rows = query_records_by_term(term)
self.table.setRowCount(0)

# 存储相对路径列表,供双击事件使用
self._search_results_rel_paths = []

for rel, key_en, key_zh, desc_en, desc_zh in rows:
self._search_results_rel_paths.append(rel) # 存储路径

r = self.table.rowCount()
self.table.insertRow(r)

# 0. Thumbnail
abs_path = os.path.join(self.root_dir or os.path.expanduser("~"), rel)
pix = make_thumbnail(abs_path)
lbl = QLabel()
lbl.setPixmap(pix.scaled(QSize(160, 90), Qt.AspectRatioMode.KeepAspectRatio,
Qt.TransformationMode.SmoothTransformation))
self.table.setCellWidget(r, 0, lbl)
self.table.setRowHeight(r, 100) # 确保行高适应缩略图

# 1. Filename
self.table.setItem(r, 1, QTableWidgetItem(os.path.basename(rel)))

# 2. Keywords (EN/ZH)
keywords_text = f"EN: {key_en or 'N/A'}\nZH: {key_zh or 'N/A'}"
item_key = QTableWidgetItem(keywords_text)
item_key.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignVCenter)
self.table.setItem(r, 2, item_key)

# 3. Description (EN) - 独立列
item_desc_en = QTableWidgetItem(desc_en or 'N/A')
item_desc_en.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignTop)
self.table.setItem(r, 3, item_desc_en)

# 4. Description (ZH) - 独立列
item_desc_zh = QTableWidgetItem(desc_zh or 'N/A')
item_desc_zh.setTextAlignment(Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignTop)
self.table.setItem(r, 4, item_desc_zh)

# 强制调整行高以适应自适应拉伸的列中的多行文本
self.table.resizeRowToContents(r)

# self.table.resizeRowToContents(r) # Let stretch/auto handle it

# ------ Stats tab (Improved with Matplotlib) ------
def build_stats_tab(self):
w = QWidget()
v = QVBoxLayout()
h = QHBoxLayout()
h.addWidget(QLabel("Top N"))
self.spin_top = QSpinBox()
self.spin_top.setRange(5, 200)
self.spin_top.setValue(30)
btn_refresh = QPushButton("Refresh Stats")
btn_refresh.clicked.connect(self.refresh_stats)
btn_export = QPushButton("Export CSV")
btn_export.clicked.connect(self.export_stats)
h.addWidget(self.spin_top)
h.addWidget(btn_refresh)
h.addWidget(btn_export)
v.addLayout(h)

# matplotlib canvas
self.fig = Figure(figsize=(6, 3))
# Pass self to FigureCanvas constructor for proper parentage
self.canvas = FigureCanvas(self.fig)
v.addWidget(self.canvas, 1)
w.setLayout(v)
return w

def refresh_stats(self):
topn = self.spin_top.value()
# items: [(keyword, count), ...], total: count of total images with keywords
items, total = get_keyword_rank(limit=topn)

kws = [k for k, c in items]
counts = [c for k, c in items]

self.fig.clear()
ax = self.fig.add_subplot(111)

# Use horizontal bar chart for better label readability
y_pos = np.arange(len(kws))
ax.barh(y_pos, counts)
ax.set_yticks(y_pos)
ax.set_yticklabels(kws, fontsize=8) # Smaller font for many labels
ax.invert_yaxis()
ax.set_xlabel("Count")
ax.set_title(f"Top {len(kws)} keywords (total images indexed: {total})")

self.fig.tight_layout()
self.canvas.draw()

def export_stats(self):
topn = self.spin_top.value()
items, total = get_keyword_rank(limit=topn)

path, _ = QFileDialog.getSaveFileName(self, "Save CSV", os.path.expanduser("~"), "CSV files (*.csv)")
if not path:
return

try:
with open(path, 'w', encoding='utf-8') as f:
f.write("keyword,count\n")
for k, c in items:
f.write(f'"{k}",{c}\n')
QMessageBox.information(self, "Saved", f"Saved to {path}")
except Exception as e:
QMessageBox.warning(self, "Export failed", f"Failed: {e}")

def closeEvent(self, event):
# Stop worker thread before closing
self.ollama_worker.stop()
self.ollama_worker.join()
if self.analyzer and self.analyzer.isRunning():
self.analyzer.request_stop()
self.analyzer.wait()
DB_CONN.close()
super().closeEvent(event)


# ---------- Main ----------
def main():
# Use os.environ to set the PyQt API version explicitly to avoid mixing, especially with Matplotlib
# os.environ['QT_API'] = 'pyqt6'
app = QApplication(sys.argv)
# Global check for Matplotlib backend (if needed)
try:
if FigureCanvas.__name__ == 'FigureCanvasQTAgg':
print("Using Matplotlib with PyQt6 backend.")
except Exception:
print("Matplotlib backend check failed.")

mw = MainWindow()
mw.show()
sys.exit(app.exec())


if __name__ == "__main__":
# Ensure application is created before any Qt objects requiring it
main()