webp/README.md

# webp
- [x] 相似图像推荐(迁移)
- [x] 以图搜图(迁移)
- [x] 标签筛选(补充筛选条件)
- [x] WEBP 缩略图

# 数据迁移
- [ ] CDN 回源直接到服务器
- [ ] 静态原始图像
- [ ] 异地备份
- [ ] 视频流媒体
- [ ] 上传图像接口

# 偏好推荐
- [ ] 收集用户访问记录点赞记录
- [ ] 收集用户推荐反馈
- [ ] 调整偏好推荐

# OCR
- [ ] 脚本统计各关键词总量
- [x] 筛选条件支持多选 颜色,风格 | 类型,主题,功能,材质图案(多选逗号分隔)
- [x] 按颜色筛选, 周期性脚本自动补全三色


```sql
-- 添加列
ALTER TABLE web_images ADD COLUMN color_0_r TINYINT UNSIGNED;
ALTER TABLE web_images ADD COLUMN color_0_g TINYINT UNSIGNED;
ALTER TABLE web_images ADD COLUMN color_0_b TINYINT UNSIGNED;
ALTER TABLE web_images ADD COLUMN color_1_r TINYINT UNSIGNED;
ALTER TABLE web_images ADD COLUMN color_1_g TINYINT UNSIGNED;
ALTER TABLE web_images ADD COLUMN color_1_b TINYINT UNSIGNED;

-- 为每个字段创建索引
CREATE INDEX idx_color_0_r ON web_images(color_0_r);
CREATE INDEX idx_color_0_g ON web_images(color_0_g);
CREATE INDEX idx_color_0_b ON web_images(color_0_b);
CREATE INDEX idx_color_1_r ON web_images(color_1_r);
CREATE INDEX idx_color_1_g ON web_images(color_1_g);
CREATE INDEX idx_color_1_b ON web_images(color_1_b);

-- 全文索引
CREATE FULLTEXT INDEX idx_images_desc ON web_images (images_desc);
CREATE FULLTEXT INDEX idx_tags ON web_images (tags);
CREATE FULLTEXT INDEX idx_tags ON web_article (tags);

-- 同步收藏
ALTER TABLE web_praise ADD COLUMN gorse TINYINT UNSIGNED;

```

```bash
# 使用 pm2 启动服务
pm2 start ./main --name=main-6002 --watch=./main -- --config=./data/config_test.yaml
```

### 流媒体
通过流媒体服务降低视频文件加载消耗及防止恶意刷流量
对视频地址添加有效期, 过期需由服务器重新提供token认证观众身份
可后期增加基于用户账户或cookie信任度评估的视频播放权限认证


```javascript
// 请求视频播放地址
import Hls from 'hls.js'
import axios from 'axios'
axios.get('/video?url=' + encodeURIComponent(this.src)).then(res => {
    const img = res.data.VideoBase.CoverURL
    if (img) {
        video.poster = img
    }
    const m3u8 = res.data.PlayInfoList.PlayInfo.find(x => x.Format === 'm3u8')
    if (!m3u8) {
        video.src = this.src
        return console.log('流媒体地址不含m3u8')
    }
    this.player = new Hls({ maxMaxBufferLength: 5, autoStartLoad: false })
    this.player.loadSource(m3u8.PlayURL)
    this.player.attachMedia(video)
    this.player.on(Hls.Events.MANIFEST_PARSED, () => {
        if (this.autoplay) {
            video.play()
        }
    })
}).catch(err => {
    console.log('未取得流媒体地址')
    video.src = this.src
})
```


```javascript
// GET /webp/{type}-{id}-{version}-{width}-{height}-{fit}.{format}
// @type: image avatar article article_attribute ad
// @id: int 图片ID或是文章ID或是广告ID
// @version: update_time 时间戳
// @width: 宽度 1x 2x 3x 倍图直接输入尺寸
// @height: 高度 1x 2x 3x 倍图直接输入尺寸
// @fit: 裁切方式 cover contain fill auto

// GET /img/{type}-{id}.{format}?width=320&height=320&fit=cover
// 更优雅的实现, 使用查询参数的接口 (不幸的是CDN与OSS都不支持)

// endpoint: oss-cn-shanghai-internal.aliyuncs.com

// 获取查询参数
let queryParam = "example query";

// 安全地编码查询参数
let safeParam = encodeURIComponent(queryParam);

// 创建一个带有安全查询参数的URL
let url = "http://example.com/tags?param=" + safeParam;

// 使用fetch API发送请求
fetch(url)
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error('Error:', error));
```


## Update

```bash
# 静态编译
go build bin/main.go

# 上传到服务器
scp ./main root@47.103.40.152:~/main

# 图像转向量(为了更快速度它应当运行在GPU服务器上)
python3 api/resnet.py

```


## Dev & Install

- [gosseract](https://github.com/otiai10/gosseract/tree/v2.2.1)
- [ocrdoc](https://github.com/tesseract-ocr/tessdoc)
- [tesseract-ocr - Tesseract command line OCR tool (devel)](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel)

```bash
# ubuntu 安裝 tesseract-ocr
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt update
sudo apt install tesseract-ocr
sudo apt install libleptonica-dev
sudo apt install libtesseract-dev

# https://i.scwy.net/code/2020/091108-ocr/
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
sudo apt install tesseract-ocr-chi-sim

tesseract --list-langs


# 需要安装依赖
sudo apt install libopencv-dev

```

防止错误 ‘ArucoDetector’ in namespace ‘cv::aruco’ does not name a type
將安裝的 opencv 版本從 4.2 更改為 4.7，它工作正常。

```bash
# E: 无法定位软件包 libdc1394-22-dev
sudo add-apt-repository "deb http://security.ubuntu.com/ubuntu xenial-security main"
sudo apt update
sudo apt upgrade
sudo apt install libjasper1 libjasper-dev

# 添加主服务器
sudo gedit /etc/apt/sources.list
    deb http://archive.ubuntu.com/ubuntu/ trusty main universe restricted multiverse

sudo apt-get update
sudo apt update
sudo apt upgrade

# GPT 方案
sudo add-apt-repository universe
sudo apt update
sudo apt install libdc1394-22-dev
apt search libdc1394

# 手动下载 https://pkgs.org/download/libdc1394-22
wget http://archive.ubuntu.com/ubuntu/pool/universe/libd/libdc1394-22/libdc1394-22_2.2.5-2.1_amd64.deb
sudo dpkg -i libdc1394-22_2.2.5-2.1_amd64.deb
apt search libdc1394

# 确认安装后注释 Markfile 第52行
cd gocv
make install

# 下载超时请爬墙
```

torch 模型文件
https://d2j0dndfm35trm.cloudfront.net/resnet-50.t7
https://github.com/facebookarchive/fb.resnet.torch/tree/master/pretrained


## Document

通用权重模型接口


### 获取图片列表(RESTful标准查询)

GET /api/images
```javascript
{
    page: 1,             // 当前页码
    pageSize: 20,        // 分页数
    next: true,          // 是否存在下一页
    list: [{
        id: 1234,        // 原图ID
        width: 512,      // 原图宽度
        height: 512,     // 原图高度
        user: {          // 来源用户
            id: 1234,
            user_name: 'LAST',
        },
        article: {       // 来源文章
            id: 1234,
            title: 'GAMEX',
        }
    }]
}
```


列表视图:(输出控制)
-------------------------------------------------------------------------------------
| Method | URL                            | Info                   | Status |
| ------ | ------------------------------ | ---------------------- | ------ |
| GET    | /api/images                    | 标准顺序查询           | ok     |
| GET    | /api/images?page=1&pageSize=20 | 指定页码和指定分页大小 | ok     |


筛选规则:(数据过滤)
-------------------------------------------------------------------------------------
| Method | URL                           | Info                             | Statu |
| ------ | ----------------------------- | -------------------------------- | ----- |
| GET    | /api/images?user=1234         | 筛选指定某用户发表的图片         |
| GET    | /api/images?choice=1234       | 筛选指定精选集下的图片           |
| GET    | /api/images?like=1234         | 筛选指定用户点赞的图片           |
| GET    | /api/images?tag=1234          | 筛选含有指定标签的图片           |
| GET    | /api/images?tag=1234,1235     | 筛选含有多个标签之一的图片(并集) |
| GET    | /api/images?tag=1234&tag=1235 | 筛选含有指定多个标签的图片(交集) |
| GET    | /api/images?user=1234&tag=123 | 筛选指定用户的指定标签图片(交集) |
| GET    | /api/images?date=20220214+    | 时间范围(之后)                   |
| GET    | /api/images?date=20220214-    | 时间范围(之前)                   |
| GET    | /api/images?date=2022~2023    | 时间范围(之间)                   |


排序规则:(权重强化)
-------------------------------------------------------------------------------------
| Method | URL                      | Info                                    | Status |
| ------ | ------------------------ | --------------------------------------- | ------ |
| GET    | /api/images?similar=1234 | 根据指定图片的相似图片(指定图片ID)      | ok     |
| GET    | /api/images?sort=date+   | 排序规则(相似图片查询时此项无效)        |
| GET    | /api/images?sort=like    | 根据用户偏好推荐(指定用户的偏好)        |
| GET    | /api/images?sort=history | 根据浏览记录推荐(指定用户的记录)        |
| GET    | /api/images?sort=choice  | 根据精选集推荐(指定精选集ID,取一组权重) |

* 注意, 筛选规则为多条件取交集, 单条件的复数取并集
* 权重强化属于排序规则而非过滤规则


### 獲取任務列表(標準查询)

GET /api/tasks
```javascript
{
    page: 1,             // 当前页码
    pageSize: 20,        // 分页数
    next: true,          // 是否存在下一页
    list: [{
        id: 'xxxxxx',    // 任務ID
        type: '',        // 任務類型(推理, 訓練)
        data: {},        // 任務執行數據
        create_time: ''  // 任務創建時間
        update_time: ''  // 任務更新時間
    }],
}
```

Websocket /api/tasks/{task_id}
```javascript
{
    // 狀態
    // 進度
    // 結果
}
```

* 通過websocket監聽任務狀態變化


### 模型(我的|共享|熱門|查詢)
### 圖片(我的|共享|熱門|查詢)
### 標籤


# 仅基于 ID 的双向推荐
用户ID
图片ID

输入用户ID u1 与图像ID p1, 使 u1 2=> p1
输入用户ID u1 与图像ID p1, 使 u1 3=> p1
输入用户ID u1 与图像ID p2, 使 u1 2=> p2
输入用户ID u2 与图像ID p2, 使 u2 2=> p2

则输入 u1 取得权重较前的 p, 推出 u1 相似性较大的u2, 使用 u2 的记录给 u1 推荐 x 太简陋


时序: u1 在浏览 p1 后转到了 p2, 则 p1 +> p2
组序: 向 u1 推荐了 p0 的 [p1, p2, p3, p4], 其只打开了 p1, 则p0 的 p2 p3 p4 权重 -1, p0 +> p1
u1 从 p0 > p1 时, 由于 p0 是已读不会再被读取, 如果组序中从p1推荐了p0, p0不被访问不代表相关性需要下降
既构建推荐组序时应以浏览历史作反向权重

指定用户 与 指定图像 的关联度
指定图像 与 指定图像 的关联度
指定用户 与 指定用户 的关联度