read_book/用户画像设计方案.md at master

Files

1. 实现了用户阅读画像

2. 实现了全局检索功能

2026-01-11 14:40:31 +08:00

5.8 KiB

Raw Permalink Blame History

如何设计离线阅读画像系统

1. 第一阶段：画像模型设计

画像不是简单的计数，而是将原始数据（书名、字数、关键词）转化为认知模型。我们定义以下五个核心指标：

认知深度 (Cognition)：通过统计关键词（Keywords）的重复频次和专业程度。

知识广度 (Breadth)：统计不同书籍领域的分布（基于书名聚类或人工分类）。

实践应用 (Practicality)：识别 occupation 字段，职业相关（Professional）的心得占比。

产出效率 (Output)：计算总生成字数与任务完成率。

国际视野 (Global)：统计英文（en）与中文（zh）任务的比例。

2. 第二阶段：数据库实体定义 (Prisma/TypeORM)

假设你使用常用的本地 ORM。我们需要在数据库中增加一个画像缓存表，避免每次打开页面都进行全量计算。

/**
 * 阅读画像缓存实体
 * 存储计算后的分值，避免高频计算
 */
export interface IReadingPersona {
  id: string; // 固定 ID 如 'current_user_persona'
  cognition: number;      // 认知深度分 (0-100)
  breadth: number;        // 知识广度分 (0-100)
  practicality: number;   // 实践应用分 (0-100)
  output: number;         // 产出效率分 (0-100)
  global: number;         // 国际视野分 (0-100)
  topKeywords: string;    // 存储 Top 10 关键词的 JSON 字符串
  updatedAt: Date;        // 最后计算时间
}

3. 第三阶段：核心计算逻辑 (Main Process)

这是画像系统的“大脑”，负责执行 SQL 并转换数据。

import { IReadingReflectionTaskItem, IReadingReflectionTaskBatch } from '../types'

export class PersonaService {
  /**
   * 从数据库聚合数据并计算画像分值
   */
  async calculatePersona(items: IReadingReflectionTaskItem[], batches: IReadingReflectionTaskBatch[]) {
    // 1. 计算认知深度：根据关键词频次
    const allKeywords = items.flatMap(i => i.keywords || [])
    const keywordMap = new Map<string, number>()
    allKeywords.forEach(k => keywordMap.set(k, (keywordMap.get(k) || 0) + 1))

    // 逻辑：去重后的关键词越多且重复越高，分值越高 (示例算法)
    const cognitionScore = Math.min(100, (keywordMap.size * 2) + (allKeywords.length / 5))

    // 2. 计算知识广度：根据书籍数量
    const breadthScore = Math.min(100, batches.length * 10)

    // 3. 计算产出效率：根据总字数
    const totalWords = items.reduce((sum, i) => sum + (i.content?.length || 0), 0)
    const outputScore = Math.min(100, totalWords / 500) // 每 5万字满分

    // 4. 计算 Top 10 关键词
    const sortedKeywords = [...keywordMap.entries()]
      .sort((a, b) => b[1] - a[1])
      .slice(0, 10)
      .map(entry => entry[0])

    return {
      cognition: Math.round(cognitionScore),
      breadth: Math.round(breadthScore),
      output: Math.round(outputScore),
      practicality: 75, // 可根据 occupation 比例动态计算
      global: 60,       // 可根据 language 比例动态计算
      topKeywords: sortedKeywords
    }
  }
}

4. 第四阶段：前端可视化集成 (Statistics.vue)

使用 ECharts 渲染雷达图，将计算结果展现给用户。


<script setup lang="ts">
  import { onMounted, ref } from 'vue'
  import * as echarts from 'echarts'
  import { trpc } from '@renderer/lib/trpc'

  const chartRef = ref<HTMLElement | null>(null)
  const personaData = ref<any>(null)

  const initChart = (data: any) => {
    const myChart = echarts.init(chartRef.value!)
    const option = {
      radar: {
        indicator: [
          { name: '认知深度', max: 100 },
          { name: '知识广度', max: 100 },
          { name: '实践应用', max: 100 },
          { name: '产出效率', max: 100 },
          { name: '国际视野', max: 100 }
        ],
        shape: 'circle',
        splitNumber: 4,
        axisLine: { lineStyle: { color: '#E5E7EB' } },
        splitLine: { lineStyle: { color: '#E5E7EB' } },
        splitArea: { areaStyle: { color: ['#fff', '#F8F9FB'] } }
      },
      series: [{
        type: 'radar',
        data: [{
          value: [data.cognition, data.breadth, data.practicality, data.output, data.global],
          areaStyle: { color: 'rgba(120, 22, 255, 0.2)' },
          lineStyle: { color: '#7816ff', width: 3 },
          itemStyle: { color: '#7816ff' }
        }]
      }]
    }
    myChart.setOption(option)
  }

  onMounted(async () => {
    // 通过 tRPC 从主进程获取计算好的数据
    const data = await trpc.stats.getPersona.query()
    personaData.value = data
    initChart(data)
  })
</script>

<template>
  <div class="p-8 bg-white rounded-[32px] border border-slate-100 shadow-sm">
    <div class="mb-6">
      <h2 class="text-xl font-black text-slate-800">阅读心流画像</h2>
      <p class="text-xs text-slate-400">基于本地 {{ personaData?.totalBooks }} 本书籍分析得出</p>
    </div>

    <div ref="chartRef" class="w-full h-[400px]"></div>

    <div class="mt-8 flex flex-wrap gap-2">
      <span v-for="tag in personaData?.topKeywords" :key="tag"
            class="px-3 py-1 bg-slate-50 text-slate-500 text-[10px] rounded-full border border-slate-100">
        # {{ tag }}
      </span>
    </div>
  </div>
</template>

🎨 设计心得：为什么这么做？

数据解耦：原始任务数据（Items/Batches）与画像数据分开存储。即使画像计算逻辑升级（比如你想改变评分算法），也不需要修改历史心得数据。

缓存策略：不要在用户每次切换页面时都去扫全表。建议在任务完成时触发一次增量计算，或者每天用户第一次打开应用时刷新一次。

可视化反馈：雷达图的面积代表了用户的“知识疆域”。当用户看着面积一点点变大时，这种离线的成就感是留存的关键。

5.8 KiB Raw Permalink Blame History Unescape Escape