将医生记录转化为时间医疗记录知识图谱的案例研究

2024年11月15日 由 alex 发表 102 0

是否对将医生/患者的医疗记录和病历转录转化为时间知识图谱感兴趣?这样你就可以在多个病历、时间段和患者之间提出复杂问题。


在本案例研究中,我们展示了如何将医疗病历转录转化为时间知识图谱,以便用于资源分配图(RAG)和分析等目的。我们展示了针对此系统的真实问答,以及通过此系统可以实现哪些业务成果。据我们所知,此处的步骤组合是一种相对新颖的知识图谱实施方式。


所用数据

出于数据隐私原因,我们使用了由Synthea数据合成的医疗病历转录数据集。以下是作为知识图谱创建输入数据之一的医疗病历转录示例。我们将这些病历转录数据与Synthea数据中的结构化医疗记录相结合。我们拥有约75份病历转录,涵盖了10名患者(即每名患者有5-10份病历转录)。以下是使用的一份病历转录示例:


1


新颖的知识图谱架构概述


节点:

我们有5种类型的节点:患者、观察结果、免疫接种、病症和就诊类型。


三元组(示例列表):

患者 -> 有就诊记录 -> 就诊

患者 -> 患有 -> 病症

患者 -> 接种了 -> 免疫接种

患者 -> 有测量结果 -> 观察结果


块(Chunks):

块是独立的文本对象。块与每个三元组相关联,并且一个三元组可以关联多个块。在这种情况下,块不是三元组的非结构化来源,而是与每种三元组类型相关的摘要和关键点。因此,我们有6种类型的块:患者人口统计块、病症摘要块、就诊块、观察结果块、免疫接种块和病症详细块。


以下是不同类型块与三元组关联的一个示例:


1. Patient -> EncounterType
Triple: (Patient) -[had_encounter]-> (EncounterType)
- Chunk_ids link to specific visit instances
- Example Chunk: "Annual physical on 2024–01–15. BP 120/80, routine screenings 
updated."
2. Patient -> Condition
Triple: (Patient) -[has_condition]-> (Condition)
- Chunk_ids link to condition episodes
- Example Chunk: "Diagnosed with hypertension on 2020–03–10. Status: active. 
Managed with medication."
3. Patient -> Immunization
Triple: (Patient) -[received]-> (Immunization)
- Chunk_ids link to administration records
- Example Chunk: "Influenza vaccine administered on 2024–01–15."
4. Patient -> Observation
Triple: (Patient) -[has_measurement]-> (Observation)
- Chunk_ids link to measurement instances
- Example Chunk: "2024–01–15: Blood Pressure 120/80 mmHg, Weight 70kg."


2


这种特定的图架构允许你将关键点和摘要与三元组相关联,从而可以通过非结构化搜索聚焦于正确的三元组集合,然后通过链接的块以结构化的方式引入所有相关信息。


WhyHow架构的独特之处

WhyHow图基础设施中有几个独特之处,使我们能够以简单的方式构建这种架构。


首先,三元组通过向量搜索进行嵌入和检索,避免了常见的检索问题,即必须使用Text2Cypher来识别节点、关系,然后构建Cypher查询才能找到正确的三元组。这已显示出能将检索准确性提高高达三倍。


其次,在WhyHow中,三元组是独立的对象,你可以将块链接到它们。这允许你提炼出每个三元组你想要检索的关键信息,并在找到正确的三元组后直接将其带入上下文中。这避免了必须在图格式中表示关键信息和上下文(使模式构建过程复杂化),而是在初始的非结构化向量搜索之后以结构化的方式引入信息。这与LinkedIn在其系统中应用知识图的过程相似,其中像“重现步骤”这样的关键信息也是以类似的方式表示和检索的,并且这些步骤本身被表示为单独的“块”/“节点”。


第三,WhyHow接受JSON格式的数据,这使得任何提取框架都可以直接与图的创建进行无缝交互。在这种情况下,我们使用Claude将转录数据初步转换为加载到WhyHow所需的JSON结构。如果你已经有以JSON格式存储的信息,那么将数据加载到WhyHow中就简单得多了。


第四,由于WhyHow系统中块和检索过程的设计方式,你可以轻松地包含时间数据,这些数据可以用来控制答案的构建方式。时间数据一直是知识图中难以建模的部分(以至于领先的KG专家通常建议不要这样做),但它显然是工作流程中非常重要的一部分。现有的尝试建模时间数据的方法甚至试图将其纳入知识图本身,然后基于结构化的Cypher查询进行检索,而我们的架构则独特地使用大型语言模型(LLM)来帮助过滤时间数据。


将大型语言模型(LLM)与结构化知识表示(如知识图)相结合是实现业务成果的重要途径,我们认为这种时间知识图架构将通过成功实施时间数据为业务带来巨大的价值。


使用的数据转换过程

首先,我们使用Claude将转录信息转换为每个转录本基础上的模式对齐信息集。结合来自结构化医疗记录的信息,转录本被转换成一个JSON摘要,如下所示:


PATIENT SUMMARY
Name: Joseph Crona
DOB: 2022–08–29
Age: 2 years
Gender: male
MRN: #dbfbaa
CURRENT MEASUREMENTS (as of 2024–08–05)
Height: 84.1cm (50th percentile)
Weight: 14.5kg (52nd percentile)
ALLERGIES
No known allergies
IMMUNIZATIONS
- DTaP: 2022–12–05, 2023–02–06, 2023–03–06, 2024–02–05
- Hepatitis A: 2023–11–06
- Hepatitis B: 2022–08–29, 2022–10–03, 2023–03–06
- Hib: 2022–12–05, 2023–02–06, 2023–11–06
- Influenza: 2023–03–06, 2024–08–05
- MMR: 2023–11–06
- PCV13: 2022–12–05, 2023–02–06, 2023–03–06, 2023–11–06
- Polio: 2022–12–05, 2023–02–06, 2023–03–06
- Rotavirus: 2022–12–05, 2023–02–06
- Varicella: 2023–11–06
MEDICAL HISTORY
- Viral sinusitis (disorder)
Onset: 2023–03–13
Status: resolved
Outcome: Resolved
GROWTH & DEVELOPMENT
- 2023–11–06: Body Weight: 12.7 kg
- 2024–02–05: Body Height: 79 cm
- 2024–02–05: Body Weight: 13.4 kg
- 2024–08–05: Body Height: 84.1 cm
- 2024–08–05: Body Weight: 14.5 kg
Development: Age-appropriate milestones met
- Gross motor: Age appropriate
- Fine motor: Age appropriate
- Language: Age appropriate
- Social: Age appropriate
PREVENTIVE CARE
Well-Child Visits:
- 2024–08–05: 2yo well visit - Development on track
- 2024–02–05: 1yo well visit - Development on track
- 2023–11–06: 1yo well visit - Development on track
- 2023–08–07: 1yo well visit - Development on track
- 2023–05–08: 9mo well visit - Age appropriate exam completed
- 2023–02–06: 6mo well visit - Age appropriate exam completed
- 2022–12–05: 4mo well visit - Age appropriate exam completed
- 2022–10–03: 2mo well visit - Age appropriate exam completed
- 2022–08–29: Newborn visit - Normal exam
FAMILY HISTORY
Mother: Healthy
Father: Healthy
Siblings: None documented
SOCIAL HISTORY
Living Situation: Lives with parents
Development: Meeting age-appropriate milestones
Sleep: Age-appropriate pattern
Nutrition: Age-appropriate diet


其次,我们将这个JSON模式映射到WhyHow模式中,然后将所有信息导入到WhyHow.AI知识图谱工作室(KG Studio)中。


以下是最终加载到WhyHow中的知识图谱结构的一个示例。


Knowledge Graph Structure (Timeless):

Nodes:
1. Patient Node
  Structure: {
      name: str,         # "John Smith"
      label: "Patient",
      properties: {
          gender: str,   # FHIR gender
          patient_type: str  # "adult" | "pediatric"
      },
      chunk_ids: List[str]  # Links to demographic chunks
  }

2. EncounterType Node
  Structure: {
      name: str,         # "Well-child visit" | "Annual physical"
      label: "EncounterType",
      properties: {
          category: str,  # "preventive" | "acute" | "chronic"
          specialty: str  # "primary_care" | "pediatrics" | "emergency"
      },
      chunk_ids: List[str]  # Links to visit pattern chunks
  }

3. Condition Node
  Structure: {
      name: str,         # "Essential hypertension"
      label: "Condition",
      properties: {
          category: str,     # "chronic" | "acute" | "resolved"
          system: str,       # "respiratory" | "cardiovascular" | etc
          is_primary: bool   # True if primary diagnosis
      },
      chunk_ids: List[str]  # Links to condition history chunks
  }

4. Immunization Node
  Structure: {
      name: str,         # "DTaP" | "MMR"
      label: "Immunization",
      properties: {
          series: str,       # "primary" | "booster"
          target: str        # "tetanus" | "measles" | etc
      },
      chunk_ids: List[str]  # Links to immunization records
  }

5. Observation Node
  Structure: {
      name: str,         # "Blood Pressure" | "Height"
      label: "Observation",
      properties: {
          category: str,     # "vital" | "lab" | "growth"
          unit: str         # "mmHg" | "cm" | etc
      },
      chunk_ids: List[str]  # Links to measurement records
  }

Relations:
1. Patient -> EncounterType
  Triple: (Patient) -[had_encounter]-> (EncounterType)
  - Chunk_ids link to specific visit instances

2. Patient -> Condition
  Triple: (Patient) -[has_condition]-> (Condition)
  - Chunk_ids link to condition episodes

3. Patient -> Immunization
  Triple: (Patient) -[received]-> (Immunization)
  - Chunk_ids link to administration records

4. Patient -> Observation
  Triple: (Patient) -[has_measurement]-> (Observation)
  - Chunk_ids link to measurement instances

5. Condition -> EncounterType
  Triple: (Condition) -[managed_in]-> (EncounterType)
  - Links conditions to typical encounter types

6. Immunization -> EncounterType
  Triple: (Immunization) -[given_during]-> (EncounterType)
  - Links vaccines to visit types


第三,我们在每次自然语言查询后,都会运行一个自定义提示,将从知识图谱中检索到的三元组置于上下文中。


有了这样的架构,一个有趣的地方是,我们现在可以轻松地向知识图谱中添加关于患者就诊、患者治疗和病情的信息,因为这仅仅是向现有的三元组中添加额外的块的问题。如果患者患上了新疾病,就会在患者节点上添加额外的病情节点。


这个过程花费了25个开发小时,可以分解为以下部分:

  • 2小时(8%)用于查看和理解数据(探索性数据分析)
  • 18小时(72%)用于迭代模式,确定哪些节点应该在图中,哪些节点应该与哪些节点相连,哪些块应该存在,它们应该如何与各种三元组相连,使用一系列问题测试检索到的答案,并据此进行迭代。
  • 2小时(8%)用于编写代码以创建要加载的三元组集
  • 3小时(12%)用于编写验证检查和输出检查,以捕捉任何错误


针对医疗记录知识图谱的问题与回答

使用WhyHow自然语言图查询引擎从知识图谱中检索到相关上下文后,用于构建答案的提示。


    You are an AI assistant specializing in medical records analysis. 
Use the following information to answer the user's question. 's question. 
    The information is derived from a knowledge graph of patient medical records.
    Relevant Nodes (these represent patients, encounters, and conditions):
    {node_context}
    Relevant Relationships (these show connections between patients, 
encounters, and conditions):
    {triple_context}
    Relevant Medical Records:
    {chunk_context}
    Answer the question using the following steps:
    1. Analyze the medical information, focusing on the specific topic of the 
question
    2. Identify key medical events, conditions, and temporal relationships
    3. Formulate a clear and concise answer that directly addresses the 
question
    4. If there are any uncertainties in the medical information, mention 
them explicitly
    5. If you don't have enough information to answer confidently, say so
    6. Consider temporal relationships between medical events
    7. It's currently {datetime.now().strftime("%Y-%m-%d")}. Consider this 
when interpreting dates
    8. Maintain patient privacy by using only first names or initials when 
necessary
    Question: {question}


患者治疗分析


历史患者分析

问题:Joseph Crona在第一年内接受了哪些类型的儿童健康检查和免疫接种?


答案:


To determine the types of well-child visits and immunizations Joseph Crona 
received in his first year, we need to consider the relevant medical events 
and their timing. Joseph was born in 2022, and it is currently November 9, 2024.
Therefore, his first year would span from 2022 to 2023.
**Well-Child Visits:**
- Joseph had an encounter categorized as a "Well child visit," which falls 
under preventive care in pediatrics. This type of visit is typically part of 
routine check-ups for children, especially in their first year.
**Immunizations:**
During his first year, Joseph received the following immunizations, all of 
which are common for pediatric patients:
1. **Hep B, adolescent or pediatric** - Completed
2. **Hib (PRP-OMP)** - Completed
3. **Rotavirus, monovalent** - Completed
4. **IPV (Inactivated Poliovirus Vaccine)** - Completed
5. **DTaP (Diphtheria, Tetanus, and Pertussis)** - Completed
6. **Pneumococcal conjugate PCV 13** - Completed
These immunizations are typically administered during the first year of life 
as part of the standard immunization schedule for infants.
In summary, Joseph Crona received a well-child visit and completed a series of
standard pediatric immunizations during his first year.


问题:对于患有中耳炎的患者,记录的就诊和观察顺序是什么?


答案:


To address the question regarding the sequence of encounters and observations 
for patients with otitis media, we will focus on the medical records of Miles 
Dickinson, as he is the patient identified with this condition.
1. **Condition Onset:**
- Miles Dickinson was diagnosed with otitis media on June 10, 2024. The 
condition is currently resolved.
2. **Encounters:**
- Miles had two types of encounters recorded:
- **Well Child Visit:** This is categorized as a preventive encounter in 
pediatrics.
- **Encounter for Symptom:** This is categorized as a follow-up encounter in 
primary care.
3. **Observations:**
- Various observations were recorded for Miles, including:
- **Pain Severity:** Recorded multiple times with scores ranging from 0 to 4.
- **Body Weight:** Recorded with values ranging from 3.2 kg to 8.2 kg.
- **Head Occipital-frontal Circumference and Percentile:** Recorded with 
specific values in cm and percentiles.
- **Heart Rate:** Recorded with values ranging from 61 to 100 beats per minute.
- **Respiratory Rate:** Recorded with values ranging from 12 to 16 breaths per
minute.
4. **Temporal Sequence:**
- The otitis media diagnosis occurred on June 10, 2024. Observations and 
encounters before this date may have been related to other conditions or 
routine check-ups. Observations and encounters after this date could be 
related to follow-up care for otitis media or other health assessments.
5. **Conclusion:**
- The sequence of encounters and observations for Miles Dickinson includes 
preventive and follow-up encounters, along with various health observations. 
The specific sequence related to otitis media is not detailed in terms of exact
dates for each observation or encounter, but the diagnosis date provides a 
temporal marker for understanding subsequent medical activities.
If more detailed temporal data on each observation and encounter were 
available, a more precise sequence could be established. However, based on 
the current information, this is the overview of the sequence related to 
otitis media for Miles Dickinson.


历史多患者分析

问题:有多少患者被诊断为急性支气管炎,其典型发病模式是什么?


答案:


Based on the provided medical records, there are two patients diagnosed with 
acute bronchitis: Maximo Swaniawski and Bennie Herman.
### Onset Patterns:
- **Maximo Swaniawski**:
- Onset dates for acute bronchitis:
- October 24, 2014
- November 7, 2015
- November 29, 2018
- **Bennie Herman**:
- Onset dates for acute bronchitis:
- October 24, 2014
- November 7, 2015
- November 29, 2018
### Typical Onset Patterns:
The onset dates for both patients are identical, suggesting a pattern where 
acute bronchitis episodes occurred in late October or November. This could 
indicate a seasonal pattern, possibly related to colder weather or other 
seasonal factors that might contribute to respiratory conditions.
Both patients have had their conditions resolved, indicating that the episodes 
of acute bronchitis were acute and not chronic.


问题:哪些患者同时患有病毒性鼻窦炎和支气管炎,并且他们的最早发病日期是什么时候?


答案:


To determine which patients had both viral sinusitis and bronchitis, 
we need to analyze the conditions associated with each patient and their 
onset dates.
1. **Maximo Swaniawski**:
- Conditions: Acute bronchitis
- Earliest onset of bronchitis: 2014–10–24
2. **Cassandra Monahan**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12
3. **Bennie Herman**:
- Conditions: Acute bronchitis
- Earliest onset of bronchitis: 2014–10–24
4. **Deangelo Price**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12
5. **Joseph Crona**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12
From the information provided, none of the patients have both viral sinusitis 
and bronchitis. Each patient has either one condition or the other, but not 
both. Therefore, there are no patients with both conditions, and consequently, 
no earliest onset dates for both conditions in a single patient.



文章来源:https://medium.com/enterprise-rag/case-study-turning-doctor-transcripts-into-temporal-medical-record-knowledge-graphs-cf624d4927eb
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
热门职位
Maluuba
20000~40000/月
Cisco
25000~30000/月 深圳市
PilotAILabs
30000~60000/年 深圳市
写评论取消
回复取消