Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights in unstructured text. It is very easy to use, with no machine learning experience required. You can customize Comprehend for your specific use case, for example creating custom document classifiers to organize your documents into your own categories, or custom entity types that analyze text for your specific terms. However, medical terminology can be very complex and specific to the healthcare domain.
For this reason, we introduced last year Amazon Comprehend Medical, a HIPAA eligible natural language processing service that makes it easy to use machine learning to extract relevant medical information from unstructured text. Using Comprehend Medical, you can quickly and accurately gather information, such as medical condition, medication, dosage, strength, and frequency from a variety of sources like doctors’ notes, clinical trial reports, and patient health records.
Today, we are adding the capability of linking the information extracted by Comprehend Medical to medical ontologies.
An ontology provides a declarative model of a domain that defines and represents the concepts existing in that domain, their attributes, and the relationships between them. It is typically represented as a knowledge base, and made available to applications that need to use or share knowledge. Within health informatics, an ontology is a formal description of a health-related domain.
The ontologies supported by Comprehend Medical are:
For each ontology, Comprehend Medical returns a ranked list of potential matches. You can use confidence scores to decide which matches make sense, or what might need further review. Let’s see how this works with an example.
Using Ontology Linking
In the Comprehend Medical console, I start by giving some unstructured, doctor notes in input:
At first, I use some functionalities that were already available in Comprehend Medical to detect medical and protected health information (PHI) entities.
Among the recognized entities (see this post for more info) there are some symptoms and medications. Medications are recognized as generics or brands. Let’s see how we can connect some of these entities to more specific concepts.
I use the new features to link those entities to RxNorm concepts for medications.
In the text, only the parts mentioning medications are detected. In the details of the answer, I see more information. For example, let’s look at one of the detected medications:
To look for for medical conditions using ICD-10-CM concepts, I am giving a different input:
The idea again is to link the detected entities, like symptoms and diagnoses, to specific concepts.
As expected, diagnoses and symptoms are recognized as entities. In the detailed results those entities are linked to the medical conditions in the ICD-10-CM ontology. For example, the two main diagnoses described in the input text are the top results, and specific concepts in the ontology are inferred by Comprehend Medical, each with its own score.
In production, you can use Comprehend Medical via API, to integrate these functionalities with your processing workflow. All the screenshots above render visually the structured information returned by the API in JSON format. For example, this is the result of detecting medications (RxNorm concepts):
{
"Entities": [
{
"Id": 0,
"Text": "Clonidine",
"Category": "MEDICATION",
"Type": "GENERIC_NAME",
"Score": 0.9933062195777893,
"BeginOffset": 83,
"EndOffset": 92,
"Attributes": [],
"Traits": [],
"RxNormConcepts": [
{
"Description": "Clonidine",
"Code": "2599",
"Score": 0.9148101806640625
},
{
"Description": "168 HR Clonidine 0.00417 MG/HR Transdermal System",
"Code": "998671",
"Score": 0.8215734958648682
},
{
"Description": "Clonidine Hydrochloride 0.025 MG Oral Tablet",
"Code": "892791",
"Score": 0.7519310116767883
},
{
"Description": "10 ML Clonidine Hydrochloride 0.5 MG/ML Injection",
"Code": "884225",
"Score": 0.7171697020530701
},
{
"Description": "Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884185",
"Score": 0.6776907444000244
}
]
},
{
"Id": 1,
"Text": "Vyvanse",
"Category": "MEDICATION",
"Type": "BRAND_NAME",
"Score": 0.9995427131652832,
"BeginOffset": 148,
"EndOffset": 155,
"Attributes": [
{
"Type": "DOSAGE",
"Score": 0.9910679459571838,
"RelationshipScore": 0.9999822378158569,
"Id": 2,
"BeginOffset": 156,
"EndOffset": 162,
"Text": "50 mgs",
"Traits": []
},
{
"Type": "ROUTE_OR_MODE",
"Score": 0.9997182488441467,
"RelationshipScore": 0.9993833303451538,
"Id": 3,
"BeginOffset": 163,
"EndOffset": 165,
"Text": "po",
"Traits": []
},
{
"Type": "FREQUENCY",
"Score": 0.983681321144104,
"RelationshipScore": 0.9999642372131348,
"Id": 4,
"BeginOffset": 166,
"EndOffset": 184,
"Text": "at breakfast daily",
"Traits": []
}
],
"Traits": [],
"RxNormConcepts": [
{
"Description": "lisdexamfetamine dimesylate 50 MG Oral Capsule [Vyvanse]",
"Code": "854852",
"Score": 0.8883932828903198
},
{
"Description": "lisdexamfetamine dimesylate 50 MG Chewable Tablet [Vyvanse]",
"Code": "1871469",
"Score": 0.7482635378837585
},
{
"Description": "Vyvanse",
"Code": "711043",
"Score": 0.7041242122650146
},
{
"Description": "lisdexamfetamine dimesylate 70 MG Oral Capsule [Vyvanse]",
"Code": "854844",
"Score": 0.23675969243049622
},
{
"Description": "lisdexamfetamine dimesylate 60 MG Oral Capsule [Vyvanse]",
"Code": "854848",
"Score": 0.14077001810073853
}
]
},
{
"Id": 5,
"Text": "Clonidine",
"Category": "MEDICATION",
"Type": "GENERIC_NAME",
"Score": 0.9982216954231262,
"BeginOffset": 199,
"EndOffset": 208,
"Attributes": [
{
"Type": "STRENGTH",
"Score": 0.7696017026901245,
"RelationshipScore": 0.9999960660934448,
"Id": 6,
"BeginOffset": 209,
"EndOffset": 216,
"Text": "0.2 mgs",
"Traits": []
},
{
"Type": "DOSAGE",
"Score": 0.777644693851471,
"RelationshipScore": 0.9999927282333374,
"Id": 7,
"BeginOffset": 220,
"EndOffset": 236,
"Text": "1 and 1 / 2 tabs",
"Traits": []
},
{
"Type": "ROUTE_OR_MODE",
"Score": 0.9981689453125,
"RelationshipScore": 0.999950647354126,
"Id": 8,
"BeginOffset": 237,
"EndOffset": 239,
"Text": "po",
"Traits": []
},
{
"Type": "FREQUENCY",
"Score": 0.99753737449646,
"RelationshipScore": 0.9999889135360718,
"Id": 9,
"BeginOffset": 240,
"EndOffset": 243,
"Text": "qhs",
"Traits": []
}
],
"Traits": [],
"RxNormConcepts": [
{
"Description": "Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884185",
"Score": 0.9600071907043457
},
{
"Description": "Clonidine Hydrochloride 0.025 MG Oral Tablet",
"Code": "892791",
"Score": 0.8955953121185303
},
{
"Description": "24 HR Clonidine Hydrochloride 0.2 MG Extended Release Oral Tablet",
"Code": "885880",
"Score": 0.8706559538841248
},
{
"Description": "12 HR Clonidine Hydrochloride 0.2 MG Extended Release Oral Tablet",
"Code": "1013937",
"Score": 0.786146879196167
},
{
"Description": "Chlorthalidone 15 MG / Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884198",
"Score": 0.601354718208313
}
]
}
],
"ModelVersion": "0.0.0"
}
Similarly, this is the output when detecting medical conditions (ICD-10-CM concepts):
{
"Entities": [
{
"Id": 0,
"Text": "coronary artery disease",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9933860898017883,
"BeginOffset": 90,
"EndOffset": 113,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9682672023773193
}
],
"ICD10CMConcepts": [
{
"Description": "Atherosclerotic heart disease of native coronary artery without angina pectoris",
"Code": "I25.10",
"Score": 0.8199513554573059
},
{
"Description": "Atherosclerotic heart disease of native coronary artery",
"Code": "I25.1",
"Score": 0.4950370192527771
},
{
"Description": "Old myocardial infarction",
"Code": "I25.2",
"Score": 0.18753206729888916
},
{
"Description": "Atherosclerotic heart disease of native coronary artery with unstable angina pectoris",
"Code": "I25.110",
"Score": 0.16535982489585876
},
{
"Description": "Atherosclerotic heart disease of native coronary artery with unspecified angina pectoris",
"Code": "I25.119",
"Score": 0.15222692489624023
}
]
},
{
"Id": 2,
"Text": "atrial fibrillation",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9923409223556519,
"BeginOffset": 116,
"EndOffset": 135,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9708861708641052
}
],
"ICD10CMConcepts": [
{
"Description": "Unspecified atrial fibrillation",
"Code": "I48.91",
"Score": 0.7011875510215759
},
{
"Description": "Chronic atrial fibrillation",
"Code": "I48.2",
"Score": 0.28612759709358215
},
{
"Description": "Paroxysmal atrial fibrillation",
"Code": "I48.0",
"Score": 0.21157972514629364
},
{
"Description": "Persistent atrial fibrillation",
"Code": "I48.1",
"Score": 0.16996538639068604
},
{
"Description": "Atrial premature depolarization",
"Code": "I49.1",
"Score": 0.16715925931930542
}
]
},
{
"Id": 3,
"Text": "hypertension",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9993137121200562,
"BeginOffset": 138,
"EndOffset": 150,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9734011888504028
}
],
"ICD10CMConcepts": [
{
"Description": "Essential (primary) hypertension",
"Code": "I10",
"Score": 0.6827990412712097
},
{
"Description": "Hypertensive heart disease without heart failure",
"Code": "I11.9",
"Score": 0.09846580773591995
},
{
"Description": "Hypertensive heart disease with heart failure",
"Code": "I11.0",
"Score": 0.09182810038328171
},
{
"Description": "Pulmonary hypertension, unspecified",
"Code": "I27.20",
"Score": 0.0866364985704422
},
{
"Description": "Primary pulmonary hypertension",
"Code": "I27.0",
"Score": 0.07662317156791687
}
]
},
{
"Id": 4,
"Text": "hyperlipidemia",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9998835325241089,
"BeginOffset": 153,
"EndOffset": 167,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9702492356300354
}
],
"ICD10CMConcepts": [
{
"Description": "Hyperlipidemia, unspecified",
"Code": "E78.5",
"Score": 0.8378056883811951
},
{
"Description": "Disorders of lipoprotein metabolism and other lipidemias",
"Code": "E78",
"Score": 0.20186281204223633
},
{
"Description": "Lipid storage disorder, unspecified",
"Code": "E75.6",
"Score": 0.18514418601989746
},
{
"Description": "Pure hyperglyceridemia",
"Code": "E78.1",
"Score": 0.1438658982515335
},
{
"Description": "Other hyperlipidemia",
"Code": "E78.49",
"Score": 0.13983778655529022
}
]
},
{
"Id": 5,
"Text": "chills",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9989762306213379,
"BeginOffset": 211,
"EndOffset": 217,
"Attributes": [],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.9510533213615417
}
],
"ICD10CMConcepts": [
{
"Description": "Chills (without fever)",
"Code": "R68.83",
"Score": 0.7460958361625671
},
{
"Description": "Fever, unspecified",
"Code": "R50.9",
"Score": 0.11848161369562149
},
{
"Description": "Typhus fever, unspecified",
"Code": "A75.9",
"Score": 0.07497859001159668
},
{
"Description": "Neutropenia, unspecified",
"Code": "D70.9",
"Score": 0.07332006841897964
},
{
"Description": "Lassa fever",
"Code": "A96.2",
"Score": 0.0721040666103363
}
]
},
{
"Id": 6,
"Text": "nausea",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9993392825126648,
"BeginOffset": 220,
"EndOffset": 226,
"Attributes": [],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.9175007939338684
}
],
"ICD10CMConcepts": [
{
"Description": "Nausea",
"Code": "R11.0",
"Score": 0.7333012819290161
},
{
"Description": "Nausea with vomiting, unspecified",
"Code": "R11.2",
"Score": 0.20183530449867249
},
{
"Description": "Hematemesis",
"Code": "K92.0",
"Score": 0.1203150525689125
},
{
"Description": "Vomiting, unspecified",
"Code": "R11.10",
"Score": 0.11658868193626404
},
{
"Description": "Nausea and vomiting",
"Code": "R11",
"Score": 0.11535880714654922
}
]
},
{
"Id": 8,
"Text": "flank pain",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9315784573554993,
"BeginOffset": 235,
"EndOffset": 245,
"Attributes": [
{
"Type": "ACUITY",
"Score": 0.9809532761573792,
"RelationshipScore": 0.9999837875366211,
"Id": 7,
"BeginOffset": 229,
"EndOffset": 234,
"Text": "acute",
"Traits": []
}
],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.8182812929153442
}
],
"ICD10CMConcepts": [
{
"Description": "Unspecified abdominal pain",
"Code": "R10.9",
"Score": 0.4959934949874878
},
{
"Description": "Generalized abdominal pain",
"Code": "R10.84",
"Score": 0.12332479655742645
},
{
"Description": "Lower abdominal pain, unspecified",
"Code": "R10.30",
"Score": 0.08319114148616791
},
{
"Description": "Upper abdominal pain, unspecified",
"Code": "R10.10",
"Score": 0.08275411278009415
},
{
"Description": "Jaw pain",
"Code": "R68.84",
"Score": 0.07797083258628845
}
]
},
{
"Id": 10,
"Text": "numbness",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9659366011619568,
"BeginOffset": 255,
"EndOffset": 263,
"Attributes": [
{
"Type": "SYSTEM_ORGAN_SITE",
"Score": 0.9976192116737366,
"RelationshipScore": 0.9999089241027832,
"Id": 11,
"BeginOffset": 271,
"EndOffset": 274,
"Text": "leg",
"Traits": []
}
],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.7310190796852112
}
],
"ICD10CMConcepts": [
{
"Description": "Anesthesia of skin",
"Code": "R20.0",
"Score": 0.767346203327179
},
{
"Description": "Paresthesia of skin",
"Code": "R20.2",
"Score": 0.13602739572525024
},
{
"Description": "Other complications of anesthesia",
"Code": "T88.59",
"Score": 0.09990577399730682
},
{
"Description": "Hypothermia following anesthesia",
"Code": "T88.51",
"Score": 0.09953102469444275
},
{
"Description": "Disorder of the skin and subcutaneous tissue, unspecified",
"Code": "L98.9",
"Score": 0.08736388385295868
}
]
}
],
"ModelVersion": "0.0.0"
}
Available Now
You can use Amazon Comprehend Medical via the console, AWS Command Line Interface (CLI), or AWS SDKs. With Comprehend Medical, you pay only for what you use. You are charged based on the amount of text processed on a monthly basis, depending on the features you use. For more information, please see the Comprehend Medical section in the Comprehend Pricing page. Ontology Linking is available in all regions were Amazon Comprehend Medical is offered, as described in the AWS Regions Table.
The new ontology linking APIs make it easy to detect medications and medical conditions in unstructured clinical text and link them to RxNorm and ICD-10-CM codes respectively. This new feature can help you reduce the cost, time and effort of processing large amounts of unstructured medical text with high accuracy.
— Danilo
Source: AWS News