Wednesday, April 22, 2020

Using word2vec on Cancer trials for similar Treatments, Biomarkers and Indications


This is an attempt to use gensim to find similar indications or treatments or biomarker for a given input. The model was trained on collection of 1,000,000 eligibility criteria of oncology trials.

The aim of this analysis was to explore alternative methods to quickly find similar clinical concepts for a particular given input. The workflow is pretty straight forward - create a corpus out of 1 million eligibility criteria, remove stop words, tokenize, then train for word2vec. Essentially what it represents every word in a N-dimensional space, we chose the N as 100.

Full code here.

As you can see below AML is represented with 100 numbers - basically it is the vector representation of the word 'AML'. It is referred as embeddings.


Now as next step we can input any clinical term and ask the program to provide top 10 terms very similar to the input. These 10 terms will be computed using the embeddings. Some interesting findings below

Example 1: AML

Acute Myeloid Leukemia (AML) is a cancer of the blood and bone marrow with excess immature white blood cells.

AML progresses rapidly, hence the name "acute" myeloid leukemia.
^ This is interesting as MDS (Myelodysplastic syndrome) is precursor to AML, MDS trial is often combined with AML.


Example 2: Pembrolizumab

Pembrolizumab is a blockbuster drug from Merck (trade name Keytruda).

Pembrolizumab is a CheckPoint Inhibitor which targets PD-1/PD-L1. Interestingly we can see a few more CPIs in the results like Nivolumab, Tremelimumab. Additional, these CPIs are often combined with mAb like Bevacizumab or chemotherapy like Carboplatin.


Example 3: Enzalutamide

Enzalutamide is novel anti-androgen drug (NAAD) used for treating indications like Castration Resistant Prostate Cancer (CRPC).

This doesn't align well with the expectations. I was hoping to get other NAADs like Apalutamide or Abiraterone. However the algorithm does identify terms like 'Androgen' as relevant.


Example 4: Estrogen

^ This is again a moderately useful result. Estrogen is one of the hormone and Estrogen receptor plays a key role in a type of Breast Cancer, the other 2 being Progesteron and HER2 (both missing). When all three have negative status we call it a Triple-Negative Breast Cancer, which you can see above as TNBC.


Example 5: Sorafenib




No comments:

Post a Comment