Curriculum vitae

Work experience

  
PhD CandidateOct 2019 - Present
1. Conduct research and analysis for cellular communication within spatial context in cancer using imaging and sequencing data 
2. Propose a novel computation package called STRISH to identify cells colocalisation with ligand-receptor throughout the tissue using spatially resolved multi-omics data (spatial transcriptomic and proteomic data) 
3. Adapt the spatial analyses (i.e. Delaunay triangulation, K nearest neighbors, Ripley’s K function) and heterogeneity scoring measurements (i.e. Shanon entropy, Rao’s quadratic entropy, graph modularity) to identify the spatial differences across cancer subtype 
4. Collaborate with other research groups to develop and conduct the analysis for scRNA-seq, ATAC-seq and spatial -omic data 
5. Tutor course introduction to Bioinformatic BINF6000 
Bioinformatic ScientistMar. 2019 - Oct. 2019
1. Design the workflows to analyse workflow to analyse DNA sequencing data to build the largest population-scale biomedical database for Vietnamese genome 
2. Develop API to automate data QC process and upload meta data to the genomic variant database for Vietnamese population 
3. Benchmark for accuracy and speed of the performance of variant calling platforms including Dragen (Illumina), DeepVariant (Google), and GATK 
AI EngineerOct. 2018 - Mar. 2019
1. Develop backbone of document analysis solution using machine learning (ML) and deep learning (DL) to extract information from contracts 
2. Leverage state-of-the-art NLP transformers BERT as the alternative solution for information extraction models to deliver higher accuracy 90%+ overall and F1 score at 0.86 with limited training data (approx. 100 samples) 
Visiting ScholarMar. 2018 - Sep. 2018
1. Chromosome conformation capture techniques (Hi-C) provide a wealth of data on the three-dimensional architecture of genomes 
2. Introduce dimensional reduction including Autoencoder, PCA and SVG and robust differential analysis for noisy and high dimensional Chromosome 3D maps 
3. Autoencoder achieved higher true positive value compared to PCA by 15%-30% proportion to the size of consideration block 

Education

Projects

  1. Develop methods to analyse cellular communication using high resolution images
    • Propose a novel cells colocalization detection throughout the tissue and demonstrate the significant of the results through a statistical test. The results contributed to a publication and a python package named STRISH.
    • Adapt the spatial analyses and heterogeneity scoring metrics to identify cell crosstalk within the tissue microenvironment. The analyses contributed to research manuscripts about spatial omic in cancer. Some examples are available in Example 1 and Example 2.
  2. Automated information retrieval of legal contracts
    • Participate in the development of two products which applied Machine Learning (MLP, tf-idf, etc) approaches to perform information retrieval from legal documents.
    • Develop an alternative approach to improve the performance of existing solution with pre-trainined BERT and achieve 90%+ overall accuracy and F1 score at 0.86
  3. Comparison of genome 3D structural with machine learning approaches
    • Inactive genes can interact with active genes via distal enhancer mechanisms including 3D space proximity. Meanwhile chromosome 3D organization is high dimensional data and contains noise. Using dimensional reduction provides more robust differential analysis of chromosome 3D maps.
    • Preprocessing methods including Autoencoder, PCA and SVG, capturing the interation between inactive and active genes via 3D space proximity
  4. Image Haze Removal with Generative Adversarial Networks (GAN)
    • Fog effects the quality of the images varies depending on the depth of the landscape. I proposed to GAN for single image haze removal task. The method partially resolved the problem of traditional haze removal methods with sky region or scatter light atmosphere.
    • Adapt GAN model to image haze removal task, resolving the challenges with sky regions or scatter light atmosphere. The approach overcome the varying effects of fog on quality of the images depending on the depth of the landscape (results are summarised here ).

Skills and Technologies

Programming Languages: Python, R, Bash Shell, JavaScript, Java, C/C++, SQL, LATEX

Developer Tools: Git, Docker, Anaconda, Slurm, PBS

Framework and Tools: sklearn, pandas, skimage, scipy, SimpleITK, tensorflow, Keras, QuPath, CellProfiler, napari, Seurat, scanpy

Publications

PDF Version

PDF Medium Version

PDF Short Version