Skip to content

statgenetics/TransCisPredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TransCisPredict: A Comprehensive Framework for Protein Level Prediction

This repository contains the complete analysis pipeline for the comprehensive framework to predict protein levels that incorporates both cis- and trans- variants to facilitate conducting proteome-wide association studies (PWAS) on a biobank scale. The weights that have been generated from UK Biobank are also available through Synapse ID: 69052240.

Repository Structure

TransCisPredict/
├── README.md                    # This file
├── readme_scripts.md            # Detailed analysis pipeline documentation
├── step1_data_processing/       # Raw data processing and QC
├── step2_covariate_regression/  # Covariate adjustment
├── step3_ld_block_selection/    # Genomic region selection
├── step4_cross_validation/      # CV using BayesR, SuSiE, LASSO, and Elastic Net
├── step5_cv_evaluation/         # CV performance evaluation
├── step6_whole_sample_analysis/ # Weight Estimation using the "optimal" method
├── step7_population_prediction/ # Protein expression level prediction in the target sample
├── step8_pwas_analysis/         # Proteome-wide association analyses (PWAS)
└── utilities/                   # Commonly used functions

Quick Start

a. Setup Environment: Install required R packages (see readme_scripts.md)

b. Configure Paths: Modify placeholder paths in each script's configuration section

c. Obtain Prediction Weights: Either download pre-computed weights from Synapse ID: 69052240, or generate custom weights by running steps 1-6 of the analysis pipeline

d. Predict Protein Levels: Use the obtained weights to predict protein expression levels in your target sample (see step7 scripts)

e. Perform PWAS: Conduct proteome-wide association studies using the predicted protein levels (see step8 script)

Analysis Pipeline

The complete analysis pipeline with comprehensive documentation and usage examples is available in readme_scripts.md. Each step is self-contained with clear input/output specifications.

Applications

This framework enables:

  • Weight Estimation: Generate weight estimation between protein expression level and genetic vriants from the reference sample
  • Protein Expression Prediction: Predict heritable component of protein levels using only genotype data from the target sample
  • PWAS in Target Cohort: Conduct PWAS to identify associations with complex traits

Citation

Please cite the associated manuscript when using these analysis scripts and prediction weights for research purposes:

Dong R, Lamb D, Wang GT, DeWan AT, Leal SM. Leveraging cis- and trans-variants to improve plasma protein level prediction for proteome-wide association studies (2025). Submitted.

License

This repository is intended for academic research and manuscript reproducibility. Please follow Guidance on Researcher Responsibilities for UK Biobank resource.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages