TransCisPredict: A Comprehensive Framework for Protein Level Prediction

This repository contains the complete analysis pipeline for the comprehensive framework to predict protein levels that incorporates both cis- and trans- variants to facilitate conducting proteome-wide association studies (PWAS) on a biobank scale. The weights that have been generated from UK Biobank are also available through Synapse ID: 69052240.

Repository Structure

TransCisPredict/
├── README.md                    # This file
├── readme_scripts.md            # Detailed analysis pipeline documentation
├── step1_data_processing/       # Raw data processing and QC
├── step2_covariate_regression/  # Covariate adjustment
├── step3_ld_block_selection/    # Genomic region selection
├── step4_cross_validation/      # CV using BayesR, SuSiE, LASSO, and Elastic Net
├── step5_cv_evaluation/         # CV performance evaluation
├── step6_whole_sample_analysis/ # Weight Estimation using the "optimal" method
├── step7_population_prediction/ # Protein expression level prediction in the target sample
├── step8_pwas_analysis/         # Proteome-wide association analyses (PWAS)
└── utilities/                   # Commonly used functions

Quick Start

a. Setup Environment: Install required R packages (see readme_scripts.md)

b. Configure Paths: Modify placeholder paths in each script's configuration section

c. Obtain Prediction Weights: Either download pre-computed weights from Synapse ID: 69052240, or generate custom weights by running steps 1-6 of the analysis pipeline

d. Predict Protein Levels: Use the obtained weights to predict protein expression levels in your target sample (see step7 scripts)

e. Perform PWAS: Conduct proteome-wide association studies using the predicted protein levels (see step8 script)

Analysis Pipeline

The complete analysis pipeline with comprehensive documentation and usage examples is available in readme_scripts.md. Each step is self-contained with clear input/output specifications.

Applications

This framework enables:

Weight Estimation: Generate weight estimation between protein expression level and genetic vriants from the reference sample
Protein Expression Prediction: Predict heritable component of protein levels using only genotype data from the target sample
PWAS in Target Cohort: Conduct PWAS to identify associations with complex traits

Citation

Please cite the associated manuscript when using these analysis scripts and prediction weights for research purposes:

Dong R, Lamb D, Wang GT, DeWan AT, Leal SM. Leveraging cis- and trans-variants to improve plasma protein level prediction for proteome-wide association studies (2025). Submitted.

License

This repository is intended for academic research and manuscript reproducibility. Please follow Guidance on Researcher Responsibilities for UK Biobank resource.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
utilities		utilities
.gitignore		.gitignore
README.md		README.md
readme_scripts.md		readme_scripts.md
step1_process_olink_data.R		step1_process_olink_data.R
step2_covariate_regression.R		step2_covariate_regression.R
step3_LD_block_selection.R		step3_LD_block_selection.R
step4_cross_validation.R		step4_cross_validation.R
step5a_evaluate_cv_performance.R		step5a_evaluate_cv_performance.R
step5b_identify_best_method.R		step5b_identify_best_method.R
step5c_summarize_all_methods.R		step5c_summarize_all_methods.R
step6_whole_sample_analysis.R		step6_whole_sample_analysis.R
step7a_predict_npx_population.R		step7a_predict_npx_population.R
step7b_combine_npx_files.R		step7b_combine_npx_files.R
step8_pwas_analysis.R		step8_pwas_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TransCisPredict: A Comprehensive Framework for Protein Level Prediction

Repository Structure

Quick Start

Analysis Pipeline

Applications

Citation

License

About

Uh oh!

Releases

Packages

Languages

statgenetics/TransCisPredict

Folders and files

Latest commit

History

Repository files navigation

TransCisPredict: A Comprehensive Framework for Protein Level Prediction

Repository Structure

Quick Start

Analysis Pipeline

Applications

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages