Skat package

7/5/2023

A majority of group-based tests down-weigh common variants and up-weigh rare variants, the approach being potentially error-prone, as the relative effect of common and rare variations on the burden of a disease is unknown prior to testing. Table 1 provides a snap shot of few of the many popular existing software, tools, webservers and databases that allow genetic association, functional and annotation analysis at variant, and gene levels. Hence, many methods have been proposed, which conduct group-wise association tests taking into account the disease burden of both rare and common variants. There exists precedence that the disease risk variants at a given locus might include novel, rare, low-frequency, and common genetic variants. Utilizing sequence data can potentially better evaluate the genetic burden of low frequency and rare (MAF < 0.05) variants on disease risk. Genome-wide association studies (GWASs) mostly focus on variant-by-variant testing for association of common variants (minor allele frequency (MAF) ≥ 0.05). Hence, the role of these variants with respect to a specific disease etiology needs to be identified and verified before the corresponding genes of interest can be further interrogated. However, about 40% of all known variants are of uncertain significance, therefore, challenging their clinical relevance. With technologies getting cheaper, high-throughput microarray and next-generation sequencing (NGS) data, including whole-genome and whole exome constituting of hundreds of thousands to millions of variants, are readily available. This enables the design of treatment regimens with the correct drug at the correct dose for the correct individual, which would ideally be prescribed. It is largely dependent on genomics and allows characterization of the molecular differences between individuals towards disease risk prediction. Individual-level disease risk stratification is the foundation of personalized medicine. Furthermore, it also eliminates the requirement for high computational resources and bioinformatics expertise. CLIN_SKAT offers users a one-stop R package that identifies disease risk variants with improved power via a series of tailor-made procedures that allows dimension reduction, by retaining functionally relevant variants, and incorporating ethnicity based priors. Therefore, retaining a subset of variants that are biologically meaningful seems to be a more effective strategy for identifying explainable associations while reducing the degrees of freedom. Statistical association analysis is often underpowered due to low sample sizes and high numbers of variants to be tested, limiting detection of causal ones.

All outputs (tabular and graphical) can be downloaded in simple, publishable formats. It can be freely downloaded from, installed through devtools::install_github("ShihChingYu/CLIN_SKAT", force=T) and executed by loading the package into R using library(CLIN_SKAT). CLIN_SKAT is available on Windows/Linux/MacOS and is operative for R version 4.0.4 or later. Moreover, it offers several plot functions that can be availed towards obtaining visualizations for interpretation of the analyses results. CLIN_SKAT introduces improvements by adding certain pre-analysis steps and customizable features to make the SKAT results clinically more meaningful. This study describes CLIN_SKAT, an R package, that provides users with an easily implemented analysis pipeline with the goal of (i) extracting clinically relevant variants (both rare and common), followed by (ii) gene-based association analysis by grouping the selected variants.ĬLIN_SKAT offers four simple functions that can be used to obtain clinically relevant variants, map them to genes or gene sets, calculate weights from global healthy populations and conduct weighted case–control analysis. However, analysis strategies struggle to keep up with the huge amount of data at disposal therefore creating a bottleneck. Rare variants are important keys towards explaining the heritability for complex diseases that remains to be explained by common variants due to their low effect sizes. Availability of next generation sequencing data, allows low-frequency and rare variants to be studied through strategies other than the commonly used genome-wide association studies (GWAS).

0 Comments

Skat package

Leave a Reply.

Author

Archives

Categories