Title: | Simulate SNP Matrix, Phenotype and Genotypic Effects |
---|---|
Description: | Simulate genotypes in SNP (single nucleotide polymorphisms) Matrix as random numbers from an uniform distribution, for diploid organisms (coded by 0, 1, 2), Sikorska et al., (2013) <doi:10.1186/1471-2105-14-166>, or half-sib/full-sib SNP matrix from real or simulated parents SNP data, assuming mendelian segregation. Simulate phenotypic traits for real or simulated SNP data, controlled by a specific number of quantitative trait loci and their effects, sampled from a Normal or an Uniform distributions, assuming a pure additive model. This is useful for testing association and genomic prediction models or for educational purposes. |
Authors: | Martin Nahuel Garcia [aut, cre] |
Maintainer: | Martin Nahuel Garcia <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-02-23 04:31:20 UTC |
Source: | https://github.com/mngar/simulmgf |
Simulate SNP matrix coded 0, 1 and 2; with random genotypes.
simGeno(Nind, Nmarkers)
simGeno(Nind, Nmarkers)
Nind |
number of individuals to simulate. |
Nmarkers |
Nmarkers number of SNP markers to generate. |
a matrix of dimensions Nind x Nmarkers.
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
#simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" dim(simG);simG[1:5,1:5] #[1] 100 1000 #[,1] [,2] [,3] [,4] [,5] #[1,] 0 1 0 2 2 #[2,] 2 0 2 0 0 #[3,] 1 1 1 2 2 #[4,] 2 2 1 2 1 #[5,] 2 1 1 1 1
#simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" dim(simG);simG[1:5,1:5] #[1] 100 1000 #[,1] [,2] [,3] [,4] [,5] #[1,] 0 1 0 2 2 #[2,] 2 0 2 0 0 #[3,] 1 1 1 2 2 #[4,] 2 2 1 2 1 #[5,] 2 1 1 1 1
Simulate a phenotype from a genotype matrix with QTLs with random effects sampled from a Normal distribution.
simPheno(x, Nqtl, Esigma, Pmean, Perror)
simPheno(x, Nqtl, Esigma, Pmean, Perror)
x |
SNP matrix coded like 0 homozygous; 1 heterozygous; 2 homozygous |
Nqtl |
number of QTLs to simulate |
Esigma |
standard deviation of effects with distribution N~(0,Esigma^2) |
Pmean |
phenotype mean |
Perror |
standard deviation of error (portion of phenotype not explained by genomic information) |
An object of class list containing the trait, the markers associated and their effects.
pheno |
vector with the trait values simulated. |
QTN |
column in the SNP matrix with the SNP associated. |
Meffects |
effects of the associated SNPs. |
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
set.seed(123) simGeno(100, 1000) #' #[1] "simG was generated" simPheno(simG, 50, .8, 12, .5) #[1] "simP was generated" str(simP) #List of 3 #$ pheno : num [1:100, 1] 24 20.5 15.6 13.6 18.5 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2396 -0.138 0.906 0.0186 1.0687 ...
set.seed(123) simGeno(100, 1000) #' #[1] "simG was generated" simPheno(simG, 50, .8, 12, .5) #[1] "simP was generated" str(simP) #List of 3 #$ pheno : num [1:100, 1] 24 20.5 15.6 13.6 18.5 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2396 -0.138 0.906 0.0186 1.0687 ...
Simulate full sib progeny genotypes from the genotype of the parents (matrixes with the same dimensions). Pair of parents mating will be in the order of the matrixes. We assume that these are diploid organisms.
simulFS(x, y, Nprogeny)
simulFS(x, y, Nprogeny)
x |
genotype matrix of a set of moms |
y |
genotype matrix of a set of dads |
Nprogeny |
Nprogeny number of progeny's genotypes to generate from each pair of parents |
a matrix of dimensions (nrow(x)*Nprogeny) x ncol(x)
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
#simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" #simulate the genotype of 5 FS from 3 pairs of parents simulFS(simG[1:3,],simG[4:6,],5) #[1] "simulatedFS was generated" dim(simulatedFS) #[1] 15 1000 # The first 5 individuals are progeny of mom 1 and dad 1, the second 5 individuals # are progeny of mom 2 and dad 2, and so on.
#simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" #simulate the genotype of 5 FS from 3 pairs of parents simulFS(simG[1:3,],simG[4:6,],5) #[1] "simulatedFS was generated" dim(simulatedFS) #[1] 15 1000 # The first 5 individuals are progeny of mom 1 and dad 1, the second 5 individuals # are progeny of mom 2 and dad 2, and so on.
Simulate half sib progeny from one genotyped parent assuming a random genotype for the other parental. We assume that these are diploid organisms.
simulHS(x, Nprogeny)
simulHS(x, Nprogeny)
x |
genotype matrix of a set of moms |
Nprogeny |
number of progeny's genotypes to simulate for each mom |
The function assume: a diploid organism; mendelian segregation of alleles; and independent segregation.
a matrix of dimensions (nrow(x)*Nprogeny) x ncol(x)
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
#' #simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" #simulate the genotype of 3 sets 5 HS (one set by mom) simulHS(simG[1:3,],5) #[1] "simulatedHS was generated" dim(simulatedHS) #[1] 15 1000
#' #simulate 100 individuals and 1000 SNPs set.seed(123) simGeno(100, 1000) #[1] "simG was generated" #simulate the genotype of 3 sets 5 HS (one set by mom) simulHS(simG[1:3,],5) #[1] "simulatedHS was generated" dim(simulatedHS) #[1] 15 1000
This function simulate a SNP matrix (coded as 0, 1, 2) and traits with a selected number of QTLs and their effects that will be sampled from a Normal distribution.
simulN(Nind, Nmarkers, Nqtl, Esigma, Pmean, Perror)
simulN(Nind, Nmarkers, Nqtl, Esigma, Pmean, Perror)
Nind |
number of individuals to simulate. |
Nmarkers |
number of SNP markers to generate. |
Nqtl |
number of QTLs controlling the trait. |
Esigma |
standard deviation of effects with distribution N~(0,Esigma^2). |
Pmean |
phenotype mean. |
Perror |
standard deviation of error (portion of phenotype not explained by genomic information). |
Genotypic data is simulated as the round value sampled from an uniform distribution with interval (-.5,2.5). Phenotypic data are obtained as a linear function defined by:
An object of class list containing the SNP matrix, the trait, the markers associated and their effects.
geno |
SNP matrix generated. |
pheno |
vector with the trait values simulated. |
QTN |
column in the SNP matrix with the SNP associated. |
Meffects |
effects of the associated SNPs. |
The genotype is simulated in the same way of simGeno function. The trait, QTLs and their effects are simulated in the same way of simPheno function.
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
simGeno, simPheno, simulU
set.seed(123) simulN(100, 1000, 50, .9, 12, .5) #[1] "nsimout was generated" str(nsimout) #List of 4 #$ geno : num [1:100, 1:1000] 0 2 1 2 2 0 1 2 1 1 ... #$ pheno : num [1:100, 1] 25.4 21.6 16 13.8 19.4 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2696 -0.1552 1.0192 0.0209 1.2023 ...
set.seed(123) simulN(100, 1000, 50, .9, 12, .5) #[1] "nsimout was generated" str(nsimout) #List of 4 #$ geno : num [1:100, 1:1000] 0 2 1 2 2 0 1 2 1 1 ... #$ pheno : num [1:100, 1] 25.4 21.6 16 13.8 19.4 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2696 -0.1552 1.0192 0.0209 1.2023 ...
This function simulate a SNP matrix (coded as 0, 1, 2) and traits with a selected number of QTLs and their effects that will be sampled from a Uniform distribution.
simulU(Nind, Nmarkers, Nqtl, Pmean, Perror)
simulU(Nind, Nmarkers, Nqtl, Pmean, Perror)
Nind |
number of individuals to simulate. |
Nmarkers |
number of SNP markers to generate. |
Nqtl |
number of QTLs controlling the trait. |
Pmean |
phenotype mean. |
Perror |
standard deviation of error (portion of phenotype not explained by genomic information). |
An object of class list containing the SNP matrix, the trait, the markers associated and their effects.
geno |
SNP matrix generated. |
pheno |
vector with the trait values simulated. |
QTN |
column in the SNP matrix with the SNP associated. |
Meffects |
effects of the associated SNPs. |
Martin Nahuel Garcia <orcid:0000-0001-5760-986X>
Wu, R., Ma, C., & Casella, G. (2007). Statistical genetics of quantitative traits: linkage, maps and QTL. Springer Science & Business Media.
simGeno, simulN
set.seed(123) simulU(100, 1000, 50, 12, .5) #[1] "usimout was generated" str(usimout) #List of 4 #$ geno : num [1:100, 1:1000] 0 2 1 2 2 0 1 2 1 1 ... #$ pheno : num [1:100, 1] 10.3 14.7 11.8 10.2 13.1 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2355 0.0158 -0.1369 -0.1246 0.7426 ...
set.seed(123) simulU(100, 1000, 50, 12, .5) #[1] "usimout was generated" str(usimout) #List of 4 #$ geno : num [1:100, 1:1000] 0 2 1 2 2 0 1 2 1 1 ... #$ pheno : num [1:100, 1] 10.3 14.7 11.8 10.2 13.1 ... #$ QTN : int [1:50] 568 474 529 349 45 732 416 51 413 514 ... #$ Meffects: num [1:50] 0.2355 0.0158 -0.1369 -0.1246 0.7426 ...