Truncating Animals


  Similar to the previous example, a bash script is outlined below that illustrates how to reduce the memory requirements by truncating animals born 'n' generations ago. This is especially useful if you want to simulate a large number of generations (25+) and as outlined below truncating the data to only include recently born animals does not have a large impact on the genetic trend across generations. To truncate animals for BLUP based ebv prediction the "BLUP_OPTIONS" parameter needs to be included and the first option specifies how many generations back from the current group of selection candidates you want to go. For example, if a value of 1 is specified the selection candidate’s parents and the associated parents progeny will only be included in the analysis. The default value is the number of generations simulated (i.e. all animals are used each generation). The bash script outlined below uses the 'gblup' option to generate breeding values but either uses all the animals, 3 or 1 generations back from the selection candidates to estimate breeding values. A total of 15 replicates were generated. After a scenario is done the directory where the replicates are saved is renamed within the bash script.

##--------------------------------------------------------------------
## First generate parameter file. Put parameters to change at the very
## bottom of the file. That way you can use the 'head' linux command
## for the parameters that change and use the 'echo' linux command to
## add any new parameters.
##--------------------------------------------------------------------
## In case 5_Example.txt is still a file ##
rm -rf Example13.txt || TRUE

# Parameters that are the same #
echo '-| General |-' >> Example13.txt
echo 'SEED: 1500' >> Example13.txt
echo 'NREP: 15' >> Example13.txt
echo '−| Genome & Marker |−' >> Example13.txt
echo 'CHR: 3' >> Example13.txt
echo 'CHR_LENGTH: 150 150 150' >> Example13.txt
echo 'NUM_MARK: 4000 4000 4000' >> Example13.txt
echo 'QTL: 150 150 150' >> Example13.txt
echo '−| Population |−' >> Example13.txt
echo 'FOUNDER_Effective_Size: Ne100_Scen1' >> Example13.txt
echo 'MALE_FEMALE_FOUNDER: 50 400 random 5' >> Example13.txt
echo 'VARIANCE_A: 0.45' >> Example13.txt
echo '−| Selection |−' >> Example13.txt
echo 'GENERATIONS: 20' >> Example13.txt
echo 'INDIVIDUALS: 50 0.2 400 0.2' >> Example13.txt
echo 'PROGENY: 1' >> Example13.txt
echo 'SELECTION: ebv high' >> Example13.txt
echo 'CULLING: ebv 8' >> Example13.txt
echo 'MATING: random125 simu_anneal' >> Example13.txt
echo 'EBV_METHOD: gblup' >> Example13.txt
echo 'G_OPTIONS: VanRaden observed' >> Example13.txt

##--------------------------------------------------------------------
## First do GBLUP with all animals
##--------------------------------------------------------------------
echo 'START: sequence' >> Example13.txt
echo 'BLUP_OPTIONS: 20 pcg cholesky' >> Example13.txt
## Run GenoDiver ##
./GenoDiver Example13.txt
## rename replicate output to reps_20trunc ##
mv ./GenoDiverFiles/replicates ./GenoDiverFiles/reps_20trunc

##--------------------------------------------------------------------
## Then do GBLUP with 3 generations back
##--------------------------------------------------------------------
## Remove parameters that are changing ##
head -n 22 Example13.txt > Example13a.txt
mv ./Example13a.txt ./Example13.txt
## echo in new paramters ##
echo 'START: founder' >> Example13.txt
echo 'BLUP_OPTIONS: 3 pcg cholesky' >> Example13.txt
## Run GenoDiver ##
./GenoDiver Example13.txt
## rename replicate output to reps_3trunc ##
mv ./GenoDiverFiles/replicates ./GenoDiverFiles/reps_3trunc

##--------------------------------------------------------------------
## Then do GBLUP with 1 generation back
##--------------------------------------------------------------------
## Remove parameters that are changing ##
head -n 22 Example13.txt > Example13a.txt
mv ./Example13a.txt ./Example13.txt
## echo in new parameters
echo 'START: founder' >> Example13.txt
echo 'BLUP_OPTIONS: 1 pcg cholesky' >> Example13.txt
## Run GenoDiver ##
./GenoDiver Example13.txt
## rename replicate output to reps_1trunc ##
mv ./GenoDiverFiles/replicates ./GenoDiverFiles/reps_1trunc

Parameter File Summary
  Sequence information is generated for three chromosomes with a length of 150 Megabases. The genome simulated has a high degree of short-range LD (Ne100_Scen1). The SNP panel contains 12,000 markers (i.e. 4,000 markers per chromosome). For each chromosome, 150 randomly placed QTL and no FTL mutations were generated. The quantitative trait simulated has a narrow sense heritability of 0.45 and only additive effects are generated (i.e. no dominance). The phenotypic variance is by default set at 1.0, and therefore the residual variance is 0.55. The founder population consisted of 50 males and 400 females. For each generation, a total of 50 males and 400 females are in the population. A total of 10 and 80 (0.2 replacement rate) male and female parents, respectively, are culled and replaced by new progeny each generation. Random selecton of progeny and culling of parents was conducted for 5 generations. After 5 generations, animals with a high EBV were selected or culled each generation. The EBV are estimated using a genomic-based BLUP utilizing all the animals or truncating the dataset 3 and 1 generations back from the selection candidates. The genomic relationship matrix was constructed based on the observed frequencies for the animals utilized to calculate breeding values. Each mating pair produced one progeny. Parents that had pedigree-based relationships greater than 0.125 were avoided, and this was optimized based on the simulated annealing method.

  Similar to Example 12, the important files are saved within the renamed replicate folder within each scenario. The replicate number is appended to the file name. The bash script above renames the replicate directories to "reps_20trunc", "reps_3trunc" and "reps_1trunc" for the model that uses all animals, 3 generations back or 1 generation back, respectively.

  Utilizing the R code outlined below the following plot was generated to illustrate how to loop through each scenario and generate plots that describe the impact of truncating animals on the genetic trend and the reduction in the number of equations to solve. Also, within the R code it illustrated how you can read in the log file and grab metrics across generations.

R-Code
rm(list=ls()); gc()
library(ggplot2); library(tidyverse)
## Change
wd <- "/Users/jeremyhoward/Documents/39_GenoDiver_C++Code/WebsiteExamples/Example13/"
scen <- c("reps_20trunc","reps_3trunc","reps_1trunc") ## the directory name for each scenario ##
reps <- c(1500:1514) ## Number of replicates simulated ##
##################################################################
## Loop through and grab metric across scenarios and replicates ##
##################################################################
for(i in 1:length(scen))
{
for(j in 1:length(reps))
{
    filename <- paste(wd,scen[i],"/Summary_Statistics_DataFrame_Performance_",reps[j],sep="")
    df <- read_table2(file=filename,col_names = TRUE,col_type = "dcccccc") %>%
        mutate(tgv = as.numeric(matrix(unlist(strsplit(tgv, "[()]")), ncol = 2, byrow = TRUE)[, 1]),
                     Method = unlist(strsplit(scen[i],"_"))[2], Rep = reps[j]) %>%
        select(Generation,Method,Rep,tgv)
    if(j == 1 & i == 1){summarytable <- df}
    if(j > 1 | i > 1){summarytable <- rbind(summarytable,df);}
}
}
## generate mean and sd by generation and method
means <- aggregate(tgv ~ Generation + Method, data=summarytable,FUN=mean)
sds <- aggregate(tgv ~ Generation + Method, data=summarytable,FUN=sd)
#################################################
## Plot Genetic Trend across Different Methods ##
#################################################
plotdf <- cbind(means,sds[,3]); rm(means,sds)
names(plotdf) <- c("Generation","Method","Mean","SD")
pd <- position_dodge(0.20)

ggplot(plotdf, aes(x=as.factor(Generation), y=Mean, group=Method, colour=Method)) +
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), colour="black", width=.4, size = 0.5, position=pd) +
geom_point(size=2.0) + geom_line(size=0.50) + theme_bw() +
labs(title = "Genetic Trend \n(+/- 1 SD)", x = "Generation", y = "Mean True Breeding Value") +
theme(plot.title = element_text(size = 16,hjust = 0.5),axis.title = element_text(size = 12),
legend.position="bottom",axis.text=element_text(size=10))
##############################################################################
## Loop through and grab size of LHS within log_file.txt for each replicate ##
##############################################################################
for(i in 1:length(scen))
{
for(j in 1:length(reps))
{
      filename <- paste(wd,scen[i],"/log_file_",reps[j],sep="")
      df <- scan(file=filename,sep="\n",what=character(),quiet = TRUE); ## save each line as a string ##
      ## Grab generation ##
      tempa <- df[grep("Begin Generation",df)]; tempa <- tempa[6:length(tempa)-1]
      tempa <- as.numeric(matrix(unlist(strsplit(tempa," ")),ncol = 5,byrow=TRUE)[ ,c(4)])
      ## Grab size of Relationship Matrix and add one for intercept ##
      tempb <- df[grep("Size of Relationship Matrix",df)];
      tempb <- unlist(strsplit(tempb," ")); tempb <- tempb[which(tempb != "")]
      tempb <- matrix(tempb,ncol = 8,byrow=TRUE); tempb <- as.numeric(tempb[c(1:(nrow(tempb)-1)),c(6)]); tempb <- tempb+1;
      ## combine together and add variables
      df <- data.frame(cbind(tempa,tempb),stringsAsFactors = FALSE)
      df$Method <- unlist(strsplit(scen[i],"_"))[2] # save method
      df$Rep <- reps[j]; # save replicate
      names(df) <- c("Generation","LHS_Size","Method","Rep")
      if(j == 1 & i == 1){summarytable <- df;}
      if(j > 1 | i > 1){summarytable <- rbind(summarytable,df);}
}
}
## generate mean and sd by generation and method
summarytable$LHS_Size <- as.numeric(paste(summarytable$LHS_Size))
means <- aggregate(LHS_Size ~ Generation + Method, data=summarytable,FUN=mean)
sds <- aggregate(LHS_Size ~ Generation + Method, data=summarytable,FUN=sd)
#################################################
## Plot LHS Size For Different Truncation Methods##
#################################################
plotdf <- cbind(means,sds[,3]); rm(means,sds)
names(plotdf) <- c("Generation","Method","Mean","SD")
pd <- position_dodge(0.5)
ggplot(plotdf, aes(x=Generation, y=Mean, colour=Method)) +
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), colour="black", width=.4, size = 0.5, position=pd) +
geom_point(size=2.0) + geom_line(size=0.50) + theme_bw() +
labs(title = "Reduction in Number of Equations", x = "Generation", y = "Left Hand Side Size") +
theme(plot.title = element_text(size = 16,hjust = 0.5),axis.title = element_text(size = 12),
legend.position="bottom",axis.text=element_text(size=10))