--- title: "Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution" author: "Almut E. D. Veraart" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: trawlpackagebib.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## The multivariate negative binomial distribution. Recall that a random variable $X$ has (univariate) negative binomial law with parameters $\kappa>0, 00$. We note that the function `stats::rnbinom` can be used to simulate from the univariate negative binomial distribution. The `trawl` package introduces the function `Bivariate_NBsim` which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate $X_1$ from the univariate negative binomial distribution NB($\kappa$,$p_1/(1-p_2)$). Second, we simulate $X_2|X_1=x_1$ from the univariate negative binomial distribution NB($\kappa+x_1,p_2$), see for instance [@Dunn1967]. ###Example ```{r} set.seed(1) kappa<- 3 p1 <- 0.1 p2 <- 0.85 p <- p1+p2 p0 <-1-p1-p2 N<- 10000 #Simulate from the bivariate negative binomial distribution y <- trawl::Bivariate_NBsim(N,kappa,p1,p2) #Compare the empirical and theoretical mean of the first component base::mean(y[,1]) kappa*p1/(1-p) #Compare the empirical and theoretical variance of the first component stats::var(y[,1]) kappa*p1*(1-p2)/(1-p)^2 #Compare the empirical and theoretical mean of the second component base::mean(y[,2]) kappa*p2/(1-p) #Compare the empirical and theoretical variance of the second component stats::var(y[,2]) kappa*p2*(1-p1)/(1-p)^2 #Compare the empirical and theoretical correlation between the two components stats::cor(y[,1],y[,2]) (p1*p2/(p0+p1)*(p0+p2))^(1/2) ``` ## The multivariate logarithmic series distribution We say that a vector ${\bf X}=(X_1,\dots,X_n)'$ follows the multivariate logarithmic series distribution (LSD), see, e.g., [@GB1967]. ${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$, where $00$. The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2)$ and $\delta_1=\ln(1-p_2)/\ln(1-p)$. Then we simulate $X_2$ conditional on $X_1$. We note that $X_2|X_1=x_1$ follows the logarithmic series distribution with parameter $p_2$ when $x_1=0$, and the negative binomial distribution with parameters $(x_1,p_2)$ when $x_1>0$. #### Example Next we provide an example of a simulation from the bivariate LSD and we showcase the functions `ModLSD_Mean`, `ModLSD_Var`, `BivLSD_Cor` and `BivLSD_Cov` which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively. ```{r} set.seed(1) p1<-0.15 p2<-0.3 N<-10000 #Simulate N realisations from the bivariate LSD y<-trawl::Bivariate_LSDsim(N, p1, p2) #Compute the empirical and theoretical mean of the first component base::mean(y[,1]) trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2)) #Compute the empirical and theoretical mean of the second component base::mean(y[,2]) trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1)) #Compute the empirical and theoretical variance of the first component stats::var(y[,1]) trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2)) #Compute the empirical and theoretical variance of the second component stats::var(y[,2]) trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1)) ##Compute the empirical and theoretical correlation between the two components stats::cor(y[,1],y[,2]) trawl::BivLSD_Cor(p1,p2) ##Compute the empirical and theoretical covariance between the two components stats::cov(y[,1],y[,2]) trawl::BivLSD_Cov(p1,p2) ``` ### Simulating from the trivariate logarithmic series distribution The function `Trivariate_NBsim` can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2-p_3)$ and $\delta_1=\ln(1-p_2-p_3)/\ln(1-p)$. Then we simulate $(X_2,X_3)'$ conditional on $X_1$. We note that $(X_2,X_3)'|X_1=x_1$ follows the bivariate logarithmic series distribution with paramaters $(p_2,p_3)$ when $x_1=0$, and the bivariate negative binomial distribution with parameters $(x_1,p_2,p_3)$ when $x_1>0$. #### Example ```{r} set.seed(1) p1<-0.15 p2<-0.25 p3<-0.55 N<- 10000 #Simulate N realisations from the bivariate LSD y<-trawl::Trivariate_LSDsim(N, p1, p2, p3) #Compute the empirical and theoretical mean of the first component base::mean(y[,1]) trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3)) #Compute the empirical and theoretical mean of the second component base::mean(y[,2]) trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3)) #Compute the empirical and theoretical mean of the third component base::mean(y[,3]) trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2)) #Compute the empirical and theoretical variance of the first component stats::var(y[,1]) trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3)) #Compute the empirical and theoretical variance of the second component stats::var(y[,2]) trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3)) #Compute the empirical and theoretical variance of the third component stats::var(y[,3]) trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2)) #Computing the bivariate covariances and correlations #Cor(X1,X2): delta <- base::log(1-p3)/base::log(1-p1-p2-p3) hatp1 <-p1/(1-p3) hatp2<-p2/(1-p3) stats::cov(y[,1],y[,2]) trawl::BivModLSD_Cov(delta,hatp1,hatp2) stats::cor(y[,1],y[,2]) trawl::BivModLSD_Cor(delta,hatp1,hatp2) #Cor(X1,X3): delta <- log(1-p2)/log(1-p1-p2-p3) hatp1 <-p1/(1-p2) hatp2<-p3/(1-p2) stats::cov(y[,1],y[,3]) trawl::BivModLSD_Cov(delta,hatp1,hatp2) stats::cor(y[,1],y[,3]) trawl::BivModLSD_Cor(delta,hatp1,hatp2) #Cor(X2,X3): delta <- log(1-p1)/log(1-p1-p2-p3) hatp1 <-p2/(1-p1) hatp2<-p3/(1-p1) stats::cov(y[,2],y[,3]) trawl::BivModLSD_Cov(delta,hatp1,hatp2) stats::cor(y[,2],y[,3]) trawl::BivModLSD_Cor(delta,hatp1,hatp2) ``` ##References