---
title: "Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution"
author: "Almut E. D. Veraart"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: trawlpackagebib.bib
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## The multivariate negative binomial distribution.
Recall that a random variable $X$ has (univariate) negative binomial law with parameters $\kappa>0, 0
0$.
We note that the function `stats::rnbinom` can be used to simulate from the univariate negative binomial distribution. The `trawl` package introduces the function `Bivariate_NBsim` which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps:
First, we simulate $X_1$ from the univariate negative binomial distribution NB($\kappa$,$p_1/(1-p_2)$).
Second, we simulate $X_2|X_1=x_1$ from the univariate negative binomial distribution NB($\kappa+x_1,p_2$), see for instance [@Dunn1967].
###Example
```{r}
set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2
N<- 10000
#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)
#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
kappa*p1/(1-p)
#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
kappa*p1*(1-p2)/(1-p)^2
#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
kappa*p2/(1-p)
#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
kappa*p2*(1-p1)/(1-p)^2
#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
(p1*p2/(p0+p1)*(p0+p2))^(1/2)
```
## The multivariate logarithmic series distribution
We say that a vector ${\bf X}=(X_1,\dots,X_n)'$ follows the multivariate logarithmic series distribution (LSD), see, e.g., [@GB1967].
${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$, where $00$.
The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2)$ and $\delta_1=\ln(1-p_2)/\ln(1-p)$. Then
we simulate $X_2$ conditional on $X_1$. We note that $X_2|X_1=x_1$ follows the logarithmic series distribution with parameter $p_2$ when $x_1=0$, and the negative binomial
distribution with parameters $(x_1,p_2)$ when $x_1>0$.
#### Example
Next we provide an example of a simulation from the bivariate LSD and we showcase the functions
`ModLSD_Mean`, `ModLSD_Var`, `BivLSD_Cor` and `BivLSD_Cov` which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.
```{r}
set.seed(1)
p1<-0.15
p2<-0.3
N<-10000
#Simulate N realisations from the bivariate LSD
y<-trawl::Bivariate_LSDsim(N, p1, p2)
#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
trawl::BivLSD_Cor(p1,p2)
##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
trawl::BivLSD_Cov(p1,p2)
```
### Simulating from the trivariate logarithmic series distribution
The function `Trivariate_NBsim` can be used to simulate from the trivariate logarithmic series distribution.
The simulation proceeds in two steps: First, $X_1$ is simulated from the
modified logarithmic distribution with parameters $\tilde
p_1=p_1/(1-p_2-p_3)$ and $\delta_1=\ln(1-p_2-p_3)/\ln(1-p)$. Then we
simulate $(X_2,X_3)'$ conditional on $X_1$. We note that
$(X_2,X_3)'|X_1=x_1$ follows the bivariate logarithmic series
distribution with paramaters $(p_2,p_3)$ when $x_1=0$, and the
bivariate negative binomial distribution with parameters $(x_1,p_2,p_3)$
when $x_1>0$.
#### Example
```{r}
set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55
N<- 10000
#Simulate N realisations from the bivariate LSD
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)
#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)
stats::cov(y[,1],y[,2])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
stats::cor(y[,1],y[,2])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)
stats::cov(y[,1],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
stats::cor(y[,1],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)
stats::cov(y[,2],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
stats::cor(y[,2],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
```
##References