---
title: "Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution"
author: "Almut E. D. Veraart"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: trawlpackagebib.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
## The multivariate negative binomial distribution.
 Recall that a random variable $X$ has (univariate) negative binomial law with parameters $\kappa>0, 0<p<1$, i.e., $X\sim \text{NB}(\kappa, p)$ if its probability mass function is given by 
$$
P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}.
$$

A random vector ${\bf X}=(X_1,\dots,X_n)'$ is said to follow the multivariate negative binomial distribution with parameters  $\kappa, p_1, \dots, p_n$ if its probability mass function is given by
$$
P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa},
$$
where, for $i=1,\dots,n$, $x_i\in\{0,1,\dots\}$, $0<p_i<1$ such that $\sum_{i=1}^np_i<1$ and $\kappa>0$.

We note that the function `stats::rnbinom` can be used to simulate from the univariate negative binomial distribution. The `trawl` package introduces the function `Bivariate_NBsim` which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps:
First, we simulate $X_1$ from the univariate negative binomial distribution NB($\kappa$,$p_1/(1-p_2)$).
Second, we simulate $X_2|X_1=x_1$ from the univariate negative binomial distribution NB($\kappa+x_1,p_2$), see for instance [@Dunn1967].

###Example
```{r}
set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2

N<- 10000

#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)

#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
kappa*p1/(1-p)

#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
kappa*p1*(1-p2)/(1-p)^2

#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
kappa*p2/(1-p)

#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
kappa*p2*(1-p1)/(1-p)^2

#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
(p1*p2/(p0+p1)*(p0+p2))^(1/2)


```

      

## The multivariate logarithmic series distribution

We say that a vector ${\bf X}=(X_1,\dots,X_n)'$ follows  the multivariate logarithmic series distribution (LSD), see, e.g., [@GB1967]. 
${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$, where $0<p_i<1, p:=\sum_{i=1}^np_i <1$ if for ${\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}$, if its probability mass function is given by
$$
P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.
$$
Note that each component $X_i$ follows the modified univariate logarithmic distribution with parameters $\tilde p_i = p_i/(1-p+p_i)$ and $\delta_i= \ln(1-p+p_i)/\ln(1-p)$, i.e.,
$X_i\sim\text{ModLSD}(\tilde p_i, \delta_i)$ with
$$
P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\
(1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.
$$
Simulations from the univariate LSD can be carried out using the function `Runuran::urlogarithmic`. The `trawl` package implements the functions `Bivariate_LSDsim` and `Trivariate_LSDsim` to simulate from both the bivariate and the trivariate logarithmic series distribution.

### Simulating from the bivariate  logarithmic series distribution
The function `Bivariate_NBsim` can be used to simulate from the bivariate  logarithmic series distribution. To this end, note that 
the probability mass function of a random vector ${\bf X}=(X_1,X_2)'$
following the bivariate logarithmic series distribution with parameters
$0<p_1, p_2<1$ with $p:=p_1+p_2<1$ is given by
$$P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!}
\frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},$$
for $x_1,x_2=0,1,2,\dots$ such
that $x_1+x_2>0$. 

The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2)$ and $\delta_1=\ln(1-p_2)/\ln(1-p)$. Then
we simulate $X_2$ conditional on $X_1$. We note that $X_2|X_1=x_1$ follows the  logarithmic series distribution with parameter $p_2$ when $x_1=0$, and the  negative binomial
distribution with parameters $(x_1,p_2)$ when  $x_1>0$.
  
  
 

#### Example
Next we provide an example of a simulation from the bivariate LSD and we showcase the functions
`ModLSD_Mean`,  `ModLSD_Var`,  `BivLSD_Cor` and `BivLSD_Cov` which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.

```{r}
set.seed(1)
p1<-0.15
p2<-0.3

N<-10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Bivariate_LSDsim(N, p1, p2)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))

##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
trawl::BivLSD_Cor(p1,p2)

##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
trawl::BivLSD_Cov(p1,p2)

```

### Simulating from the trivariate  logarithmic series distribution
The function `Trivariate_NBsim` can be used to simulate from the trivariate  logarithmic series distribution. 
The simulation proceeds in two steps: First, $X_1$ is simulated from the
  modified logarithmic distribution with parameters $\tilde
  p_1=p_1/(1-p_2-p_3)$ and $\delta_1=\ln(1-p_2-p_3)/\ln(1-p)$. Then we
  simulate $(X_2,X_3)'$ conditional on $X_1$. We note that
  $(X_2,X_3)'|X_1=x_1$ follows the bivariate logarithmic series
  distribution with paramaters $(p_2,p_3)$ when $x_1=0$, and the
  bivariate negative binomial distribution with parameters $(x_1,p_2,p_3)$
  when  $x_1>0$.

#### Example
```{r}
set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55

N<- 10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))

#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))

#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))

#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)

stats::cov(y[,1],y[,2])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)

stats::cor(y[,1],y[,2])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)

#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)

stats::cov(y[,1],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)

stats::cor(y[,1],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)

#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)

stats::cov(y[,2],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
stats::cor(y[,2],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
```

##References