Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution

The multivariate negative binomial distribution.

Recall that a random variable X has (univariate) negative binomial law with parameters κ > 0, 0 < p < 1, i.e., X ∼ NB(κ, p) if its probability mass function is given by $$ P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}. $$

A random vector ${\bf X}=(X_1,\dots,X_n)'$ is said to follow the multivariate negative binomial distribution with parameters κ, p1, …, pn if its probability mass function is given by $$ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa}, $$ where, for i = 1, …, n, xi ∈ {0, 1, …}, 0 < pi < 1 such that $\sum_{i=1}^np_i<1$ and κ > 0.

We note that the function stats::rnbinom can be used to simulate from the univariate negative binomial distribution. The trawl package introduces the function Bivariate_NBsim which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate X1 from the univariate negative binomial distribution NB(κ,p1/(1 − p2)). Second, we simulate X2|X1 = x1 from the univariate negative binomial distribution NB(κ + x1, p2), see for instance (Dunn 1967).

###Example

set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2

N<- 10000

#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)

#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 5.9653
kappa*p1/(1-p)
#> [1] 6

#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 18.0047
kappa*p1*(1-p2)/(1-p)^2
#> [1] 18

#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 50.9742
kappa*p2/(1-p)
#> [1] 51

#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 926.3358
kappa*p2*(1-p1)/(1-p)^2
#> [1] 918

#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#> [1] 0.7891007
(p1*p2/(p0+p1)*(p0+p2))^(1/2)
#> [1] 0.7141428

The multivariate logarithmic series distribution

We say that a vector ${\bf X}=(X_1,\dots,X_n)'$ follows the multivariate logarithmic series distribution (LSD), see, e.g., (Patil and Bildikar 1967). ${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$, where $0<p_i<1, p:=\sum_{i=1}^np_i <1$ if for ${\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}$, if its probability mass function is given by $$ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}. $$ Note that each component Xi follows the modified univariate logarithmic distribution with parameters i = pi/(1 − p + pi) and δi = ln (1 − p + pi)/ln (1 − p), i.e., Xi ∼ ModLSD(i, δi) with $$ P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\ (1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right. $$ Simulations from the univariate LSD can be carried out using the function Runuran::urlogarithmic. The trawl package implements the functions Bivariate_LSDsim and Trivariate_LSDsim to simulate from both the bivariate and the trivariate logarithmic series distribution.

Simulating from the bivariate logarithmic series distribution

The function Bivariate_NBsim can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector ${\bf X}=(X_1,X_2)'$ following the bivariate logarithmic series distribution with parameters 0 < p1, p2 < 1 with p := p1 + p2 < 1 is given by $$P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!} \frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},$$ for x1, x2 = 0, 1, 2, … such that x1 + x2 > 0.

The simulation proceeds in two steps: First, X1 is simulated from the modified logarithmic distribution with parameters 1 = p1/(1 − p2) and δ1 = ln (1 − p2)/ln (1 − p). Then we simulate X2 conditional on X1. We note that X2|X1 = x1 follows the logarithmic series distribution with parameter p2 when x1 = 0, and the negative binomial distribution with parameters (x1, p2) when x1 > 0.

Example

Next we provide an example of a simulation from the bivariate LSD and we showcase the functions ModLSD_Mean, ModLSD_Var, BivLSD_Cor and BivLSD_Cov which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.

set.seed(1)
p1<-0.15
p2<-0.3

N<-10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Bivariate_LSDsim(N, p1, p2)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 0.4575
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#> [1] 0.45619

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 0.8995
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#> [1] 0.91238

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 0.3686306
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#> [1] 0.3724961

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 0.5690567
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#> [1] 0.5776045

##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#> [1] -0.3961486
trawl::BivLSD_Cor(p1,p2)
#> [1] -0.3608673

##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
#> [1] -0.1814394
trawl::BivLSD_Cov(p1,p2)
#> [1] -0.1673877

Simulating from the trivariate logarithmic series distribution

The function Trivariate_NBsim can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, X1 is simulated from the modified logarithmic distribution with parameters 1 = p1/(1 − p2 − p3) and δ1 = ln (1 − p2 − p3)/ln (1 − p). Then we simulate (X2, X3)′ conditional on X1. We note that (X2, X3)′|X1 = x1 follows the bivariate logarithmic series distribution with paramaters (p2, p3) when x1 = 0, and the bivariate negative binomial distribution with parameters (x1, p2, p3) when x1 > 0.

Example

set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55

N<- 10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 1.0152
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#> [1] 1.001425

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 1.6857
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#> [1] 1.669041

#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
#> [1] 3.7275
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#> [1] 3.67189

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 2.999069
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#> [1] 3.002847

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 7.255041
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#> [1] 7.228548

#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
#> [1] 30.87613
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#> [1] 30.5799

#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)

stats::cov(y[,1],y[,2])
#> [1] 3.316309
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 3.335704

stats::cor(y[,1],y[,2])
#> [1] 0.7109545
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.6041258

#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)

stats::cov(y[,1],y[,3])
#> [1] 7.287671
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 7.338549

stats::cor(y[,1],y[,3])
#> [1] 0.7573281
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.7220044

#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)

stats::cov(y[,2],y[,3])
#> [1] 12.31899
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 12.23092
stats::cor(y[,2],y[,3])
#> [1] 0.8230829
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.7969081

##References

Dunn, James E. 1967. “Characterization of the Bivariate Negative Binomial Distribution.” Journal of the Arkansas Academy of Science 21.
Patil, Ganapati P., and Sheela Bildikar. 1967. “Multivariate Logarithmic Series Distribution as a Probability Model in Population and Community Ecology and Some of Its Statistical Properties.” Journal of the American Statistical Association 62 (318): 655–74.