Recall that a random variable X has (univariate) negative binomial law with parameters κ > 0, 0 < p < 1, i.e., X ∼ NB(κ, p) if its probability mass function is given by $$ P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}. $$
A random vector ${\bf X}=(X_1,\dots,X_n)'$ is said to follow the multivariate negative binomial distribution with parameters κ, p1, …, pn if its probability mass function is given by $$ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa}, $$ where, for i = 1, …, n, xi ∈ {0, 1, …}, 0 < pi < 1 such that $\sum_{i=1}^np_i<1$ and κ > 0.
We note that the function stats::rnbinom
can be used to
simulate from the univariate negative binomial distribution. The
trawl
package introduces the function
Bivariate_NBsim
which generates samples from the bivariate
negative binomial distribution. The simulation algorithm proceeds in two
steps: First, we simulate X1 from the univariate
negative binomial distribution NB(κ,p1/(1 − p2)).
Second, we simulate X2|X1 = x1
from the univariate negative binomial distribution NB(κ + x1, p2),
see for instance (Dunn 1967).
###Example
set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2
N<- 10000
#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)
#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 5.9653
kappa*p1/(1-p)
#> [1] 6
#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 18.0047
kappa*p1*(1-p2)/(1-p)^2
#> [1] 18
#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 50.9742
kappa*p2/(1-p)
#> [1] 51
#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 926.3358
kappa*p2*(1-p1)/(1-p)^2
#> [1] 918
#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#> [1] 0.7891007
(p1*p2/(p0+p1)*(p0+p2))^(1/2)
#> [1] 0.7141428
We say that a vector ${\bf
X}=(X_1,\dots,X_n)'$ follows the multivariate logarithmic series
distribution (LSD), see, e.g., (Patil and
Bildikar 1967). ${\bf X} \sim
\text{LSD}(p_1,\ldots,p_n)$, where $0<p_i<1, p:=\sum_{i=1}^np_i <1$ if
for ${\bf x}\in \mathbb{N}_0^n \setminus
\{{\bf 0} \}$, if its probability mass function is given by $$
P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots
x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.
$$ Note that each component Xi follows the
modified univariate logarithmic distribution with parameters p̃i = pi/(1 − p + pi)
and δi = ln (1 − p + pi)/ln (1 − p),
i.e., Xi ∼ ModLSD(p̃i, δi)
with $$
P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for }
x_i=0\\
(1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde
p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.
$$ Simulations from the univariate LSD can be carried out using
the function Runuran::urlogarithmic
. The trawl
package implements the functions Bivariate_LSDsim
and
Trivariate_LSDsim
to simulate from both the bivariate and
the trivariate logarithmic series distribution.
The function Bivariate_NBsim
can be used to simulate
from the bivariate logarithmic series distribution. To this end, note
that the probability mass function of a random vector ${\bf X}=(X_1,X_2)'$ following the bivariate
logarithmic series distribution with parameters 0 < p1, p2 < 1
with p := p1 + p2 < 1
is given by $$P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!}
\frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},$$ for x1, x2 = 0, 1, 2, …
such that x1 + x2 > 0.
The simulation proceeds in two steps: First, X1 is simulated from the modified logarithmic distribution with parameters p̃1 = p1/(1 − p2) and δ1 = ln (1 − p2)/ln (1 − p). Then we simulate X2 conditional on X1. We note that X2|X1 = x1 follows the logarithmic series distribution with parameter p2 when x1 = 0, and the negative binomial distribution with parameters (x1, p2) when x1 > 0.
Next we provide an example of a simulation from the bivariate LSD and
we showcase the functions ModLSD_Mean
,
ModLSD_Var
, BivLSD_Cor
and
BivLSD_Cov
which compute the mean and the variance of the
univariate modified LSD and the correlation and covariance of the
bivariate LSD, respectively.
set.seed(1)
p1<-0.15
p2<-0.3
N<-10000
#Simulate N realisations from the bivariate LSD
y<-trawl::Bivariate_LSDsim(N, p1, p2)
#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 0.4575
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#> [1] 0.45619
#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 0.8995
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#> [1] 0.91238
#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 0.3686306
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#> [1] 0.3724961
#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 0.5690567
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#> [1] 0.5776045
##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#> [1] -0.3961486
trawl::BivLSD_Cor(p1,p2)
#> [1] -0.3608673
##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
#> [1] -0.1814394
trawl::BivLSD_Cov(p1,p2)
#> [1] -0.1673877
The function Trivariate_NBsim
can be used to simulate
from the trivariate logarithmic series distribution. The simulation
proceeds in two steps: First, X1 is simulated from the
modified logarithmic distribution with parameters p̃1 = p1/(1 − p2 − p3)
and δ1 = ln (1 − p2 − p3)/ln (1 − p).
Then we simulate (X2, X3)′
conditional on X1.
We note that (X2, X3)′|X1 = x1
follows the bivariate logarithmic series distribution with paramaters
(p2, p3)
when x1 = 0, and
the bivariate negative binomial distribution with parameters (x1, p2, p3)
when x1 > 0.
set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55
N<- 10000
#Simulate N realisations from the bivariate LSD
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)
#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#> [1] 1.0152
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#> [1] 1.001425
#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#> [1] 1.6857
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#> [1] 1.669041
#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
#> [1] 3.7275
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#> [1] 3.67189
#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#> [1] 2.999069
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#> [1] 3.002847
#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#> [1] 7.255041
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#> [1] 7.228548
#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
#> [1] 30.87613
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#> [1] 30.5799
#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)
stats::cov(y[,1],y[,2])
#> [1] 3.316309
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 3.335704
stats::cor(y[,1],y[,2])
#> [1] 0.7109545
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.6041258
#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)
stats::cov(y[,1],y[,3])
#> [1] 7.287671
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 7.338549
stats::cor(y[,1],y[,3])
#> [1] 0.7573281
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.7220044
#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)
stats::cov(y[,2],y[,3])
#> [1] 12.31899
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#> [1] 12.23092
stats::cor(y[,2],y[,3])
#> [1] 0.8230829
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#> [1] 0.7969081
##References