In [127]:

```
# simulate a fair coin - 2 dimensions
import numpy as np
reps = 20 # repetitions (i.e. coin flips)
res = []
for r in range(reps):
res.append(np.random.randint(2))
print('The results of %d coin flips : ' %(len(res)) + str(res))
heads = sum(res) # ones are heads
tails = reps - sum(res)
print('Heads: %0.f' %heads)
print('Tails: %0.f' %tails)
sample_mean = (sum(res)/float(reps))
print('The mean of the results is %0.2f' %sample_mean)
```

The null-hypothesis $H_0$ is an assumption about a population, which is put to the test. Often, it is an assumption in opposition to the actual working hypothesis (the argumentation of a devil's advocate).

Here, the null-hypothesis is: the coin is fair and thus the probability to fall on head is exactly 0.5.

The significance level (often denoted by the Greek letter $\alpha$) is a probability threshold that determines when you reject the null hypothesis. Often $\alpha = 0.05$ (5%) is used to delimit significance.

The $p$-value is the probability of getting a certain
result from an experiment if the null hypothesis is true.
For example getting a result of 14 heads and 6 tails from 20
coin flips if the coin is fair yields a (two-sided)
$p$-value of 0.115 (An alternative would be to consider a
one-sided $p$-value, meaning that only heads (not heads *and*
tails) are considered. In this case the $p$-value would be
0.058 (= 0.115/2)).

The $p$-value depends on the number of repetitions (reps) in the experiment.

If the $p$-value is lower than the significance level, the null hypothesis should be rejected in favor of an alternative hypothesis.

In [4]:

```
# p-value of the fair coin result
from scipy.special import binom
# throwing a fair coin yields a binomial distribution (it is a Bernoulli-process)
# the one-sided p-value thus is calculated with the sum of binomial coefficients
# multiplied by the expected value to the power of the number of trials. two-sided its just doubled (head + tail)
sl = 0.05 # 5% significance level
if tails <= heads:
x = heads
else:
x = tails
P = []
for k in range(x, reps + 1):
P.append(binom(reps, k))
p = 2 * sum(P) * 1 / (2**reps)
print('The two-sided p-value of getting a result of %d ones in %d coin flips is %0.3f' %(tails, reps, p))
# alternatively use the binom_test function from the scipy stats modul
#import scipy.stats as stats
#bintest = stats.binom_test(tails, reps, 0.5) # (two-sided test)
#print bintest
if p < sl:
print('%0.3f is smaller than the significance level of %0.3f. The null hypothesis should be rejected.' %(p, sl))
else:
print('%0.3f is greater than the significance level of %0.3f. Therefore the null hypothesis will not to be rejected.' %(p, sl))
```

In [5]:

```
# plot a distribution of the results and indicate the significance level
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
# show plots in notebook
% matplotlib inline
# prepare plots
fig = plt.figure(figsize=(12,8))
fig.add_subplot(1,1,1)
sl = 0.05 # 5% significance level
reps = 20
runs = 1000
Ru = range(runs)
runres = []
P = []
for ru in Ru:
res = []
for r in range(reps):
res.append(np.random.randint(2))
# binomial test
p = stats.binom_test(sum(res), reps, 0.5) # (two-sided test)
if p < sl:
P.append(0)
else:
P.append(1)
runres.append(np.mean(res))
c = 0 # counter
for n,x in enumerate(P):
if x == 0:
plt.plot(n, runres[n], 'ro')
c += 1
else:
plt.plot(n, runres[n], 'bo')
print('%d samples out of %d trials are outside of the significance level' %(c, runs))
plt.plot(Ru, [0.5]*runs, 'r--', linewidth=3, label = '0.5-line')
plt.title('Results of fair coin flips, red dots are outside of %0.f%% significance level' %(sl * 100))
plt.ylim(0, 1)
plt.grid()
plt.legend(loc = 'lower right')
```

Out[5]:

Compare observed and simulated data. How well is simulated data fitting observed data? How good is your model?

In [116]:

```
# generate some data (from a model, but lets pretend this is observed empirical data)
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
# show plots in notebook
% matplotlib inline
# prepare drawing canvas and plot
fig = plt.figure(figsize=(15,5))
fig.add_subplot(1,2,1)
# growth rates
g = 1.2 # observed
sg = 1.5 # simulated
T = range(100)
dt = 0.1
K = 10.
sigma = 0.1 # strength of noise
# define list to store results, include initial value
OD = [1.]
SD = [1.]
# loop over time range
for t in T:
odiff = (g * OD[t] * (1 - OD[t]/K)) * dt
sdiff = (sg * SD[t] * (1 - SD[t]/K)) * dt
noise = sigma * np.random.normal(0, 1)
OD.append(OD[t] + odiff + noise)
SD.append(SD[t] + sdiff)
plt.plot(OD, 'bo', label = 'observed data')
plt.plot(SD, 'r.', label = 'simulated data')
plt.title('Observed and simulated population growth in time')
plt.xlabel('time')
plt.grid()
plt.ylim(1, 11)
plt.legend(loc = 'best')
fig.add_subplot(1,2,2)
plt.plot(OD, SD, 'go')
plt.plot(range(12), range(12), 'r-')
plt.xlim(1, 11)
plt.ylim(1, 11)
plt.xlabel('observed data')
plt.ylabel('simulated data')
plt.title('Observed versus simulated population growth')
plt.grid()
```

i.e. the square root of the average square of the difference between $O$ = observed and $S$ = simulated data.

RMSD expresses the total difference in the same units as used for the measurment

In [117]:

```
d = []
for n, od in enumerate(OD):
d.append((od - SD[n])**2)
RMSD = np.sqrt(sum(d)/float(len(OD)))
print('The root mean squared deviation of observed and simulated data is %0.4f ' %RMSD)
```

In [118]:

```
d = []
for n, od in enumerate(OD):
d.append(np.sqrt((od - SD[n])**2))
AD = sum(d)/float(len(OD))
print('The average difference of observed and simulated data is %0.4f ' %AD)
```

where $\overline{O}$ is the average of observed data

The RMSE expresses the total difference as a percentage. An RMSE of 10% may be considered to be acceptable. Any RMSE of less than 10% thus would indicate a sufficiently accurate model.

In [119]:

```
d = []
for n, od in enumerate(OD):
d.append((od - SD[n])**2)
RMSE = (100/(np.mean(OD))) * np.sqrt(sum(d)/float(len(OD)))
print('The root mean squared error of observed and simulated data is %0.2f%%' %RMSE)
```

The 95% confidence interval is the range around the mean value of repeatedly observed data within which 95% of the simulated data would be expected to be found, if the model is sufficiently accurate.

In [120]:

```
# generate d (= 3) replicates of observed data and take the mean from them
# prepare drawing canvas and plot
fig = plt.figure(figsize=(8,5))
fig.add_subplot(1,1,1)
Col = ['wo', 'co', 'yo']
sigma = 0.2
m = 3
MD = []
for c, n in enumerate(range(m)):
nOD = [1.] # newly observed data
# loop over time range
for t in T:
odiff = (g * nOD[t] * (1 - nOD[t]/K)) * dt
noise = sigma * np.random.normal(0, 1)
nOD.append(nOD[t] + odiff + noise)
plt.plot(nOD, Col[c], label = 'observed data %0i' %(c+1))
MD.append(nOD)
MOD = [np.mean(x) for x in zip(MD[0], MD[1], MD[2])]
plt.plot(MOD, 'k--', linewidth = 3, label = 'mean of all observed data')
plt.plot(SD, 'r--', linewidth = 3, label = 'simulated data')
plt.title('Data of three independent observations, and their mean')
plt.grid()
plt.legend(loc = 'best')
```

Out[120]:

The 95%-confidence interval can be obtained from the standard error multiplied by Student's $t$ at 95% probability.

The **Student's $t$** is a continuous probability
distribution that arises when estimating the mean of a
normally distributed population in situations where the
sample size is small and the population's standard deviation
is unknown. Whereas a normal distribution describes a full
population, t-distributions describe samples drawn from a
full population. Accordingly, the t-distribution for each
sample size is different, and the larger the sample, the
more the distribution resembles a normal distribution.

The **standard error**, $SE_i = \frac{\sum_{ij=1}^m
(O_i - O_{ij})^2}{\sqrt m * (m-1)}$, where $O_i$ is the mean
of the $i$th measurement, $O_{ij}$ is the $j$th replicate of
the $i$th measurement, and $m$ is the total number of
replicates of the $i$th measurement. The standard error is
the standard deviation of all possible replicate means.

In [121]:

```
fig = plt.figure(figsize=(15,5))
fig.add_subplot(1,2,1)
# standard error
SE = []
for n, item in enumerate(MD[0]):
S = []
for x in range(m):
S.append((MOD[n] - MD[x][n])**2)
SE.append(sum(S) / float(np.sqrt(m)*(m-1)))
plt.plot(SE)#[10]
plt.grid()
plt.ylim(0,1)
plt.title('Standard error of all observed data')
fig.add_subplot(1,2,2)
# ci = 1.96 * standard error
ci_l = [MOD[n] - 1.96 * x for n,x in enumerate(SE)]
ci_u = [MOD[n] + 1.96 * x for n,x in enumerate(SE)]
# show only most important part
plt.plot(T[20: 40], MOD[20: 40], 'bo', label = 'mean of all observed data')
plt.plot(T[20: 40], SD[20: 40], 'r.', label = 'simulated data')
plt.plot(T[20: 40], ci_l[20: 40], 'g--', linewidth=3, label = '95%-confidence interval')
plt.plot(T[20: 40], ci_u[20: 40], 'g--', linewidth=3)
plt.title('95%-confidence interval around observed and simulated data')
plt.xlabel('time')
plt.grid()
plt.ylim(1, 11)
plt.legend(loc = 'best')
```

Out[121]:

singles out errors due to variations in the measurements (i.e. in observations) and considers only errors due to the difference between simulated and measured data. However, actual values of replicated measurements are needed to calculate LOFIT.

$LOFIT = \sum_{i=1}^n m_i*(O_i - S_i)^2$

where $m_i$ is the number of replicates of the $i$-th measurement, $n$ is the number of simulated and measured pairs being compared, $O_i$ is the mean of the replicates of the $i$-th measurement and $S_i$ is the simulated value.

In [122]:

```
# LOFIT of the above replicates
m = 3 # number of replicates
L = 0
for n,i in enumerate(SD):
L += m * (MOD[n] - i)**2
print('The LOFIT is %0.2f' %L)
```

excluding errors assiciated with variations in measurements.

$F = \frac{\sum_{i=1}^n (m_i - 1) * LOFIT}{n*\sum_{i=1}^n \sum_{j=1}^{m_i} ((O_{ij} - S_i) - (O_i - S_i))^2}$

In [123]:

```
# significance of LOFIT
num = []
for n,i in enumerate(SD):
num.append((MOD[n] - 1) * L)
Num = sum(num)
denum = [] # denumerator
for n,i in enumerate(SD):
s1 = []
for x in range(m):
s1.append(((MD[x][n] - SD[n]) - (MOD[n] - SD[n]))**2)
denum.append(sum(s1))
Denum = len(SD) * sum(denum)
F = Num / float(Denum)
print('The significance of the LOFIT is %0.2f' %F)
```

determines the degree of association between measured and simulated data.

$SCC = \frac{\sum_{i=1}^{n_i} (O_i - \overline{O}) * (S_i - \overline{S})}{\sqrt{\sum_{i=1}^{n_i} (O_i - \overline{O})^2} * \sqrt{\sum_{i=1}^{n_i} (S_i - \overline{S})^2}}$

The SCC can have any value between -1 and 1. A correlation coefficient of 1 or close to one indicates good positive correlation. The simulated values are strongly associated with the measured values, suggesting that the model is performing well.

A correlation coefficient of -1 or close to it indicates a negative correlation. The simulated values are strongly associated with the measured values, but there is something wrong with the model. Results are coming out the wrong way.

A correlation coefficient of 0 indicates no correlation. The simulated values are not associated with the measured values. The model is performing badly.

In [124]:

```
Om = np.mean(MOD) # (OD)
Sm = np.mean(SD)
s1 = []
for n,x in enumerate(OD):
s1.append((x - Om) * (SD[n] - Sm))
s2 = []
for n,x in enumerate(OD):
s2.append((x - Om)**2)
s3 = []
for n,x in enumerate(OD):
s3.append((SD[n] - Sm)**2)
r = sum(s1) / float(np.sqrt(sum(s2)) * np.sqrt(sum(s3)))
print('The sample correlation coefficient is %0.2f' %r)
```