22-23 October 2012 Pacengo del Garda (VR) - Italy www.caeconference.com
|
Healing the swine flu with modeFRONTIER
One of the hot topics of the winter 2009 that probably will be remembered is the outbreak of the so-called “swine flu”. The new virus A-H1N1 captured the attention of the Italian media, which literally bombarded the population with daily reports on the number of deaths, the severity of this virus and other alarms based on the opinion of some “epidemiology experts”, spreading in this way the fear within the population.
 |
| A photo of the A-H1N1 virus (left) and a swine (right). They do not look so dangerous… |
During the first weeks of autumn some sentences such as “We will have an extraordinary peak of flu diffusion between Christmas and the new year” or “we will be the victim of a new pandemia with many deaths” were pronounced.
How is it possible to predict such an “apocalyptic” scenario so many weeks in advance? The truth is that it is extremely difficult, especially when no previous knowledge on the virus behavior is available. However, in epidemiology some simple mathematical models have been developed and used for many years; they are mainly based on ordinary differential equations (shortly ODEs).
Probably, the most known model is the so-called SIR model, where the population, which is supposed to be large and homogeneous enough, is divided into three groups (Susceptible, Infected and Recovered), according to their status (see [4]). Strong simplifications are present in this model which can be applied as scale level; in some cases it could lead to poor results. For this reason, there is a variety of SIR based models which remove some of these simplifications in an attempt to be closer to reality. In this work, we suggest to add a new category to the standard SIR model in order to consider the fact that unfortunately, some infected people may die. The resulting model can be expressed as:
 |
| Figure 2: The solution of a classical SIR model: the three categories S, I and R are plotted versus time, expressed in weeks. It is clear that the disease has a peak between the first and the second week and that the maximum number of ill people is around 150 over 1000. In this case we adopted the following values for the parameters (β=10, ν=5 and the number of initial infected is 1.98 for 1000 persons). |
This is a non-linear system of first order ODEs; the four categories used to classify the population are S = Susceptible, I = Infected, R = Recovered, D = Dead and they are expressed in percentage terms. For this reason the sum of all the categories has to be always equal to one. The parameters β, ν and δ are constants which determine the evolution of the disease. The results strongly depend on the numerical values of these parameters. Specifically the peak value of the infected and the week of the year when it will appear, which are important information to have in advance, can be really difficult to capture if there is not a rigorous estimation of the above mentioned parameters.
Obviously, it is mandatory to know the initial conditions before solving the system: in other words we have to know the number of susceptible, infected, recovered and dead persons at time zero, when we want to begin our simulation.
The solution of such equations is always done, excluding trivial cases, through numerical techniques which have been expressively defined to tackle this kind of problem. To obtain reliable solutions, the numerical strategies have to consider the nature of the ODE to be solved; in general, ODEs can be really complicated and strongly nonlinear and the independent functions could have sharp variations within time.
 |
| Figure 3: The modeFRONTIER workflow used for the model tuning problem. |
For this reason, many techniques have been developed as it can be easily seen in literature (see [5] and [6] just to have an idea), to minimize the difference between the numerical and the theoretical solution.
The implementation of such techniques in general is not an easy task for many engineers and scientists who probably are more interested in obtaining a reliable solution for their problems rather than in spending time and money in compiling codes.
To partially mitigate this situation, we use a general-purpose and open source platform, Scilab (see [2]) which provides the user with powerful numerical tools to manage different problems and to solve an ODEs system.
In Figure 2 the three categories S, I and R (these quantities are measured with reference to a population of 1000 persons) have been plotted versus time (expressed in weeks). In this case a classical SIR model has been solved: it can be easily seen that the number of infected persons amounts to a maximum of 150 out of 1000 and that it falls between the first and the second week. The model parameters (β=10, ν=5 and the number of initial infected is 1.98 for 1000 persons) have been chosen in this case without any reference to a real disease.
Unfortunately, the model parameters are not known in advance but, usually, they have to be estimated starting from some previously acquired knowledge on the evolution of the disease. Once these parameters have been estimated, it will be possible to predict the spread of the disease.
This is a typical model tuning problem which could be formulated, for example, as a least square problem. Actually, if we knew the number of infected persons and the deaths which can be ascribed to the flu during a given period, we could try to find out the values for the model parameters in order to best fit the known data. The result could give a better insight into the flu evolution, and the possible predicting of the peak of the infection and hence a better understanding of the general evolution of the disease.
To this aim we decided to use the data reported in Table 1, which are provided by the ISS (Istituto Superiore di Sanità) and collected by the author from different Italian media (see [2]). It is obvious that they are not numerous, but they are the only ones available at the end of the 46th week of the year (November 15th). However we would like to predict the swine flu evolution in Italy for the following weeks.
 |
| Figure 4: The evolution of the swine flu in Italy according to our modified SIR model. Note that the logarithmic scale has been adopted for the persons ages. It can be seen that the flu peak falls between the 45th and 46th week (blue curve)and that it concerns about 13.5 persons out of 1000. The flu should practically disappear before the new year. The triangles represent the fitted data, contained in Table 1: it immediately becomes clear that the death rate is slightly overestimated. |
 |
| Table 1: The number of infected persons over 1000 (data source [2], November 15th) and the total deaths due to the swine flu (reported by the italian media) in Italy are reported in this table for some weeks of the year. |
As mentioned above, our aim is now to find out the best values of β, ν and δ in such a way that our modified SIR model is able to best fit the data reported in Table 1. We are building the modeFRONTIER project drawn in Figure 3: the four input variables are represented by the four green nodes in the upper part while the output variables, the number of the infected and the deaths at different weeks, are extracted directly from Scilab through two output vectors (the blue nodes).
Among the many available strategies to adopt for the solution of this problem, we decided to use the following one, which has the desirable feature to lead to a mono-objective minimization problem.
We introduce a target node, involving the computed number of infected people, looking for the best fit:

and a constraint node, involving the number of actual deaths:

For the solution of such a problem, usually, a Levenberg-Marquardt algorithm is recommended, in view of the nature of the function to be minimized. It is well known, however, that this algorithm adopts a penalty oriented approach to manage constraints, which may not be the best in our case: we actually would like to have a very accurate fit also for the death rate, which is involved directly by the constraints.
 |
| Table 2: A comparison between the best solutions found with the two optimization algorithms adopted. It can be seen that the NLPQLP provides a better solution. C44, C45 and C46 represent the value of the constraint as defined in equation (2) expressed for the weeks 44, 45 and 46 respectively. |
The NLPQLP algorithm, which has a completely different approach in the constraint management, has also been tested: it can be seen (from Table 2) that it provides better results than the Levenberg-Marquardt algorithm.
The evolution of the flu is reported in Figure 4, as computed using the best fit parameters by NLPQLP. It is evident that the peak falls between the 45th and 46th week and it concerns about 13.5 persons out of 1000. Moreover, it points out that the flu should practically disappear before the new year. The mortality rate can be estimated by looking at the number of deaths after a long period (let us say after one year from the beginning): in our model this value amounts to 0.0028 out of 1000 persons, which means 0.03% (170 deaths in Italy approximately). This value appears to be very close to analogous quantities computed for other seasonal flues in the past, which usually range between 0.02% and 0.04%. Finally, it has to be mentioned that the deaths are slightly overestimated in our model.
 |
| Figure 5: The modeFRONTIER workflow used for the solution of the sensitivity problem. A Latin-Hypercube technique has been used to generate a DOE in accordance with the probability density functions characterizing the targets. In the project shown in Figure 3, the model tuning problem is solved by modeFRONTIER with a batch call and a Scilab routine which are used to continuously extract the information about the disease. |
However, if the reader visits the web site given in [2], he/she can read that the proposed data could be affected by slight variations, due to some delays in reporting by the surveillance network. Probably, the number of infected persons will not be exactly the same as those reported in Table 1, after November 15th. Hence knowing that the available data at the end of the 46th week may not be accurate, we would like to estimate how our previsions reported above, could change. In other words, we want to understand what is the error rate of our previsions, as the target data may be affected by slight variations.
Here, the first step certainly is to give a probabilistic characterization of the target values; we decided to use an exponential probability density function for each target value of the infected persons. This choice has been driven by the fact that the true values of the infected persons are certainly higher than those reported in Table 1; actually, they are expected to grow. In Table 3 the values of the location and the scale for the four exponential probability functions are collected.
These values have been arbitrary chosen (there is no information on the reliability of data we have) in such a way that values lesser than those reported in the last column of Table 3 have around 90% of probability to appear.
 |
| Figure 6: The peak is plotted versus the peak position. The bubble color is used to represent the mortality. The cloud of points summarizes the simulated scenarious. It can be easily seen that even the worst previsions in terms of peak and mortality rate have nothing in common with a pandemia or a catastrophic outbreak. |
Five thousand simulations have been organized modifying the target values in accordance with the given probability density functions mentioned above and the corresponding peak position, peak intensity and mortality have been computed.
To launch these simulations a new modeFRONTIER project has been organized (see Figure 5): the Latin-Hypercube algorithm has been set up to plan an appropriate DOE and a batch call to the project described before has been applied in order to solve the model tuning problem. A Scilab routine finally extracts the results in correspondence with the best solution found.
In Figure 6 a bubble chart is shown: the peak value is plotted versus the peak position and the bubble color is used to represent the expected mortality. It can be easily seen that we obtain three ranges of existence; we can say that the peak position ranges between 45.29 and 45.60 weeks and that the peak ranges between 13.19 and 13.24 infected for one thousand inhabitants. The mortality rate never passes the 0.00347 over 1000 persons. It is interesting to observe that the resulting cloud of points is not uniformly nor homogeneously distributed, but it has important voids and regions with high densities.
 |
| Table 3: The location and the scale parameters of the exponential probability functions used to characterize the target values of the infected people at different weeks. |
To understand how the probability of the couple peak and peak position is distributed, we have built the diagram plotted in Figure 7. The cloud of points has been divided in a 20 x 20 cells regular array, and we have counted the number of designs inside each cell. These counts have been divided by the total number of computed designs obtaining the relative frequency, which can be reasonably associated with the probability. This plot allows to say that peaks of around 13.35 infected falling between the 45.43 and 45.44 week are the most probable ones. We can conclude that, even if considering uncertainties in the target values, it is possible to estimate the spread of the disease with a reasonable accuracy: it is certainly possible to exclude catastrophic scenarious even if few data are available. During the next weeks we will see if the model presented in this work has been able to correctly predict the spread of the swine flu or, on the contrary, if a terrible outbreak will happen. Let's hope for the best and be optimistic!
 |
| Figure 7: The frequency of the couples peak and peak position. The most probable scenarios are those characterized by a peak of around 13.35 infected falling around the 45.44 week of the year. |
Conclusions
In this work we have shown how it is possible to model the natural spread of a disease, within a population, with relatively simple equations. If some observed or measured data are available it is possible to tune the model and predict the evolution of the disease with sufficient accuracy at macro scale.
The Scilab platform has been used to numerically solve the ODEs system and modeFRONTIER to tune the model and manage the uncertainties on available information. We would like to emphasize that the methodology adopted in this work can be used in the same way, also in other contexts, when a prevision is needed and experimental data are affected by errors.
References
[1] http://www.scilab.org/ to have more information on Scilab.
[2] http://www.iss.it/iflu/ to have more information on the italian sentinel surveillance. The data relative to the infected have been downloaded here.
[3] http://www.ministerosalute.it to have a complete description of the swine flu.
[4] http://www.wikipedia.com to have more information on the SIR model.
[5] P. Blanchard, R. L. Devaney, G. R. Hall, Differential Equations, (2006) Thomson Brooks/Cole, 3nd ed.
[6] K. S. Bhamra, O. R. Bala, Ordinary Differential Equations. An Introductory Treatment with Applications, (2003) Allied Publishers PVT. LTD.
For more information on this document please contact the author: Massimiliano Margonari - EnginSoft S.p.A.
info@enginsoft.it
|