"An apple a day keeps the doctor away"

Keeping a good and balanced diet is fundamental to having a healthy life as it helps avoiding food-related illnesses such as diabetes, obesity and cardiovascular diseases. However, the price of the food products influences greatly the decisions of individuals in purchasing them or not. Haven't you ever chosen a more expensive brand because you felt it was healthier? This belief is supported by a 2013 study from Harvard School of Public Health that found that eating a healthy diet costs about $1.5 more per day per person than eating an unhealthy one. Sounds like a pocket change, but this represents an extra $2,200 per year for a family of four. So, do people have an equal chance in maintaining a nutritious diet and thus a healthy life?

We provide here an insight on the food consumption discrepancies between different boroughs of Greater London and explore the link between the economic situations of households and their food purchases. The datasets we use contain information about the grocery purchases, incomes and child poverty per borough of Greater London.






The tastiest of ingredients: the data

Let's first present the Tesco Grocery 1.0 dataset from which we get the food purchases data. It consists in a record of 420 M food items purchased by 1.6 M fidelity card owners who shopped at the 411 Tesco stores within the boundaries of Greater London during the entire year of 2015. These data are aggregated at different spatial granularities (from Lower Super Output Areas to Boroughs) to preserve anonymity.

420M

food items

1.6M

customers

411

Tesco scores


We also use the following datasets:

  • Prevalence of overweight and obese children: fractions of overweight and obese primary school children in Reception class (aged 4 to 5) and year 6 (aged 10 to 11), sampled across wards. This data has been collected by the English National Health Service (NHS) in the 2013–2014 school year.
  • Prevalence of overweight and obese adults: fractions of overweight and obese individuals among a statistical sample of borough residents. This data has been collected by the Active People Survey (APS) in 2012.
  • Diabetes prevalence: fraction of adults among those registered at a GP practice in England who are affected by type-2 diabetes. This data has been collected by the NHS for year 2015 at ward level.
  • Earnings by place of residence: gross earnings of employees by place of residence. We only considered the full-time weekly earnings per borough in 2015.
  • Children poverty: numbers and percentages of children in poverty for Borough and London Wards (at 31 August each year). We only considered the children (dependent children under the age of 20) in child benefit families per borough in 2015. Therefore, the higher the value of the aid perceived, the more precarious the economic situation.
  • London consumer expenditure estimates - Detailed borough base: consumer expenditure data to 2036 broken down by London borough. We transformed the data concerning food expenditure in percentage of the total expenditure over the year 2015, per borough.
  • What is the average diet of a Londoner?


    Is the data really representative of the Londonian food habits? It appears so… Indeed, Tesco was the biggest grocery retailer in UK in 2015, with 28% of market share. Moreover, we only take into account the areas where the Tesco food purchases data is sufficiently representative (over 10%) of the food purchases of the area's population.

    Let's now have a look at the Londoners food habits!

    It seems that Londoners have a diet rich in fats (especially saturated fats) and carbohydrates (especially sugars). If we now look at the most represented food categories, at first sight, one may be satisfied to find fruits and vegetables as the top one food category. But sweets occupy an important part of the energy income.... Grains and dairy come after, yet quickly regained by ready-made food.

    What constitutes a healthy diet?

    Do Londoners eat healthy ?

    The exact make-up of a diversified, balanced and healthy diet will vary depending on individual characteristics (e.g. age, gender, lifestyle and degree of physical activity), cultural context, locally available foods and dietary customs. However, the basic principles of what constitutes a healthy diet remain the same. According to the World Health Organization, a healthy diet includes the following:

  • fruit, vegetables, nuts and whole grains
  • at least 400 g of fruit and vegetables per day
  • less than 10% of total energy intake from free sugars (ideally less than 5%)
  • less than 30% of total energy intake from fats
  • less than 10% of total energy intake from saturated fats
  • less than 10% of total energy intake from trans-fats
  • less than 5g of salt per day
  • Let’s see if Londoners follow the WHO recommendations regarding free sugars, fats and saturated fats…

    Areas exceed this criteria by 17.8% to 26.2%!

    Areas exceed this criteria by 12.8% to 15.1%!

    Areas exceed this criteria by 25.2% to 29.4% !


    The results are clear: Londoners average diet is way richer in free sugars and fats than what it should be.

    How does Londoners diet relate to their health?

    Let’s take the bull by the horns: does this average diet have a direct impact on the Londoners health? We specifically look at the prevalence of obesity and type-2 diabetes, two metabolic syndrome conditions strongly linked to food consumption habits. The data collected by the APS in 2012 indicate that 37.4 % of Londoners are overweight and 19.8 % are obese. So only about 40 % of the londonian population has a “healthy weight”! However, those statistics can be taken with a grain of salt as they come from statistical samples.


    To verify that the Londoners food habits are associated with an increased prevalence of metabolic disorders, we correlate diabetes, obesity and overweight prevalence among adults and children with the different nutrients and food categories that we have seen previously.

    For both food items and nutrients, the Spearman rank correlations are comparable for the obesity and overweight prevalence among children. The same happens for the obesity and overweight prevalence among adults. Diabetes prevalence has its own pattern of correlations.

    Regarding nutrients, we can see that the energy coming from fibres and alcohol (didn't you know that Guiness is healthy?) and the entropy of energy from nutrients are strongly negatively correlated with all the metabolic syndromes considered. On the other hand, the total energy and the energy from carbs show nice positive correlations with the obesity and overweight prevalence among adults. Finally, the diabetes prevalence is well correlated with almost all nutrients categories.

    Regarding food items, we again find that alcohol (beer and wine) are negatively correlated with the metabolic disorders. Less surprisingly, fruits and vegetables and dairy seem to decrease the disorders prevalence. On the contrary, fats and oils increase them.


    How do we measure the healthiness of a diet?

    I already hear you thinking, what’s the point of all this? The point of all this analysis is that at the end, we can compare the healthiness of Londoners diet with the economic situations of households. So we need a diet score such that if the score is 1, the corresponding diet is diversified, balanced and healthy and respectively, if the score is 0, the corresponding diet is completely unhealthy. We found two ways to compute this score:

    Method 1: Fit a linear regression on the overweight and obesity prevalence datasets with their highest correlated nutrients as features.

    As we have seen previously, the energy coming from fibres (that is cereals, nuts, peas, beans, pulses…) and the entropy of energy from nutrients (capturing the diversity of nutrients in the total energy) seem to significantly reduce the obesity and overweight prevalence among the population. On the other hand, the energy from carbohydrates (associated with processed foods made from plants as sweets, soft drinks, breads, pastas…) has the tendency to increase it.

    Score 1 simply consists in a weighted sum of these three nutrients, with the weights corresponding to the coefficients obtained by running an ordinary least squares regression.

    Method 2: Fit a linear regression on the overweight and obesity prevalence datasets with the most consumed food items as features.

    Those most consumed categories are fruits and vegetables (27.8%), sweets (16.1%) - which also include cakes and biscuits, grains (15%) - which include rice, corn and wheat and dairy (10.5%) which correspond to cheese, milk or yoghurt for example.

    Score 2 is computed in the same way as score 1, except that we select as features the food items the most consumed by Londoners: fruits and vegetables, sweets, grains and dairy.

    SCORE 1

    SCORE 2

    To check the consistency of the computed scores, we compare the features of the 25% lowest-scoring and of 25% highest-scoring areas. As you can see just above, the results are realistic. Concerning score 1, the highest-scoring areas show a diet slightly richer in fibres and entropy and poorer in carbohydrates than the lowest-score areas. Regarding score 2, the differences are even more significative. The highest-scoring areas show a way bigger consumption of fruits and vegetables and of dairy and a smaller consumption of sweets and grains.

    But how can we be sure that the two scores we computed are well representative of the healthiness of the Londoners diet? We need to validate them using new data, that is...(drum roll)...data on diabetes prevalence, another food-related illness!

    As you can see, the two scores are strongly correlated, which enables to assume that both roughly capture the same information. This is confirmed by their visualization on the London map: they only exhibit slight differences.




    But what really interests us is the correlation between each score and the estimated diabetes prevalence. It appears that both scores are strongly correlated with diabetes prevalence, and as expected, the lower the score, the higher the vulnerability to diabetes. But the score 1 seems to win… so we choose to keep it for the rest of the story!

    What is the proportion of food related expenditure in each borough? How does it relate to its economic situation?

    Now, let’s interest us now in some economic indicators. Especially, how is food expenditure related to wealth?



    We can see that food is the 4st highest activity consumption behind housing, restaurants and hotels.

    We propose here a visualization of the wealth differences between the different boroughs through these three indicators:


    At first sight, earnings and child poverty seem to be strongly correlated in boroughs, as expected. Concerning the food expenditures, they appear to represent a more important part of the total expenditures in poorer areas, which is quite intuitive. In fact, this is in accordance with the Engel's law, an observation in economics stating that, as income increases, the proportion of income spent on food decreases. This effect is clearly visible in London!

    How does a healthy diet relates to the borough's economic situation? Is this connection area-dependent?

    The WHO asserts that the global food price crisis threatens public health and especially the health of the low-income families. But how does a healthy diet relate to the economic situation?

    Here, it clearly appears again that the higher the earnings and the lower the child poverty, the lower the proportion of food expenditures. But more important here, the higher the earnings and the lower the child poverty, the higher the healthy diet score. So it would be reasonable to think that a healthy diet is clearly positively correlated to the borough’s economic situation.

    But in the end, the most important that one should not forget:
    "When health is absent... wealth becomes useless" - Herophilus
    The detailed analysis of the project can be found here .

    BACK TO THE MENU

    Data story produced for the Applied Data Analysis course, EPFL, 2020.