Data Diversity Podcast #5 – Abdulwahab Alshallal

Welcome back to another edition of the Data Diversity Podcast, the Research Data podcast from the University of Cambridge Office of Scholarly Communication (OSC). If this is your first time here, in this podcast, I speak to Cambridge Data Champions about their journeys in acquiring and working with data in their research, with the hope to highlight interesting facets of data work, but also academic research in general. In this episode, I spoke to Cambridge PhD student Abdulwahab Alshallal, from the MRC Epidemiology Unit, and who is part of the Physical Activity Epidemiology research group.  

Currently for his PhD, he is exploring associations of physical activity, behaviour and fitness with cardio metabolic risk in different global populations. Abdulwahab recently presented at a Data Champion Forum, where he talked about working with datasets from international sources, specifically from non-Western nations, and discussed the barriers to collaboration and differences in the flexibility of institutions regarding data access and sharing. In this episode, we discussed those matters and also went into his aspirations for public health policy making and how his data driven mindset applies to this endeavour. 


I am of the mind that your social and physical environments are a big determinant of your physical activity and your general lifestyle behaviours. For example, it is unfair to to compare the UK and India because it is much easier to cycle in streets and walk around in the UK than it is in India, or Mexico or even Kuwait, and the barriers can be different. It could be pedestrian access, it could be heat, in my case it would be humidity. All of these factors matter, and we need to get data to represent those populations and use that data in such studies. – Abdulwahab Alshallal


The overrepresentation of data from Western studies in global understandings of fitness 

LO: Is it true to say that most of the data that is available now is all based on Western data sources and is it problematic then to use that to represent a global understanding of fitness?

AA: I would rephrase that. It is not that the data does not exist, rather, it is that its representation in the literature is absent. The data exists but when it comes to the data making into the literature and influencing policy guidelines, this is not yet prevalent. Take for example physical activity guidelines: every few years, data from a lot of the literature of what is published is gathered and used to make new recommendations for physical activity. It is through these guidelines that it was recommended that people exercise, for example, 30 to 60 minutes of physical activity per day. Now, the guidelines say that it is 150 minutes of physical activity per week, no matter which day you do it. But the data that influences these policies are mostly data from North America, Europe, Australia (because these are the data used in the literature cited for the creation of these guidelines). This implies that we do not think that it matters much to look at data from other places, because humans are humans. But I am of the mind that your social and physical environments are a big determinant of your physical activity and your general lifestyle behaviours. For example, it is unfair to to compare the UK and India because it is much easier to cycle in streets and walk around in the UK than it is in India, or Mexico or even Kuwait and the barriers can be different. It could be pedestrian access, it could be heat, in my case, it would be humidity. All of these factors matter, and we need to get data to represent those populations and use that data in such studies. 

The data does exist and thankfully I have made an effort to do include it in my research. One of the places where you can acquire this data is from the World Health Organisation (WHO). That is the most wide-ranging data source, and then the few others that I’m using are from the South Asia Biobank, which covers four countries in South Asia: India, Sri Lanka, Pakistan and Bangladesh. Another source is the biobank from the UAE Healthy Future Study which would cover the Gulf populations, and the Qatar biobank.  

Data in his research 

LO: what are the research questions that you’re asking and what and and how is data used, or what data is needed to answer those questions? 

AA: I am interested in physical activity and asking are the associations of physical activity in the different ethnic populations different or the same? Does it matter where you live in the world? And we have made progress in this discovery. You would be the first to hear this actually but we have finished up our analysis for my first paper, and this is using the WHO data. We are close to submitting the manuscript. This is a bit of a segue but it is worth mentioning because it highlights one of the problems of the literature, but this paper touches on one of the controversies in my field. What the paper addresses is that all physical activity is good for you. For some context: there has been a recent phenomenon that we found in the current literature that uses mostly European data, that views occupational and non-occupational physical activity separately. They show that non-occupational physical activity is good for you, but occupational physical activity either has no effect or is actually bad for you in terms of mortality outcomes. What is alarming for us to instigate is to frame a paper that states that in low- and low-income countries outside of Europe, there is very little concept of non-occupational leisure time physical activity. Most of your activity is going to be in travel behavior or activity during your occupation, for example if you are doing heavy manual labor like construction and farming. So, we had to investigate that and I’m glad to report that, at least in terms of our findings, we found that occupational physical activity is not bad for you. Non-occupational physical activity is also good for you and it doesn’t matter what type of activity you do. We also were able to control the proportion accumulated in either occupational and non-occupational physical activity and based on what we found, any physical activity wherever you do it is good for you. 

We need to understand the physical activity in different parts of the world. The types of activity you’re going in one part of the world is going to be different to other parts of the world so one guideline is not going to be appropriate. We currently have one guideline from The WHO for the whole world which has 150 minutes of moderate to vigorous physical activity as the goal. Does that seem appropriate for the whole world? It might not be in terms of different countries or even different population subgroups such as young versus old or men versus women, or different occupations or different activity levels, and what really is the barrier between light, physical activity and moderate activity? It is going to be relative and likely complex. This is a shift of mindset that hopefully I will be able to contribute through my research. 

The experience of acquiring data for his research from global data banks

LO: What has your experience of acquiring data from different sources been? From what I understand, there are different barriers in place to getting the data. 

AA: Just to put it out there, I think it’s completely understandable that these barriers are in place. The data that these organisations produce is particularly high-quality, high-resolution data. Besides the WHO data, the studies from the biobank’s that I have mentioned plan on collecting data every few years from the same participants so the data really tells you about the health of the population because these cohorts are meant to be representative of the population. To put this data in the hands of researchers that you do not properly vet can be quite a risk, even if it means using anonymised data, so I completely understand the barriers. 

In terms of the the difficulty in which to get that data, it has been different. In regard to WHO data – and this is not my experience, but an experience of a researcher before me, a post doc that that worked on the same data set before me – a few years back she had to go all the way to Geneva and to perform the data analysis there because they did not have an online infrastructure in order to allow researchers from abroad to use the data. That has since changed and the way that I was able to request it is through the WHO microdata Repository.  

For the South Asia Biobank, after going through the data request, researchers are given a link to the data. The data request process itself is very comprehensive and can cause delays. It takes a lot of time, and there is a lot of emphasis on the protocol. They want to make sure that you have a proper protocol to say what you’re authorized to do. If you want to make small changes, even small changes, you have to rectify them before submitting the proposal and that can cause delays. In my case it took around six months and we just received the data, so we have not had a chance to use it. 

For the UAE healthy future study, it is actually a bit more secure than that. You do go through that process of the back and forth of going through the protocol. In terms of getting the data, from what I understand, you are using it locally. I know this from a researcher that I spoke to who works between Cambridge and the UAE. To work with the UAE healthy future study data, she’s given a laptop by the University (NYU Abu Dhabi), and she must be connected to a VPN. While she while connected to the VPN, she’s using a secure platform called NYU-Box. I believe NYU uses this platform in all of its institutions; Shanghai, Abu Dhabi. I have been told that it is very secure and you can use it offline as well.  

Regarding the Qatar Biobank, I don’t know much about the data security measures of Qatar Biobank. Through my experiences, I only know about trying to get that data. They are willing to work with foreign institutions, which is good, but the main PI of the project must be based in Qatar and the analysis must be conducted in Qatar. However, I think going through that effort and that process is very much worth it because it has one of the most comprehensive data sources in all of the Middle East that is available in recent times. It was established around 2014 and they have now up to 47,000 participants and counting. 30,000 of them are Qatar nationals and around 17,000 are foreign nationals who are long term residents. You have people from various populations which includes participants who are Indian, Egyptian, Lebanese. So, you can get to look at migrant workers, you get to look at other Arabs that are living in a specific environment, meaning that you can parse genetics out of social and physical environments. There is so much you can do and in addition to that, what makes it special, for my PhD at least, is that they have treadmill data. This is where they put people through a treadmill around treadmill test and they look at their heart rate response to exercise instead of just going through self-reported physical activity or through wearables. The Qatar biobank is the only study in that region that actually uses heart rate data so we can definitely estimate fitness in that population. For this reason, it is very much worth the effort of trying to push for it.


One thing I am grappling with at the moment is policy development, which is a bit of a departure from data. On one end, I’m gathering the evidence in order to understand the different populations of the world through physical activity to look at the different trends in fitness. Then, once we have the physical activity data, how do we know which resources to allocate to? Who should we target so fitness can tell us that in terms of policy? Who needs it the most might not necessarily be in the volume of activity. – Abdulwahab Alshallal


On the difference between self-reported fitness data and objective data

LO: Are self-reported fitness data less valuable than objective data obtained from wearables? 

AA: It is important to understand that for a long time, it was difficult to get objective data. If you spoke to a researcher from 30 or 40 years ago telling them about a cohort study that would be using wearables, they would not believe you and they wouldn’t think it would be scalable and they think it would be too expensive, and so self-reported data was the only resource that we had. Also, there are downsides to data from wearables. For example, there is going to be noise and glitches with data obtained from accelerometry. So, I wouldn’t say that self-reported data is useless.

I am a big critic of self-reported data and the dependence of the literature on self-reported data and my supervisor has made mellow about it by reminding me that it gives you context. One of the things that we haven’t been able to overcome with accelerometry is knowing what is actually happening. We can tell that they are being active, but what are they doing? When are they doing it? For example, in the questionnaires (that are used to generate self-reported data), we don’t ask people when they leave work or when they start work or commuting, we ask them to estimate their physical strain while doing those things in those specific contexts. This removes from the researcher the burden of trying to estimate what activity is happening. 

In terms of accuracy of the numbers and their influence on policy? That is a good question, and I think accelerometry would answer those questions. Using wearables and attaining objective data, in terms of specific numbers, is much more valuable. But policies in the past are not necessarily based on numbers, and self-reports have benefited us and there is still continued benefit. It is about data points which have a degree of relativity. There are people who are going to misreport because they don’t remember accurately how much activity they were doing, or they might be lying because they feel self-conscious or they want people to think that they are more active, or there might be a recall bias or a social desirability bias which could all lead to misclassifications. We asked for moderate and vigorous activity, but what is moderate to me and light to you? It’s different and relative. While there are accuracy problems in self-reported data, for the most part it tells us something that is relative to people. Take for example someone who reports 30 minutes of activity throughout the whole week versus someone who is reporting 200 or 300 minutes of physical activity per week. We could tell that the person who was reporting the more minutes of activity is more likely to be someone who’s more physically active. It’s going to be aligned more with a better blood profile than the person reporting less activity and so in terms of a relative sense, it is helpful. But having the resources that we have now and the ability to use wearable data, we should be making a transition towards that, but self-reported data still has value. I think they can compliment each other and provide context for the type of activity that you’re doing. 

On data and policy making

AA: One thing I am grappling with at the moment is policy development, which is a bit of a departure from data. On one end, I’m gathering the evidence in order to understand the different populations of the world through physical activity to look at the different trends in fitness. Then, once we have the physical activity data, how do we know which resources to allocate to? Who should we target so fitness can tell us that in terms of policy? Who needs it the most might not necessarily be in the volume of activity. For example, we may have some barriers to fitness such as environmental factors like heat and humidity, also infrastructure factors such as pedestrian access, green spaces, and how these are different in different parts of the world. But how can we use these data to influence policy development? This is something I’m starting to understand and trying to get a grip on. Soon, I will begin a policy internship so I will hopefully learn more about that. I’ve had some conversations with people in physical activity policy, and I’ve learned that in terms of what would actually influence policy, I should be looking for a shared problem and the shared solution. Take for example, cycling lanes. Say you want to create more cycling lanes, but then the government says they don’t have enough money for cycling lanes so they decide against it. But then, you also have a congestion problem and you want to achieve net zero, and you also have an obesity problem. You know what can fix that? Cycling lanes. More cycling lanes means more people are going to be actively commuting and less cars on the road, so there will be less carbon. Then, they will be interested to get on board. So it’s about framing it and that’s what I’ve realized, because framing it in terms of health is not going to take you very far. But in terms of money, or the overall goal, matching them up is going to be helpful. And it’s quite a departure from the way that I’ve been doing things which is being driven by data and what is good for health.


We thank Abdulwahab for speaking with us. We are certainly excited to see how he gets on with policy making. It would be comforting to know that there is a data driven thinker in the world of policy making, especially one that is aware of, and takes into consideration, the contextual, environmental and behavioural differences of people in different communities and parts of the world when integrating data into public health policy decisions

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.