Wednesday 22 April 2020

The good, the bad and ugly of Crisis response statistics

It's natural to tabulate data for use during a disaster or crisis as it unfolds. It is also useful to leaders that need facts and information about events as they occur and what, if any actions, should be taken in response. Too little information, then response options are likely to be ignored for further action. Too much information from limited sources, often leads to improper use and conclusions.

Statistics are a valuable tool that can be used to develop insights and situational awareness. We can use this data in multiple forms and visualization, allowing experts and government agencies to resolve significant issues as events occur. To do so requires critical understanding of what the data can influence, which if done incorrectly, can lead to miscalculations and perhaps, make problems worse rather than better.

Additionally, it is also natural to make comparisons using data collections from other regions or countries. It is an often used option in the scientific and social sciences communities when generating forecasting models including environmental conditions, disease, economic activity, etc.. The Canadian Broadcasting Corporation (CBC News) has attempted to compare different countries infection curves. It included source references and significant levels of data detail. But it would be unfair to calculate how well each nation's response capability has been without understanding local conditions as identified below.

Not all environments or population demographics are equal or possible to adjust to generate a median average. Nor should they be given cultural and governance differences that exist throughout the world. Canada's response and infection curve will not the same outcomes as other countries. Comparisons using medians such as per hundred thousand, is not an accurate reflection of governance options implemented. Such an analysis should not assume what happens in Country A will occur in Country B.

Data can be easily abused or misinterpreted with devastating consequences that generate doubt and conflict in both the scientific and political space. This easily induces mistrust in the public, if the data is not sufficiently scrutinized and debated or peer reviewed. Parsing data is becoming a common problem that intentionally generates divisiveness and is easily distributed on the internet. In itself, data abuse has spread like a pandemic and is the foundation of most conspiracy theories now spreading on internet blogs and websites.

Very few organizations generate statistical models that are peer reviewed prior to release. It's one of several reasons some models are open to debate and not taken seriously while others are given a trustworthy seal of approval based on historical reputation and accuracy. The level of detail of any statistical model, is only as good as its sources and its accuracy. This is particularly true of forecast models or those that attempt to predict future outcomes using a mathematical algorithms or logarithms. Some models do disclose all sources used in generating their statistical models as is the case with the John Hopkins University Coronavirus dashboard. They published an thorough article on how they generated their data to create a visualization of the pandemic, on its site, The Lancet. The result is a dashboard of the current Coronavirus pandemic displaying a global picture of the number of infections and fatalities. While it has some detail limitations, these are not because of how the data is being interpreted or manipulated but the availability limitations or format of how the data is exported from each host nation or organization.

Image Capture of John Hopkins Center for Systems Science and Engineering April 22, 2020

When attempting to blend baseline data with behaviour models (social activity, movement, employment demographics, culture, etc), the level of accuracy is significantly impacted if the level of evidence is hypothetical and itself based on a logarithmic formula. For example, let's create a hypothetical COVID-19 transmission model that attempts to explain how the virus can spread within a community. To do so, we will need baseline data to generate a profile of the impact zone and generate ground zero of what the event looks like now.

We can use the following datasets to generate this baseline:

  • Census Data (often 2 to 8 years old) 
  • Residential Property tax database
  • Voter Registration database
We then use medical information to determine potential forecast model of the virus:
  • Virus characteristics (What is it, who it can affect, why it spreads, where can it spread, when does it incubate / become active)
  • Corrective actions necessary (isolation, quarantine, level of medical care required)
We then input local infrastructure data:
  • Transportation system (bus / train terminals and routes, airports and airplanes, personal, commercial logistics vehicles)
  • Business composition (service, industrial, professional, business / corporate, entertainment)
  • Social services (hospitals, elderly care, locations of worship)
Almost all of these datasets exist and can generate activity models which can display different levels of saturation (i.e. peak hours, congestion levels, etc.). When blended with the behaviour model data, a picture emerges as to how the virus could spread. 

Census data is widely available in most countries. Some examples include:
The same is true of tax databases and voter registration counts along with the others identified. But many of the forecast models currently being published do not disclose what sources they are using or how they have generated their mathematically generated conclusions. Even when these datasets are properly filtered, the granular level of detail can generate models that have a degree of inaccuracy. For example, how old is the data. Simulation of dynamic data maybe required in advance when using transportation data (peak hours, to and from destinations, public facility usage, etc.). In almost all cases, simulations are rarely calculated to this level of granular detail. 

There are maps being generated using real-time data that attempt to explain how the virus is spreading between States. One example was published by the Daily Dot after the annual College spring break that often center in the State of Florida. Before and after Spring Break vacation, students head head to Fort Lauderdale, Orlando or Miami and then head back home or to their respective College or University. Mobile phone signal datasets were plotted and then placed on a Geo-spatial Information System (GIS) based map.

Source: www.dailydot.com


The logic of this heat map based on mobile phone activity, is a potentially good data source and can explain how COVID-19 is spreading across the midwest and other regions across the United States. Note that unnecessary travel between Canada and the State of Florida restriction was already put in place and is accurately reflected in this heat map. But what it does not do, is explain or calculate the level of transmission that actually occurs, yet several government leaders look at this map and began to make decisions on who can and cannot enter their State from Florida or elsewhere. The source of this data is not a sufficient scientifically or statistical model of evidence that can conclusively generate a forecast model. It can be a valuable source of data directing government organizations where to focus prevention measures such as virus screening and testing, sanitation protocols, crowd control, etc. When combined with transportation information (airport datasets, etc.), the prevention evidence begins to build and generate accurate data points for use in preventing the spread of the virus. But this is only true if agencies act upon the information it generates.

Source: BBC News, London Tube train, March 23, 2020 - Packed with no social distancing rules in place
Many Local and State level governments are not disclosing evidence based on scientific data when making policy decisions when to re-open or reduce lock-down / isolation policies. Nor are they  allowing peer review of any analysis they have projected. In some cases, officials have taken a different approach all together, believing the economy is more important than the crisis event the virus has created. This is the case for the State of Florida and Texas. Or in one case, the Governor of Georgia stated he was not aware that COVID-19 was not capable of human to human transmission by that are asymptomatic infected 4 months after the first case of the virus occurred. The State of Rhode Island Governor has banned all non-essential travel from any State by car. The question everyone has raised, is it based on science or fear? No statistical model was used in making this decision. The City Mayor of Las Vegas seems to believe in statistics - based her belief that the city can get back work using the city as a control test location. The city's statistician said no to the idea.

Statistics are valuable sources of information during a crisis, if collected and interpreted by experts are provided the right datasets and most importantly, are used correctly. Without this expertise and detailed analysis, any briefing leaders give, will be open to interpretation and potential discourse in what the data is informing the public. 

2 comments:

  1. Thank you for providing such valuable information and thanks for sharing this Business Promotion technique.

    ReplyDelete
  2. I was suggested this website by my cousin. I am not sure whether this post is written by him as nobody else know such detailed about my trouble. You are wonderful! Thanks!Nice blog here! Also your web site loads up fast! What web host are you using? Can I get your affiliate link to your host? I wish my website loaded up as quickly as yours lol University of Hertfordshire IELTS requirements

    ReplyDelete