Data Analysis and Comparative Risk

We apply statistics in many of our projects. Three broad categories cover most applications:

  1. We employ design-of-experiments (DOE) methods to develop studies that provide cost-effective, reliable answers to engineering questions.
  2. We have at our disposal many analytical methods to detect trends and relationships among key variables that would otherwise remain hidden in large volumes of data.
  3. Through comparative analysis and statistical modeling, we assess risks posed by technologies, products, procedures, policies, and the likely impact of proposed or recent changes.

Our engineers and scientists scale proposed research to meet the varied needs of cases and projects – from handfuls of data to files containing millions of records. Principles of data analysis remain invariant; practical challenges vary considerably. Whether we have large amounts of data or a set of case studies, we attempt to glean information not observable from merely looking at numbers.

A legal case provides one example of analyzing small sets of data sets. Three independent measurements were made of the interior dimension of a truck. The allegation was that this particular dimension was less than the minimum required by a standard. We analyzed the measurements and were able to show that the variation in the three measurements was greater than the alleged deficiency in the dimension and therefore, to an engineer, these measurements indicated that there was no deficiency.

We have engaged large-scale data analysis to assess societal risks posed by many consumer products and to evaluate regulatory policies promulgated to ensure public safety. Products have ranged from recreational equipment to medical devices to automobiles. Motor vehicle risk analysis has become especially important. Cars, SUVs, light-duty trucks, and passenger vans are among the most regulated, most ingrained products in our technological society. Questions about automotive safety are often at the forefront of public policy debates, currently in conjunction with environmental issues.

Very often, real-world data offer direct answers to questions about actual problems – for instance, whether a vehicle’s design increases the risk of collision or occupant injury. At the same time, the supposed “effect” – a crash or an injury – could have resulted from multiple other causal factors related to drivers, vehicles, or driving environments. To obtain reliable answers, data analysis and statistical modeling are key tools for “mining” data from large crash files maintained by US and state transportation agencies and transforming findings into useful information.

We observe, as have others, that statistical methods as applied to motor vehicle risk analysis are the same as methods developed and widely used for epidemiology, the scientific study of disease in populations. In fact, injuries and deaths in traffic crashes have long been considered a public health problem, not just a “safety problem”. Traffic crashes, their causes and consequences are complex events to which several or more different risk factors can contribute. Therefore, primary aims of statistical analysis include the following:

  1. To identify all relevant risk factors
  2. To assess the statistical significance and the practical importance of alleged “single causes” of safety problems
  3. To estimate the influence of “single causes” relative to the influence of other factors that can produce the same “effect”
  4. To examine, using “what-if” methods of analysis, differences likely produced by engineering (or politically inspired) changes to those variables

A simple definition of risk is how likely events are to occur. Events of interest usually have negative consequences like injury or monetary loss. Risk is the ratio of two numbers: the number of events divided by the number of chances the events had to happen (exposure). Risk analysis is the discipline concerned with quantifying risks. Motor vehicle risk analysis focuses on risks associated with motorized vehicles, most often passenger vehicles (cars, sport utility vehicles, pickup trucks, vans, and motorcycles), but also all-terrain vehicles (ATVs), boats, and snowmobiles. Motor vehicle risks include:

  1. vehicle components failing to perform under actual driving conditions,
  2. involvement in collisions (crash risk), and
  3. injury to occupants of crash-involved vehicles (injury risk).

Commonly applied is the technique of comparative risk analysis. For example, to assess whether an alleged defect or deficiency in design does indeed increase the risk of crash or injury, analysts use real-world crash data to compare the performance of vehicles in question to that of other vehicles with different designs. This technique is not limited to automobiles or to large data sets. In the example below, we used comparative risk analysis to show the existence of a defect in a particular manufacturer's catamaran mast.

A well-known day-charter catamaran in San Francisco suffered a dismasting within its first year of service. The mast was replaced with another one from the same manufacturer and that one too failed within year. A third mast from a different manufacturer is still operating without trouble. In addition to failure analysis and mechanical engineering analysis which indicated the original mast was inadequate, we used the U.S. Coast Guard's accident database to determine the probability that a boat of this type would suffer two dismastings in consecutive years. Based on US boat accident data, we calculated that a boat of this type would have to be in service for 7.7 billion years before it could be expected to suffer a second dismasting. Indeed, the subject failures were the only occurrence of two dismastings on the same boat in the Coast Guard data. The real-world performance data support our conclusion that there was a design problem with the original masts.

© Principia, LLC | San Francisco, California | 415.398.3018