Air Quality, Smoking, and Lung Cancer: Predicting Incidence in the US


Lung cancer is a significant health concern in the United States, affecting millions of people every year. Researchers are continually exploring ways to predict and understand the factors that contribute to its incidence. In this article, we delve into a cutting-edge research study by Arnold Kamis and team from Brandeis University, USA, that utilizes advanced techniques to model lung cancer rates. By analyzing data from various sources, including smoking habits, environmental quality, and ambient emissions, the researchers aim to shed light on the complex interactions that influence lung cancer incidence.

The Multivariate, Multi-method Approach:

To achieve accurate predictions, the researchers adopted a multivariate, multi-method approach. They collected public health and ambient emission data from multiple sources spanning the period 2000 to 2013. With this wealth of information, they sought to model lung cancer rates for the years 2013 to 2017. Their analysis compared several models using four primary predictors: adult smoking rates, state-specific factors, environmental quality index data, and ambient emissions.

Understanding the Predictor Variables:

The predictor variables in this study encompassed various aspects relevant to lung cancer risk. Adult smoking rates serve as a crucial indicator, as smoking is a well-known major risk factor for lung cancer. State-specific factors were also considered, as regional disparities and policies may impact lung cancer incidence. The environmental quality index variables covered five broad domains: air, land, water, socio-demographics, and built environment. These domains help assess the overall environmental conditions and their potential effects on lung cancer rates.

Assessing Ambient Emissions:

Another vital aspect of this study was the evaluation of ambient emissions. These emissions consist of various pollutants released into the air, including Cyanide compounds, Carbon Monoxide, Carbon Disulfide, Diesel Exhaust, Nitrogen Dioxide, Tropospheric Ozone, Coarse Particulate Matter, Fine Particulate Matter, and Sulfur Dioxide. These pollutants can significantly impact air quality and may play a role in lung cancer development.

Comparing Models and Findings:

The researchers compared different models to identify the most accurate one for predicting lung cancer incidence. They found that the best regression model explained around 62 percent of the variance, whereas the top-performing machine learning model explained 64 percent of the variance with 10% less error. These models provided valuable insights into the potential contributors to lung cancer incidence.

Identifying Hazardous Ambient Emissions:

An essential outcome of the study was the identification of the most hazardous ambient emissions concerning lung cancer. The pollutants with the strongest associations were Coarse Particulate Matter, Fine Particulate Matter, Sulfur Dioxide, Carbon Monoxide, and Tropospheric Ozone. Understanding the role of these emissions is crucial for developing strategies to improve air quality and reduce lung cancer incidence.

Implications for Air Quality Improvement:

The findings of this research have significant implications for public health and environmental policies. By targeting and curtailing the hazardous ambient emissions identified, we have the potential to enhance air quality and, consequently, reduce the incidence of lung cancer. This underscores the importance of implementing measures to tackle pollution and its adverse effects on respiratory health.

Balancing Transparency and Accuracy:

The research also highlighted an important tradeoff between transparency and accuracy in the models used. While simpler regression models offer more transparency, machine learning models can achieve higher accuracy. Striking the right balance between these two factors is crucial for producing reliable predictions and guiding effective public health interventions.

Limitations and Future Directions:

As with any study, there are limitations to consider. The researchers discussed these limitations and proposed future directions to extend and refine their models. Ongoing research and advancements in data analysis techniques will further our understanding of lung cancer risk factors and ultimately aid in preventing and treating this devastating disease.


Predicting lung cancer incidence is a complex task that requires a comprehensive and multi-method approach. This research demonstrates the importance of considering various factors, including smoking habits, environmental quality, and ambient emissions. By understanding the associations between these variables and lung cancer incidence, we can take proactive steps to improve air quality and reduce the burden of lung cancer in the United States. This study represents a crucial step in the ongoing efforts to combat lung cancer and improve public health for all.


DOI: 10.3390/ijerph18116127

Photo by Sam Jotham Sutharson on Unsplash

Leave a Reply