by Ian R.H. Rockett
Population Bulletin, Vol. 54, No. 4, December 1999
Table of Contents
Introduction
Auspicious Origins
Demographic and Epidemiologic Transitions
Disease Models
Searching for Cause: Analytic Epidemiology
Integrating Epidemiology
References
This Population Bulletin, published in December 1999, explains the terms, methods, and materials scientists use to study the health of populations, as well as the historical underpinnings of the modern-day science of epidemiology.
Most people are concerned about their health. When they are well, they wonder how to remain that way. Will regular exercise decrease their risk of cardiovascular disease later in life? Will beta-carotene or vitamin C reduce their risk of getting cancer? Does living near overhead power lines increase that risk? When they, their families, or friends are ill, they wonder which treatments would be best. Is chemotherapy more effective than surgery and radiation in treating cancer? Is angioplasty more appropriate than heart bypass surgery for treating blocked arteries?
Television, newspapers, and magazines fuel this widespread curiosity about the mysterious world of health risks and hazards. How dangerous is radiation exposure? Which populations face the greatest risks? What are the risks of injury in an automobile crash when driving intoxicated versus driving sober, and how are those risks modified in cars with airbags?
All too often, discussions of these and similar questions are characterized more by ignorance or fear than by scientific knowledge. But, the quality of these discussions is being enhanced as scientific research becomes more accessible to the public. The science of epidemiology is a major contributor to this growing body of knowledge about how to prevent and treat disease and injury.
What is epidemiology? It may be formally defined as the "study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems."1 In other words, epidemiology is the study of our collective health. Epidemiology offers insight into why disease and injury afflict some people more than others, and why they occur more frequently in some locations and times than in others — knowledge necessary for finding the most effective ways to prevent and treat health problems.
Epidemiology provides a unique way of viewing and investigating disease and injury. The keys to understanding health, injury, and disease are embedded in the language and methods of epidemiology. Many of the basic epidemiologic concepts are familiar to most people, although only superficially understood. They reside in such everyday terms as exposure, risk factor, epidemic, and bias. This Population Bulletin explains the terms, methods, and materials scientists use to study the health of populations, as well as the historical underpinnings of the modern-day science of epidemiology.
Two English physicians, John Snow and William Farr, and a Hungarian physician, Ignaz Semmelweis, can be considered the founders of modern epidemiology because they jointly carried epidemiology beyond description into analysis or explanation. Indeed, the epidemiologic legacies of all three include the crucial concept of hypothesis testing, upon which progress in any science ultimately depends. Each man made seminal contributions to epidemiology, public health, and preventive medicine.
John Snow (1813-1858) defied contemporary medical thinking and succeeded in slowing the spread of cholera in London, which was beset with cholera epidemics in the late 1840s and again in 1853–1854. This disease afflicts victims with violent diarrhea and vomiting, and it can be fatal. Europe had suffered from periodic cholera epidemics since at least the 16th century. During the mid-19th century, most physicians attributed the disease to miasma — "bad air" believed to be formed from decaying organic matter. Snow held a radically different view. Snow, who was also well known as the founder of anesthesiology, suspected that the real culprit was drinking water contaminated by fecal waste.
In September 1854, Snow determined that the cholera deaths in a recent outbreak clustered around a popular source of drinking water, the Broad Street pump. He shared this finding with local authorities, along with his hunch as to the cause. His disclosures prompted the removal of the pump handle, and thus shut down the suspected disease source. Shortly thereafter, the Broad Street outbreak subsided. Because cholera fatalities were already declining in London, however, Snow was unable to attribute the end of the outbreak directly to the closing of the pump.
The cholera-water connection remained in doubt only until 1855, when Snow published the results of his carefully controlled test of the hypothesis that sewage in drinking water causes cholera. For this research, Snow obtained information on cholera mortality occurring among 300,000 residents of a specified area of London whose water suppliers could be identified. Because he could link the cholera cases to a population base and because the allocation of the water source to households seemed random, Snow's study has been called a natural experiment. By walking door-to-door, Snow acquired the names of the specific water companies servicing the houses where cholera fatalities had occurred — an approach to data collection that scientists now call shoe-leather epidemiology. Snow's research demonstrated that the cholera fatality rate in households receiving contaminated water was higher than the rate in households getting cleaner water. This finding confirmed his hypothesis.
Snow's results were unacceptable to the medical establishment primarily because they contradicted miasmic theory. Professional resistance to Snow's cholera theory was also related to his inability to identify and specify cholera's disease agent — the essential causal ingredient. It was not until 1883 that this agent, Vibrio cholerae, was isolated under the microscope by the German bacteriologist Robert Koch. Koch — best known for his research on tuberculosis and for confirming that "germs" (or microorganisms) cause infectious disease — filled in the missing piece of the cholera puzzle.2 Snow's efforts showed, however, how epidemiology can play a preventive role even when the specific microorganism responsible for a disease is unknown.
John Snow's contemporary, William Farr (1807-1883), was a leader in developing health and vital statistics records for the Office of the British Registrar General. His many innovations include the refining of life table analysis by relating disease prevention to life expectancy, devising standardized measures to capture occupational and residential differences in mortality, and creating a system to classify disease and injury.3 His classification system was the forerunner of the International Classification of Diseases (ICD), the standard system used throughout the world today to record the causes of mortality and morbidity (or the occurrence of disease).
Like Snow, Farr conducted an exhaustive analysis of cholera. He ascertained that cholera death rates were inversely related to altitude. But, misled by miasmic theory, Farr erred in concluding that altitude was causally connected to water contamination, and therefore to the spread of cholera. Farr provided the mortality data for the more famous Snow study of cholera in London, a testimony to his consummate professionalism. Farr also later confirmed the Snow hypothesis by showing that a specific water company had negligently marketed and supplied the unfiltered water through which cholera bacteria had been transmitted.
Ignaz Semmelweis (1818-1865), the third founder of modern epidemiology, helped revolutionize hospital practices because of his discoveries about the causes of infections. Before the introduction of antibiotics and high standards of personal hygiene, nosocomial (or hospital-acquired) infection was so common that hospitals were hazardous places to seek health care. Medical and hospital hygiene practices were dramatically improved thanks to the work of Semmelweis in the maternity wards at the General Hospital in Vienna.4 Maternal mortality from puerperal (childbirth) fever often reached epidemic heights in Europe between the 17th and 19th centuries. Between 1841 and 1846, puerperal fever at times killed up to 50 percent of the women giving birth in the General Hospital's maternity wards staffed by medical students. The average fatality rate in these wards was about 10 percent in the 1840s — three times higher than the rate in a second set of maternity wards staffed by midwifery students.
While pursuing an obstetrical residency at the General Hospital in the late 1840s, Semmelweis became concerned about the problem of puerperal fever. He was intrigued by the vastly different maternal mortality rates in the two sets of wards. He hypothesized that the differential resulted from the failure of medical students to cleanse their hands after dissecting unrefrigerated cadavers just before examining maternity patients. He believed that puerperal fever was a septicemia, a form of blood poisoning. His belief arose from observing the similarity between symptoms of the mothers who died of puerperal fever and those of a colleague who died of illness associated with a knife wound sustained while performing an autopsy.
Semmelweis reached his conclusion after he logically refuted a series of alternative explanations: soiled bed linen, crowding, atmospheric conditions, poor ventilation, and diet. None of these factors differed between the two maternity wards. This strengthened his original hypothesis that the disease was transmitted through the medical students. To test his hypothesis, Semmelweis insisted that the students and other medical personnel in his wards scrub their hands in soap and water and then soak them in chlorinated lime before conducting pelvic examinations. Within seven months of this controversial intervention, puerperal fever fatalities in the ward plummeted tenfold, from 120 deaths per 1,000 births to 12 deaths per 1,000 births. For the first time, the mortality rate in the wards staffed by medical students dipped below that in the wards of the student midwives.
The medical community in Europe and the United States — still heavily invested in miasmic theory — rejected Semmelweis' powerful evidence that puerperal fever was transmitted through direct physical contact between caregiver and patient. The U.S. medical establishment had ignored an earlier warning about the contagious nature of puerperal fever given by Oliver Wendell Holmes Sr., the celebrated physician and author.5 Some support for a miasmic explanation of the disease lingered even after the 1870s, when Louis Pasteur isolated its bacterial agent.6
Disease patterns have changed dramatically in the industrialized world since the era of Snow, Farr, and Semmelweis. Chronic diseases, such as cancer and heart disease, displaced communicable diseases as the leading causes of mortality and morbidity in industrialized nations.7
In 1900, the three leading causes of death in the United States were pneumonia, tuberculosis, and diarrhea and enteritis (see Table 1). All are communicable diseases. Collectively they accounted for nearly one-third of all deaths at the beginning of the century. In 1998, the top three causes were all chronic diseases: heart disease, cancer, and stroke. Together they were responsible for 61 percent of all U.S. deaths. These three diseases also numbered among the top 10 killers in 1900, but then they accounted for less than one-sixth of the death toll.
Between 1900 and 1998, life expectancy at birth rose from 47 to 77 years in the United States.8 The decline in communicable disease mortality rates, along with falling birth rates, increased the share of the elderly in the U.S. population. Americans ages 65 or older constituted 4.1 percent of the U.S. population in 1900. By 1998, they represented three times that number, or 12.7 percent.9
The most familiar disease model, the epidemiologic triad, depicts a relationship among three key factors in the occurrence of disease or injury: agent, environment, and host (see Figure 1).
An agent is a factor whose presence or absence, excess or deficit, is necessary for a particular disease or injury to occur. General classes of disease agents include chemicals such as benzene, oxygen, and asbestos; microorganisms such as bacteria, viruses, fungi, and protozoa; and physical energy sources such as electricity and radiation. Many diseases and injuries have multiple agents.
People who are not epidemiologists often confuse a disease or injury agent with its intermediary — its vector or vehicle. A vector is a living organism, whereas a vehicle is inanimate. The female of one species of mosquito carries the protozoa that are parasitic agents of malaria. The mosquito is the vector or intermediate host of malaria, but not the agent. Similarly, an activated nuclear bomb functions as a vehicle for burns by conveying one of its agents, ionizing radiation.
The environment includes all external factors, other than the agent, that can influence health. These factors are further categorized according to whether they belong in the social, physical, or biological environments. The social environment encompasses a broad range of factors, including laws about seat belt and helmet use; availability of medical care and health insurance; cultural "dos" and "don'ts" regarding diet; and many other factors pertaining to political, legal, economic, educational, communications, transportation, and health care systems. Physical environmental factors that influence health include climate, terrain, and pollution. Biological environmental influences include disease and injury vectors; soil, humans, and plants serving as reservoirs of infection; and plant and animal sources of drugs and antigens.
The host is the actual or potential recipient or victim of disease or injury. Although the agent and environment combine to "cause" the illness or injury, host susceptibility is affected by personal characteristics such as age, occupation, income, education, personality, behavior, and gender and other genetic traits. Sometimes genes themselves are disease agents, as in hemophilia and sickle cell anemia.
From the perspective of the epidemiologic triad, the host, agent, and environment can coexist fairly harmoniously. Disease and injury occur only when there is interaction or altered equilibrium between them. But if an agent, in combination with environmental factors, can act on a susceptible host to create disease, then disruption of any link among these three factors can also prevent disease.
Smallpox was eradicated globally through this kind of disruption.10 Smallpox is almost always spread by human face-to-face contact, but is less contagious than influenza, measles, chickenpox, and some other communicable diseases. Health personnel severed the link between disease agent and host by isolating each smallpox case upon diagnosis and then vaccinating everyone within a three-mile radius. This highly effective method, known as the case-containment and ring-vaccination strategy, proved to be a relatively low-cost way to eradicate smallpox.
Compiling Epidemiologic Evidence
Models are useful in guiding epidemiologic research, but health scientists cannot answer the underlying questions about the causes of disease or injury without appropriate data. Researchers need a myriad of data on the personal and medical backgrounds of individuals to determine, for example, whether physicians are more likely to have hypertension than construction workers — and whether one group is more likely than the other to develop a related disease.
Original data collected by or for an investigator are called primary data. Because primary data collection is expensive and time consuming, it usually is undertaken only when existing data sources — or secondary data — are deficient. Most descriptive epidemiologic studies use secondary data, often data collected for another purpose. Analytic epidemiologic studies usually require primary as well as secondary data.
Finding Patterns: Descriptive Epidemiology
People's lives seem besieged by health risks at any given moment, yet the health environment is relatively benign in most industrialized countries. Nearly two-thirds of U.S. deaths in 1998 were attributed to heart disease, cancer, and stroke — all diseases associated with old age. There is only a small chance that an individual will commit suicide, die in a motor vehicle crash, or be murdered. National-level figures, however, mask much higher risks for certain groups of people. Men ages 75 or older, for example, turn to suicide at a much higher rate than men in other age groups in the United States. This same pattern is found in many other industrialized countries. Japanese and German men, for example, generally have higher suicide rates than the U.S. men, but the rates rise at older ages in all three countries. In Canada, reported suicide rates are highest in the young adult years, but the likelihood of suicide rises again in the oldest age group.
Teenagers and young adults, on the other hand, face a higher risk of dying or being injured in an automobile crash than people in other age groups. A Rhode Island study in the 1980s showed, for example, that men ages 15 to 34 and women ages 15 to 24 were much more likely to be hospitalized or killed in an automobile crash than people in other age groups. A male's risk of being a homicide victim is much higher in the United States than in other populous industrialized countries, as shown in Figure 2.
Descriptive epidemiology is a two-step process. The first step involves the rather mechanical task of amassing all the facts about a situation or problem. The second is the more contemplative step of conceiving a plausible explanation for why the situation exists. This second phase, known as hypothesis formulation, involves examining all the facts and asking questions from different perspectives. It is the bridge between descriptive and analytic epidemiology. Analytic epidemiology is responsible for testing the hypotheses — for addressing the question of why certain groups are at higher or lower risk of a particular disease or injury than others. But before testing a hypothesis, researchers must describe the problem in standard terms.
Epidemiologists describe the magnitude of a health problem in two ways: in terms of prevalence and incidence. Prevalence reveals how many cases exist in a population at a given time. The incidence rate records the rate at which new cases are appearing within that population over a specific period.
Knowing the magnitude of disease or injury is only the beginning of the epidemiologist's work. The next step is to answer the following three questions: Who has the disease or injury? Where did the cases occur? When did they occur?
Specifying person, place, and time is crucial for identifying risk groups, narrowing the search for risk factors, and targeting and evaluating interventions. People may be identified by sociodemographic characteristics that promote or inhibit susceptibility to disease or injury. They may also be identified by habits or lifestyles that influence the likelihood of harmful or beneficial exposures. Place can be described geographically (for example, by country or state) and institutionally (for example, by type of school or branch of military service). The date or time that disease or injury occurred can help document secular (or long-term) trends, seasonal, and other periodic effects or the presence of epidemics or case clusters.
The ultimate purpose of epidemiology is the treatment and prevention of health problems that threaten the quality and length of people's lives. To design, target, and implement successful health interventions, scientists need to understand the etiology of specific health problems. This is the domain of analytic epidemiology. Analytic studies test hypotheses about exposure to risk factors and a specific health outcome.
There are two main types of research design for analytic studies: cohort and case-control.
A cohort study tracks the occurrence of a disease (or other health problem) among groups of individuals within a particular population. All the members of the study cohort are assumed to be free of that disease at the beginning of the study. They are then grouped according to their exposure to the risk factor(s) under investigation. The group of individuals exposed to a risk factor (for example, asbestos) is usually compared with an unexposed group. At the end of the study, researchers compare the incidence rate for the disease (for example, lung cancer) in the exposed group with the incidence rate in the unexposed group. The strength of the association between the exposure and a specific health outcome is measured by the rate ratio. The rate ratio indicates the likelihood that those exposed to asbestos would develop lung cancer relative to the likelihood that those not exposed would get lung cancer.
Case-control is the second major type of analytic study. In a case-control study, two groups are differentiated by disease status: the group of cases with disease and the group of controls without the disease. Researchers then reconstruct the exposure history of the two groups to determine which factors might explain why one group developed the disease. For example, if a case-control study addressed the question of whether drinking alcohol increases the risk of breast cancer for women, then the alcohol consumption history of women with breast cancer (the cases) would be compared with that of women without cancer (the controls). This approach is the opposite of the cohort approach, which begins with disease-free subjects and follows them forward over time. The strength of the association between the disease and risk factors in a case-control study is measured by the odds ratio or relative odds.
The 1990s have brought epidemiology into the public spotlight through a proliferation of media stories about epidemiologic studies of risk factors for chronic disease, communicable disease, and injury. Epidemiology's appearance in the spotlight has been accompanied by unprecedented criticism from epidemiologists and from those outside the field.11 This, in turn, has fostered lively debates in health journals and at epidemiology conferences. There have been two primary stimulants. The first has been conflicting and frequently modest epidemiologic findings concerning putative chronic disease risks, especially those for cancer. The second has been the inability of epidemiology to predict and evaluate threats to human health from persisting and growing social inequality and massive global environmental shifts.
Risk factor epidemiology, the predominant form of epidemiology and the focus of this Population Bulletin, has been the target of the criticism. Using the individual as the unit of analysis, risk factor epidemiology occupies the middle ground in the scientific assessment of cause-effect relationships between exposures to health risks and health states. But it is an important point of departure for epidemiologists as they extend the causal search downstream from the individual level to the molecular level and upstream to the societal-environmental level. Scientists label these downstream and upstream domains of epidemiologic analysis microepidemiology and macroepidemiology, respectively.
Operating at the cellular and intracellular levels, microepidemiology encompasses the specialties of molecular epidemiology (also a specialty within toxicology) and genetic epidemiology.12 Its debt to microbiology is profound. The laboratory scientists who perform microepidemiology are investigating biochemical disease mechanisms hitherto hidden in the black box of risk factor epidemiology. When the black box paradigm prevails, epidemiologists are left to infer or reject causal relationships from knowledge largely confined to the box's inputs and outputs.13 Inputs comprise individual study subjects' sociodemographics and measures of their potentially harmful or beneficial exposures. Outputs are measures of their health status; for example, cause-specific incidence and mortality rates.
While microepidemiology is essential for decoding disease processes, risk factor epidemiology helps narrow the search for disease agents. Moreover, it may yield strong circumstantial evidence (such as that linking tobacco smoking in the 1930s with lung cancer in the early 1950s) that can motivate effective and pervasive public health interventions. Modern risk factor epidemiology has revealed health hazards to humans from other exposures entering the body through the respiratory tract, gastrointestinal tract, or skin. These hazards include asbestos, ionizing radiation, and saturated fat.14 Although risk factor epidemiology and microepidemiology can be at odds, they can operate cohesively and effectively. Examples of this cooperation are the discovery of a causal connection between HIV-infection and Kaposi's sarcoma, and another between genes and breast cancer.15
Besides the vagueness of the black box, a second serious deficiency of risk factor epidemiology is its tendency to function in a social, economic, political, and cultural vacuum.16 What, when and how much people eat and exercise; their sexual and reproductive behavior; their household living arrangements; their modes of work, recreation, and transportation; and their education and health care practices all partially reflect contextual forces that transcend the personal choices they can make. These contextual forces include social-structural factors like racism, residential segregation, poverty, and types of political and economic systems. Responsibility for examining their population health effects falls within the emerging domain of macroepidemiology.
Advocates for macroepidemiology envision complex and dynamic causal webs whose health mysteries will be unlocked only through sophisticated theory construction and model building, with multilevel analyses of data on individuals and context.17 Further complicating the big health picture is rapid population growth that has pushed world population to 6 billion, and the industrialization that continues to exact an enormous toll on such nonrenewable resources as fresh water, stratospheric ozone, oceans, forests, and arable land.18 Rapid population growth and industrialization work together to severely diminish the Earth's biodiversity through the extinction of many plants and animals.19 Unless we better protect our natural resources, there could be substantial reversals in the rising trend in life expectancy that transformed most national populations in the 20th century. These reversals would occur first in the most recent beneficiaries of this rising trend, the less developed countries.
Anthony J. McMichael, an epidemiologist who writes extensively on likely adverse health effects from climatic, ecological, and environmental changes, argues compellingly for macroepidemiology to be proactive.20 Proactive macroepidemiology would contrast with risk factor epidemiology, which typically responds reactively to public and scientific concerns about the safety of various practices and products. To anticipate global hazards and facilitate disease and injury prevention, macroepidemiologists must use mathematical modeling, and incorporate new technologies like digital communications and geographic information systems (or GIS).
The spirited debates of the 1990s over the limitations of risk factor epidemiology have not seriously undermined the credibility and viability of epidemiology as a science. But, epidemiology will function optimally as the foundation science of public health and preventive and clinical medicine only if there is complete integration of microepidemiology, risk factor epidemiology, and macroepidemiology.
Ian R.H. Rockett is professor of epidemiology and director of the Bureau of Evaluation, Research, and Service at the University of Tennessee, Knoxville. He is affiliated with the University's Community Health Research Group and Department of Exercise Science and Sport Management. He holds degrees from Brown University, Harvard University, the University of Western Ontario, and the University of Western Australia. Dr. Rockett's research interests and publications focus on mortality and the epidemiology and demography of injury and drug abuse. Among his publications is the Population Bulletin "Injury and Violence: A Public Health Perspective."