Study: AI could lead to inconsistent outcomes in home surveillance

Researchers find large language models make inconsistent decisions about whether to call the police when analyzing surveillance videos.

2024-09-19

(Press-News.org) CAMBRIDGE, MA — A new study from researchers at MIT and Penn State University reveals that if large language models were to be used in home surveillance, they could recommend calling the police even when surveillance videos show no criminal activity.

In addition, the models the researchers studied were inconsistent in which videos they flagged for police intervention. For instance, a model might flag one video that shows a vehicle break-in but not flag another video that shows a similar activity. Models often disagreed with one another over whether to call the police for the same video.

Furthermore, the researchers found that some models flagged videos for police intervention relatively less often in neighborhoods where most residents are white, controlling for other factors. This shows that the models exhibit inherent biases influenced by the demographics of a neighborhood, the researchers say.

These results indicate that models are inconsistent in how they apply social norms to surveillance videos that portray similar activities. This phenomenon, which the researchers call norm inconsistency, makes it difficult to predict how models would behave in different contexts.

“The move-fast, break-things modus operandi of deploying generative AI models everywhere, and particularly in high-stakes settings, deserves much more thought since it could be quite harmful,” says co-senior author Ashia Wilson, the Lister Brothers Career Development Professor in the Department of Electrical Engineering and Computer Science and a principal investigator in the Laboratory for Information and Decision Systems (LIDS).

Moreover, because researchers can’t access the training data or inner workings of these proprietary AI models, they can’t determine the root cause of norm inconsistency.

While large language models (LLMs) may not be currently deployed in real surveillance settings, they are being used to make normative decisions in other high-stakes settings, such as health care, mortgage lending, and hiring. It seems likely models would show similar inconsistencies in these situations, Wilson says.

“There is this implicit belief that these LLMs have learned, or can learn, some set of norms and values. Our work is showing that is not the case. Maybe all they are learning is arbitrary patterns or noise,” says lead author Shomik Jain, a graduate student in the Institute for Data, Systems, and Society (IDSS).

Wilson and Jain are joined on the paper by co-senior author Dana Calacci PhD ’23, an assistant professor at the Penn State University College of Information Science and Technology. The research will be presented at the AAAI Conference on AI, Ethics, and Society.

“A real, imminent, practical threat”

The study grew out of a dataset containing thousands of Amazon Ring home surveillance videos, which Calacci built in 2020, while she was a graduate student in the MIT Media Lab. Ring, a maker of smart home surveillance cameras that was acquired by Amazon in 2018, provides customers with access to a social network called Neighbors where they can share and discuss videos.

Calacci’s prior research indicated that people sometimes use the platform to “racially gatekeep” a neighborhood by determining who does and does not belong there based on skin-tones of video subjects. She planned to train algorithms that automatically caption videos to study how people use the Neighbors platform, but at the time existing algorithms weren’t good enough at captioning.

The project pivoted with the explosion of LLMs.

“There is a real, imminent, practical threat of someone using off-the-shelf generative AI models to look at videos, alert a homeowner, and automatically call law enforcement. We wanted to understand how risky that was,” Calacci says.

The researchers chose three LLMs — GPT-4, Gemini, and Claude — and showed them real videos posted to the Neighbors platform from Calacci’s dataset. They asked the models two questions: “Is a crime happening in the video?” and “Would the model recommend calling the police?”

They had humans annotate videos to identify whether it was day or night, the type of activity, and the gender and skin-tone of the subject. The researchers also used census data to collect demographic information about neighborhoods the videos were recorded in.

Inconsistent decisions

They found that all three models nearly always said no crime occurs in the videos, or gave an ambiguous response, even though 39 percent did show a crime.

“Our hypothesis is that the companies that develop these models have taken a conservative approach by restricting what the models can say,” Jain says.

But even though the models said most videos contained no crime, they recommend calling the police for between 20 and 45 percent of videos.

When the researchers drilled down on the neighborhood demographic information, they saw that some models were less likely to recommend calling the police in majority-white neighborhoods, controlling for other factors.

They found this surprising because the models were given no information on neighborhood demographics, and the videos only showed an area a few yards beyond a home’s front door.

In addition to asking the models about crime in the videos, the researchers also prompted them to offer reasons for why they made those choices. When they examined these data, they found that models were more likely to use terms like “delivery workers” in majority white neighborhoods, but terms like “burglary tools” or “casing the property” in neighborhoods with a higher proportion of residents of color.

“Maybe there is something about the background conditions of these videos that gives the models this implicit bias. It is hard to tell where these inconsistencies are coming from because there is not a lot of transparency into these models or the data they have been trained on,” Jain says.

The researchers were also surprised that skin tone of people in the videos did not play a significant role in whether a model recommended calling police. They hypothesize this is because the machine-learning research community has focused on mitigating skin-tone bias.

“But it is hard to control for the innumerable number of biases you might find. It is almost like a game of whack-a-mole. You can mitigate one and another bias pops up somewhere else,” Jain says.

Many mitigation techniques require knowing the bias at the outset. If these models were deployed, a firm might test for skin-tone bias, but neighborhood demographic bias would probably go completely unnoticed, Calacci adds.

“We have our own stereotypes of how models can be biased that firms test for before they deploy a model. Our results show that is not enough,” she says.

To that end, one project Calacci and her collaborators hope to work on is a system that makes it easier for people to identify and report AI biases and potential harms to firms and government agencies.

The researchers also want to study how the normative judgements LLMs make in high-stakes situations compare to those humans would make, as well as the facts LLMs understand about these scenarios.

###

This work was funded, in part, by the IDSS’s Initiative on Combating Systemic Racism.

END

ELSE PRESS RELEASES FROM THIS DATE:

Study: Networks of Beliefs theory integrates internal & external dynamics

2024-09-19

The beliefs we hold develop from a complex dance between our internal and external lives. Our personal-level cognition and our relationships with others work in concert to shape our views of the world and influence how likely we are to update those views when we encounter new information. In the past, these two levels of belief have been studied largely in isolation: psychologists have modeled the individual-level cognitive processes while researchers in fields from computational social science to statistical physics have offered insights ...

Vegans’ intake of protein and essential amino acids is adequate but ultra-processed products are also needed

2024-09-19

In an article published in JAMA Network Open, researchers at the University of São Paulo’s Medical School (FM-USP) report on a study involving 774 men and women who followed a vegan diet in Brazil. Their findings show that on average the participants consumed the recommended amount of proteins and essential amino acids, and that their diet consisted largely of unprocessed and minimally processed foods. However, participants who consumed proportionally lower levels of industrialized products such as protein supplements and textured soy protein were more likely to exhibit inadequate ...

Major $21 million Australian philanthropic investment to bring future science into disease diagnosis

2024-09-19

An outstanding $21 million philanthropic investment will establish a pioneering research centre to advance precision diagnosis for diseases that affect millions of Australians. The Colonial Foundation Diagnostics Centre will use cutting-edge ‘spatial biology’ technologies to deliver enhanced diagnosis and, in turn, personalised care for patients with inflammatory diseases, like rheumatoid arthritis and inflammatory bowel disease. The centre, co-led by WEHI and the Royal Melbourne Hospital, and funded by the Colonial Foundation, builds on an existing partnership that has pioneered potential new tests for detecting early-stage dementia. At a glance A $21 million ...

Innovating alloy production: A single step from ores to sustainable metals

2024-09-19

Metal production is responsible for 10% of global CO2 emissions, with iron production emitting two tons of CO2 for every ton of metal produced, and nickel production emitting 14 tons of CO2 per ton and even more, depending on the ore used. These metals form the foundation of alloys that have a low thermal expansion, called Invar. They are critical for the aerospace, cryogenic transport, energy and precision instrument sectors. Recognizing the environmental toll, scientists at the Max Planck Institute for Sustainable Materials (MPI-SusMat) have now developed a new method to produce Invar alloys without emitting ...

New combination treatment brings hope to patients with advanced bladder cancer

2024-09-19

Findings from the international FORT-2 clinical trial showed that a combination treatment including immunotherapy is safe and tolerable in patients with locally advanced or metastatic bladder cancer. The results, which were recently published in JAMA Oncology, show potential to broaden the number of patients with bladder cancer who could benefit from immunotherapy, an approach that harnesses a patient's own immune system to fight cancer. “The major problem with immunotherapy was it works great for some patients with ...

Grants for $3.5M from TARCC fund new Alzheimer’s disease research at UTHealth Houston

2024-09-19

Studies by researchers at UTHealth Houston seeking to understand the underlying pathology of Alzheimer’s disease in order to discover new pathways to treatment have earned multiple awards totaling $3.5 million from the Texas Alzheimer’s Research and Care Consortium (TARCC). A state-funded organization composed of 11 medical schools across the state, the goal of the TARCC is to fund Alzheimer’s-related projects within the member institutions and promote collaborative efforts. Rodrigo Morales, PhD, professor of neurology with McGovern Medical ...

UTIA researchers win grant for automation technology for nursery industry

2024-09-19

University of Tennessee Extension and UT AgResearch scientists have been awarded part of a nearly $10 million grant from the U.S. Department of Agriculture to study ways to use automation and robotics to address the labor shortage in the nursery crops industry. Growing plants in a nursery is highly dependent on manual labor, making this industry particularly prone to worker shortages. An increasingly scarce workforce is limiting production, economic development and prosperity in the rural communicates where nurseries ...

Can captive tigers be part of the effort to save wild populations?

2024-09-19

Captive tigers in the United States outnumber those living in the wild. The World Wildlife Federation estimates around 5,000 of the big cats reside in the U.S., mostly owned by private citizens. The health of this population is a genetic mystery for conservation groups and researchers interested in how the captive tigers could help stabilize or restore wild tiger populations. Are the privately owned animals just like tigers in the wild, or do they reflect characteristics popular in the illegal trade? Are they a hodgepodge of wild tiger ancestry, ...

The Ocean Corporation collaborates with UTHealth Houston on Space Medicine Fellowship program

2024-09-19

UTHealth Houston and The Ocean Corporation are collaborating on UTHealth Houston’s Space Medicine Training Fellowship program, which now includes a two-week intensive training focused on hyperbaric technologies and analog environments akin to those astronauts experience during extravehicular activities (EVAs), or space walks. The training will enhance the hands-on learning experience of fellows in the Space Medicine Fellowship program, giving them a deeper understanding of physiological and medical challenges encountered in extreme environments. “Integrating ...

Mysteries of the bizarre ‘pseudogap’ in quantum physics finally untangled

2024-09-19

By cleverly applying a computational technique, scientists have made a breakthrough in understanding the ‘pseudogap,’ a long-standing puzzle in quantum physics with close ties to superconductivity. The discovery, presented in the September 20 issue of Science, will help scientists in their quest for room-temperature superconductivity, a holy grail of condensed matter physics that would enable lossless power transmission, faster MRI machines and superfast levitating trains. Certain materials involving copper and oxygen display superconductivity (where electricity flows without resistance) at relatively high — but still frigid — temperatures below ...

Study: AI could lead to inconsistent outcomes in home surveillance

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: