PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

How good is Google Bard’s visual understanding? An empirical study on open challenges

How good is Google Bard’s visual understanding? An empirical study on open challenges
2023-09-20
(Press-News.org)

Bard, Google’s AI chatbot, based on LaMDA and later PaLM models, was launched with moderate success in March 2023 before expanding globally in May. It’s a generative AI that accepts prompts and performs text-based tasks like providing answers, and summaries, and creating various forms of text content. On 13 July 2023, Google Bard announced a major update which allowed providing images as inputs together with textual prompts. It was claimed that Bard can analyze visual content and provide a description (e.g., image captions) or answer questions using visual information. Notably, although other models such as GPT4 have claimed to have capabilities to accept and understand visual inputs as prompts, they are not publicly accessible for experimentation. Therefore, access to Bard provides a first opportunity for the computer vision community to assess its soundness and robustness toward understanding existing strengths and limitations. In this empirical study, researchers’ goal is to analyze the capability of Bard towards some of the long-standing problems of computer vision in image comprehension.

 

This study identifies several interesting scenarios based on computer vision problems for the qualitative evaluation of Bard. Since API-based access to Bard is still not available, researchers’ evaluations do not comprise of quantitative results on large-scale benchmarks. Instead, the goal is to identify a number of insightful scenarios and corresponding visual-textual prompts that serves the purpose of evaluating not only the visual understanding capabilities of Bard but future large multimodal models such as GPT4 as well. Their motivation to particularly focus on Bard is its top performance among all open and closed-source multimodal conversational models (including Bing-Chat rolled out on 18 July 2023) as demonstrated via LLaVA-Bench.

 

To assess Bard’s capabilities, such as visual perception and contextual understanding, conditioned on the given text prompts, researchers designed a range of vision-language task scenarios. Subsequently, they delve into several illustrative examples drawn from these empirical studies, encompassing a total of 15 visual question-answering (VQA) scenarios involving tasks such as object detection and localization, analyzing object attributes, count, affordances, and fine-grained recognition in natural images. They also experiment with challenging cases such as identifying camouflaged objects and diverse domains such as medical, underwater, and remote sensing images. They explain the scenarios below.

 

Scenario #1 is object attributes. It suggests that Bard appears to have challenges in identifying attributes that necessitate a deep understanding of each object and its properties. Scenario #2 is object presence. This suggests that Bard’s basic understanding of visual content remains limited. Researchers further note that Bard is currently tailored for images without any humans and deletes any visual inputs containing human faces or persons. Scenario #3 is object location. It suggests that Bard’s localization ability of visual context can be further enhanced. Scenario #4 is relationship reasoning. This indicates that there is room to improve Bard’s ability in reasoning relationships. Scenario #5 is affordance. It implies that Bard still needs to better capture visual semantics strictly based on the text guidance and more effectively associate these semantics with recognized objects in a scene. Scenario #6 is adversarial sample. All outputs from Bard demonstrate that it fails to understand adversarial samples. Scenario #7 is rainy conditions. The results indicate that Bard does not perform well when the image features rainy conditions. Scenario #8 is sentiment understanding. When researchers query Bard, it replies an incorrect response. Scenario #9 is fine-grained recognition. This task involves identifying specific subcategories within a given object class, which is more complex than general object recognition due to increased intra-class variation, subtle inter-class differences, and the necessity for specialized domain knowledge. Bard gives both right and wrong answers. Scenario #10 is identifying camouflaged object. This suggests that Bard’s capability to parse camouflaged patterns and similar textures could be further enhanced. Scenario #11 is object counting. Researchers note that Bard excels at describing a scene, and it seems to be not adept in understanding high-level content in challenging scenarios. Scenario #12 is spotting industrial defects. Researchers observe Bard struggles with identifying these unnoticed defects in such a challenging scenario, thus providing incorrect responses to users. Scenario #13 is recognizing optical character. Bard struggles in various text recognition scenarios, the model finds it challenging to understand the text in natural images. Scenario #14 is analyzing medical data. No meaningful content was output in the experiment. Scenario #15 is interpreting remote sensing data. Researchers’ findings suggest a tendency for Bard to understand visual scenes holistically, yet it faces challenges in discerning fine-grained visual patterns, particularly when determining the precise count of objects such as the commercial buildings in this case.

 

The emergence of Google’s Bard in the field of conversational AI has sparked considerable interest due to its remarkable success. Building upon this momentum, this study aims to comprehensively evaluate Bard’s performance across various task scenarios, including general, camouflaged, medical, underwater, and remote sensing images. The investigation shows that while Bard excels in many areas, it still faces challenges in certain vision-based scenarios. This finding highlights the immense potential of Bard in diverse applications and underscores the ample room for growth and improvement in vision-related tasks. The empirical insights from this study are expected to be valuable for future model development, particularly in bridging the gap in vision performance. By addressing the limitations observed in vision scenarios, researchers anticipate subsequent models will be endowed with stronger visual comprehension capabilities, ultimately driving the advancement of conversational AI to new heights.

 

See the article:

How Good is Google Bard’s Visual Understanding? An Empirical Study on Open Challenges

http://doi.org/10.1007/s11633-023-1469-x

END


[Attachments] See images for this press release:
How good is Google Bard’s visual understanding? An empirical study on open challenges How good is Google Bard’s visual understanding? An empirical study on open challenges 2 How good is Google Bard’s visual understanding? An empirical study on open challenges 3

ELSE PRESS RELEASES FROM THIS DATE:

Global obesity battle stymied: Deeper understanding is needed

2023-09-20
Prof. John Speakman from the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, alongside Prof. Kevin Hall from the National Institutes of Health (U.S.), Prof. Thorkild Sorensen from the University of Copenhagen and Prof. David Allison from Indiana University (U.S.), has published a perspective article on potential mechanisms of obesity pathogenesis. It was based on an academic conference held by The Royal Society, with experts and scholars in the field of obesity research discussing the potential pathogenesis of obesity. This article was published in Science on Aug. 31. Governments ...

Imaging the smallest atoms provides insights into an enzyme's unusual biochemistry

Imaging the smallest atoms provides insights into an enzymes unusual biochemistry
2023-09-20
Osaka, Japan – When your wounds heal and your liver detoxifies a poison such as histamine you ingested, you can thank the class of enzymes known as copper amine oxidases for their assistance. Identifying the exact positions of the smallest hydrogen atoms in these enzymes is challenging with commonly used technologies, but is critical to engineering improved enzymes that exhibit unusual yet useful biochemical reactivity. Now, in a study recently published in ACS Catalysis, a team led by researchers ...

Grant supports research on extreme risk of alcohol abuse among Pacific Islander young adults

2023-09-20
RIVERSIDE, Calif. -- In previous research, Andrew Subica and his colleagues found exceptionally high rates of alcohol use disorder (or alcohol abuse) and alcohol-related harms among Pacific Islander young adults. Now Subica, an associate professor in the UC Riverside School of Medicine’s Department of Social Medicine, Population, and Public Health, has received a $3 million grant from the National Institute on Alcohol Abuse and Alcoholism, or NIAAA, of the National Institutes of Health to conduct research aimed at preventing these disorders and harms in Pacific ...

Understanding bacterial motors may lead to more efficient nanomachine motors

Understanding bacterial motors may lead to more efficient nanomachine motors
2023-09-20
A research group led by Professor Emeritus Michio Homma (he, him) and Professor Seiji Kojima (he, him) of the Graduate School of Science at Nagoya University, in collaboration with Osaka University and Nagahama Institute of Bio-Science and Technology, have made new insights into how locomotion occurs in bacteria. The group identified the FliG molecule in the flagellar layer, the ‘motor’ of bacteria, and revealed its role in the organism. These findings suggest ways in which future engineers could build nanomachines with full control over their movements. They published the study in iScience.    As nanomachines become ...

New tool will help to diagnose form of extreme social isolation

2023-09-20
A new evaluation tool offers practical guidance for diagnosing an extreme form of social isolation known as hikikimori. The diagnostic evaluation tool was published online Sept. 15 with an accompanying letter by co-authors in the journal World Psychiatry. The tool is the first structured technique to evaluate people who suffer from a condition first recognized in young people in Japan, but believed to be widely shared in people of all ages across the globe. Known as the Hikikomori Diagnostic Evaluation, or HiDE, the tool provides practical guidance and specific ...

Behavior is the secret to success for a range expansion

Behavior is the secret to success for a range expansion
2023-09-20
One explanation for why some species decline is that human modifications make existing habitat unsuitable for them. For other species, these modifications are advantageous and make the habitat available for them to expand into. Researchers from the Max Planck Institute of Evolutionary Anthropology in Germany, and the University of California Santa Barbara and the University of Rochester in the USA investigated the role that increased habitat availability might have played. They compared the rapidly expanding great-tailed grackle with their closest relative, the boat-tailed grackle, who are not ...

Certain community health care worker programs often exploit volunteers, Mount Sinai researchers report

2023-09-20
More than half of volunteer community health care workers in 19 countries experience labor exploitation, including sub-minimum-wage pay and excess work hours, Mount Sinai researchers report in the first systematic review of the subject. The researchers focused on two-tiered or dual-cadre programs, in which salaried community health workers work alongside a volunteer group of community health workers. The study, published in Lancet Global Health on September 19, provides a global estimate of the presence, prevalence, and magnitude of labor ...

Tall buildings could be built quicker if damping models were correct, study finds

2023-09-20
Multi-storey buildings are assembled over cautiously to withstand wind strengths, researchers have found. This is because there are several difficulties in estimating damping – the method of removing energy in order to control vibratory motion like noise and mechanical oscillation, accurately in high-rise buildings The findings, published today in the journal Structures, addresses the draw back and were compiled by a team at the University of Bristol who studied the damping and natural frequency characteristics of a 150 m tall building in London (UK) obtained from the full-scale wind-induced responses using a minimal monitoring system. In general, the response ...

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction
2023-09-20
UNDER EMBARGO UNTIL 00:01 BST WEDNESDAY 20 SEPTEMBER 2023 / 19:01 ET TUESDAY 19 SEPTEMBER 2023 New study finds that most Rafflesia species, which produce the world’s largest flowers, face extinction. Lack of protection at local, national, and international levels means that remaining populations are under critical threat. Researchers propose an urgent action plan to save these remarkable flowers, building on local success stories. An international group of scientists, including botanists at the University of Oxford’s Botanic ...

Identifying sepsis: Only two out of four recommended screening tools are useful

2023-09-20
Barcelona, Spain: Two out of the four internationally-recommended screening tools used by emergency medical services are inadequate for recognising sepsis, according to new research presented at the European Emergency Medicine Congress today (Wednesday).   Mrs Silke Piedmont, a health scientist at the Department of Emergency Medicine Campus Benjamin Franklin Charité – Universitätsmedizin Berlin (Germany), and her colleagues from the University of Magdeburg and Jena (Germany), analysed data on 221,429 patients who were seen by emergency medical services (EMS) in Germany in 2016 outside of the hospital setting. They found that only one out of four ...

LAST 30 PRESS RELEASES:

New perspective highlights urgent need for US physician strike regulations

An eye-opening year of extreme weather and climate

Scientists engineer substrates hostile to bacteria but friendly to cells

New tablet shows promise for the control and elimination of intestinal worms

Project to redesign clinical trials for neurologic conditions for underserved populations funded with $2.9M grant to UTHealth Houston

Depression – discovering faster which treatment will work best for which individual

Breakthrough study reveals unexpected cause of winter ozone pollution

nTIDE January 2025 Jobs Report: Encouraging signs in disability employment: A slow but positive trajectory

Generative AI: Uncovering its environmental and social costs

Lower access to air conditioning may increase need for emergency care for wildfire smoke exposure

Dangerous bacterial biofilms have a natural enemy

Food study launched examining bone health of women 60 years and older

CDC awards $1.25M to engineers retooling mine production and safety

Using AI to uncover hospital patients’ long COVID care needs

$1.9M NIH grant will allow researchers to explore how copper kills bacteria

New fossil discovery sheds light on the early evolution of animal nervous systems

A battle of rafts: How molecular dynamics in CAR T cells explain their cancer-killing behavior

Study shows how plant roots access deeper soils in search of water

Study reveals cost differences between Medicare Advantage and traditional Medicare patients in cancer drugs

‘What is that?’ UCalgary scientists explain white patch that appears near northern lights

How many children use Tik Tok against the rules? Most, study finds

Scientists find out why aphasia patients lose the ability to talk about the past and future

Tickling the nerves: Why crime content is popular

Intelligent fight: AI enhances cervical cancer detection

Breakthrough study reveals the secrets behind cordierite’s anomalous thermal expansion

Patient-reported influence of sociopolitical issues on post-Dobbs vasectomy decisions

Radon exposure and gestational diabetes

EMBARGOED UNTIL 1600 GMT, FRIDAY 10 JANUARY 2025: Northumbria space physicist honoured by Royal Astronomical Society

Medicare rules may reduce prescription steering

Red light linked to lowered risk of blood clots

[Press-News.org] How good is Google Bard’s visual understanding? An empirical study on open challenges