PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

How good is Google Bard’s visual understanding? An empirical study on open challenges

How good is Google Bard’s visual understanding? An empirical study on open challenges
2023-09-20
(Press-News.org)

Bard, Google’s AI chatbot, based on LaMDA and later PaLM models, was launched with moderate success in March 2023 before expanding globally in May. It’s a generative AI that accepts prompts and performs text-based tasks like providing answers, and summaries, and creating various forms of text content. On 13 July 2023, Google Bard announced a major update which allowed providing images as inputs together with textual prompts. It was claimed that Bard can analyze visual content and provide a description (e.g., image captions) or answer questions using visual information. Notably, although other models such as GPT4 have claimed to have capabilities to accept and understand visual inputs as prompts, they are not publicly accessible for experimentation. Therefore, access to Bard provides a first opportunity for the computer vision community to assess its soundness and robustness toward understanding existing strengths and limitations. In this empirical study, researchers’ goal is to analyze the capability of Bard towards some of the long-standing problems of computer vision in image comprehension.

 

This study identifies several interesting scenarios based on computer vision problems for the qualitative evaluation of Bard. Since API-based access to Bard is still not available, researchers’ evaluations do not comprise of quantitative results on large-scale benchmarks. Instead, the goal is to identify a number of insightful scenarios and corresponding visual-textual prompts that serves the purpose of evaluating not only the visual understanding capabilities of Bard but future large multimodal models such as GPT4 as well. Their motivation to particularly focus on Bard is its top performance among all open and closed-source multimodal conversational models (including Bing-Chat rolled out on 18 July 2023) as demonstrated via LLaVA-Bench.

 

To assess Bard’s capabilities, such as visual perception and contextual understanding, conditioned on the given text prompts, researchers designed a range of vision-language task scenarios. Subsequently, they delve into several illustrative examples drawn from these empirical studies, encompassing a total of 15 visual question-answering (VQA) scenarios involving tasks such as object detection and localization, analyzing object attributes, count, affordances, and fine-grained recognition in natural images. They also experiment with challenging cases such as identifying camouflaged objects and diverse domains such as medical, underwater, and remote sensing images. They explain the scenarios below.

 

Scenario #1 is object attributes. It suggests that Bard appears to have challenges in identifying attributes that necessitate a deep understanding of each object and its properties. Scenario #2 is object presence. This suggests that Bard’s basic understanding of visual content remains limited. Researchers further note that Bard is currently tailored for images without any humans and deletes any visual inputs containing human faces or persons. Scenario #3 is object location. It suggests that Bard’s localization ability of visual context can be further enhanced. Scenario #4 is relationship reasoning. This indicates that there is room to improve Bard’s ability in reasoning relationships. Scenario #5 is affordance. It implies that Bard still needs to better capture visual semantics strictly based on the text guidance and more effectively associate these semantics with recognized objects in a scene. Scenario #6 is adversarial sample. All outputs from Bard demonstrate that it fails to understand adversarial samples. Scenario #7 is rainy conditions. The results indicate that Bard does not perform well when the image features rainy conditions. Scenario #8 is sentiment understanding. When researchers query Bard, it replies an incorrect response. Scenario #9 is fine-grained recognition. This task involves identifying specific subcategories within a given object class, which is more complex than general object recognition due to increased intra-class variation, subtle inter-class differences, and the necessity for specialized domain knowledge. Bard gives both right and wrong answers. Scenario #10 is identifying camouflaged object. This suggests that Bard’s capability to parse camouflaged patterns and similar textures could be further enhanced. Scenario #11 is object counting. Researchers note that Bard excels at describing a scene, and it seems to be not adept in understanding high-level content in challenging scenarios. Scenario #12 is spotting industrial defects. Researchers observe Bard struggles with identifying these unnoticed defects in such a challenging scenario, thus providing incorrect responses to users. Scenario #13 is recognizing optical character. Bard struggles in various text recognition scenarios, the model finds it challenging to understand the text in natural images. Scenario #14 is analyzing medical data. No meaningful content was output in the experiment. Scenario #15 is interpreting remote sensing data. Researchers’ findings suggest a tendency for Bard to understand visual scenes holistically, yet it faces challenges in discerning fine-grained visual patterns, particularly when determining the precise count of objects such as the commercial buildings in this case.

 

The emergence of Google’s Bard in the field of conversational AI has sparked considerable interest due to its remarkable success. Building upon this momentum, this study aims to comprehensively evaluate Bard’s performance across various task scenarios, including general, camouflaged, medical, underwater, and remote sensing images. The investigation shows that while Bard excels in many areas, it still faces challenges in certain vision-based scenarios. This finding highlights the immense potential of Bard in diverse applications and underscores the ample room for growth and improvement in vision-related tasks. The empirical insights from this study are expected to be valuable for future model development, particularly in bridging the gap in vision performance. By addressing the limitations observed in vision scenarios, researchers anticipate subsequent models will be endowed with stronger visual comprehension capabilities, ultimately driving the advancement of conversational AI to new heights.

 

See the article:

How Good is Google Bard’s Visual Understanding? An Empirical Study on Open Challenges

http://doi.org/10.1007/s11633-023-1469-x

END


[Attachments] See images for this press release:
How good is Google Bard’s visual understanding? An empirical study on open challenges How good is Google Bard’s visual understanding? An empirical study on open challenges 2 How good is Google Bard’s visual understanding? An empirical study on open challenges 3

ELSE PRESS RELEASES FROM THIS DATE:

Global obesity battle stymied: Deeper understanding is needed

2023-09-20
Prof. John Speakman from the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, alongside Prof. Kevin Hall from the National Institutes of Health (U.S.), Prof. Thorkild Sorensen from the University of Copenhagen and Prof. David Allison from Indiana University (U.S.), has published a perspective article on potential mechanisms of obesity pathogenesis. It was based on an academic conference held by The Royal Society, with experts and scholars in the field of obesity research discussing the potential pathogenesis of obesity. This article was published in Science on Aug. 31. Governments ...

Imaging the smallest atoms provides insights into an enzyme's unusual biochemistry

Imaging the smallest atoms provides insights into an enzymes unusual biochemistry
2023-09-20
Osaka, Japan – When your wounds heal and your liver detoxifies a poison such as histamine you ingested, you can thank the class of enzymes known as copper amine oxidases for their assistance. Identifying the exact positions of the smallest hydrogen atoms in these enzymes is challenging with commonly used technologies, but is critical to engineering improved enzymes that exhibit unusual yet useful biochemical reactivity. Now, in a study recently published in ACS Catalysis, a team led by researchers ...

Grant supports research on extreme risk of alcohol abuse among Pacific Islander young adults

2023-09-20
RIVERSIDE, Calif. -- In previous research, Andrew Subica and his colleagues found exceptionally high rates of alcohol use disorder (or alcohol abuse) and alcohol-related harms among Pacific Islander young adults. Now Subica, an associate professor in the UC Riverside School of Medicine’s Department of Social Medicine, Population, and Public Health, has received a $3 million grant from the National Institute on Alcohol Abuse and Alcoholism, or NIAAA, of the National Institutes of Health to conduct research aimed at preventing these disorders and harms in Pacific ...

Understanding bacterial motors may lead to more efficient nanomachine motors

Understanding bacterial motors may lead to more efficient nanomachine motors
2023-09-20
A research group led by Professor Emeritus Michio Homma (he, him) and Professor Seiji Kojima (he, him) of the Graduate School of Science at Nagoya University, in collaboration with Osaka University and Nagahama Institute of Bio-Science and Technology, have made new insights into how locomotion occurs in bacteria. The group identified the FliG molecule in the flagellar layer, the ‘motor’ of bacteria, and revealed its role in the organism. These findings suggest ways in which future engineers could build nanomachines with full control over their movements. They published the study in iScience.    As nanomachines become ...

New tool will help to diagnose form of extreme social isolation

2023-09-20
A new evaluation tool offers practical guidance for diagnosing an extreme form of social isolation known as hikikimori. The diagnostic evaluation tool was published online Sept. 15 with an accompanying letter by co-authors in the journal World Psychiatry. The tool is the first structured technique to evaluate people who suffer from a condition first recognized in young people in Japan, but believed to be widely shared in people of all ages across the globe. Known as the Hikikomori Diagnostic Evaluation, or HiDE, the tool provides practical guidance and specific ...

Behavior is the secret to success for a range expansion

Behavior is the secret to success for a range expansion
2023-09-20
One explanation for why some species decline is that human modifications make existing habitat unsuitable for them. For other species, these modifications are advantageous and make the habitat available for them to expand into. Researchers from the Max Planck Institute of Evolutionary Anthropology in Germany, and the University of California Santa Barbara and the University of Rochester in the USA investigated the role that increased habitat availability might have played. They compared the rapidly expanding great-tailed grackle with their closest relative, the boat-tailed grackle, who are not ...

Certain community health care worker programs often exploit volunteers, Mount Sinai researchers report

2023-09-20
More than half of volunteer community health care workers in 19 countries experience labor exploitation, including sub-minimum-wage pay and excess work hours, Mount Sinai researchers report in the first systematic review of the subject. The researchers focused on two-tiered or dual-cadre programs, in which salaried community health workers work alongside a volunteer group of community health workers. The study, published in Lancet Global Health on September 19, provides a global estimate of the presence, prevalence, and magnitude of labor ...

Tall buildings could be built quicker if damping models were correct, study finds

2023-09-20
Multi-storey buildings are assembled over cautiously to withstand wind strengths, researchers have found. This is because there are several difficulties in estimating damping – the method of removing energy in order to control vibratory motion like noise and mechanical oscillation, accurately in high-rise buildings The findings, published today in the journal Structures, addresses the draw back and were compiled by a team at the University of Bristol who studied the damping and natural frequency characteristics of a 150 m tall building in London (UK) obtained from the full-scale wind-induced responses using a minimal monitoring system. In general, the response ...

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction
2023-09-20
UNDER EMBARGO UNTIL 00:01 BST WEDNESDAY 20 SEPTEMBER 2023 / 19:01 ET TUESDAY 19 SEPTEMBER 2023 New study finds that most Rafflesia species, which produce the world’s largest flowers, face extinction. Lack of protection at local, national, and international levels means that remaining populations are under critical threat. Researchers propose an urgent action plan to save these remarkable flowers, building on local success stories. An international group of scientists, including botanists at the University of Oxford’s Botanic ...

Identifying sepsis: Only two out of four recommended screening tools are useful

2023-09-20
Barcelona, Spain: Two out of the four internationally-recommended screening tools used by emergency medical services are inadequate for recognising sepsis, according to new research presented at the European Emergency Medicine Congress today (Wednesday).   Mrs Silke Piedmont, a health scientist at the Department of Emergency Medicine Campus Benjamin Franklin Charité – Universitätsmedizin Berlin (Germany), and her colleagues from the University of Magdeburg and Jena (Germany), analysed data on 221,429 patients who were seen by emergency medical services (EMS) in Germany in 2016 outside of the hospital setting. They found that only one out of four ...

LAST 30 PRESS RELEASES:

Depression research pioneer Dr. Philip Gold maps disease's full-body impact

Rapid growth of global wildland-urban interface associated with wildfire risk, study shows

Generation of rat offspring from ovarian oocytes by Cross-species transplantation

Duke-NUS scientists develop novel plug-and-play test to evaluate T cell immunotherapy effectiveness

Compound metalens achieves distortion-free imaging with wide field of view

Age on the molecular level: showing changes through proteins

Label distribution similarity-based noise correction for crowdsourcing

The Lancet: Without immediate action nearly 260 million people in the USA predicted to have overweight or obesity by 2050

Diabetes medication may be effective in helping people drink less alcohol

US over 40s could live extra 5 years if they were all as active as top 25% of population

Limit hospital emissions by using short AI prompts - study

UT Health San Antonio ranks at the top 5% globally among universities for clinical medicine research

Fayetteville police positive about partnership with social workers

Optical biosensor rapidly detects monkeypox virus

New drug targets for Alzheimer’s identified from cerebrospinal fluid

Neuro-oncology experts reveal how to use AI to improve brain cancer diagnosis, monitoring, treatment

Argonne to explore novel ways to fight cancer and transform vaccine discovery with over $21 million from ARPA-H

Firefighters exposed to chemicals linked with breast cancer

Addressing the rural mental health crisis via telehealth

Standardized autism screening during pediatric well visits identified more, younger children with high likelihood for autism diagnosis

Researchers shed light on skin tone bias in breast cancer imaging

Study finds humidity diminishes daytime cooling gains in urban green spaces

Tennessee RiverLine secures $500,000 Appalachian Regional Commission Grant for river experience planning and design standards

AI tool ‘sees’ cancer gene signatures in biopsy images

Answer ALS releases world's largest ALS patient-based iPSC and bio data repository

2024 Joseph A. Johnson Award Goes to Johns Hopkins University Assistant Professor Danielle Speller

Slow editing of protein blueprints leads to cell death

Industrial air pollution triggers ice formation in clouds, reducing cloud cover and boosting snowfall

Emerging alternatives to reduce animal testing show promise

Presenting Evo – a model for decoding and designing genetic sequences

[Press-News.org] How good is Google Bard’s visual understanding? An empirical study on open challenges