PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

How good is Google Bard’s visual understanding? An empirical study on open challenges

How good is Google Bard’s visual understanding? An empirical study on open challenges
2023-09-20
(Press-News.org)

Bard, Google’s AI chatbot, based on LaMDA and later PaLM models, was launched with moderate success in March 2023 before expanding globally in May. It’s a generative AI that accepts prompts and performs text-based tasks like providing answers, and summaries, and creating various forms of text content. On 13 July 2023, Google Bard announced a major update which allowed providing images as inputs together with textual prompts. It was claimed that Bard can analyze visual content and provide a description (e.g., image captions) or answer questions using visual information. Notably, although other models such as GPT4 have claimed to have capabilities to accept and understand visual inputs as prompts, they are not publicly accessible for experimentation. Therefore, access to Bard provides a first opportunity for the computer vision community to assess its soundness and robustness toward understanding existing strengths and limitations. In this empirical study, researchers’ goal is to analyze the capability of Bard towards some of the long-standing problems of computer vision in image comprehension.

 

This study identifies several interesting scenarios based on computer vision problems for the qualitative evaluation of Bard. Since API-based access to Bard is still not available, researchers’ evaluations do not comprise of quantitative results on large-scale benchmarks. Instead, the goal is to identify a number of insightful scenarios and corresponding visual-textual prompts that serves the purpose of evaluating not only the visual understanding capabilities of Bard but future large multimodal models such as GPT4 as well. Their motivation to particularly focus on Bard is its top performance among all open and closed-source multimodal conversational models (including Bing-Chat rolled out on 18 July 2023) as demonstrated via LLaVA-Bench.

 

To assess Bard’s capabilities, such as visual perception and contextual understanding, conditioned on the given text prompts, researchers designed a range of vision-language task scenarios. Subsequently, they delve into several illustrative examples drawn from these empirical studies, encompassing a total of 15 visual question-answering (VQA) scenarios involving tasks such as object detection and localization, analyzing object attributes, count, affordances, and fine-grained recognition in natural images. They also experiment with challenging cases such as identifying camouflaged objects and diverse domains such as medical, underwater, and remote sensing images. They explain the scenarios below.

 

Scenario #1 is object attributes. It suggests that Bard appears to have challenges in identifying attributes that necessitate a deep understanding of each object and its properties. Scenario #2 is object presence. This suggests that Bard’s basic understanding of visual content remains limited. Researchers further note that Bard is currently tailored for images without any humans and deletes any visual inputs containing human faces or persons. Scenario #3 is object location. It suggests that Bard’s localization ability of visual context can be further enhanced. Scenario #4 is relationship reasoning. This indicates that there is room to improve Bard’s ability in reasoning relationships. Scenario #5 is affordance. It implies that Bard still needs to better capture visual semantics strictly based on the text guidance and more effectively associate these semantics with recognized objects in a scene. Scenario #6 is adversarial sample. All outputs from Bard demonstrate that it fails to understand adversarial samples. Scenario #7 is rainy conditions. The results indicate that Bard does not perform well when the image features rainy conditions. Scenario #8 is sentiment understanding. When researchers query Bard, it replies an incorrect response. Scenario #9 is fine-grained recognition. This task involves identifying specific subcategories within a given object class, which is more complex than general object recognition due to increased intra-class variation, subtle inter-class differences, and the necessity for specialized domain knowledge. Bard gives both right and wrong answers. Scenario #10 is identifying camouflaged object. This suggests that Bard’s capability to parse camouflaged patterns and similar textures could be further enhanced. Scenario #11 is object counting. Researchers note that Bard excels at describing a scene, and it seems to be not adept in understanding high-level content in challenging scenarios. Scenario #12 is spotting industrial defects. Researchers observe Bard struggles with identifying these unnoticed defects in such a challenging scenario, thus providing incorrect responses to users. Scenario #13 is recognizing optical character. Bard struggles in various text recognition scenarios, the model finds it challenging to understand the text in natural images. Scenario #14 is analyzing medical data. No meaningful content was output in the experiment. Scenario #15 is interpreting remote sensing data. Researchers’ findings suggest a tendency for Bard to understand visual scenes holistically, yet it faces challenges in discerning fine-grained visual patterns, particularly when determining the precise count of objects such as the commercial buildings in this case.

 

The emergence of Google’s Bard in the field of conversational AI has sparked considerable interest due to its remarkable success. Building upon this momentum, this study aims to comprehensively evaluate Bard’s performance across various task scenarios, including general, camouflaged, medical, underwater, and remote sensing images. The investigation shows that while Bard excels in many areas, it still faces challenges in certain vision-based scenarios. This finding highlights the immense potential of Bard in diverse applications and underscores the ample room for growth and improvement in vision-related tasks. The empirical insights from this study are expected to be valuable for future model development, particularly in bridging the gap in vision performance. By addressing the limitations observed in vision scenarios, researchers anticipate subsequent models will be endowed with stronger visual comprehension capabilities, ultimately driving the advancement of conversational AI to new heights.

 

See the article:

How Good is Google Bard’s Visual Understanding? An Empirical Study on Open Challenges

http://doi.org/10.1007/s11633-023-1469-x

END


[Attachments] See images for this press release:
How good is Google Bard’s visual understanding? An empirical study on open challenges How good is Google Bard’s visual understanding? An empirical study on open challenges 2 How good is Google Bard’s visual understanding? An empirical study on open challenges 3

ELSE PRESS RELEASES FROM THIS DATE:

Global obesity battle stymied: Deeper understanding is needed

2023-09-20
Prof. John Speakman from the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, alongside Prof. Kevin Hall from the National Institutes of Health (U.S.), Prof. Thorkild Sorensen from the University of Copenhagen and Prof. David Allison from Indiana University (U.S.), has published a perspective article on potential mechanisms of obesity pathogenesis. It was based on an academic conference held by The Royal Society, with experts and scholars in the field of obesity research discussing the potential pathogenesis of obesity. This article was published in Science on Aug. 31. Governments ...

Imaging the smallest atoms provides insights into an enzyme's unusual biochemistry

Imaging the smallest atoms provides insights into an enzymes unusual biochemistry
2023-09-20
Osaka, Japan – When your wounds heal and your liver detoxifies a poison such as histamine you ingested, you can thank the class of enzymes known as copper amine oxidases for their assistance. Identifying the exact positions of the smallest hydrogen atoms in these enzymes is challenging with commonly used technologies, but is critical to engineering improved enzymes that exhibit unusual yet useful biochemical reactivity. Now, in a study recently published in ACS Catalysis, a team led by researchers ...

Grant supports research on extreme risk of alcohol abuse among Pacific Islander young adults

2023-09-20
RIVERSIDE, Calif. -- In previous research, Andrew Subica and his colleagues found exceptionally high rates of alcohol use disorder (or alcohol abuse) and alcohol-related harms among Pacific Islander young adults. Now Subica, an associate professor in the UC Riverside School of Medicine’s Department of Social Medicine, Population, and Public Health, has received a $3 million grant from the National Institute on Alcohol Abuse and Alcoholism, or NIAAA, of the National Institutes of Health to conduct research aimed at preventing these disorders and harms in Pacific ...

Understanding bacterial motors may lead to more efficient nanomachine motors

Understanding bacterial motors may lead to more efficient nanomachine motors
2023-09-20
A research group led by Professor Emeritus Michio Homma (he, him) and Professor Seiji Kojima (he, him) of the Graduate School of Science at Nagoya University, in collaboration with Osaka University and Nagahama Institute of Bio-Science and Technology, have made new insights into how locomotion occurs in bacteria. The group identified the FliG molecule in the flagellar layer, the ‘motor’ of bacteria, and revealed its role in the organism. These findings suggest ways in which future engineers could build nanomachines with full control over their movements. They published the study in iScience.    As nanomachines become ...

New tool will help to diagnose form of extreme social isolation

2023-09-20
A new evaluation tool offers practical guidance for diagnosing an extreme form of social isolation known as hikikimori. The diagnostic evaluation tool was published online Sept. 15 with an accompanying letter by co-authors in the journal World Psychiatry. The tool is the first structured technique to evaluate people who suffer from a condition first recognized in young people in Japan, but believed to be widely shared in people of all ages across the globe. Known as the Hikikomori Diagnostic Evaluation, or HiDE, the tool provides practical guidance and specific ...

Behavior is the secret to success for a range expansion

Behavior is the secret to success for a range expansion
2023-09-20
One explanation for why some species decline is that human modifications make existing habitat unsuitable for them. For other species, these modifications are advantageous and make the habitat available for them to expand into. Researchers from the Max Planck Institute of Evolutionary Anthropology in Germany, and the University of California Santa Barbara and the University of Rochester in the USA investigated the role that increased habitat availability might have played. They compared the rapidly expanding great-tailed grackle with their closest relative, the boat-tailed grackle, who are not ...

Certain community health care worker programs often exploit volunteers, Mount Sinai researchers report

2023-09-20
More than half of volunteer community health care workers in 19 countries experience labor exploitation, including sub-minimum-wage pay and excess work hours, Mount Sinai researchers report in the first systematic review of the subject. The researchers focused on two-tiered or dual-cadre programs, in which salaried community health workers work alongside a volunteer group of community health workers. The study, published in Lancet Global Health on September 19, provides a global estimate of the presence, prevalence, and magnitude of labor ...

Tall buildings could be built quicker if damping models were correct, study finds

2023-09-20
Multi-storey buildings are assembled over cautiously to withstand wind strengths, researchers have found. This is because there are several difficulties in estimating damping – the method of removing energy in order to control vibratory motion like noise and mechanical oscillation, accurately in high-rise buildings The findings, published today in the journal Structures, addresses the draw back and were compiled by a team at the University of Bristol who studied the damping and natural frequency characteristics of a 150 m tall building in London (UK) obtained from the full-scale wind-induced responses using a minimal monitoring system. In general, the response ...

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction

Researchers issue urgent call to save the world’s largest flower -Rafflesia - from extinction
2023-09-20
UNDER EMBARGO UNTIL 00:01 BST WEDNESDAY 20 SEPTEMBER 2023 / 19:01 ET TUESDAY 19 SEPTEMBER 2023 New study finds that most Rafflesia species, which produce the world’s largest flowers, face extinction. Lack of protection at local, national, and international levels means that remaining populations are under critical threat. Researchers propose an urgent action plan to save these remarkable flowers, building on local success stories. An international group of scientists, including botanists at the University of Oxford’s Botanic ...

Identifying sepsis: Only two out of four recommended screening tools are useful

2023-09-20
Barcelona, Spain: Two out of the four internationally-recommended screening tools used by emergency medical services are inadequate for recognising sepsis, according to new research presented at the European Emergency Medicine Congress today (Wednesday).   Mrs Silke Piedmont, a health scientist at the Department of Emergency Medicine Campus Benjamin Franklin Charité – Universitätsmedizin Berlin (Germany), and her colleagues from the University of Magdeburg and Jena (Germany), analysed data on 221,429 patients who were seen by emergency medical services (EMS) in Germany in 2016 outside of the hospital setting. They found that only one out of four ...

LAST 30 PRESS RELEASES:

Sexual health symptoms may correlate with poor adherence to adjuvant endocrine therapy in Black women with breast cancer

Black patients with triple-negative breast cancer may be less likely to receive immunotherapy than white patients

Affordable care act may increase access to colon cancer care for underserved groups

UK study shows there is less stigma against LGBTQ people than you might think, but people with mental health problems continue to experience higher levels of stigma

Bringing lost proteins back home

Better than blood tests? Nanoparticle potential found for assessing kidneys

Texas A&M and partner USAging awarded 2024 Immunization Neighborhood Champion Award

UTEP establishes collaboration with DoD, NSA to help enhance U.S. semiconductor workforce

Study finds family members are most common perpetrators of infant and child homicides in the U.S.

Researchers secure funds to create a digital mental health tool for Spanish-speaking Latino families

UAB startup Endomimetics receives $2.8 million Small Business Innovation Research grant

Scientists turn to human skeletons to explore origins of horseback riding

UCF receives prestigious Keck Foundation Award to advance spintronics technology

Cleveland Clinic study shows bariatric surgery outperforms GLP-1 diabetes drugs for kidney protection

Study reveals large ocean heat storage efficiency during the last deglaciation

Fever drives enhanced activity, mitochondrial damage in immune cells

A two-dose schedule could make HIV vaccines more effective

Wastewater monitoring can detect foodborne illness, researchers find

Kowalski, Salonvaara receive ASHRAE Distinguished Service Awards

SkAI launched to further explore universe

SLU researchers identify sex-based differences in immune responses against tumors

Evolved in the lab, found in nature: uncovering hidden pH sensing abilities

Unlocking the potential of patient-derived organoids for personalized sarcoma treatment

New drug molecule could lead to new treatments for Parkinson’s disease in younger patients

Deforestation in the Amazon is driven more by domestic demand than by the export market

Demand-side actions could help construction sector deliver on net-zero targets

Research team discovers molecular mechanism for a bacterial infection

What role does a tailwind play in cycling’s ‘Everesting’?

Projections of extreme temperature–related deaths in the US

Wearable device–based intervention for promoting patient physical activity after lung cancer surgery

[Press-News.org] How good is Google Bard’s visual understanding? An empirical study on open challenges