(Press-News.org) It's not hard to tell the difference between the "charge" of a battery and criminal "charges." But for computers, distinguishing between the various meanings of a word is difficult.
For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software. Driven initially by efforts to translate Russian scientific texts during the Cold War (and more recently by the value of information retrieval and data analysis tools), these efforts have met with mixed success. IBM's Jeopardy-winning Watson system and Google Translate are high profile, successful applications of language technologies, but the humorous answers and mistranslations they sometimes produce are evidence of the continuing difficulty of the problem.
Our ability to easily distinguish between multiple word meanings is rooted in a lifetime of experience. Using the context in which a word is used, an intrinsic understanding of syntax and logic, and a sense of the speaker's intention, we intuit what another person is telling us.
"In the past, people have tried to hand-code all of this knowledge," explained Katrin Erk, a professor of linguistics at The University of Texas at Austin focusing on lexical semantics. "I think it's fair to say that this hasn't been successful. There are just too many little things that humans know."
Other efforts have tried to use dictionary meanings to train computers to better understand language, but these attempts have also faced obstacles. Dictionaries have their own sense distinctions, which are crystal clear to the dictionary-maker but murky to the dictionary reader. Moreover, no two dictionaries provide the same set of meanings — frustrating, right?
Watching annotators struggle to make sense of conflicting definitions led Erk to try a different tactic. Instead of hard-coding human logic or deciphering dictionaries, why not mine a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a weighted map of relationships — a dictionary without a dictionary?
"An intuition for me was that you could visualize the different meanings of a word as points in space," she said. "You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations ("the newspaper published charges..."). The meaning of a word in a particular context is a point in this space. Then we don't have to say how many senses a word has. Instead we say: 'This use of the word is close to this usage in another sentence, but far away from the third use.'"
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower.
"The lower end for this kind of a research is a text collection of 100 million words," she explained. "If you can give me a few billion words, I'd be much happier. But how can we process all of that information? That's where supercomputers and Hadoop come in."
Applying Computational Horsepower
Erk initially conducted her research on desktop computers, but around 2009, she began using the parallel computing systems at the Texas Advanced Computing Center (TACC). Access to a special Hadoop-optimized subsystem on TACC's Longhorn supercomputer allowed Erk and her collaborators to expand the scope of their research. Hadoop is a software architecture well suited to text analysis and the data mining of unstructured data that can also take advantage of large computer clusters. Computational models that take weeks to run on a desktop computer can run in hours on Longhorn. This opened up new possibilities.
"In a simple case we count how often a word occurs in close proximity to other words. If you're doing this with one billion words, do you have a couple of days to wait to do the computation? It's no fun," Erk said. "With Hadoop on Longhorn, we could get the kind of data that we need to do language processing much faster. That enabled us to use larger amounts of data and develop better models."
Treating words in a relational, non-fixed way corresponds to emerging psychological notions of how the mind deals with language and concepts in general, according to Erk. Instead of rigid definitions, concepts have "fuzzy boundaries" where the meaning, value and limits of the idea can vary considerably according to the context or conditions. Erk takes this idea of language and recreates a model of it from hundreds of thousands of documents.
Say That Another Way
So how can we describe word meanings without a dictionary? One way is to use paraphrases. A good paraphrase is one that is "close to" the word meaning in that high-dimensional space that Erk described.
"We use a gigantic 10,000-dimentional space with all these different points for each word to predict paraphrases," Erk explained. "If I give you a sentence such as, 'This is a bright child,' the model can tell you automatically what are good paraphrases ('an intelligent child') and what are bad paraphrases ('a glaring child'). This is quite useful in language technology."
Language technology already helps millions of people perform practical and valuable tasks every day via web searches and question-answer systems, but it is poised for even more widespread applications.
Automatic information extraction is an application where Erk's paraphrasing research may be critical. Say, for instance, you want to extract a list of diseases, their causes, symptoms and cures from millions of pages of medical information on the web.
"Researchers use slightly different formulations when they talk about diseases, so knowing good paraphrases would help," Erk said.
In a paper to appear in ACM Transactions on Intelligent Systems and Technology, Erk and her collaborators illustrated they could achieve state-of-the-art results with their automatic paraphrasing approach.
Recently, Erk and Ray Mooney, a computer science professor also at The University of Texas at Austin, were awarded a grant from the Defense Advanced Research Projects Agency to combine Erk's distributional, high dimensional space representation of word meanings with a method of determining the structure of sentences based on Markov logic networks.
"Language is messy," said Mooney. "There is almost nothing that is true all the time. "When we ask, 'How similar is this sentence to another sentence?' our system turns that question into a probabilistic theorem-proving task and that task can be very computationally complex."
In their paper, "Montague Meets Markov: Deep Semantics with Probabilistic Logical Form," presented at the Second Joint Conference on Lexical and Computational Semantics (STARSEM2013) in June, Erk, Mooney and colleagues announced their results on a number of challenge problems from the field of artificial intelligence.
In one problem, Longhorn was given a sentence and had to infer whether another sentence was true based on the first. Using an ensemble of different sentence parsers, word meaning models and Markov logic implementations, Mooney and Erk's system predicted the correct answer with 85% accuracy. This is near the top results in this challenge. They continue to work to improve the system.
There is a common saying in the machine-learning world that goes: "There's no data like more data." While more data helps, taking advantage of that data is key.
"We want to get to a point where we don't have to learn a computer language to communicate with a computer. We'll just tell it what to do in natural language," Mooney said. "We're still a long way from having a computer that can understand language as well as a human being does, but we've made definite progress toward that goal."
INFORMATION:
When will my computer understand me?
Linguists, computer scientists use supercomputers to improve natural language processing
2013-06-10
ELSE PRESS RELEASES FROM THIS DATE:
Frequent binge drinking is associated with insomnia symptoms in older adults
2013-06-10
DARIEN, IL – A new study suggests that frequent binge drinking is associated with insomnia symptoms in older adults.
Results show that overall, 26.2 percent of participants had two or less binge drinking days per week, on average, and 3.1 percent had more than two days per week, on average. After adjustment for demographic variables, medical conditions, and elevated depressive symptoms, participants who binged on an average of more than two days a week had an 84 percent greater odds of reporting an insomnia symptom compared to non-binge drinkers.
"It was somewhat surprising ...
The diabetes 'breathalyzer'
2013-06-10
PITTSBURGH—Diabetes patients often receive their diagnosis after a series of glucose-related blood tests in hospital settings, and then have to monitor their condition daily through expensive, invasive methods. But what if diabetes could be diagnosed and monitored through cheaper, noninvasive methods?
Chemists at the University of Pittsburgh have demonstrated a sensor technology that could significantly simplify the diagnosis and monitoring of diabetes through breath analysis alone. Their findings were published in the latest issue of the Journal of the American Chemical ...
Pendulum swings back on 350-year-old mathematical mystery
2013-06-10
PITTSBURGH—A 350-year-old mathematical mystery could lead toward a better understanding of medical conditions like epilepsy or even the behavior of predator-prey systems in the wild, University of Pittsburgh researchers report.
The mystery dates back to 1665, when Dutch mathematician, astronomer, and physicist Christiaan Huygens, inventor of the pendulum clock, first observed that two pendulum clocks mounted together could swing in opposite directions. The cause was tiny vibrations in the beam caused by both clocks, affecting their motions.
The effect, now referred ...
Screening at-risk adolescents for celiac disease proves cost-effective
2013-06-10
Bethesda, MD (June 10, 2013) — The current standard practice of screening adolescents who are either symptomatic or at high-risk for celiac disease proves to be more cost-effective than universal screening. Additionally, the strategy is successful in preventing bone loss and fractures in celiac patients, according to a new study in Clinical Gastroenterology and Hepatology, the official clinical practice journal of the American Gastroenterological Association.
As many as 70 percent of untreated celiac patients experience decreased bone mineral density, which can lead ...
A rather complex complex: Brain scans reveal internal conflict during Jung's word association test
2013-06-10
Over 100 years ago psychologist Carl Gustav Jung penned his theory of 'complexes' where he explained how unconscious psychological issues can be triggered by people, events, or Jung believed, through word association tests.
New research in the Journal of Analytical Psychology is the first to reveal how modern brain function technology allows us to see inside the mind as a 'hot button' word triggers a state of internal conflict between the left and right parts of the brain.
The study revealed that some words trigger a subconscious internal conflict between our sense ...
Whitebark pine trees: Is their future at risk?
2013-06-10
There's trouble ahead for the whitebark pine, a mountain tree that's integral to wildlife and water resources in the western United States and Canada.
Over the last decade, some populations of whitebark pines have declined by more than 90 percent. But these declines may be just the beginning.
New research results, supported by the National Science Foundation (NSF) and published today in the Journal of Ecology, suggest that as pine stands are increasingly fragmented by widespread tree death, surviving trees may be hindered in their ability to produce their usually abundant ...
Transplant patient outcomes after trauma better than expected
2013-06-10
Baltimore, MD – June 10, 2013 – In the largest study of its kind, physicians from the Department of Surgery at the University of Maryland School of Medicine and the R Adams Cowley Shock Trauma Center at the University of Maryland Medical Center (UMMC) have determined that outcomes for traumatic injury in patients with organ transplants are not worse than for non-transplanted patients, despite common presumptions among physicians. The findings, published in the June 2013 issue of The Journal of Trauma and Acute Care Surgery, also show that transplanted organs are rarely ...
2-D electronics take a step forward
2013-06-10
HOUSTON – (June 10, 2013) – Scientists at Rice University and Oak Ridge National Laboratory (ORNL) have advanced on the goal of two-dimensional electronics with a method to control the growth of uniform atomic layers of molybdenum disulfide (MDS).
MDS, a semiconductor, is one of a trilogy of materials needed to make functioning 2-D electronic components. They may someday be the basis for the manufacture of devices so small they would be invisible to the naked eye.
The work appears online this week in Nature Materials.
The Rice labs of lead investigator Jun Lou, Pulickel ...
The body electric: Researchers move closer to low-cost, implantable electronics
2013-06-10
COLUMBUS, Ohio—New technology under development at The Ohio State University is paving the way for low-cost electronic devices that work in direct contact with living tissue inside the body.
The first planned use of the technology is a sensor that will detect the very early stages of organ transplant rejection.
Paul Berger, professor of electrical and computer engineering and physics at Ohio State, explained that one barrier to the development of implantable sensors is that most existing electronics are based on silicon, and electrolytes in the body interfere with the ...
Ames Laboratory scientists discover new family of quasicrystals
2013-06-10
Scientists at the U.S. Department of Energy's (DOE) Ames Laboratory have discovered a new family of rare-earth quasicrystals using an algorithm they developed to help pinpoint them. Quasicrystalline materials may be found close to crystalline phases that contain similar atomic motifs, called crystalline approximants. And just like fishing experts know that casting a line in the right habitat hooks the big catch, the scientists used their knowledge to hone in on just the right spot for new quasicrystal materials discovery.
Their research resulted in finding the only known ...
LAST 30 PRESS RELEASES:
New study uncovers key differences in allosteric regulation of cAMP receptor proteins in bacteria
Co-located cell types help drive aggressive brain tumors
Social media's double-edged sword: New study links both active and passive use to rising loneliness
An unexpected mechanism regulates the immune response during parasitic infections
Scientists enhance understanding of dinoflagellate cyst dormancy
PREPSOIL promotes soil literacy through education
nTIDE February 2025 Jobs Report: Labor force participation rate for people with disabilities hits an all-time high
Temperamental stars are distorting our view of distant planets
DOE’s Office of Science is now Accepting Applications for Office of Science Graduate Student Research Awards
Twenty years on, biodiversity struggles to take root in restored wetlands
Do embedded counseling services in veterinary education work? A new study says “yes.”
Discovery of unexpected collagen structure could ‘reshape biomedical research’
Changes in US primary care access and capabilities during the COVID-19 pandemic
Cardiometabolic trajectories preceding dementia in community-dwelling older individuals
Role of ELK3 in ferroptosis of rheumatoid arthritis fibroblast-like synoviocytes
Team of Prof. Woo Young Jang Department of Orthopedic Surgery, KU Anam Hospital wins the Best Paper Award from the Korean Musculoskeletal Tumor Society
Terasaki Institute for Biomedical Innovation announces recipients of inaugural Keith Terasaki Mid-Career Innovation Award
The impact of liver graft preservation method on longitudinal gut microbiome changes following liver transplant
Cardiovascular health risks continue to grow within Black communities, action needed
ALS survival may be cut short by living in disadvantaged communities
No quantum exorcism for Maxwell's demon (but it doesn't need one)
Balancing the pressure: How plant cells protect their vacuoles
Electronic reporting of symptoms by cancer patients can improve quality of life and reduce emergency visits
DNA barcodes and citizen science images map spread of biocontrol agent for control of major invasive shrub
Pregnancy complications linked to cardiovascular disease in the family
Pancreatic cancer immune map provides clues for precision treatment targeting
How neighborhood perception affects housing rents: A novel analytical approach
Many adults report inaccurate beliefs about risks and benefits of home firearm access
Air pollution impacts an aging society
UC Davis researchers achieve total synthesis of ibogaine
[Press-News.org] When will my computer understand me?Linguists, computer scientists use supercomputers to improve natural language processing