(Press-News.org) Ancient languages hold a treasure trove of information about the culture, politics and commerce of millennia past. Yet, reconstructing them to reveal clues into human history can require decades of painstaking work. Now, scientists at the University of California, Berkeley, have created an automated "time machine," of sorts, that will greatly accelerate and improve the process of reconstructing hundreds of ancestral languages.
In a compelling example of how "big data" and machine learning are beginning to make a significant impact on all facets of knowledge, researchers from UC Berkeley and the University of British Columbia have created a computer program that can rapidly reconstruct "proto-languages" – the linguistic ancestors from which all modern languages have evolved. These earliest-known languages include Proto-Indo-European, Proto-Afroasiatic and, in this case, Proto-Austronesian, which gave rise to languages spoken in Southeast Asia, parts of continental Asia, Australasia and the Pacific.
"What excites me about this system is that it takes so many of the great ideas that linguists have had about historical reconstruction, and it automates them at a new scale: more data, more words, more languages, but less time," said Dan Klein, an associate professor of computer science at UC Berkeley and co-author of the paper published online today (Feb. 11) in the journal Proceedings of the National Academy of Sciences.
The research team's computational model uses probabilistic reasoning – which explores logic and statistics to predict an outcome – to reconstruct more than 600 Proto-Austronesian languages from an existing database of more than 140,000 words, replicating with 85 percent accuracy what linguists had done manually. While manual reconstruction is a meticulous process that can take years, this system can perform a large-scale reconstruction in a matter of days or even hours, researchers said.
Not only will this program speed up the ability of linguists to rebuild the world's proto-languages on a large scale, boosting our understanding of ancient civilizations based on their vocabularies, but it can also provide clues to how languages might change years from now.
"Our statistical model can be used to answer scientific questions about languages over time, not only to make inferences about the past, but also to extrapolate how language might change in the future," said Tom Griffiths, associate professor of psychology, director of UC Berkeley's Computational Cognitive Science Lab and another co-author of the paper.
The discovery advances UC Berkeley's mission to make sense of big data and to use new technology to document and maintain endangered languages as critical resources for preserving cultures and knowledge. For example, researchers plan to use the same computational model to reconstruct indigenous North American proto-languages.
Humans' earliest written records date back less than 6,000 years, long after the advent of many proto-languages. While archeologists can catch direct glimpses of ancient languages in written form, linguists typically use what is known as the "comparative method" to probe the past. This method establishes relationships between languages and identifying sounds that change with regularity over time to determine whether they share a common mother language.
"To understand how language changes -- which sounds are more likely to
change and what they will become -- requires reconstructing and analyzing
massive amounts of ancestral word forms, which is where automatic
reconstructions play an important role," said Alexandre Bouchard-Côté, an
assistant professor of statistics at the University of British Columbia
and lead author of the study, which he started while a graduate student at
UC Berkeley.
The UC Berkeley computational model is based on the established linguistic theory that words evolve along the branches of a family tree – much like a genealogical tree – reflecting linguistic relationships that evolve over time, with the roots and nodes representing proto-languages and the leaves representing modern languages.
Using an algorithm known as the Markov chain Monte Carlo sampler, the program sorted through sets of cognates, words in different languages that share a common sound, history and origin, to calculate the odds of which set is derived from which proto-language. At each step, it stored a hypothesized reconstruction for each cognate and each ancestral language.
"Because the sound changes and reconstructions are closely linked, our system uses them to repeatedly improve each other," Klein said. "It first fixes its predicted sound changes and deduces better reconstructions of the ancient forms. It then fixes the reconstructions and re-analyzes the sound changes. These steps are repeated, and both predictions gradually improve as the underlying structure emerges over time."
### END
Toronto – If an employee's performance drops in one area, does that mean they're slacking off?
It could mean that they've simply shifted and refocused their efforts on a different set of tasks -- a positive sign of adaptability that should be considered in performance evaluations, says a study lead by a researcher at the University of Toronto's Rotman School of Management.
The study, published in Human Performance, draws on statistics from professional basketball players for its data and conclusions. Researchers assessed data on more than 700 members of the NBA to ...
A 23andMe study of consumers' reactions to genetic testing found that even when the tests revealed high-risk mutations in individuals, those individuals had few negative reactions to the news. Instead of inducing serious anxiety, the test results prompted people to take positive steps, including follow-up visits with a doctor and discussions with family members who could also be at risk.
The study, titled "Dealing with the unexpected: Consumer responses to direct-access BRCA mutation testing" published today as part of the launch of PeerJ, a new peer reviewed open access ...
Cognitive brain researchers have studied a magic trick filmed in magician duo Penn & Teller's theater in Las Vegas, to illuminate the neuroscience of illusion. Their results advance our understanding of how observers can be misdirected and will aid magicians as they work to improve their art.
The research team was led by Dr. Stephen Macknik, Director of the Laboratory of Behavioral Neurophysiology at Barrow Neurological Institute, in collaboration with fellow Barrow researchers Hector Rieiro and Dr. Susana Martinez-Conde, Director of the Laboratory of Visual Neuroscience. ...
Amyotrophic lateral sclerosis (ALS), or Lou Gehrig's disease, and frontotemporal dementia (FTD) are devastating neurodegenerative diseases with no effective treatment. Researchers are beginning to recognize ALS and FTD as part of a spectrum disorder with overlapping symptoms. Now investigators reporting online February 12 in the Cell Press journal Neuron have discovered an abnormal protein that first forms as a result of genetic abnormalities and later builds up in the brains of many patients with either disease.
"In identifying the novel protein that abnormally accumulates ...
Researchers have given rats the ability to "touch" infrared light, normally invisible to them, by fitting them with an infrared detector wired to microscopic electrodes implanted in the part of the mammalian brain that processes tactile information. The achievement represents the first time a brain-machine interface has augmented a sense in adult animals, said Duke University neurobiologist Miguel Nicolelis, who led the research team.
The experiment also demonstrated for the first time that a novel sensory input could be processed by a cortical region specialized in ...
A group of researchers have designed a new fuzzy ontology-based system to help people in disagreement reach consensus. This system, which acts as a virtual moderator, is a step forward in the field on Artificial Intelligence. This tool can be useful in making everyday decisions –such as choosing a wine in a restaurant–, but it can also be helpful in complex negotiations between countries fighting for their interests in the European Union framework.
Fuzzy ontologies represent the relationships among basic concepts. This new system uses ontology to help in the decision-making ...
Amsterdam, NL, 12 February 2013 – Millions of people worldwide are regularly exposed to arsenic through drinking water and eating rice grown in soil and water containing high amounts of arsenic. Long-term exposure can lead to the development of different types of cancer as well as serious cardiovascular, neurological, and other health problems. Scientists have now identified aromatic rice from Bangladesh that has far lower arsenic concentrations than found in non-aromatic rice. The other important benefit is that it contains higher amounts of selenium and zinc. The discovery ...
WASHINGTON – Even very young children understand that adults don't always know best. When it comes to helping, 3-year-olds may ignore an adult's specific request for an unhelpful item and go out of their way to bring something more useful, according to new research published by the American Psychological Association.
Youngsters may also attempt to warn adults who are doing something counterproductive, such as reaching for an empty box of crayons to draw a picture or putting on a wet sweatshirt when they say they are cold, according to the article published online in ...
Researchers at Johannes Gutenberg University Mainz (JGU) have confirmed the original model of the molecular structure of water and have thus made it possible to resolve a long-standing scientific controversy about the structure of liquid water. The tetrahedral model was first postulated nearly 100 years ago and it assumes that every water molecule forms a so-called hydrogen bond with four adjacent molecules. This concept was almost toppled in 2004 when an international research group announced that it had experimentally established that water molecules form bonds only with ...
Bethesda, MD—Notorious among athletes and trainers as career killers, Achilles tendon injuries are among the most devastating. Now, by carbon testing tissues exposed to nuclear fallout in post WWII tests, scientists have learned why: Like our teeth and the lenses in our eyes, the Achilles tendon is a tissue that does not repair itself. This discovery was published online in The FASEB Journal.
"Tendon injury is a very common disease, which hinders many people from enjoying the numerous benefits of sports and recreational activities," said Katja Heinemeier, Ph.D., a researcher ...