PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Using digitized books as 'cultural genome,' researchers unveil quantitative approach to humanities

Online tool enables anyone to quantify cultural trends going back centuries

2010-12-17
(Press-News.org) CAMBRIDGE, Mass. -- Researchers have created a powerful new approach to scholarship, using approximately 4 percent of all books ever published as a digital "fossil record" of human culture. By tracking the frequency with which words appear in books over time, scholars can now precisely quantify a wide variety of cultural and historical trends.

The four-year effort, led by Harvard University's Jean-Baptiste Michel and Erez Lieberman Aiden, is described this week in the journal Science.

The team, comprising researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary, has already used their approach -- dubbed "culturomics," by analogy with genomics -- to gain insight into topics as diverse as humanity's collective memory, the adoption of technology, the dynamics of fame, and the effects of censorship and propaganda.

"Interest in computational approaches to the humanities and social sciences dates to the 1950s," says Michel, a postdoctoral researcher based in Harvard's Department of Psychology and Program for Evolutionary Dynamics. "But attempts to introduce quantitative methods into the study of culture have been hampered by the lack of suitable data. We now have a massive dataset, available through an interface that is user-friendly and freely available to anyone."

Google will release a new online tool to accompany the paper: a simple interface that enables users to type in a word or phrase and immediately see how its usage frequency has changed over the past few centuries.

"Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena in the social sciences and humanities," says Aiden, a junior fellow in Harvard's Society of Fellows and principal investigator of the Laboratory-at-Large, part of Harvard's School of Engineering and Applied Sciences. "While browsing this cultural record is fascinating for anyone interested in what's mattered to people over time, we hope that scholars of the humanities and social sciences will find this to be a useful and powerful tool."

This dataset, which is available for download, is thousands of times larger than any previous historical corpus. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, Russian, and Hebrew.

It is the largest data release in the history of the humanities, the authors note, a sequence of letters 1,000 times longer than the human genome. If written in a straight line, it would reach to the moon and back 10 times over.

"Now that a significant fraction of the world's books have been digitized, it's possible for computer-aided analysis to reveal undiscovered trends in history, culture, language, and thought," says Jon Orwant, engineering manager for Google Books.

The paper describes the development of this new approach and surveys a vast range of applications, focusing on the past two centuries. The team's findings include:

Some 8,500 new words enter the English language annually, fueling a 70 percent growth of the lexicon between 1950 and 2000. But many of these million-plus words can't be found in dictionaries.

"We estimated that 52 percent of the English lexicon -- the majority of words used in English books -- consist of lexical 'dark matter' undocumented in standard references," the researchers write in Science.

Humanity is forgetting its past faster with each passing year. The Harvard-Google team tracked the frequency with which each year from 1875 to 1975 appeared, finding that references to the past decrease much more rapidly now than in the 19th century. References to "1880" didn't fall by half until 1912 -- a lag of 32 years -- but references to "1973" reached half their peak just a decade later, in 1983.

Innovations spread faster than ever. For instance, inventions from the end of the 19th century spread more than twice as fast as those from the early 1800s.

Modern celebrities are younger and more famous than their 19th-century predecessors, but their fame is shorter-lived. Celebrities born in 1950 initially achieved fame at an average age of 29, compared to 43 for celebrities born in 1800. But their fame also disappears faster, with a "half-life" that is increasingly short.

"People are getting more famous than ever before," the researchers write, "but are being forgotten more rapidly than ever."

The most famous actors tend to become famous earlier (around age 30) than the most famous writers (around age 40) and politicians (after age 50). But patience pays off: Top politicians end up much more famous than the best-known actors.

Culturomics is a powerful tool for automatically identifying censorship and propaganda. For example, Jewish artist Marc Chagall was mentioned just once in the entire German corpus from 1936 to 1944, even as his prominence in English-language books grew roughly fivefold. Evidence of similar suppression is seen in Russian with regard to Leon Trotsky; in Chinese with regard to Tiananmen Square; and in the US with regard to the "Hollywood Ten," a group of entertainers blacklisted in 1947.

"Freud" is more deeply engrained in our collective subconscious than "Galileo," "Darwin," or "Einstein."

### Michel, Aiden, and Orwant's co-authors are Aviva Presser Aiden, Adrian Veres, Steven Pinker, and Martin A. Nowak at Harvard; Google's Matthew K. Gray, Dan Clancy, Peter Norvig, and the Google Books Team; Yuan Kui Shen at the Massachusetts Institute of Technology; Joseph P. Pickett, executive editor of the American Heritage Dictionary; and Dale Hoiberg, editor-in-chief of Encyclopaedia Britannica.

The work was funded by Google, a Foundational Questions in Evolutionary Biology Prize Fellowship, Harvard Medical School, the Harvard Society of Fellows, a Fannie and John Hertz Foundation Graduate Fellowship, a National Defense Science and Engineering Graduate Fellowship, a National Science Foundation Graduate Fellowship, the National Space Biomedical Research Institute, the National Human Genome Research Institute, the Templeton Foundation, the National Institutes of Health, and the Bill and Melinda Gates Foundation.

Links to the data and browser are available at www.culturomics.org.



ELSE PRESS RELEASES FROM THIS DATE:

Age doesn't matter: New genes are as essential as ancient ones

2010-12-17
New genes that have evolved in species as little as one million years ago – a virtual blink in evolutionary history – can be just as essential for life as ancient genes, startling new research has discovered. Evolutionary biologists have long proposed that the genes most important to life are ancient and conserved, handed down from species to species as the "bread and butter" of biology. New genes that arise as species split off from their ancestors were thought to serve less critical roles – the "vinegar" that adds flavor to the core genes. But when nearly 200 new ...

Light dawns on dark gamma-ray bursts

Light dawns on dark gamma-ray bursts
2010-12-17
Gamma-ray bursts (GRBs), fleeting events that last from less than a second to several minutes, are detected by orbiting observatories that can pick up their high energy radiation. Thirteen years ago, however, astronomers discovered a longer-lasting stream of less energetic radiation coming from these violent outbursts, which can last for weeks or even years after the initial explosion. Astronomers call this the burst's afterglow. While all gamma-ray bursts [1] have afterglows that give off X-rays, only about half of them were found to give off visible light, with the ...

Most Medicare stroke patients rehospitalized or dead within year

2010-12-17
Nearly two-thirds of Medicare beneficiaries discharged from hospitals after ischemic stroke die or are readmitted within one year, researchers report in Stroke: Journal of the American Heart Association. Stroke is the second leading cause of hospital admissions among older adults in the United States, according to American Heart Association/American Stroke Association statistics. Ischemic stroke, which occurs as a result of an obstruction within a blood vessel supplying blood to the brain, accounts for 87 percent of all strokes. Only a few contemporary studies have ...

Most Medicare stroke patients die or are rehospitalized within year after discharge

2010-12-17
A UCLA-led has study found that after leaving the hospital, nearly two-thirds of Medicare beneficiaries hospitalized for acute ischemic stroke either died or were rehospitalized within a year. The findings point to an opportunity for more quality-of-care initiatives to improve stroke care, especially in transitioning to home, stroke rehabilitation and outpatient care. The study, which appears online Dec. 16 in Stroke, a journal of the American Heart Association, also found that hospital mortality and readmission rates varied widely nationwide, indicating there ...

Mount Sinai researchers develop mouse model to help find how a gene mutation leads to autism

2010-12-17
Researchers from Mount Sinai School of Medicine have found that when one copy of the SHANK3 gene in mice is missing, nerve cells do not effectively communicate and do not show cellular properties associated with normal learning. This discovery may explain how mutations affecting SHANK3 may lead to autism spectrum disorders (ASDs). The research is currently published in Molecular Autism. "We know that SHANK3 mutation plays a central, causative role in some forms of autism spectrum disorders, but wanted to learn more about how it does this," said Joseph Buxbaum, PhD, Director ...

Tools used to decipher 'histone code' may be faulty

Tools used to decipher histone code may be faulty
2010-12-17
CHAPEL HILL, N.C. – The function of histones -- the proteins that enable yards of DNA to be crammed into a single cell -- depends on a number of chemical tags adorning their exterior. This sophisticated chemical syntax for packaging DNA into tight little coils or unraveling it again -- called the "histone code" -- is the latest frontier for researchers bent on understanding how genetics encodes life. But recent research from the University of North Carolina at Chapel Hill has found a number of issues with histone antibodies, the main tools used to decipher this code, ...

A 'spin ratchet': A new electronic structure for generating spin current

2010-12-17
A research team from the Institut Català de Nanotecnologia (ICN), in Barcelona, has demonstrated a device that induces electron spin motion without net electric currents, a key step in developing the spin computers of the future. The results are published in the Dec 17 issue of the journal Science. The authors are Marius V. Costache and Sergio O. Valenzuela, an ICREA Professor who is leader of the Physics and Engineering of Nanodevices Group at ICN. Spintronics is a branch of electronics that aims to use the electron spin rather than its charge to transport and store ...

Better spaces for older people

2010-12-17
The research project 'Older People's Use of Unfamiliar Space' (OPUS) examined the strategies used by older people to find their way in unfamiliar spaces as pedestrians and users of public transport. As part of the research, older people were shown town scenes and pedestrian routes and gave feedback on signposting, ease of navigation and general impressions. Their heart rates were measured to monitor stress levels. Participants were also taken to a town centre to walk through the same routes in person. Initial findings show: Signs are of limited use even in unfamiliar ...

Genome code cracked for most common form of pediatric brain cancer

2010-12-17
Scientists at the Johns Hopkins Kimmel Cancer Center have deciphered the genetic code for medulloblastoma, the most common pediatric brain cancer and a leading killer of children with cancer. The genetic "map" is believed to be the first reported of a pediatric cancer genome and is published online in the December 16 issue of Science Express. Notably, the findings show that children with medulloblastoma have five- to tenfold fewer cancer-linked alterations in their genomes compared with their adult counterparts, the scientists say. "These analyses clearly show that ...

CHOP experts collaborate in gene survey of childhood brain cancer; intriguing clues found

2010-12-17
Pediatric cancer researchers at The Children's Hospital of Philadelphia contributed important expertise to a new landmark study of medulloblastoma, a type of brain tumor typically found in children. The large multicenter study defines the genetic landscape of this cancer, and holds intriguing clues to gene changes on signaling pathways that may become fruitful targets for future therapies. The most common cancerous brain tumor in children, medulloblastoma is, fortunately, rare. However, it causes significant mortality, and survivors may suffer serious long-term side effects ...

LAST 30 PRESS RELEASES:

Label distribution similarity-based noise correction for crowdsourcing

The Lancet: Without immediate action nearly 260 million people in the USA predicted to have overweight or obesity by 2050

Diabetes medication may be effective in helping people drink less alcohol

US over 40s could live extra 5 years if they were all as active as top 25% of population

Limit hospital emissions by using short AI prompts - study

UT Health San Antonio ranks at the top 5% globally among universities for clinical medicine research

Fayetteville police positive about partnership with social workers

Optical biosensor rapidly detects monkeypox virus

New drug targets for Alzheimer’s identified from cerebrospinal fluid

Neuro-oncology experts reveal how to use AI to improve brain cancer diagnosis, monitoring, treatment

Argonne to explore novel ways to fight cancer and transform vaccine discovery with over $21 million from ARPA-H

Firefighters exposed to chemicals linked with breast cancer

Addressing the rural mental health crisis via telehealth

Standardized autism screening during pediatric well visits identified more, younger children with high likelihood for autism diagnosis

Researchers shed light on skin tone bias in breast cancer imaging

Study finds humidity diminishes daytime cooling gains in urban green spaces

Tennessee RiverLine secures $500,000 Appalachian Regional Commission Grant for river experience planning and design standards

AI tool ‘sees’ cancer gene signatures in biopsy images

Answer ALS releases world's largest ALS patient-based iPSC and bio data repository

2024 Joseph A. Johnson Award Goes to Johns Hopkins University Assistant Professor Danielle Speller

Slow editing of protein blueprints leads to cell death

Industrial air pollution triggers ice formation in clouds, reducing cloud cover and boosting snowfall

Emerging alternatives to reduce animal testing show promise

Presenting Evo – a model for decoding and designing genetic sequences

Global plastic waste set to double by 2050, but new study offers blueprint for significant reductions

Industrial snow: Factories trigger local snowfall by freezing clouds

Backyard birds learn from their new neighbors when moving house

New study in Science finds that just four global policies could eliminate more than 90% of plastic waste and 30% of linked carbon emissions by 2050

Breakthrough in capturing 'hot' CO2 from industrial exhaust

New discovery enables gene therapy for muscular dystrophies, other disorders

[Press-News.org] Using digitized books as 'cultural genome,' researchers unveil quantitative approach to humanities
Online tool enables anyone to quantify cultural trends going back centuries