Content Mining of the bioscience literature


Thu May 14th 10:00am to 11:00am PDT


Peter Murray-Rust


The ContentMine has developed Open tools for mining the scientific and medical literature (full text, figures, images and supplemental data). We have developed a pipeline to cover the whole process of Crawling, Scraping, Normalising and Mining articles and storing/republishing the results. We are now doing this on a daily basis.

The ContentMine is funded by a Fellowship to PMR from the Shuttleworth Foundation. The aims include the creation of subcommunities, and unrestricted dissemination of all materials, code and results (Apache 2, CC-BY and CC0 as appropriate). We intend to generate publish 100 million facts per year available for use and re-use. The system is designed to allow anyone to create pluggable resources (code, vocabularies) and to make ContentMining easy and available to anyone. Much of our work is through interactive workshops and we hope to show participants how to start ContentMining. Two of our approaches include downloadable virtual machines and a web service.


Dr. Peter Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Dr. Murray-Rust is also known for his support of open access and open data.  He leads the team at the ContentMine project which uses machines to liberate 100,000,000 facts from the scientific literature. After obtaining a Ph.D., he became lecturer in chemistry at the (new) University of Stirling and was first warden of Andrew Stewart Hall of Residence. In 1982 he moved to Glaxo Group Research at Greenford to head Molecular Graphics, Computational Chemistry and later protein structure determination. He was Professor of Pharmacy in the University of Nottingham from 1996-2000, setting up the Virtual School of Molecular Sciences. He is now Reader in Molecular Informatics at the University of Cambridge and Senior Research Fellow of Churchill College, Cambridge.

Dr. Murray-Rust's research interests have involved the automated analysis of data in scientific publications, creation of virtual communities (e.g., The Virtual School of Natural Sciences in the Globewide Network Academy and the Semantic Web). With Henry Rzepa he has extended this to chemistry through the development of markup languages, especially Chemical Markup Language. He campaigns for open data, particularly in science, and is on the advisory board of the Open Knowledge Foundation and a co-author of the Panton Principles for Open scientific data. Together with a few other chemists he was a founder member of the Blue Obelisk movement in 2005.

How to join the webinar:

Must use web and audio.
Please include name and company when joining meeting.
Web: (Access Code: 2201876)
Audio: 1-866-740-1260 (Access Code: 2201876)
Click here to test your computer's compatibility before the meeting.


IMPORTANT NOTICE: This ReadyTalk service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. By joining this session, you automatically consent to such recordings. If you do not consent to the recording, discuss your concerns with the meeting host prior to the start of the recording or do not join the session. Please note that any such recordings may be subject to discovery in the event of litigation.