Semantic Technology at The New York Times: Lessons Learned and Future Directions

Abstract

At last year's International Semantic Web Conference, The New York Times Company announced the release of our Linked Open Data Platform: http://data.nytimes.com. In the subsequent year, we have continued our efforts in this space and learned many valuable lessons. In our remarks, we will review these lessons; demonstrate innovative prototypes built on our linked data; explore the future of RDF and RDFa in the News Industry and announce an exciting new milestone in our Linked Data efforts.

The Speaker

Evan Sandhaus is the Lead Architect of Semanic Platforms for The New York Times Company. In this capacity, he directs the development of next generation metadata management systems. In his four years with The Times, Mr. Sandhaus has directed strategy and technology for The New York Times Linked Open Data Initiative; developed a semantic technology for identifying key concepts in large text datasets; engineered a patent-pending system for purging template text from Web content; and collaborated with The Linguistic Data Consortium to release and promote The New York Times Annotated Corpus, a collection of 1.8 million richly annotated Times articles published from 1987 to 2007. Additionally, Mr. Sandhaus has led the development of a Web-scale web crawler, a Google Earth news layer and multiple search engine optimization toolkits.

Before joining the Times Company, Mr. Sandhaus worked at The University of Pennsylvania from 2005 to 2006 and Lockheed Martin from 2002 to 2005.

Mr. Sandhaus holds a bachelor’s degree from Williams College; a master’s degree from Villanova University; and is currently pursuing a doctorate in Computer Science at New York University.

Born and raised in Leawood, Kan., Mr. Sandhaus now resides in Brooklyn, N.Y.

You can follow Evan on Twitter @kansandhaus