In autumn 2012 Royal School of Library and Information Science, Copenhagen offers this course.
Cultural heritage constitutes a society’s collective memory and is typically curated by libraries, museums, and archives. In recent years, many of these institutions have started digitizing their collections and making them available to the general public. This course deals with this entire process from digitization, cleaning, and error correction to enrichment and providing efficient access to these cultural heritage collections.
The course will focus on three main areas:
- How can we digitize and automatically enrich our cultural heritage (e.g., data cleaning, NLP for cultural heritage)?
- How can we provide efficient access to digitized cultural heritage material (e.g., search, browsing & tagging, and recommendation)?
- Real-world cases from the literature: how are these issues dealt with in actual institutions?
Content in detail:
1. Digitization, cleaning, and enrichment
o How can cultural heritage material be digitized?
§ Optical Character Recognition (OCR)
§ Digitizing audio & video
o What are the important issues in digitization?
§ Durability of storage media
§ Durability of formats
§ Licensing and copyright
o How can we clean up and correct our digitized material?
§ Spelling correction
§ Database correction
o How can we automatically enrich our digital collections?
§ Speech recognition
§ Entity recognition and linking
2. Access technologies
§ What are the main retrieval algorithms?
Vector Space model w/ TFIDF
Test collections§ What is the role of the user in search?
o Browsing & tagging
§ Supporting browsing in large collections
§ Social tagging
§ How do recommender systems work?
3. Real-world examples for each of the above topics ‘