dc:description |
Data repository has long become an effective mechanism for integrating and managing the research resources. And how to improve the reuse value of research resources is one of the key issue regarding data repository. Recently, publishing those resources in the form of linked data with referenced interlinks and rich semantics on the world wide web draws much attention from data repository owners. In this study, we first converted 840,000 XML-formatted CC-licensed digital resources from the Union Catalog of Digital Archives Taiwan (http://catalog.digitalarchives.tw/) into human-editable CSV tabular data. Those tabular data are then transformed into linked data in RDF (Resource Description Framework). The XML-CSV-RDF process ensures accessibility to resources for both human and machine. An ontology (voc4odw) is also designed to describe each resource and its provenance. Two types of linked data resources are generated: one type is simply described by Dublin Core’s 15 vocabularies, and the other type is enriched by domain vocabularies from external datasets including Wikidata, GeoNames, and Encyclopedia of Life. Furthermore, for the second type of linked data resources, there may exist several “versions” of resources with different domain vocabularies to provide more insight into the data. For demonstrating generated linked data resources, a linked open data (LOD) repository http://data.odw.tw is built using CKAN (Comprehensive Knowledge Archive Network). CKAN is an open source data management system equipped with comprehensive functions for publishing, storing, managing, showing, and using data, including both raw data and metadata. Resources in RDF format were loaded to our CKAN instance through a custom harvesting method, which also provides the ability to export each resource in various linked data formats such as RDF-XML, turtle, and JSON-LD. And thanks to CKAN’s high flexibility in metadata customizations and data preview methods, we made some extensions to CKAN so that people can explore all linked data resources in well-formed table views, find resources by keywords, facets, time ranges, or spatial extents. Meanwhile, a SPARQL endpoint provided by Openlink Virtuoso is integrated into CKAN’s interface for advanced usage. The ongoing works include linking to more external datasets, improving the import process, and aggregating existing resources to infer and obtain “new knowledges” with CKAN’s data representation capabilities.
|