The CRIS UNS system is implemented as a web application available at CRIS UNS
By 2015, there are many CRIS systems: IST World , HunCRIS , SICRIS , CRIStin , Pure , etc. IST World (Information Society Technology World) is a portal that provides access to scientific-research results from several countries. This portal was developed within a FP6 (Sixth Framework Programme) project. The data model created for the purpose of that system is a CERIF data model extension. HunCRIS and SICRIS are Hungarian and Slovenian national CRIS systems based on the CERIF data model. CRUStin is an information system used by scientific institutions in Norway. Pure is a commercial software that can be installed and customized for the needs of scientific institutions. That software system is used by many universities such as the University of Helsinki and the University of Copenhagen.
Aboard all these systems we have decided to develop our own system due to specific local requests aimed at evaluation and accreditation demands. These requests are mainly reports prescribed by faculties, universities, Provincial Secretariat for Science and Technological Development of Autonomous Province of Vojvodina and Ministry of Education, Science and Technological Development of Republic of Serbia. While modeling the system, particular attention was paid to interoperability issues in order to increase accessibility to scientific references and therefore the rating of the University of Novi Sad. Informational requirements imposed to our system are:
Hereafter this system called CRIS UNS is described, which is under development since the year 2008.
A data model is created that will provide for the implementation of all above listed requirements of the system. As bedrock of the model, the CERIF data model is adopted. CERIF entities aimed at storing metadata about scientific results, researchers, institutions and projects are replaced by MARC 21 entity containing bibliographic (scientific results) and normative (researchers, institutions and projects) records. All entities and attributes prescribed by the CERIF model are mapped to the proposed model. Furthermore, the proposed model preserves the existing references between CERIF entities. Mappings of CERIF entities to MARC 21 formats are presented in the paper [1].
Advantages of the proposed data model comparing to the CERIF data model are following:
CERIF compatible data model based on MARC 21 format does not support all data types’ restrictions defined by CERIF model. This means that such restrictions must be controlled by software. For example, the attribute cfResPublDate of the cfResPubl entity is of type date, while in the data model proposed in this section this attribute is mapped to a string type attribute. These restrictions are imposed by MARC 21 format. Analogously, in the data model proposed in this chapter, multiple data from CERIF model is stored in a single attribute having a proper format. This means that values of these attributes must be parsed during data reading.
For the sake of evaluating scientific/research work the CRIS UNS data model extension is created [14]. This extension enables evaluation of scientific-research results by diverse rule books prescribed by faculties, University, Provincial Secretariat of Autonomous Province of Vojvodina for Science and Technological Development, Ministry of Education, Science and Technological Development of Republic of Serbia as well as some international institutions/organizations. The extension of data model is based on the semantic layer of CERIF model which enables classification of entities as well as classification of relations between entities according to some classification scheme. The rule books can base their scientific-research outputs evaluations upon various bibliographic indicators that are stored in the entity cfMetrics of the CERIF model. Main characteristics of the extended model are:
The application’s architecture is a multi-tier, client-server one. The application is founded on the collection of Java based open-source solutions. Use of different web browsers for accessing the application is provided. Using this application, a coauthoring researcher can enter the metadata on published result. Although the data model is based on MARC 21 and CERIF standards, the end user is not expected to be even aware of these standards in order to enter data about her/his published results. Localization is fully supported. The messages that are visible to users through user interface are stored in external files which provides for translation to a new language without changing source code. User can select the desired interface language. Currently, the interface is in Serbian language, but translations to English language and languages of national minorities living in Autonomous Province of Vojvodina (Hungarians, Romanians, Slovaks, and Rusyns) are in progress.
The client side application is a standard web browser. All contemporary browsers HTML 4 and JavaScript enabled (Mozilla Firefox, Google Chrome, Internet Explorer and others) are supported. The server application is running under Apache Tomcat application server. JSF (Java Server Faces - model-view-controller framework is used for interaction with users. In order to enrich user interface and, at the same time, reduce the AJAX technology caused problems within web browsers, the RichFaces library of AJAX-based JSF components is used.
For text indexing and search the Apache Lucene is used, which is an open source text search engine library written entirely in Java.
DTO (Data transfer object) are objects that are used for data transfer between application components. DTOs have a set of attributes and methods only for access to and modification of these attributes. A software component is implemented that enables conversion from DTO objects to object representation of MARC 21 record and vice-versa. JSF is used for interface implementation which utilizes DTOs for data updating and presentation. MARC 21 record, which is previously indexed by Apache Lucene based text server, is stored in a database. This means that an Interface component, after receiving an update request, converts DTO to MARC 21 record and forwards it to database components for indexing and storage. Analogously, after completion of search, the MARC 21 format results obtained from text server are converted in DTOs.
Component which is responsible for database operations uses JDBC to access database. Database management system is MySQL .
File system component is responsible for controlled download and upload of research results represented in digital form (.pdf, .doc, .docx, .odt), as well as for extraction of textual content that are submitted to text server component for indexing.
Import/export component, which is responsible for the systems’ interoperability with other systems storing research results, implements various protocols for data exchange.
Search component receives CQL queries , processes them and returns results lists. This component accepts queries from other computer systems via SRU/W protocol, as well as queries submitted through the user interface of the CRIS UNS systems’ web page.
Evaluation component is in charge with application of diverse research results’ evaluation rule books to research results that are stored in CRIS UNS database.
Reporting component is aimed at creating diverse reports requested by faculties, University, Provincial Secretariat for Science and Technological Development of Autonomous Province of Vojvodina, and Ministry of Education, Science and Technological Development of Republic of Serbia. This component is implemented by using FreeMarker Package .
The main feature of the system architecture is the independence of components for interaction with system users and those aimed at persisting and retrieving data from the bibliographic records database. This architecture allows an easy transition to other bibliographic standards and easy integration with library systems based on the adopted bibliographic standard.
Researchers, who are not expected to be familiar nor with MARC 21 format neither with CERIF data model, enter their scientific results. The types of scientific results that can be entered are: Papers published in scientific journals, Papers published in Conference proceedings, Monographs, Monographs’ chapters, Theses and PhD dissertations, Patents, Products.
All these results’ types have different metadata that are expected to be entered by researchers. Mapping of these metadata is described in paper [1]. The software module for data acquisition is described in papers [10, 11], while the metadata extraction from PDF files is described in the paper [12].
The MARC 21 format is rich in metadata and enables detailed description of entities in the CRIS UNS system. A MARC 21 record can store all metadata prescribed by DC and ETD-MS format [9]. An information system based on the CERIF-compatible data model can exchange data with other systems using XML documents, whose XML schemas are prescribed by CERIF standard, and also can exchange data with library information systems based on MARC 21 formats and with institutional repositories based on DC or ETD-MS format. Integration of CRIS UNS and an OAI-PMH Compatible ETDs Repository is main subject of the paper [13]. An OAI-PMH Provider which enables repository of CRIS UNS theses and PhD dissertations to become the member of the NDLTD and DART-Europe networks has been implemented. Consolidation of data about theses and PhD dissertations is in progress. Once consolidation of data is finished, ETDs repository implemented as part of CRIS UNS will become a member of NDLTD and DART-Europe networks.
CRIS UNS ontology for theses and PhD dissertations description is presented in paper [14]. Also, CRIS UNS ontology for other scientific-research results description is created. This ontology is based on BIBO, Dublin Core and FOAF ontologies. The next tasks are the implementation of a software component which will export data from the CRIS UNS system to proposed ontology, as well as the implementation of a semantic web service which will enable interoperability with other systems.
A module for import of theses and PhD dissertations to CRISs is implemented [15]. The main feature of the module is support for various formats of metadata of theses and PhD dissertations. The model is extensible with plug-in that provide support for import of other scientific-research results (papers published in journals, papers published in conferences’ proceedings, monographs, etc.) in various metadata formats (Dublin Core, ETD-MS, CERIF, etc.). The module imports data through a user interactive process by which consolidation of data is achieved.
Web page for search provides users with means for scientific results search/retrieval. It is located at CRIS UNS). The baseline for this page was the search page of the Scopus citation base. Users define search criteria by entering values for corresponding search fields: Title, Abstract, Author, etc.
It is possible to perform results’ filtering by institution owning the result and result type.
In order to achieve flexible search, a fuzzy search is implemented in which the similarity criterion for two data strings is defined as follows:
Levenshtein distance (edit distance) between two words in data string must be less or equal to integer number obtained by dividing length of a longer word by the value five.
If a data string contains more than five words than previous criterion applies to 80% of present words.
Cyrillic and Latin alphabets are equal alphabets.
Hence the similarity criteria treats Cyrillic and Latin alphabets as equals, all Cyrillic textual data are converted to their Latin counterparts prior to indexing, and all Cyrillic queries are converted to Latin for searching purpose. This means that Apache Lucene operates only on Latin coded data, but in a database this data is stored as coded by a user. Cyrillic to Latin conversion is unique.
There is also advanced search available which enables users to define their queries using CQL query language syntax. CQL provides for defining new context sets. This feature is utilized for creating a new CRIS UNS specific context set which defines metadata not available in existing standardized context sets.
Besides the web page for searching all scientific-research outputs stored in CRIS UNS, a web page for searching PhD dissertations defended at the University of Novi Sad has been also implemented.
CRIS UNS data model provides for evaluation of research results by applying various rule books as defined by scientific institutions. For example, at the Faculty of Science of University of Novi Sad, there are five departments each of them having the rule book for evaluating research results in its specific scientific filed (mathematics and informatics; physics; chemistry; biology; geography). Evaluation of the results which are published in scientific journals is based on journal’s impact factor which is stored in data model. The evaluation algorithm for the scientific results published in journals is described in paper [16]. CRIS UNS data model and the evaluation algorithm itself provide for evaluation of journal papers which includes other bibliometric indicators in addition to journal’s impact factor.
CRIS UNS users can obtain reports on all their research results evaluated in accordance with various rule books defined within the CRIS UNS system. An extension of CRIS UNS that provides a public service for journals’ papers evaluation in all scientific areas has been implemented . For that purpose we have collected publically available data for approximately 40.000 scientific journals. The ultimate goal is an Internet service that enables researchers to evaluate their papers only by providing data on journal title and journal publication year, and by selecting evaluation rule book.
An important function of any research information system is creating different kinds of reports about published results. Within CRIS UNS there is a separate software component aimed at reporting purpose. This component is implemented using the FreeMarker package in which the reports are generated as the outputs of the appropriate templates [17]. Input data for that component are a list of references and a corresponding template, and the output is an appropriate report. Adding new report to the component requires writing new templates by the developers.
CRIS UNS was verified and tested on data about scientific/research results of researchers employed at two faculty of University of Novi Sad: Faculty of Sciences and Faculty of Technology. After migration of existing data about published results to CRIS UNS, researchers continue to supply data about their published results using this application. Refill of the CRIS UNS with data on scientific/research results from other thirteen faculties is in progress. So far CRIS UNS stores over 5000 scientific results published in over 1000 journals, over 7000 conference papers and over 3000 scientific results of other types. About 500 out of 3500 researchers from the University of Novi Sad have their research results stored in CRIS UNS.
Besides the usage of the system, scientific-research community recognized the importance of obtained results within this project. More than 30 research publications has been published including:
Future work on CRIS UNS follows two main directions.
The first one is data acquisition with an ultimate goal to set up continuous, regular acquisition of scientific/research results from at least all faculties of University of Novi Sad.
The second one is improvement of CRIS UNS services. In the very near future we plan to implement SRU/W service for querying CRIS UNS database from external systems. Also, improvements of reporting component aimed at providing users with a tool for creating reporting templates matching their specific needs is planned. Finally, we are planning to develop a service aimed at recording data about scientific/research projects, including linking scientific results to projects from which these results emerged.
[1] Ivanović, D., Surla, D. and Konjović, Z. 2011. CERIF compatible data model based on MARC 21 format. The Electronic Library, 29 (1), 52-70. DOI=10.1108/02640471111111433
[2] Asserson, A., Jeffery, K. and Lopatenko, A. (2002), “CERIF: Past, Present and Future: An Overview”, Proceedings of the 6th International Conference on Current Research Information Systems, University of Kassel, August 29 - 31, 2002, pp. 33-40
[3] Jörg, B., Ferlež, J. and Grabczewski, E. (2005), “Public IST World Deliverable 1.2 – Data Model for Knowledge Organisation”, 18 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.2_DataModelF... (accessed February 13, 2012)
[4] Kiryakov, A., Grabczewski, E., Ferlež, J., Uszkoreit, H. and Jörg, B. (2005), “Public IST World Deliverable 1.1 – Definition of the Central Data Structure”, 23 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.1_CentralDat... (accessed February 13, 2012)
[5] Ferlež, J. (2005), “Public IST World Deliverable 1.3 – Data Model for Representation of Expertise”, 12 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.3_DataModelF... (accessed February 13, 2012)
[6] Jörg, B., Ferlež, J., Grabczewski, E. and Jermol, M. (2006), “IST World: European RTD Information and Service Portal”, 8th International Conference on Current Research Information Systems: Enabling Interaction and Quality: Beyond the Hanseatic League (CRIS 2006), Bergen, Norway, 10 p., available at: http://epubs.cclrc.ac.uk/bitstream/905/ISTWorld01.pdf (accessed February 13, 2012)
[7] Ivanović, D., Surla, D. and Racković, M. 2011. A CERIF data model extension for evaluation and quantitative expression of scientific research results. Scientometrics. 86 (1), 155-172. DOI=10.1007/s11192-010-0228-2
[8] Jörg, B., Krast, O., Jeffery, K. and Grootel, G. (2009b), “CERIF 2008 – 1.0 XML Data Exchange Format Specification”, 33 p., available at: http://www.eurocris.org/fileadmin/cerif-2008/CERIF2008_1.0_XML.pdf (accessed February 13, 2012)
[9] Ivanovic, L., Ivanovic, D. and Surla, D. 2012. A data model of theses and dissertations compatible with CERIF, Dublin Core and EDT-MS. Online Information Review, 36 (4), pp. 568-586, DOI 10.1108/14684521211254068
[10] Ivanović, D., Milosavljević, G., Milosavljević, B. and Surla, D. 2010. A CERIF-compatible research management system based on the MARC 21 format. Program: Electronic library and information systems. 44 (1), 229-251 DOI=10.1108/00330331011064249
[11] Milosavljević, G., Ivanović, D., Surla, D. and Milosavljević, B. 2011. Automated construction of the user interface for a CERIF-compliant research management system. The Electronic Library. 29 (5), 565 – 588. DOI= 10.1108/02640471111177035
[12] Kovačević, A., Ivanovic, D., Milosavljevic, B., Konjovic, Z. and Surla, D. 2011. Automatic extraction of metadata from scientific publications for CRIS systems. Program: electronic library and information systems. 45 (4), 376 – 396. DOI=10.1108/00330331111182094
[13] Ivanovic, L., Ivanovic, D. and Surla, D. (2012), “Integration of a Research Management System and an OAI-PMH Compatible ETDs Repository at the University of Novi Sad, Republic of Serbia”, Library Resources and Technical Services, Vol. 56, No. 2, pp. 104-112
[14] Ivanović, L., Dimić Surla, B., Segedinac, M. and Ivanović, D. (2012), “CRISUNS ontology for theses and dissertations”, ICIST 2012 - 2nd International Conference on Information Society Technology, February 29 – March 03, 2012, Kopaonik, Serbia, pp. 164-169
[15] Ivanovića, L., & Surla, D. (2012), “A software module for import of theses and dissertations to CRISs”, Proceedings of the CRIS 2012 Conference, Prague, June 6-9. 2012, pp. 313-322
[16] Ivanović, D., Surla, D. and Racković, M. (2012), “Journal evaluation based on bibliometric indicators and the CERIF data model”, Computer Science and Information Systems, 9(2), pp. 791-811, DOI 10.2298/CSIS110801009I
[17] Dimić-Surla, B. & Ivanović, D. (2012), “Software component for reporting in the CRIS systems”, Proceedings of the CRIS 2012 Conference, Prague, June 6-9. 2012, pp. 61-66