CRIS UNS-Current Research Information System of University of Novi Sad

About

The CRIS UNS system is implemented as a web application available at CRIS UNS

 

By 2015, there are many CRIS systems: IST World , HunCRIS , SICRIS CRIStin Pure , etc. IST World (Information Society Technology World) is a portal that provides access to scientific-research results from several countries. This portal was developed within a FP6 (Sixth Framework Programme) project. The data model created for the purpose of that system is a CERIF data model extension. HunCRIS and SICRIS are Hungarian and Slovenian national CRIS systems based on the CERIF data model. CRUStin is an information system used by scientific institutions in Norway. Pure is a commercial software that can be installed and customized for the needs of scientific institutions. That software system is used by many universities such as the University of Helsinki and the University of Copenhagen.

Aboard all these systems we have decided to develop our own system due to specific local requests aimed at evaluation and accreditation demands. These requests are mainly reports prescribed by faculties, universities, Provincial Secretariat for Science and Technological Development of Autonomous Province of Vojvodina and Ministry of Education, Science and Technological Development of Republic of Serbia. While modeling the system, particular attention was paid to interoperability issues in order to increase accessibility to scientific references and therefore the rating of the University of Novi Sad. Informational requirements imposed to our system are:

  • Access to the application via an arbitrary modern web browser.
  • Researchers entering their references by themselves without the need to be at home on any standard for references describing.
  • Interoperability of our system with diverse systems containing scientific content such as CRIS systems, institutional repositories, library information systems, digital repositories, etc.
  • Capability to search the database containing scientific results.
  • Capability to perform evaluation of scientific research results following the rule book(s) prescribed by the Ministry of Education, Science and Technological Development of Republic of Serbia.
  • Reporting for the faculties, University, Provincial Secretariat for Science and Technological Development of Autonomous Province of Vojvodina and Ministry of Education, Science and Technological Development of Republic of Serbia.

Hereafter this system called CRIS UNS is described, which is under development since the year 2008. 

Data model

A data model is created that will provide for the implementation of all above listed requirements of the system. As bedrock of the model, the CERIF data model is adopted.  CERIF entities aimed at storing metadata about scientific results, researchers, institutions and projects are replaced by MARC 21 entity containing bibliographic (scientific results) and normative (researchers, institutions and projects) records. All entities and attributes prescribed by the CERIF model are mapped to the proposed model. Furthermore, the proposed model preserves the existing references between CERIF entities. Mappings of CERIF entities to MARC 21 formats are presented in the paper [1].

Advantages of the proposed data model comparing to the CERIF data model are following:

  • The proposed model contains more metadata than the CERIF model. Information system based on the CRIS UNS data model can exchange data with other CRIS systems using XML documents whose XML schemes are described in [8]. Also, exchange is possible with institutional repositories based on Dublin Core format, and library systems based on MARC 21 format. CRIS UNS data model enables membership in networks of digital theses and PhD dissertations like NDLTD and DART Europe [9]. 
  • The MARC 21 format which is base for this model allows more detailed description of entities of research management systems because it is rich in metadata. For example, the MARC 21 bibliographic format contains field 510 and this field holds information about who and where cited the publication described by certain MARC 21 record.
  • The proposed model is smaller and more modular.
  • Moreover, the proposed model can be used for storing metadata about other publication types which are not supported by the CERIF model such as: outputs from fine arts research, reports, technical documentations, etc. The MARC 21 bibliographic format can store those publication types.
  • Also, the proposed data model is less sensitive to the CERIF model changes. It is possible that some future version of CERIF data model will add an entity or attributes that could be mapped to the MARC 21 format. In this case, the proposed model will remain the same.
  • Furthermore, the proposed data model creates repository of bibliographic and authority MARC 21 records about scientific-research outputs of researchers and institutions. This repository can be used for generating bibliographies of researchers and institutions, evaluation of scientific-research outputs, etc. Authority and bibliographic records are linked through a unique identifiers giving to this repository characteristics of a relational database and thus enabling precise reporting.

CERIF compatible data model based on MARC 21 format does not support all data types’ restrictions defined by CERIF model. This means that such restrictions must be controlled by software. For example, the attribute cfResPublDate of the cfResPubl entity is of type date, while in the data model proposed in this section this attribute is mapped to a string type attribute. These restrictions are imposed by MARC 21 format. Analogously, in the data model proposed in this chapter, multiple data from CERIF model is stored in a single attribute having a proper format. This means that values of these attributes must be parsed during data reading.

For the sake of evaluating scientific/research work the CRIS UNS data model extension is created [14]. This extension enables evaluation of scientific-research results by diverse rule books prescribed by faculties, University, Provincial Secretariat of Autonomous Province of Vojvodina for Science and Technological Development, Ministry of Education, Science and Technological Development of Republic of Serbia as well as some international institutions/organizations. The extension of data model is based on the semantic layer of CERIF model which enables classification of entities as well as classification of relations between entities according to some classification scheme. The rule books can base their scientific-research outputs evaluations upon various bibliographic indicators that are stored in the entity cfMetrics of the CERIF model. Main characteristics of the extended model are:

  • This model is an extension of the CERIF data model, i.e., it is extension of standardized data model of research management systems.
  • The model enables researcher’s results evaluation according to some national or international rule book.
  • This model can be used for creation arbitrary number of rule books. Also, result types defined by some rule book can be decomposed to arbitrary number of hierarchical levels.
  • There can be defined more commissions within each science area. For example, within Mathematics and natural sciences area, separate commissions for mathematics, physics, chemistry and biology can be appointed.
  • Evaluation of researcher’s scientific-research outputs can be done according to different rule books using classifications established by different commissions.

Implementation

The application’s architecture is a multi-tier, client-server one. The application is founded on the collection of Java based open-source solutions.  Use of different web browsers for accessing the application is provided. Using this application, a coauthoring researcher can enter the metadata on published result. Although the data model is based on MARC 21 and CERIF standards, the end user is not expected to be even aware of these standards in order to enter data about her/his published results. Localization is fully supported. The messages that are visible to users through user interface are stored in external files which provides for translation to a new language without changing source code. User can select the desired interface language. Currently, the interface is in Serbian language, but translations to English language and languages of national minorities living in Autonomous Province of Vojvodina (Hungarians, Romanians, Slovaks, and Rusyns) are in progress.

 

  Architecture

 

The client side application is a standard web browser. All contemporary browsers HTML 4 and JavaScript enabled (Mozilla Firefox, Google Chrome, Internet Explorer and others) are supported. The server application is running under Apache Tomcat application server. JSF (Java Server Faces -  model-view-controller framework is used for interaction with users. In order to enrich user interface and, at the same time, reduce the AJAX technology caused problems within web browsers, the RichFaces library of AJAX-based JSF components is used.

 

For text indexing and search the Apache Lucene is used, which is an open source text search engine library written entirely in Java.

 

DTO (Data transfer object) are objects that are used for data transfer between application components. DTOs have a set of attributes and methods only for access to and modification of these attributes. A software component is implemented that enables conversion from DTO objects to object representation of MARC 21 record and vice-versa. JSF is used for interface implementation which utilizes DTOs for data updating and presentation. MARC 21 record, which is previously indexed by Apache Lucene based text server, is stored in a database. This means that an Interface component, after receiving an update request, converts DTO to MARC 21 record and forwards it to database components for indexing and storage. Analogously, after completion of search, the MARC 21 format results obtained from text server are converted in DTOs.

 

Component which is responsible for database operations uses JDBC to access database. Database management system is MySQL .

 

File system component is responsible for controlled download and upload of research results represented in digital form (.pdf, .doc, .docx, .odt), as well as for extraction of textual content that are submitted to text server component for indexing.

 

Import/export component, which is responsible for the systems’ interoperability with other systems storing research results, implements various protocols for data exchange.

 

Search component receives CQL queries , processes them and returns results lists. This component accepts queries from other computer systems via SRU/W protocol, as well as queries submitted through the user interface of the CRIS UNS systems’ web page.

 

Evaluation component is in charge with application of diverse research results’ evaluation rule books to research results that are stored in CRIS UNS database.

 

Reporting component is aimed at creating diverse reports requested by faculties, University, Provincial Secretariat for Science and Technological Development of Autonomous Province of Vojvodina, and Ministry of Education, Science and Technological Development of Republic of Serbia. This component is implemented by using FreeMarker Package .

 

The main feature of the system architecture is the independence of components for interaction with system users and those aimed at persisting and retrieving data from the bibliographic records database. This architecture allows an easy transition to other bibliographic standards and easy integration with library systems based on the adopted bibliographic standard.

 

Data Acquisition

 

Researchers, who are not expected to be familiar nor with MARC 21 format neither with CERIF data model, enter their scientific results. The types of scientific results that can be entered are: Papers published in scientific journals, Papers published in Conference proceedings, Monographs, Monographs’ chapters, Theses and PhD dissertations, Patents, Products.

 

All these results’ types have different metadata that are expected to be entered by researchers. Mapping of these metadata is described in paper [1]. The software module for data acquisition is described in papers [10, 11], while the metadata extraction from PDF files is described in the paper [12].

 

Export/Import

 

The MARC 21 format is rich in metadata and enables detailed description of entities in the CRIS UNS system. A MARC 21 record can store all metadata prescribed by DC and ETD-MS format [9]. An information system based on the CERIF-compatible data model can exchange data with other systems using XML documents, whose XML schemas are prescribed by CERIF standard, and also can exchange data with library information systems based on MARC 21 formats and with institutional repositories based on DC or ETD-MS format. Integration of CRIS UNS and an OAI-PMH Compatible ETDs Repository is main subject of the paper [13]. An OAI-PMH Provider which enables repository of CRIS UNS theses and PhD dissertations to become the member of the NDLTD and DART-Europe networks has been implemented. Consolidation of data about theses and PhD  dissertations is in progress. Once consolidation of data is finished, ETDs repository implemented as part of CRIS UNS will become a member of NDLTD and DART-Europe networks.

 

CRIS UNS ontology for theses and PhD dissertations description is presented in paper [14]. Also, CRIS UNS ontology for other scientific-research results description is created. This ontology is based on BIBO, Dublin Core and FOAF ontologies. The next tasks are the implementation of a software component which will export data from the CRIS UNS system to proposed ontology, as well as the implementation of a semantic web service which will enable interoperability with other systems.

 

A module for import of theses and PhD dissertations to CRISs is implemented [15]. The main feature of the module is support for various formats of metadata of theses and PhD dissertations. The model is extensible with plug-in that provide support for import of other scientific-research results (papers published in journals, papers published in conferences’ proceedings, monographs, etc.) in various metadata formats (Dublin Core, ETD-MS, CERIF, etc.). The module imports data through a user interactive process by which consolidation of data is achieved.

 

Search

 

Web page for search provides users with means for scientific results search/retrieval. It is located at CRIS UNS). The baseline for this page was the search page of the Scopus citation base. Users define search criteria by entering values for corresponding search fields: Title, Abstract, Author, etc.

 

It is possible to perform results’ filtering by institution owning the result and result type.

 

In order to achieve flexible search, a fuzzy search is implemented in which the similarity criterion for two data strings is defined as follows:

 

Levenshtein distance (edit distance) between two words in data string must be less or equal to integer number obtained by dividing length of a longer word by the value five.

If a data string contains more than five words than previous criterion applies to 80% of present words.

Cyrillic and Latin alphabets are equal alphabets.

Hence the similarity criteria treats Cyrillic and Latin alphabets as equals, all Cyrillic textual data are converted to their Latin counterparts prior to indexing, and all Cyrillic queries are converted to Latin for searching purpose. This means that Apache Lucene operates only on Latin coded data, but in a database this data is stored as coded by a user. Cyrillic to Latin conversion is unique.

 

There is also advanced search available which enables users to define their queries using CQL query language syntax. CQL provides for defining new context sets. This feature is utilized for creating a new CRIS UNS specific context set which defines metadata not available in existing standardized context sets.

 

Besides the web page for searching all scientific-research outputs stored in CRIS UNS, a web page for searching PhD dissertations defended at the University of Novi Sad has been also implemented.

 

Evaluation

 

CRIS UNS data model provides for evaluation of research results by applying various rule books as defined by scientific institutions. For example, at the Faculty of Science of University of Novi Sad, there are five departments each of them having the rule book for evaluating research results in its specific scientific filed (mathematics and informatics; physics; chemistry; biology; geography). Evaluation of the results which are published in scientific journals is based on journal’s impact factor which is stored in data model. The evaluation algorithm for the scientific results published in journals is described in paper [16]. CRIS UNS data model and the evaluation algorithm itself provide for evaluation of journal papers which includes other bibliometric indicators in addition to journal’s impact factor.

 

CRIS UNS users can obtain reports on all their research results evaluated in accordance with various rule books defined within the CRIS UNS system. An extension of CRIS UNS that provides a public service for journals’ papers evaluation in all scientific areas has been implemented . For that purpose we have collected publically available data for approximately 40.000 scientific journals. The ultimate goal is an Internet service that enables researchers to evaluate their papers only by providing data on journal title and journal publication year, and by selecting evaluation rule book.

 

Reporting

 

An important function of any research information system is creating different kinds of reports about published results. Within CRIS UNS there is a separate software component aimed at reporting purpose. This component is implemented using the FreeMarker package in which the reports are generated as the outputs of the appropriate templates [17]. Input data for that component are a list of references and a corresponding template, and the output is an appropriate report. Adding new report to the component requires writing new templates by the developers.

Results

CRIS UNS was verified and tested on data about scientific/research results of researchers employed at two faculty of University of Novi Sad: Faculty of Sciences and Faculty of Technology. After migration of existing data about published results to CRIS UNS, researchers continue to supply data about their published results using this application. Refill of the CRIS UNS with data on scientific/research results from other thirteen faculties is in progress. So far CRIS UNS stores over 5000 scientific results published in over 1000 journals, over 7000 conference papers and over 3000 scientific results of other types. About 500 out of 3500 researchers from the University of Novi Sad have their research results stored in CRIS UNS.

 

Besides the usage of the system, scientific-research community recognized the importance of obtained results within this project. More than 30 research publications has been published including:

 

  • 3 PhD theses
  • 9 articles published in international journals indexed by WoS
  • 1 monograph which is awarded as the best research paper in Computer Science field in 2011 by Serbian Informatics Society

Future work

Future work on CRIS UNS follows two main directions.

 

The first one is data acquisition with an ultimate goal to set up continuous, regular acquisition of scientific/research results from at least all faculties of University of Novi Sad.

 

The second one is improvement of CRIS UNS services. In the very near future we plan to implement SRU/W service for querying CRIS UNS database from external systems. Also, improvements of reporting component aimed at providing users with a tool for creating reporting templates matching their specific needs is planned. Finally, we are planning to develop a service aimed at recording data about scientific/research projects, including linking scientific results to projects from which these results emerged.

References

[1]     Ivanović, D., Surla, D. and Konjović, Z. 2011. CERIF compatible data model based on MARC 21 format. The Electronic Library, 29 (1), 52-70. DOI=10.1108/02640471111111433

 

[2]     Asserson, A., Jeffery, K. and Lopatenko, A. (2002), “CERIF: Past, Present and Future: An Overview”, Proceedings of the 6th International Conference on Current Research Information Systems, University of Kassel, August 29 - 31, 2002, pp. 33-40

 

[3]     Jörg, B., Ferlež, J. and Grabczewski, E. (2005), “Public IST World Deliverable 1.2 – Data Model for Knowledge Organisation”, 18 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.2_DataModelF... (accessed February 13, 2012)

 

[4]     Kiryakov, A., Grabczewski, E., Ferlež, J., Uszkoreit, H. and Jörg, B. (2005), “Public IST World Deliverable 1.1 – Definition of the Central Data Structure”, 23 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.1_CentralDat... (accessed February 13, 2012)

 

[5]     Ferlež, J. (2005), “Public IST World Deliverable 1.3 – Data Model for Representation of Expertise”, 12 p., available at: http://ist-world.dfki.de/downloads/deliverables/ISTWorld_D1.3_DataModelF... (accessed February 13, 2012)

 

[6]     Jörg, B., Ferlež, J., Grabczewski, E. and Jermol, M. (2006), “IST World: European RTD Information and Service Portal”, 8th International Conference on Current Research Information Systems: Enabling Interaction and Quality: Beyond the Hanseatic League (CRIS 2006), Bergen, Norway, 10 p., available at: http://epubs.cclrc.ac.uk/bitstream/905/ISTWorld01.pdf (accessed February 13, 2012)

 

[7]     Ivanović, D., Surla, D. and Racković, M. 2011. A CERIF data model extension for evaluation and quantitative expression of scientific research results. Scientometrics. 86 (1), 155-172. DOI=10.1007/s11192-010-0228-2

 

[8]     Jörg, B., Krast, O., Jeffery, K. and Grootel, G. (2009b), “CERIF 2008 – 1.0 XML Data Exchange Format Specification”, 33 p., available at: http://www.eurocris.org/fileadmin/cerif-2008/CERIF2008_1.0_XML.pdf (accessed February 13, 2012)

 

[9]     Ivanovic, L., Ivanovic, D. and Surla, D. 2012. A data model of theses and dissertations compatible with CERIF, Dublin Core and EDT-MS. Online Information Review, 36 (4), pp. 568-586, DOI  10.1108/14684521211254068

 

[10]  Ivanović, D., Milosavljević, G., Milosavljević, B. and Surla, D. 2010. A CERIF-compatible research management system based on the MARC 21 format. Program: Electronic library and information systems. 44 (1), 229-251 DOI=10.1108/00330331011064249

 

[11]  Milosavljević, G., Ivanović, D., Surla, D. and Milosavljević, B. 2011. Automated construction of the user interface for a CERIF-compliant research management system. The Electronic Library. 29 (5), 565 – 588. DOI= 10.1108/02640471111177035

 

[12]  Kovačević, A., Ivanovic, D., Milosavljevic, B., Konjovic, Z. and Surla, D. 2011. Automatic extraction of metadata from scientific publications for CRIS systems. Program: electronic library and information systems. 45 (4), 376 – 396. DOI=10.1108/00330331111182094

 

[13]  Ivanovic, L., Ivanovic, D. and Surla, D. (2012), “Integration of a Research Management System and an OAI-PMH Compatible ETDs Repository at the University of Novi Sad, Republic of Serbia”, Library Resources and Technical Services, Vol. 56, No. 2, pp. 104-112

 

[14]  Ivanović, L., Dimić Surla, B., Segedinac, M. and Ivanović, D. (2012), “CRISUNS ontology for theses and dissertations”, ICIST 2012 - 2nd International Conference on Information Society Technology, February 29 – March 03, 2012, Kopaonik, Serbia, pp. 164-169

 

[15]  Ivanovića, L., & Surla, D. (2012), “A software module for import of theses and dissertations to CRISs”, Proceedings of the CRIS 2012 Conference, Prague, June 6-9. 2012, pp. 313-322

 

[16]  Ivanović, D., Surla, D. and Racković, M. (2012), “Journal evaluation based on bibliometric indicators and the CERIF data model”, Computer Science and Information Systems, 9(2), pp. 791-811, DOI  10.2298/CSIS110801009I

 

[17]  Dimić-Surla, B. & Ivanović, D. (2012), “Software component for reporting in the CRIS systems”, Proceedings of the CRIS 2012 Conference, Prague, June 6-9. 2012, pp. 61-66