REVISED WEBSITE Responsible: Dan Cristea Contributing Partners: UAIC, FFGZ, DFKI, UIB/Unifob The ultimate objective of CLARIN is to create a European federation of existing digital repositories that include language-based data, to provide uniform access to the data, wherever it is, and to provide existing language and speech technology tools as web services to retrieve, manipulate, enhance, explore and exploit the data. www.clarin.eu
The primary target audience is researchers in the humanities and social sciences and the aim is to cover all languages relevant for the user community. The objective of the current CLARIN Preparatory Phase Project (2008-2010) is to lay the technical, linguistic and organisational foundations, to provide and validate specifications for all aspects of the infrastructure (including standards, usage, IPR) and to secure sustainable support from the funding bodies in the (now 23) participating countries for the subsequent construction and exploitation phases beyond 2010. CLARIN-2008-4 2
CLARIN-2008-4 EC FP7 project no. 212230 Revised website Deliverable: D6C-2.1 - Deadline: 1.7.2008 (postponed to 1.10.2008 due to late start) Responsible: Dan Cristea Contributing Partners: UAIC, FFGZ, DFKI, UIB/Unifob all rights reserved by UAIC on behalf of CLARIN CLARIN-2008-4 3
Scope of the Document This document describes the technology used to build the CLARIN site, its structure and content. CLARIN-2008-4 4
Contents 1. Introduction... 6 2. Levels of accessibility... 6 3. The DRUPAL technology... 6 4. Site organization... 8 5. Dynamic features... 8 6. Clarin activities intermediated by the website till now... 9 7. Future developments... 9 References... 9 CLARIN-2008-4 5
1. Introduction The CLARIN website is intended to function as a virtual working space for the CLARIN partners and members as well as a dissemination space for anybody interested on our project. It records in a protected area all information regarding the projects' evolution necessary for internal consultation (set up and maintained in cooperation with WP1C), and a public section will be used for dissemination outside the project community. 2. Levels of accessibility The main criteria in the organization of the CLARIN website was to best satisfy the needs of its very diverse users. There are four categories of users of the CLARIN website: the unauthenticated user, the CLARIN member user, the CLARIN partner user and the CLARIN EB user. The different access rights of these types of users are presented below: - the unauthenticated user has access to public pages. These pages contain general interest information on the project, the events organized by the project's people, personal and institutional joining information. - the authenticated user: there are three types of authenticated users. The difference between these types of users comes from the different access permissions they are allowed to: - the CLARIN member user type: this type of user corresponds to a person working for an institution which is a CLARIN member and is interested in joining one of the CLARIN working groups. As soon as a member user requests membership to one of the working groups and the request is approved by the head of the working group or the CLARIN webmaster, she/he gets read access to all the documents and deliverables produced by the CLARIN working groups. As far as writing access is concerned, this user can only edit the documents that belong to the working group she/he is part of. Every change that is made by a user is logged by the Drupal system. The documents also have a versioning system that allows changes to be reverted. - the CLARIN partner user type: has all the privileges of a CLARIN member and has additional access to some documents that are relevant for the CLARIN partners; - the CLARIN Executive Board user type: has read and write privileges throughout all the CLARIN site; it can read and edit all the documents in all the working groups and has read/write access to EB documents and to the EB and CLARIN calendars. 3. The DRUPAL technology The site is built using the Drupal technology. Drupal is an open-source modular framework and Content Management System (CMS) written in PHP. It is distributed under the GPL (GNU General Public License) and is maintained and developed by a community of thousands of users and developers. A Content Management System is a software that facilitates the creation, organization, manipulation and removal of information in the form of images, documents, scripts, plain text, etc. 1 Drupal is part of a technology stack that contains the following pieces: - the server: a server is a computer which provides information or services to other computers on a network CLARIN-2008-4 6
- the operating system: the software that runs on the server; Drupal can be installed on a variety of OSs: Windows, Linux, Unix, Mac OS X; - the database: Drupal uses a database to store most content and configuration settings of the site; some content such as media files are usually stored in the server's files system; - the web server: a web server is the software component responsible for serving web pages; examples are Apache and Microsoft IIS; - PHP: PHP is a programming language that allows web developers to create dynamic content that interacts with databases; - Drupal: a framework for building dynamic web sites offering a broad range of features and services including user administration, publishing workflow, discussion capabilities, news aggregation, metadata functionalities using controlled vocabularies and XML publishing for content sharing purposes. 2 A Drupal installation is generally comprised of a mix of core and contributed modules. The Drupal core contains the basic features that are common across different types of CMSs: the ability to register and maintain individual user accounts and to differentiate types of user access, the ability to create and easily manage content (pages, tables, comments, custom content types), the possibility to create custom webforms (for user subscription, for submitting a resource or a tool, for signing up for an event). But the real power and flexibility of Drupal comes from the wealth of contributed modules. These modules enhance a Drupal site by offering very task specific functionality. The Drupal modules can be freely downloaded from the Drupal site 3 and can be installed and used. There are currently over 2500 modules for Drupal 6 available. The main modules which have been used for the CLARIN site are: the Organic Groups (OG) module, the Views module, the CCK module and the FCKEditor module. The Organic Groups (OG) module enables users to create and manage their own 'groups'. Each group can have subscribers and can maintain a group home page where the group members can communicate amongst themselves. They can do this by posting the Drupal-specific note types: blog, story, page, comments. 4 The groups on the CLARIN website are closely related to the structure of the project: each CLARIN Working Package has its corresponding Organic Group and each Working Group in the working packages has a corresponding Organic Subgroup. A person who works for a CLARIN member institution can apply for joining a Working Group - meaning, in Drupal terms, that it requests membership in one of the Organic Subgroups. The person that administers the Organic Group or the webmaster is in charge of allowing or disallowing the membership of that person to a certain group. The Views module provides a flexible method for Drupal site designers to control how lists of content are presented. This modules is essentially a smart query builder that, given enough information, can build a proper SQL query, execute it and display the results. 5 On the CLARIN website the Views module is used for a multitude of purposes: to view results belonging to the LRT resource/tools inventory (the database is queried for all the submitted resources/tools and these are shown in a table sorted ascending after their name; the view allows the user to further sort this information after a couple of given criteria like type, language, country or institute), to view the events posted on the CLARIN website, to view the members of a Working Group or Work Package, etc. The Content Construction Kit (CCK) module allows the users to add custom fields to nodes using a web browser 6. This gives the CLARIN users the possibility to create new forms for custom content (for example, to create the Resource content type, and then allow users to submit the metadata about their resources by filling in the Resource content type). Only EB members and webmasters can create new content types. All the other users are only allowed to create new instances of the custom content type created by EB members or webmasters (this means that a member or partner user can create new resources, but they are not allowed to create a new content type - WebService for instance). The FCKeditor module allows Drupal to replace text-area fields with the FCKeditor 7. This is a HTML editor built according to the WYSIWYG (What You See Is What You Get) principle. It gives the user the possibility to edit their web pages inside the Drupal system, without the need to know any HTML code. The editor also has some word-processor features (align the text, change the typeface of the font, insert an image or a hyperlink, cut, paste, undo, redo, etc). This editor is used throughout the CLARIN site to ease the process of adding new pages to the site. CLARIN-2008-4 7
4. Site organization The CLARIN website can be accessed through three menus: the principal static menu (positioned horizontally at the top of the screen), the user menu (as secondary links, positioned horizontally in the second line from the top), and the navigation menu (positioned vertically on the left side of the screen). 4.1 The principal menu (static) This menu consists of a series of tabs at the top of the pages, that represent links to different pages of interest inside the site. This menu, which is called the Primary Links Menu in technical terms, offers an overview of the site's most important pages. They present the visitor of the CLARIN site the main activities, events and initiatives of the project. These links enhance the navigation across the site. Their static nature offers the visitor of the CLARIN website a simple and easy way to navigate through the pages of the site. 4.2 The user menu (user-specific, static) The user menu provides to a user the most important links needed to manage her/his CLARIN site account. Using these links the user can access and edit his profile, can create new content and can log out of his account. The menu is situated horizontally above the content area of the site. In Drupal technical terms, this menu is called the Secondary Links Menu. 4.3 The navigation menu (user-specific, dynamic) The navigation menu is a user-specific menu that allows easy navigation through the parts of the site that are the most accessed by the signed-in users. It is specific in the sense that it looks different to different users. For example, the users who are part of WG 2.1 and 2.2 will have the navigation structure corresponding to these two working groups in their navigation menu (since they are part of those working groups, we assume that they will need to access the documents of their working groups more frequently than the documents of some other working groups). This is only the default setting of the navigation menu for a user. If the user is interested in some other working groups too, she/he can choose to add or remove some of the working group navigation menus. This way the user can create a menu that is entirely suited for the work she/he has to do. The navigation menu is positioned vertically on the left side of the screen. 5. Dynamic features The first page of the site contains a special banner, positioned at the top of the page, displaying a random image from a collection of given images. These images are prepared in advance by the CLARIN webmaster team to promote the initiatives and events organized by CLARIN and to draw the attention of the CLARIN website visitors towards its activities. Another zone in the website where the information changes very dynamically is the calendars' zone: the CLARIN Calendar, the General Events Calendar and the Executive Board Calendar. The CLARIN Calendar contains links to all the events organized by the CLARIN partners and members (workshops, conferences, project meetings). This calendar can be accessed by any authenticated user. Events can be added in the Calendar by those CLARIN partners/members that organize them. The General Events Calendar is a public calendar. It contains events related to CLARIN or that might be of interest to the CLARIN community but which are not necessarily organised by CLARIN people. Registered users are given the possibility to submit a form where they describe a certain event that might be of interest for the CLARIN community. These submissions are sent to an approval queue and, once verified and approved by the CLARIN webmaster team, are shown in this events calendar. This is a good way to enhance the collaboration between different CLARIN members. The Executive Board Calendar is dedicated to the organisation of all EB meetings, virtual or face-to-face. It is restricted only to EB members. CLARIN-2008-4 8
6. Clarin activities intermediated by the website till now The CLARIN website has hosted the development of a number of activities already, among which: - the LRT inventory (a WP5 activity - hosted at http://www.clarin.eu/view_resources); - the Usage and Workflow Scenario inventory (a WP2/WP5 activity - the submission forms hosted at http://www.clarin.eu/forms/submit-a-usage-and-workflow-scenario-for-clarin - activity ended in November 2008; the list of scenarios hosted at http://www.clarin.eu/wp5/wp5-documents/usage-scenarios/); - the Call for Proposals for Collaborating with Humanities and Social Science Projects (a WP3 activity - hosted at http://www.clarin.eu/wp3/wp3-documents/call_final-version); - the dissemination of the CLARIN Newsletter (a WP6 activity, hosted on the CLARIN website at http://www.clarin.eu/newsletter). 7. Future developments One significantly positive feature of Drupal, as compared to other environments supporting website-building, is that it can also be used as a powerful and versatile tool for developing web interfaces, and this feature can be used to design user-interfaces to many of the services promised in the second and third phases of CLARIN. In the immediate future, the WP6 team plans to organise the help-desk and registry interfaces as services rooted in the CLARIN website. The plan is to put at the base of the help-desk and registry access the ALPE model (Cristea et Pistol, 2008; Cristea et Pistol, to appear), which will offer an extremely powerful mechanism to intermediate NLP requests coming from people from humanities and social sciences, therefore expected to display little knowledge on programming or even on configuration of NLP tools (as GATE or UIMA require). References Cristea, D., Pistol, I. (2008). Managing Language Resources and Tools Using a Hierarchy of Annotation Schemas. Proceedings of the Workshop on Sustainability of Language Resources, LREC-2008, Marrakech. Cristea, D., Pistol, I. (2009, submitted): Managing metadata variability within a hierarchy of annotation schemas. 1. Mercer, David, Building powerful and robust websites with Drupal 6, Packt Publishing, Birmingham, 2008 2. Technology stack - http://drupal.org/node/176052 3. http://drupal.org/project/modules 4. http://drupal.org/project/og 5. http://drupal.org/project/views 6. http://drupal.org/project/cck 7. http://drupal.org/project/fckeditor CLARIN-2008-4 9