GSoC 2017 - DBpedia Mappings Front-End Administration
This summer, I was quite fortunate to work on the DBpedia Mappings Front-End Administration project with my mentors Anastasia Dimou, Dimitris Kontokoskas, and Wouter Maroy, for DBpedia organization. I developed the entire web application (frontend and backend), and integrated it with external systems such as the Extraction Framework and Wikipedia. In this post I summarize the status of the project, what I’ve done, and what I’ve learnt from this great experience.
Project Status and Important Links
- Completed: YES. Works as intented, all the required functionalities and tests have been developed.
- Github repository: Link (All the commits are mine).
- Project documentation: Link
- GSoC progress log: Link
- Heroku deployment: Link
- DBpedia deployment: Soon, planned in September 2017.
- Future work: Deploy on DBpedia servers, integrate with pre-commit validations.
This project goal was to create a web application (comprised of a frontend and a backend) that provides a user-friendly interface so the DBpedia community can easily view, create and administrate DBpedia mapping rules using the new RML format. The developed system includes user administration features, help posts, Github mappings synchronization, and rich RML related features such as syntax highlighting, RML code generation from templates, RML validation, extraction and statistics. Part of these features are possible thanks to the interaction with the DBPedia Extraction Framework.
This project fills a missing gap in DBpedia infrastructure: although the DBPedia Extraction Framework was adapted to support RML mappings thanks to a project of last year GSoC, the user interface to create mappings was still done by a MediaWiki installation, not supporting RML mappings and needing expertise on Semantic Web. You can see an Arquitecture Diagram at the bottom of the page: everything inside the darkest box was made within this project.
Although there was an initial list of requirements, some of them where changed, some removed, and some were added while the project was being developed. However, at the end, all the functionalities and goals that were required have been developed, with many functional tests and the approval of the DBpedia community. The system is ready for production deployment. Hence, we can conclude that the project development has been successful.
Here it is a list of the principal features/goals developed:
- User account management.
- User groups and permissions management.
- Help posts management (creation/edition/view/removal) with markdown support.
- Webprotege integration to edit ontology.
- Mapping list with statistics and rich search.
- Mapping view and edition with ACE editor and RDF syntax checker.
- Wikipedia extraction from a mapping.
- RML generation from multiple mapping templates, embedded in mapping editor.
- Mapping history to keep old versions.
- Two-way mapping synchronization with Github repository, with support of pre-commit validations.
- Connection to Extraction Framework to obtain mapping related data and operations.
- Deployed to Continuous Integration system with Travis and Heroku.
Phases of the projects
Here’s a brief description of what was done in the different phases of the project. For a more fine-grained description of tasks and plans, please see my GSoC Progress Page.
During the three bonding weeks, I analyzed the old mapping process, its flow, how the users interacted, and identified possible problems. Also, I studied Semantic web related topics (RML, RDF) and designed the architecture of the system to be developed, including component diagrams, interface mockups and a new requirements list. Finally, I started to practice with React and Redux to feel more confident when coding time arrives.
2. User management, posts and basic functionality
From 1st week to 4th (May 30 - June 25), I worked on the functionality required that was not related to Mappings. This includes modifying Aqua framework in many aspects to meet permissions management requirements, modifying navigation, adapting layout of pages and implementing help posts with Markdown rendering. In addition, multiple tests were created to ensure the proper working of the system, and a Continuous Integration system was built using Heroku and Travis.
This first phase helped very much to get used to Aqua framework, React and Redux. It was very challenging as I did not know anything about the framework or how the tests were created.
3. Mappings functionality
From 5th week to 10th, the core functionality was developed. This includes Webprotege Integration, and all mapping-related functionality: list of mappings, edition, view, adding a mapping template, extraction, among many other features. One functionality that was specially challenging and has been developed successfully is the Github mapping two-way synchronization. It was challenging but finally managed to implement it in a robust and nice way.
As many features depended on Extraction Framework API’s that were not ready yet, they were mocked up at first and connected to the real Extraction Framework as soon as possible.
4. Extraction Framework connection, finish last details
Last two weeks were dedicated to connecting to the different APIs provided by the Extraction Framework and finishing up some details such as the Wikipedia article search, creating some tests that were still missing and fixing some bugs. Also, a Dockerfile was created to ease the deployment, and many documentation was written in the Github repository wiki.
First of all, it is worth to mention that the system will be deployed in the DBpedia servers in September 2017, replacing the current mapping wiki. I will keep contributing to it, pushing fixes, making changes, new functionalities or whatever is required, even though, as I will be studying my Master’s degree, I will have less time than now. A possible future change would be the integration of pre-commit validation (it is already prepared but DBpedia community has to agree on how validations are done).
Besides, Wouter Maroy and I will present a demonstration of the project on the 10th DBpedia Community Meeting in Amsterdam.
Challenges & Learnings
- Getting used to a new code base and language.
- Extracting the mapping creation workflow and users problems.
- Creating an intuitive UI that follows conventions so users do not need help.
- Creating complex React components.
- Integrating external libraries such as Ace editor into the frontend.
- Implementing robust two-way synchronization with Github.
- Deployment on Heroku, where persistent file-system is not available.
- Integrating with WebProtege instance.
- Creating Dockerfile for the Mappings UI.
- React and Redux knowledge.
- Improved NodeJS and ES6 knowledge.
- Deepened my Semantic Web knowledge with concepts such as mappings, RML, transformations, etc.
- Dockerizing applications.
- Improved knowledge of Continuous Integration systems such as Travis and Heroku.
- Learned to better organize and prepare developer Skype meetings.
- Improved organization while working on remote.
- How to integrate and communicate with other members of the organization.
I want to thank my mentors, and specially Anastasia and Wouter, for their great dedication and support. They have been always willing to help me whenever I had a problem, and guided me when I was lost. Thanks also to DBpedia association, for giving me this opportunity, the chance of going to Amsterdam to their 10th meeting, and for being such a great community :) Last but not least, I want to thank Google GSoC team for accepting me into this program, that has changed how I work and how I see the open source world.
Extra: Architecture Diagram
In this image you can see a diagram with the high-level architecture of the system, and the most importan communication between components.