On 1 April 2016 SURFdrive had been operational for two years. During this period, SURFdrive grew to approximately 16.000 users. Over 140TB of data is stored on SURFdrive in the form of 100 million files. More than 4.400 unique users log in to SURFdrive every day. Time for a look back and a look ahead.
In the autumn of 2013 the research universities (CvDUR) asked SURF to develop a sync-and-share service, like Dropbox and Google Drive, for Dutch higher education and research. Research universities had concerns about storing documents in the public cloud because safe data storage is extremely important within Dutch education and research.
SURFsara and SURFnet responded to this request and started developing a safe cloud storage service together with the research universities. The first step was to choose a software product. A comprehensive package of requirements was drawn up, including requirements arising from the ‘Framework of Legal Standards for Cloud Services in Higher Education’. Another important requirement was federated access via SURFconext so that people could log in using their institution account and password. Eventually, ownCloud was chosen as the software for SURFdrive.
The creation of SURFdrive
The first three months of 2014 focussed on the technical design of the service and the parallel testing of SURFdrive. Users at the various research universities tested the functionality in a test environment at SURFnet. At SURFsara the main focus was on examining the robustness and performance of the technical set-up. This was not limited to the technical installation, configuration, tuning and testing of the service. SURFnet and SURFsara also put a lot of effort into setting up the service with user documentation, a website and user support for the institutions.
The 1st of April 2014 was the big day. The creation of SURFdrive was a reality.
For the technical set-up we worked from the assumption that the infrastructure had to be redundant. This meant that if one node failed it would not lead to SURFdrive failing. Scalability was also examined. The design that we selected had to have the intrinsic capability for having the environment fully up-scaled.
Schematically, SURFdrive comprises the following components:
- Web servers
Apache was chosen for the web servers because of the easy integration with SURFconext. The ownCloud software runs on top of this server.
- Load balancing
The renowned open source load balancing software HAProxy, used by Twitter, Instagram, Github, Alibaba and Airbnb among others, was also chosen for SURFdrive. Load balancing is important for sharing the load across the available servers.
- Database servers
Galera MariaDB, a multi-master database set-up, was chosen for the database servers. These database servers continuously keep each other synchronised. The ownCloud metadata are stored on the database servers.
- Storage servers
We chose GlusterFS for the storage servers, partly on the advice of ownCloud based on their reference architecture. When SURFdrive was started, ownCloud required a mounted file system as primary storage backend.
GlusterFS is configured in ‘distributed replicated’ mode, which means that all files are stored twice on different storage nodes. In addition, all data on all nodes is protected against disk failure by RAID6. Everything in the network area is also fully redundant.
After an initially tentative start, institutions joined in a rapid tempo. This can be seen from the figure below. As of September 2015 almost all research universities are members.
The two figures below show the growth in the amount of data stored and the growth in the number of users.
This growth is reflected in the infrastructure. See the figures below.
SURFdrive has not only grown in size, this also applies to the quality of service. The number of outstanding tickets can currently be counted on one hand. And this is despite the number of users tripling in two years.
The issue of data storage in the public cloud not only affects the Dutch, it is an international issue too. This is why initiatives like SURFdrive have come into being in many countries. For instance, large ownCloud systems can be found at AARNET in Australia under the name CloudStore+, at CERN (CERNbox), at the research universities in North Rhine Westphalia in Germany (Sciebo) and at the Swiss NREN SWITCH (SWITCHbox).
We maintain close links with our colleagues abroad and exchange experiences with each other. We meet each other at workshops and operate in close collaboration in our contacts with the software supplier ownCloud.
With SURFdrive we want to connect with the collaboration between organisations. Group structures are significant to this. We have lots of plans for the future. An important functionality on our roadmap is group functionality, which allows a group account to be created with a specific quota. The storage space for this account can be shared by a specific group of people.