The vast amount of information being generated throughout all science domains is exceeding the computational power to handle and process it. Handling this exponential growth of data is one of the main hurdles for this decade if not beyond. A common point that stands out in the articles describing the visions for 2020 computing is that the ability to process huge amounts of data and the ecient collaboration between scientists will play a crucial role in future scientic discoveries. Scientic computing is often bound by the limited computing resources available at hand. Scientic application which are characterized by large problem spaces have to resort to dedicated infrastructures based on clusters, clouds or desktop computers. Such applications can benefit from a vast computing resource that hitherto has been overseen and left untapped, the Internet browser.
The Idea of using a simple desktop as a computing platform is not new, volunteer computing has for a decade managed to gather an unprecedented computing power to help solve, in 2011, the aggregated computing power provided by the participants in all projects using BOINC (Berkeley Open Infrastructure for Network Computing) comes to over 5.2 PetaFlops, with 2.2 million registered users, where 280,000 are active users.
The research work we are developing here, is a much simpler and user friendly way to achieve volunteer computing using recent advances in technology and the Web 2.0 approach.
The Internet browser is ubiquitous on the Web. One can even say that the browser is synonymous to the Web as it is usually the first point of contact to the Internet. Every smart device, be it a computer or a mobile device, is equipped with an Internet browser. This remarkable piece of software shapes our daily lives by keeping us in contact with friends and collaborate with colleagues through social media such as Twitter and Facebook which have become a main stay in the way humans interact with each other on the Web. The social aspect of the current Web is enough to achieve volunteer computing almost instantly. An emergent computing paradigm is known as crowd sourcing where the intellect of humans is used to solve problems which are otherwise dicult to automate with know computer algorithms. Such a method for crowd sourcing is employed in SETLive where users voluntarilylter out signal noise received by the radio telescopes. The crowd sourcing success stories proves that people are willing to donate their time for a research cause. It is thus conceivable that people are also willing to volunteer by donating browser computing time for some research effort. This allows small research groups with access to limited computing resources to involve volunteers in their computing eorts through browser computing. Popular websites can offer the possibility to visiting users to contribute to computing thus the Web can evolve from a mere content network to a truly computing platform.
What makes the Internet browser now capable of being a computing platform?
The remarkable advances in browser technology specifically the introduction of HTML5. Amongst of many other features, HTML5 introduces the notion of a web workers which are basically threads within the browser that have the sole scope of performing data intensive computing without disrupting the users’ experience of the webpage.
(GPU) which paves the way for high performance computing based on stream processing through the Internet browser without the intricacies of building infrastructures with dedicated GPU hardware. The applications that already benefit from GPU based processing range from finance to medicine and can equally benefit from processing on Internet browsers equipped with underlying GPU hardware.
Proof of concept prototype
Challenges and further Investigations
Developing new methods to tackle challenges and bottlenecks identified in the existing proof of concept on different aspects namely: scalability, performance, and security.
Scalability: To truly cater for millions of users we need a scalable backend with possibly multiple servers for distributing workloads to browsers. Servers need to be setup to share job queues so as to balance the load on the clusters of browsers. Thus each server is responsible for a cluster of computing browsers.
Performance: With WebCL, next generation of browsers can compute directly on GPUs thus drastically increasing scientific computing performance for the class of applications that are targeted for GPUs. Our preliminary results show that WebCL is as fast as OpenCL, which was expected. A scientific application or experiment is often composed of several tasks, which together form a workflow. A workflow describes how tasks depend on each other, often through data. To make a distributed computing platform attractive to science, the backend should handle workflow execution, which means piping data from one job to the next and delaying scheduling of tasks until input data is available.
More details about this work can be found in:
- Distributed computing on an Ensemble of Browsers, R. Cushing, G.a Putra, S. Koulouzis, A.S.Z Belloum, M.T. Bubak, C. de Laat IEEE Internet Computing, PrePress 10.1109/MIC.2013.3, January 2013. [link to IEEE Internet journal]
- Workflow Orchestration On WeevilScout, Ganeshwara Herawan Hananda Putra, MSc thesis University of Amsterdam March 4, 2013.[pdf]
- Distributed computing on an Ensemble of Browsers, Poster 2013 [pdf]