Computing in the Web Browser [WeevilScout]

Motivation​

The vast amount of information being generated throughout all science domains is exceeding the computational power to handle and process it. Handling this exponential growth of data is one of the main hurdles for this decade if not beyond. A common point that stands out in the articles describing the visions for 2020 computing is that the ability to process huge amounts of data and the ecient collaboration between scientists will play a crucial role in future scienti c discoveries. Scienti c computing is often bound by the limited computing resources available at hand. Scienti c application which are characterized by large problem spaces have to resort to dedicated infrastructures based on clusters, clouds or desktop computers. Such applications can benefit from a vast computing resource that hitherto has been overseen and left untapped, the Internet browser.

The Idea of using a simple desktop as a computing platform is not new, volunteer computing has for a decade managed to gather an unprecedented computing power to help solve, in 2011, the aggregated computing power provided by the participants in all projects using BOINC (Berkeley Open Infrastructure for Network Computing) comes to over 5.2 PetaFlops, with 2.2 million registered users, where 280,000 are active users.

The research work we are developing here, is a much simpler and user friendly way to achieve volunteer computing using recent advances in technology and the Web 2.0 approach.

The Internet browser is ubiquitous on the Web. One can even say that the browser is synonymous to the Web as it is usually the first point of contact to the Internet. Every smart device, be it a computer or a mobile device, is equipped with an Internet browser. This remarkable piece of software shapes our daily lives by keeping us in contact with friends and collaborate with colleagues through social media such as Twitter and Facebook which have become a main stay in the way humans interact with each other on the Web. The social aspect of the current Web is enough to achieve volunteer computing almost instantly. An emergent computing paradigm is known as crowd sourcing where the intellect of humans is used to solve problems which are otherwise dicult to automate with know computer algorithms. Such a method for crowd sourcing is employed in SETLive where users voluntarily lter out signal noise received by the radio telescopes. The crowd sourcing success stories proves that people are willing to donate their time for a research cause. It is thus conceivable that people are also willing to volunteer by donating browser computing time for some research eff ort. This allows small research groups with access to limited computing resources to involve volunteers in their computing e orts through browser computing. Popular websites can o ffer the possibility to visiting users to contribute to computing thus the Web can evolve from a mere content network to a truly computing platform.

​What makes the Internet browser now capable of being a computing platform?​

The remarkable advances in browser technology specifically the introduction of HTML5. Amongst of many other features, HTML5 introduces the notion of a web workers which are basically threads within the browser that have the sole scope of performing data intensive computing without disrupting the users’ experience of the webpage.
Although JavaScript engines are unarguably slower than the equivalent program implemented in C or Java. The potential of harnessing millions of browsers overshadows the performance limitations of JavaScript engines. Furthermore, current eff orts like WebCL and WebGL that access underlying hardware directly from JavaScript shows that JavaScript limitations may be a thing of the past in the fore-seeable future and the performance can be on par with other stalwarts. With WebCL the browser can compute directly on the graphics processing unit
(GPU) which paves the way for high performance computing based on stream processing through the Internet browser without the intricacies of building infrastructures with dedicated GPU hardware. The applications that already benefi t from GPU based processing range from finance to medicine and can equally benefi t from processing on Internet browsers equipped with underlying GPU hardware.

Proof of concept prototype​

Figure 1

Figure 1: Social media plays a crucial part in garner web browser computing resources. Social media also mediates the trust between the user and the clients asked to join the network. In (1) a user with a distributed application uses social media to get colleagues and friends (2) to join the network by simply sharing a URL. Clients navigate to the URL (3) and start computing in the browser.

By exploiting the Web workers feature introduced in HTML5 and part of the new generation of Internet browsers, we have setup a framework where visiting users to our website are assigned scienti c computing tasks to perform in the background of their browser. Upon visiting the website, browsers pull JavaScript tasks from the website server (see Figure 1), the computation is performed and the results are sent back to the server. The cycle repeats itself until the queue of tasks on the server is exhausted or the user decides to move away from the website. This framework is intrinsically capable of computing on heterogeneous systems since the browser acts as a virtual machine for executing JavaScript programs.

Challenges and further Investigations

Developing new methods to tackle challenges and bottlenecks identified in the existing proof of concept on different aspects namely: scalability, performance, and security.

Scalability: To truly cater for millions of users we need a scalable backend with possibly multiple servers for distributing workloads to browsers. Servers need to be setup to share job queues so as to balance the load on the clusters of browsers. Thus each server is responsible for a cluster of computing browsers.

Performance: With WebCL, next generation of browsers can compute directly on GPUs thus drastically increasing scientific computing performance for the class of applications that are targeted for GPUs. Our preliminary results show that WebCL is as fast as OpenCL, which was expected. A scientific application or experiment is often composed of several tasks, which together form a workflow. A workflow describes how tasks depend on each other, often through data. To make a distributed computing platform attractive to science, the backend should handle workflow execution, which means piping data from one job to the next and delaying scheduling of tasks until input data is available.

Security: is also an issue. Where to draw the line? The JavaScripts running on a browser client cannot be hidden thus rising some security issues. This is also the case with data being sent to the browser. We would like to encourage users to write their own scripts for distributed computing but at the same time control against malicious usage. Being encapsulated within the browser, security against unauthorized access to the computer is as strong as the browser implement it. Simple DOS attacks can be mounted by submitting infinite while-loops this means some sort of manned or unmanned moderation should be implemented.

More details about this work can be found in:

  1. Distributed computing on an Ensemble of Browsers, R. Cushing, G.a Putra, S. Koulouzis, A.S.Z Belloum, M.T. Bubak, C. de Laat IEEE Internet Computing, PrePress 10.1109/MIC.2013.3, January 2013. [link to IEEE Internet journal]
  2. Workflow Orchestration On WeevilScout,  Ganeshwara Herawan Hananda Putra, MSc thesis University of Amsterdam March 4, 2013.[pdf]
  3. Distributed computing on an Ensemble of Browsers, Poster 2013 [pdf]

Leave a Reply