As a program, iMR would replicate all of the MapReduce APIs (application programming interfaces), and it would carry out the similar functionality of MapReduce, namely filtering data and aggregating the results. It would differ in that it could continually produce analysis, based on the latest data.
The researchers built a prototype of iMR, using the Mortar distributed stream processing software. With iMR, the user would specify the range of data being processed in a given analysis, such as all the data gathered in the past 60 seconds. The user would also specify how often results would be produced and transmitted, such as every 15 seconds.
Logothetis admitted that the Web servers may spend most of their resources doing their intended jobs, namely delivering services to users. But iMR could use the spare cycles left over to crunch the log data.
In their paper, the researchers devised a plan where the organization could establish a trade-off between the speed at which they get the analyzed data with the completeness of the resulting analysis. If an organization wants the analysis quickly, then each server may drop individual log entries during times of heavy usage, leading to less conclusive, but still relevant, results. Whereas a thorough analysis of the data would require a longer period of time, and more server resources to complete.
While in-situ MapReduce may not be beneficial for organizations that run only a handful of servers, it could be valuable to larger operations, such as search engines, social networks, and e-commerce sites, Logothetis said.