Welcome

The Networking and Wide-Area Systems Group investigates software and hardware solutions for distributed systems, wireline and wireless networking, operating systems, security and privacy, technologies and applications for the developing world. Many of our projects embody a strong inter-disciplinary, collaborative spirit. We are proud to be part of the NYU WIRELESS center and the CATER group at NYU. We also work closely with the TIER group at UC Berkeley and the PDOS group at MIT.


Front row: Prof. Jinyang Li, Ashlesh Sharma, Aditya Dhananjay, Michael Paik, Meredith Hasson, Christopher Mitchell.
Back row: Jay Chen, Nguyen Tran, Paul Lu, Matt Tierney, Russell Power, Prof. Lakshminarayanan Subramanian, Ariel Nevarez, Sunandan Chakraborty.

All the research projects of our group have been possible due to the support by research grants from the National Science Foundation, Microsoft, Google, IBM, Nokia, the Sloan Foundation and support from New York University.

Hadoop Optimization

For the past month or so, we've been investigating how to optimize Hadoop.  Hadoop is frequently used as a comparison point for the performance of other distributed systems, so having it be as fast as possible makes our job more challenging - and hopefully makes research more interesting.

In the course of our work we found that Hadoop is actually fairly well optimized (not too surprising), but there are number of simple tweaks that can be made to improve the performance for some common research use cases.  For this post, we're concerned with Hadoop's performance on a Pagerank dataset of 1 billion pages.  Our initial runs on this dataset took about 9 minutes.  Following are some changes we found helped decrease this runtime:

Caching reflected classes

There are a number of legacy hooks that Hadoop uses for creating configuration, key and value objects at runtime.  The goal of these legacy code paths is to allow for both the Configuration and JobConf based jobs to work.  Unfortunately, there is a significant overhead imposed by this code path.  Simply caching the results of a few reflection operations reduces CPU usage by 20%.

 

Speeding object copies

The default mechanism used by Hadoop to clone objects is robust, but slow.  It serializes an object and then deserializes it into the copy.  For a variety of simple objects, this can be improved by creating a Copyable interface which allows for direct copies.

 

Configuration changes

  • dfs.datanaode.max.xcievers
    Datanodes have an upper bound on the number of files that it will serve at any one time. Turning it large enough is helpful to bear high I/O overload. We configure it to 8192 to run PageRank with 1 billion pages and 250 shards.
  • dfs.balance.bandwidthPerSec
    The default speed at which HDFS balancing operations take place is 1MB/s - far too low when gigabit networks are the bottom-end.
  • mapred.map.output.compression.codec
    We use Twitter implemented LZO to replace the original Gzip to compress the output data of Map tasks and Reduce tasks.
  • mapred.reduce.parallel.copies
    When there are many map tasks serving a job, turning this parameters larger is a good option, especially when each task is small. The larger it is set, the more map outputs are fetched during the Shuffle phase simultanesouly.

 

After our changes, we measured a 25% performance improvement - down to 7m30s.  Honestly, not as much as we hoped, but still significant.  We hope to push our code changes to Hadoop upstream in the near future.

 

SOSP 2011

Transactional Storage for Geo-Replicated Systems Presented at SOSP 2011

Russell Power Awarded 2012 Microsoft Research PhD Fellowship

Russell Power Awarded 2012 Microsoft Research PhD Fellowship (Link)

Syndicate content