Building a NAS/HTPC Combo - Part 1: Requirements and Hardware

>> Sunday, February 09, 2014

For some time I wanted to build a NAS at home mainly to backup my kid's photos and videos and make those available to be viewed on TV and iPads. I researched for some off-the-shelf NAS systems but the cost of those and my enthusiasm to build my own, kept me away from buying those. Here, I will be describing the process I went through in case someone else is also interested in doing the same.

Requirements

I'm not too much worried about the electricity cost of running the system (at least for now) but really concerned about the cost of building the system. Ideally, I should be able to build a system that can support at least 6 disks for less than the cost of 4 disk off the shelf system (at this time, diskless Synology 412 is about $450-$600)

Software

  1. NO WINDOWS !! :)
  2. system should allow us to access the media across the network and translate the videos on demand to suit the devices
  3. OS installation and configuration should be relatively involved task so that I can pick and chose what I want. And then later, I should be able to change or re-configure (so, no completely off the shelf, limited configurable software stacks)

Hardware

  1. hardware requirements for the selected OS should not mandate server grade expensive hardware (So, no Free4NAS)
  2. system should be able to store more than 4GB of data with at least one backup.
  3. system, once setup, should be extremely quiet.
With these requirements in mind, I picked following components and bought during black friday and Christmas shopping season in 2013.

Hardware Used

To realized the above requirements, I came up with the following hardware requirements. 
  1. should be mini-ITX to have the smallest form factor and must have enough fans to cool
  2. the casing should support at least 6 disks
  3. the motherboard should support 
    1. 6 x 6Gb/s SATA III drives
    2. 1 x PCIe 3.0 slot for expansions
    3. HDMI output (in case, I decide to convert this to a HTPC)
    4. at least 16GB DDR3 memory
    5. Haswell processor
  4. The disks should have either SMART features so that I can spin down the idle disks or should be NAS compatible so that it can manage on it own

Along these requirements, here is what I bought. 
  1. Fractal Design Node 304 - $39.99 ($49.99 - $10 MIR) This was the best extremely quiet and compact mini-ITX case in the market with 6 disk bays. And for $39.99, this was a steal.
  2. Asus Intel H87I-PLUS Motherboard - $92.99 ($117.99 - $25 MIR) This was the cheapest motherboard from a well recognized vendor in the market with all of my requirements. This motherboard had the support for RAID support as well but it I think is considered pseudo-RAID and might not have any difference with software RAID
  3. Intel Pentium G3220 3.0GHz LGA 1150 Dual-Core - $69.99
  4. Kingston HyperX (2 x 4GB) DDR3-1333MHz memory - $69.99 ($79.99 - $10 MIR)
  5. Ultra LSP Series V2 450-Watt Power Supply - $29.99 ($39.99 - $10 MIR) This PSU has 4 SATA power connectors for hard drives
  6. SATA 6 Gb/s cables and power cable splitters from monoprice - $10.70
Total cost of the system, without the disks, is $313.65. Well under the budget :). Mind you, the Synology 212 only has 1GB RAM and can only support 4 disks.

Then hard drives.
  1. 2 x 3TB Western Digital Red Drives - $175.98 ( 2 x $139.99 - 2 x $27 MIR - $30 AMEX promotion - $20 V.me promotion)
  2. 1 x 3TB Western Digital Red Drive - $119.99
  3. 1 x 3TB Toshiba SATA III 7200rpm - $79.99 ($109.99 - $30 MIR)
  4. 1 x 2TB Seagate Barracuda - $79.99 ($109.99 - $30): Decided to have a separate disk to keep the OS and the data that doesn't need any backup but needs to be shared across the network.
Cost of NAS drives was $375.96 ($31.33/1TB) and total for all disks was $455.95. Total cost of the system was $770. In part 2, I will discuss how I built the system.

Acknowledgments: a big thank to Keith for helping with pointers and interesting discussions and slickdeals community for helping me to find the deals to keep the cost down.

Read more...

Deploying Cassandra Across Multiple Data Centers with Replication

>> Thursday, October 13, 2011

Cassandra provides a highly scalable key/value storage that can be used for many applications. When Cassandra is to be used in production one might consider deploying it across multiple data centers for various reasons. For example, your current architecture is such that you update data in one data center and all the other data centers should have a replication of the same data but you are ok with eventual consistency.

In this blog post I will discuss how one can deploy a Cassandra across three data centers making sure every data center contains full copy of the complete data set (this is important because you don't have to go across data centers to serve the traffic coming into a given data-center.

I assume you already downloaded and configured Cassandra on each of the boxes in your data centers. Since most of the steps we are doing here should be done for each node in every data center, I encourage you to use a tool like cluster-ssh (this will enable to open connections to all the nodes and run commands in parallel).

Goals
Setup a Cassandra cluster on three data centers with four nodes in each cluster. Every piece of data will be places on three nodes (one in each data center). In other words replication factor is 3. Let's assume our nodes are named as DC<data-center-name>N<node-id>. For example, DC2N3 will be the third node in second data center.

Steps
Note that all these steps, except Step 4, must be followed in EACH AND EVERY node of the cluster. These steps are tested on Cassandra 0.8.7 version.

Step 1: Configure cassandra.yaml
Open up $CASSANDRA_HOME/conf/cassandra.yaml in your favorite test editor (did I hear emacs :D).

  1. change cluster_name to a suitable value instead of the boring 'Test Cluster'.
  2. Set the initial_token. Current Cassandra implementation does a very poor job of distributing keys across the cluster. So you need to give all the nodes they are responsible for. There are two ways to divide the keys among nodes.

    1. Distributing Keys Evenly: In this scenario we will distribute the range of keys across all the nodes in the cluster. Go here and enter the number of nodes that you have in total in all data centers. For our example it is 12. Once it is generated carefully copy each value and place in each of the node's cassandra.yaml file under initial_token.

      They keys of each node in the data center should look like the following in our example.

      Data CenterNodeKey
      110
      1214178431955039101857246194831382806528
      1328356863910078203714492389662765613056
      1442535295865117307932921825928971026432
      2156713727820156407428984779325531226112
      2270892159775195516369780698461381853184
      2385070591730234615865843651857942052864
      2499249023685273724806639570993792679936
      31113427455640312814857969558651062452224
      32127605887595351923798765477786913079296
      33141784319550391032739561396922763706368
      34155962751505430122790891384580033478656

    2. Distributing Load Evenly: In this scenario we will distribute the load of the cluster across all the nodes in the cluster. Go here and enter the number of nodes that you have in each data centers. For our example it is 4 in this case. Copy the values generated into the nodes's cassandra.yaml file under initial_token in first data center. Then add one to each of these values and put that value on the nodes of second data center. Then for the third data center add two to the value of tokens in first data center (or add one to the values of nodes in second data center)
      They keys of each node in the data center should look like the following in our example.

      Data CenterNodeKey
      110
      1242535295865117307932921825928971026432
      1385070591730234615865843651857942052864
      14127605887595351923798765477786913079296
      211
      2242535295865117307932921825928971026433
      2385070591730234615865843651857942052865
      24127605887595351923798765477786913079297
      312
      3242535295865117307932921825928971026434
      3385070591730234615865843651857942052866
      34127605887595351923798765477786913079298

      Once we loaded the data into the cluster we've seen an equal distribution of load using the second method and also it is the recommended way for multiple data centers with snitch files.

  3. Point data_file_directories, commitlog_directory and saved_caches_directory to proper locations and make sure those locations do exists (otherwise create them).
  4. Set the seeds. It is best to select one node from each data center and list it here. For example, DC1N1, DC2N2, DC3N3
  5. Assuming your node is properly configured to return the right address when java calls InetAddress.getLocalHost(), leave listen_address and rpc_address blank. If you are not sure type hostname in each node and get that value as the address.
  6. Set endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch. We will provide a snitch file later (snitch file let Cassandra know the layout of our data centers.
That's pretty much it you have to do in cassandra.yaml (assuming you haven't touched any of the other default params)

Step 2: Configure log4j-server.properties
Find log4j.appender.R.File and point it to a proper location. Make sure you remember this because this is the log you will be searching for when things are going bad.

Step 3: Configure Snitch File
Open cassandra-topology.properties in a text editor and let Cassandra know about your node and data center configuration. For our example, this is how it should look like.

# Cassandra Node IP=Data Center:Rack
DC1N1=DC1:RAC1
DC1N2=DC1:RAC1
DC1N3=DC1:RAC1
DC1N4=DC1:RAC1

DC2N1=DC2:RAC1
DC2N2=DC2:RAC1
DC2N3=DC2:RAC1
DC2N4=DC2:RAC1

DC3N1=DC3:RAC1
DC3N2=DC3:RAC1
DC3N3=DC3:RAC1
DC3N4=DC3:RAC1

# default for unknown nodes
default=DC1:RAC1

Step 4: Start Your Cluster.
Goto $CASSANDRA_HOME and type ./bin/cassandra -f to bring up the node. Once you do this in all the nodes type ./bin/nodetool -h localhost ring to make sure all the nodes are up and running.

Step 5: Create Data Model with Replication
We are almost there. Now we need to tell Cassandra to use this configuration for our data model. The best way to do is through cassandra-cli.
Goto $CASSANDRA_HOME/bin and type ./cassandra-cli.

Type connect localhost/9160; to connect to the cluster. Note the semi-colon at the end. If successful you will see Connected to: "<YOUR_CLUSTER_NAME>" on localhost/9160;

Now you need to create the keyspace with proper replication. Assuming your keyspace name is MyCompanyKS type the following.

create keyspace MyCompanyKS with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1,DC2:1,DC3:1}];

and then follow the rest of the steps in cassandra-cli wiki to create column families.

That's it. Now you have an awesome Cassandra cluster spanning across three data centers. Enjoy !!





Read more...

End of Another Important Chapter ....

>> Saturday, July 30, 2011

When I joined Axis2 team in August, 2004 I had no intentions or plans on getting into grad school. I can still remember the discussion I had with my good friend Ruchith Fernando in my small car and he managed to initially convince me for grad school. Then with the consistent encouragement and advices from Dr. Sanjiva Weerawarana I decided to take the challenge.

I won the Fulbright scholarship but it didn't turn out well in university selections. I was then selected to Indiana University to start in Spring 2007 and Prof. Dennis Gannon accepted me into extreme lab right away. I was fortunate to be a part of extreme lab and to have thoughtful discussions with my colleagues, especially during coffee breaks. Even though it was harder initially to survice with a never experienced harsh winter in Indiana, a 4-months old kid and heavy course workload I managed to get used to the system after a while.

During the summer 2008, with Dr. Gannon moving to Microsoft Research, I decided to join Prof. Beth Plale and her data-to-insight center in fall 2008. With the guidance of Prof. Plale I think I transferred from being a software developer to a researcher over the next few years. The change in thinking I think is one of the most important things I learned in my PhD. Even with her busy timing she could allocate one hour for all her PhD students every week and I think those were really helped me to stay focused and get timely feedbacks on my research. Also the opportunities to directly impact the scientific research, through projects like enabling Vortex2, was some of the greatest experiences I had in the lab.

Life was never easy with a family and to live with the somewhat small monthly stipend (I don't know why CS department pays the least compared to other departments even though most professors are willing to raise student stipends). The three internships that I did, two in Microsoft Research and one in Google MapReduce team, not only helped me financially but also gave me tons of experiences and ideas for my research. From the beginning, me and my wife really liked to go out and see the nature and these three internships really helped us to see lots of places in US. Especially the road trip from / to Indiana state to/from Washington state over 8 days and 7 days with my then three year old daughter was a highlight. The only worry was because of this road trip my kid has to celebrate her 3rd birthday some where closer to Grand Canyon in 105 degrees.

About 5 years of work bore fruit when I successfully defended my thesis on July 25th, 2011.

Acknowledgements

This PhD would have been impossible without the tremendous support from my advisor, Professor Beth Plale, who has supported and encouraged me throughout my study and research. I also want to thank Dr. Dennis Gannon for taking me into extreme lab, advising me during my initial years and also providing me guidance when its needed. I would like to thank my research committee members, Prof. Geoffrey Fox and Prof. David Leake for their valuable comments and suggestions throughout the research.

I also owe my gratitude to all data-to-insight lab members for their support and feedback on different projects I was involved. I also owe my gratitude to all extreme lab members who worked with me during my initial years. I would like to thank the instructors of the courses I took in Indiana University. Knowledge gained in those courses helped me to shape my research direction at numerous occasions.

I also want to thank Dr. Sanjiva Weerawarana for introducing me to Apache Software Foundation, helping me to lay a solid foundation in distributed systems and also encouraging me to pursue higher studies. I was very fortunate to have such a mentor at the right stage. I also like to thank Microsoft Research External Computing group, Extreme Computing group and Google MapReduce team for the internship opportunities I got. Those opportunities definitely helped to shape my research and to put my research in perspective.

I also should thank my colleagues in Axis2 team, Sanjiva, Ajith, Srinath, Chathura, Deepal, Jaliya, Glen and Dims. We finally could build something that everyone could make use of and I'm really proud of it. I should also thank my teachers from my college (Dharmasoka College, Ambalangoda), lecturers from my undergraduate years in Computer Science and Engineering department in University of Moratuwa and all my friends. Everything I learnt during these interactions helped me to be who I am right now.

Finally, I want to dedicate this achievement to my wife Thushari, my daughter Dihini, my parents and my brother for their continuous support, sacrifices and encouragements. When almost everything were going wrong, they stood behind all my decisions and supported me in everyway they could. Its a real fortune to have such a family.


My Thesis Defense Presentation



Read more...

Reversible Quantum Computing: How we can preserve everything in a computation –Beginner Introduction

>> Tuesday, March 01, 2011

(When I was in high school, I was baffled by Arthur C. Clark’s science fictions which led me to read on very basic quantum computing stuff like the contraction of visible length with light-speed. Now, with my auditing of “Reversible Quantum Computing” and thanks to Amr’s excellent explanations I try to understand the beauty of this interesting field. This post and the posts to come in the future are to introduce my thoughts on this area, especially applied to distributed systems, and I hope will also help a non-physicist or non-quantum mechanics person to understand them)

Disclaimer: As I said earlier, I’m an absolute beginner for this area. These posts are to give at least some idea of the beauty of this field to newcomers. If you are an expert you will see these posts boring but you are welcome to comment and fix my errors Winking smile

Reversibility is an interesting concept even in the

Technorati Tags:
current scientific experiments. What this means is when you run your experiment/program now, can you reverse it and get back to the original inputs at any time? This looks trivial but lets take an example. Lets take XOR gate, which outputs 1 when the input is 01 or 10 and outputs 0 when input is 00 and 11. Once you get the output as, say 0, can you say what the input is? You can not because the input can be either 00 or 11. So XOR operation is an irreversible operation. But what if you change your gate to output the input also? Consider the following modified truth table in Table 2. The last two digits of the output contains the input itself making this gate completely reversible.

Input Output
00 0
01 1
10 1
11 0
Table 1: XOR Gate
Input Output
00 000
01 101
10 110
11 011

Table 2: Reversible XOR Gate

Now what is the benefit of reversibility? As I mentioned earlier, this enables the ability to go back and forth in a program. This is a huge advantage in computing. There are lots of literature already in this subject ( I will discuss some of this in future posts). At any given point of time you have enough information to back-track to previous state. And may be branch from there to a new direction. One of the most important advantages of reversible computing is the ability to conserve power. Since you are not erasing any of the information you will not lose any of the energy in the system and ultimately you end up having a system which doesn’t waste any power (theoretically, at least).

If you want to know more read the original paper from Toffoli on Reversible Computing.

Read more...

Flashing Netgear WNR2000 (v2) with DD-WRT

>> Friday, January 07, 2011

Finally I got hold of a Netgear WNR2000 (v2) router and changed it to act as a repeater bridge . I can now connect my desktop in a different room to the main wireless modem. Even though this particular router has the capability to act as a wireless client and/or a repeater bridge, the factory firmware doesn’t allow you to do it. Flashing the router with dd-wrt will enable you to do that and also opens up lots of other possibilities. Once its done its kind of trivial, but there were little or no instructions available on the web to do that.

(Disclaimer: Fortunately I didn’t get any of the complications mentioned in some of the forums. I’m writing this for my own benefit too so that I can refer back to this later. You are at your own risk in following these guidelines and I’m in no-way can not be hold responsible. Also, if things go wrong you will end up bricking your router, yes I’m serious.
If there are any complications, please contact dd-wrt forum. I have little or no knowledge to fix those complicated errors.)

Step 1: Making Sure
Make sure your WNR2000 router is v2 and not v1. Only v2 is supported by dd-wrt at this time. The version is marked on the router itself. But at the store, if the outside box doesn’t explicitly mention it, check whether the serial number is starting with 23B or 23D. If that’s the case, then its a v2 router. Flashing a v1 router (if it works can brick the router)

Step 2: Getting Firmware
To download firmware needed go here and pick the file that ends with “WNR2000v2.chk”. When I was writing this the latest available version can be download from this link.

Before the next set of steps, go to this link and read all the warnings and guidelines found in section 1 and 2.

(These instructions are copied from here, but adapted to suit WNR2000 router)

Step 3: Reset The Router
Do a hard 30/30/30 reset of the router.

Step 4: Login to the Router
Connect your router to your computer, using a Ethernet cable. Make sure your computer is not connected to any other network. Open up a browser and go to http://192.168.1.1. Login with root as the username and admin as the password. Just glance through the settings just to make sure everything is working fine.

Step 5: Upload the Firmware

  1. On the router page click on Maintenance –> Router Upgrade. Remove the check on “Check for New Version Upon Log-in”.
  2. Click on “Choose File” and point to the file you downloaded in step 2. Then click “Upload”. It will take about 2-4 minutes and DO NOT interrupt this process.
  3. At the end of the above process, if everything is successful and if you refresh the router page you will see the new DD-WRT page. Make sure to give a new username and password. At this point, reset or restart the router and after that you are done.

Now you can make your router an access point, repeater bridge, wireless client or so much more by following the instructions here.

Read more...

Usage Patterns to Provision for Scientific Experiments in Clouds

>> Wednesday, December 01, 2010

My paper presentation at CloudCom 2010 on "Usage Patterns to Provision for Scientific Experiments in Clouds".


Abstract
Driven by the need to provision resources on demand, scientists are turning to commercial and research test-bed Cloud computing resources to run their scientific experiments. Job scheduling on cloud computing resources, unlike earlier platforms, is a balance between throughput and cost of executions.
Within this context, we posit that usage patterns can improve the job execution, because these patterns allow a system to plan, stage and optimize scheduling decisions. This paper introduces a novel approach to utilization of user patterns drawn from knowledgebased techniques, to improve execution across a series of active workflows and jobs in cloud computing environments. Using empirical analysis we establish the accuracy of our prediction approach for two different workloads and demonstrate how this knowledge can be used to improve job executions.

Read more...

WRF 3.2 Installation Notes on Windows using PGI Compiler

>> Monday, October 11, 2010

Yep, being in Computer Science, you might ask why I bother about compiling WRF (Weather Research and Forecast) model. But lately we are researching on enabling scientists to run small weather forecasts on their small scale workstations. With the improvements of Windows HPC, we decided to use Windows HPC as our platform and also use PGI as our compiler. I followed following steps and got WRF 3.2 compiled in April, 2010. This is just an aggregation of resources in to one place and please post your questions to respective mailing list. I'm no way a WRF or netCDF expert :D


Pre-requisites

1. Install Windows HPC Pack 2008 (including SDK). Make sure you can run mpiexec and also the availability of MPI headers after the installation.
2. Install PGI compiler 10.x or later. PGI installer will also install cygwin for you. So there is no need to explicitly install that.

Step 1: Installing netCDF

You can either download a pre-built binary or compile yourself. Follow the instructions found here.

Step 2: Installing WRF
  • Download WRF 3.2 or higher from WRF web page. Unzip the tar file and it should create WRFV3 folder.
  • Goto WRFV3/tools and edit either source_for_pgi_windows.bash or source_for_pgi_windows.csh depending on the shell you use. By default it will be bash shell. Include the path to netcdf installation. (More info can be found here)
  • source the proper bash or csh script.
  • Follow the instructions found in WRF online tutorial.
  • Commands I used to compile WRF for real cases. But these options depends on the architecture, compiler you use, nesting options, etc.,
    • ./configure (selected dm+par and basic option)
    • ./compile em_real

Read more...