What is eScience ?

>> Sunday, July 20, 2008

(This will be helpful for me to explain my friends what I am working on currently ;) )

Disclaimer : this will be a basic introduction and might not be sophisticated enough to Achilles in the field.

In simple terms, eScience is where computer scientists blends with scientists from other science fields to solve their problems efficiently. In my view there are two things that are being referred to as eScience these days.

1. Computer scientists apply their algorithms and knowledge on to other science fields. For example, one could use algorithms and methods like neural nets, machine learning, etc., to medical field to efficiently device solutions to those areas.
Even though most of people don't see this as part of eScience, having being to a talk from David Heckerman, I also agree with him.

2. There are algorithms that require large amount of computational power and time to compute something or they act on large amount of data. For these algorithms to work or these tera bytes of data to be mined, one might need the help of super computers.

- Handling these large amount of data
- executing those algorithms on these data
- enabling scientists to work these data, thru GUIs or workflow engines etc., is also regarded as eScience.

This is what emphasized by most people and wikipedia as well.

I think I am also more in to the second area, so I will explain a bit more on that.

Think about the following scenario, related to meteorology, to understand the use case.

A country might have a large number of weather stations reporting various weather conditions to a central location. In case of US, IIRC, there are about 144 weather stations. Each weather station sends data, say once in a hour. If the size of a file sent by each weather station is about, say 1GB (this value will depend on the resolution of measurements), then we will get about 150GB per hour. There are algorithms to go through this data and mine them to find out interesting weather stations. For example, one algorithm will find out, say a set of storms using those data. Since the first phase will act on these data separately, there has to be another algorithm to aggregate the results. If first algorithms shows 5 storms, it can be few of them are related to the same one. Likewise there are different algorithms that can be run on top of this data.
Scientists can either run their algorithms on these data alone, or they can define workflows to run on these data. For example, they can design a workflow which will

  1. first mine these data, find interesting conditions
  2. cluster them to identify unique conditions
  3. talk to individual weather stations to get more data, if needed
  4. come up with a scenario explaining the current conditions
  5. predict on the path of the storm or behaviour
Since all these have to be carried out in a timely manner (You don't want to get today's weather forecast tomorrow, right ;) ), and the data sets involved are large, it is required to use high performance computers for these .

To peform the above mentioned tasks, there has to be some infrastructure which can enable the users to
  1. design, execute, monitor workflows
  2. perform data movements from/to computing resources. These movements will not be easy as it will not only invlove large amounts of data, but also involves working with super computers, data centers, etc.,
  3. schedule and monitor jobs in high performance computing environments like clusters, grids, etc.,
This whole environment can be regarded as an eScience environment. This is just one examlpe and there are lots of problems like this in bio-science, neuro-science, aerospace, etc.,

Read more...

Places to visit in Washington State - Mt St Helens

>> Friday, July 18, 2008

Location : Johnston Ridge Observatory, At the end of Spirit Lake Memorial Highway, WA

Directions : Google Maps, About 3 hrs from Bellevue, WA

For GPS : 46.276258,-122.216721

Link :
en.wikipedia.org/wiki/Mount_St._Helens

This was one of the interesting trips I went, with my family. The road to Mt St Helens was full of fascinating scenaries.
After we exit from I-5, the road goes through a small town and after that the road will be full of sharp turns. At one time there was a sign saying, it was the last place to get gas. It was 37 files from that point, but I didn't realize we will be gaining elevation and my car will have to do extra work. (Thanks to Corolla's fuel efficiency I didn't run out of gas :) )
There are couple of view points on the way and most of them were gorgeous. There was one place where you can see the path of mud and lava flow.
When we got to Johnston Ridge Visitor Center, the view was great. Since it was a sunny day we could see the whole mountain without any trouble. There are couple of trails lead by some rangers and one of them was going towards spirit lake. Visitor center also had some movies being played inside a theater.

This is a combination of three photos, showing the mighty Mt St Helens and the living crater.


Mt St Helens and the lava and mud flow path



360' view around the Mt St Helens area. If you look at the surrounding mountains, you can still see some burnt trees


Read more...

Google thinks I am a "virus"

>> Monday, July 14, 2008

I was searching for a grocery store in my area, in google, and this is what the result was.

"We're sorry .. but your query looks similar to automated requests from a computer virus or spyware application. .... "

Seems some one had messed up automatic spyware detection. Can this be due to an error in IE?

Update : I just checked couple of more queries, now using firefox, and I got the same error. So it is some thing happening beyond my machine. Can be internal network or google is messing up.



Read more...

Open-source vs source-open

>> Tuesday, July 08, 2008

I asked from one of my friends, what opensource means to him. He said, if he has source then he is good with it. Is this the meaning of opensource software? Where is the community component?

If some one builds a software, in-house, and put it out with the source, is this opensource? I personally think there is something missing.

There seems to be a trend in larger projects, that the customers demand for source. Especially large clients (like governments) in Europe tend to lean towards opensource software. So most of the companies are trying to exploit this by putting something out as their source.

What is the meaning of this? In my personal opinion, people should like opensource, because it is/was a community effort and not by a single company. In these sorts of projects, if one contributing company goes out, then the clients have more options. Also there will be competition and better code/product through synergy and open discussions. Since users are also involved in this process, the ultimate product will be what users need.
If company A can not afford to build a software alone, they can create a community around it and build a software. This will benefit the company and also benefit other people as well.

But if you write your code internally, make all your decisions and put it, it is just like the automotive industry in 1970s. Customers get what a company wants and not what they want. Even if the clients get the source, it will be crappy most of the time :)

Apache has this nice rule where a project needs at least 3 different players to be recognized as a project within Apache. One of the reasons for this is to make sure, companies won't dump any code and then claim those are opensource.

Read more...

What is Open source software (to me)

>> Monday, July 07, 2008

(Warning I might be biased towards my experience in different Apache projects)

I was doing some background work on this for a while, asking from different people and searching the web to understand what people really expects. I was some what pissed-off by this definition here.

Why do we contribute to opensource projects? Do we need something in return other than the satisfaction (some times the visibility is what really matters when you apply for higher studies or jobs, but those are secondary).

I was really happy to see thousands of users posting questions in axis-dev, commons-dev and other mailing lists in my projects in Apache. We have done something for the betterment of their progress and to the world. Do we need to restrict them? Why do we wanna say if you use this, you need to make your stuff also open source or in other words "dance to my rhythm"? Bull shit !!

Are we trying to make the whole world open source, and by doing so create a different and completely secluded camp? What is the point there? We have to be practical and give something to the people out there.

Why do we wanna enforce viral licenses? My main idea is satisfaction out of it. I am so happy to see the code I've written being used by so many people around the world, without any geographical or language barriers. When I introduce myself as a developer from XX project, people really like to talk with me and my colleagues. Do we need anything else from our contributions?

It is true that I was supported by some organization when I was contributing to those projects, but those organizations had better and far more efficient models to earn money, rather than restricting others, from our contributions.

There are some organizations, which are fully closed source, but use lots of opensource software. For their business, GPL like licenses are not healthy. Do we wanna restrict them too? Why? Yes they earn money from our efforts, so what? Those companies are just some set of users from my point of view. Sometimes they give credit to the opensource projects that they have used. Isn't that enough.
Think about the university research groups using our open source software. They are researching for the betterment of the world. They also try to optimize their funding to do something to the world. Do we wanna add barriers to them?

Apache style licenses add no barriers to the end users of those software. You can do whatever you want with it. Isn't it cool? Isn't it the success behind opensource?

There is also another category which I refer as "source-open", rather than open-source, which I need to research a bit.

Read more...