(This will be helpful for me to explain my friends what I am working on currently ;) )
Disclaimer : this will be a basic introduction and might not be sophisticated enough to Achilles in the field.
In simple terms, eScience is where computer scientists blends with scientists from other science fields to solve their problems efficiently. In my view there are two things that are being referred to as eScience these days.
1. Computer scientists apply their algorithms and knowledge on to other science fields. For example, one could use algorithms and methods like neural nets, machine learning, etc., to medical field to efficiently device solutions to those areas.
Even though most of people don't see this as part of eScience, having being to a talk from David Heckerman, I also agree with him.
2. There are algorithms that require large amount of computational power and time to compute something or they act on large amount of data. For these algorithms to work or these tera bytes of data to be mined, one might need the help of super computers.
- Handling these large amount of data
- executing those algorithms on these data
- enabling scientists to work these data, thru GUIs or workflow engines etc., is also regarded as eScience.
This is what emphasized by most people and wikipedia as well.
I think I am also more in to the second area, so I will explain a bit more on that.
Think about the following scenario, related to meteorology, to understand the use case.
A country might have a large number of weather stations reporting various weather conditions to a central location. In case of US, IIRC, there are about 144 weather stations. Each weather station sends data, say once in a hour. If the size of a file sent by each weather station is about, say 1GB (this value will depend on the resolution of measurements), then we will get about 150GB per hour. There are algorithms to go through this data and mine them to find out interesting weather stations. For example, one algorithm will find out, say a set of storms using those data. Since the first phase will act on these data separately, there has to be another algorithm to aggregate the results. If first algorithms shows 5 storms, it can be few of them are related to the same one. Likewise there are different algorithms that can be run on top of this data.
Scientists can either run their algorithms on these data alone, or they can define workflows to run on these data. For example, they can design a workflow which will
- first mine these data, find interesting conditions
- cluster them to identify unique conditions
- talk to individual weather stations to get more data, if needed
- come up with a scenario explaining the current conditions
- predict on the path of the storm or behaviour
To peform the above mentioned tasks, there has to be some infrastructure which can enable the users to
- design, execute, monitor workflows
- perform data movements from/to computing resources. These movements will not be easy as it will not only invlove large amounts of data, but also involves working with super computers, data centers, etc.,
- schedule and monitor jobs in high performance computing environments like clusters, grids, etc.,



