Sufferings and Life
>> Sunday, March 30, 2008
Last week I had to take, sort of a difficult decision on selection of XXX out of n options. I was upset when I didn't have any of the option, about 3 weeks before. Last week I was upset since I had too many options to select and to abandon all of them except one.
This reminded me about Ven. Ajahn Brahma's talk on "Your way to Happiness". He gave some wonderful examples of how we suffer due to our expectations. For example, when one is not married and having problems with marriage he (or she) has the "single person suffering". When he (or she) get married, then he (or she) has the "married person suffering". When he (or she) doesn't have kids he (or she) has "no kid suffering". When he (or she) has kids then he (or she) has "sufferings due to kids". So the point is we will never get away with sufferings. We will just transfer one form of suffering to another.
When will we be satisfied with no sufferings? If we think about this for a while, it is true for everything.
CBR and Workflows for e-Science
I was talking with my friend Jaliya about an interesting application of CBR (Case-Based Reasoning) to workflows in e-Science.
Most of the eScience systems that use workflows, run these workflows on super computers. These operations might take a considerable amount of time even within super computers and the overhead of the system itself might degrade the performance of a system. Sometimes we might have to run multiple workflows to get a good decision or output.
If we take weather applications for example, we will run couple of workflows to get a final decision. What if we can look at the data, compare them with previous data and results, and then predict the workflows to run? It is true that no two weather events are similar (a quote from my friend Suresh Marru). But this method might give really fast clues to those meteorologists.
We were trying to do this as a class project, but failed as we couldn't find enough data within the system. But for me this seems a very promising approach.
During one of our extreme lab brainstorming coffee session, we thought about the application of CBR for scheduling jobs within grid environment. If we have data like, process requirements (# of CPUs, time constraints, etc.,) and previous information like wait time, execution time, affinity for each super computer, then we can build a AI system, with concepts from CBR. An interesting project to do, if I get some time.
Example Based Learning
People who already know about this might disregard this approach as soon as they see it. But this seems to be a more fundamental way of how we initially grasp a new spoken language. It is true that the extent to which you can proceed with this approach is limited. But this method is a good starter.
This method can be applied to translate between languages, learning from given examples. Let me give an idea of how this works and teach you some Sinhala ;).
English : I go to (the) school
Sinhala : Mama pasela ta yami
English : I go to the shop
Sinhala : Mama kade ta yami
Now if you are asked to translate "I go to campus", now you can translate some of this as "Mama xxxx ta yami". You just have to find out the translation from campus to Sinhala and you are done.
This is the simplest way one can use learning. As you can see this will become harder for long and complex sentences, but this is very useful to start learning languages.
Lesk Algorithm
I am taking an NLP (Natural Language Processing) course this time there seems to be quite a number of interesting algorithms in that. I will try to blog on some of the interesting algorithms here once in a while, for the interest of my blog readers.
My interest in NLP doesn't mean I am going away from my major interest in Distributed computing :)
This time it is about Lesk Algorithm. Ambiguity is one of the challenges in NLP. Words like suit, chair, etc., have multiple meanings depending on the context. There are various ways of disambiguating them and Lesk algorithm is one of them.
Let's take the word "suit" as an example. It has meanings like clothes, law suit, fit for something, etc.,
First we need to find sample sentences for each and every meaning of the word suit. For every meaning, we create "bag of words" from those sample sentences. Basically we get all the words from the samples sentences, and assign that to the particular meaning.
When we get a new sentences with the word to disambiguate, we get all the words in that sentence, and try to find the best intersection with the bag of words we created earlier. The meaning associated with the maximum intersection, is considered the meaning of the word in the test sentence.
Let's take an example from the word "suit"
'suit of clothes': "set of garments (usually including a jacket and trousers or skirt) for outerwear all of the same fabric, design and color) "they buried him in his best suit""
'lawsuit': "comprehensive term for any proceeding in a court of law whereby an individual seeks a legal remedy) "the family brought suit against the landlord"
The bag of words of the meaning 'suit of clothes' will consists of {"set", "of", "garments", "usually", "including", ...}
Then we get a test sentence, with the word suit.
{'Mayor', 'William', 'B.', 'Hartsfield', 'filed', 'suit', 'for', 'divorce', 'from', 'his', 'wife', ',', 'Pearl', 'Williams', 'Hartsfield', ',', 'in', 'Fulton', 'Superior', 'Court', 'Friday'}
In the first approach, we remove all the stop words like 'the', 'in' both from test and training sentences.
If we do that, you can see the closest meaning will be 'law suit' for the given test sentence.
There is a problem of identifying the stop words. Perhaps some of them might not be stop words in some contexts. For this, the second approach is to come up with an IDF measure. We take all the sentences as it is.
We get set of training sentences and "augment" our bag of words with them. We then weight the overlapping set by their their inverse document frequency (IDF). As a crude way to calculate the IDF for a word, we can use the n documents within a given corpus as documents. For example, the word brain appears in 12 of the 15 documents, so its IDF is log 15.0/12.0 = 0.3219.
This method, even though might be slow, gave me better results for disambguation, over the first approach.
I wrote a simple python code for this (this was one of our assignments). You need to have NLTK installed (installation notes) to run this.
It was so amazing to see how this really works, even though it seems to be dump sometimes to me. This is the more trivial of capturing contexts in to programs.
Interesting Questions
>> Sunday, March 09, 2008
I encounter some interesting questions on the Web, which seemed to common in Google and Microsoft interviews. Try to see you can answer these. I will publish my answers later.
1. You go one mile North, One mile East, One mile South and you come to the same point you started. How many points like this are on earth and how do you form these.
(Hint : I got to know from a different resource that, if your answer is
1, 2 - you will not be hired, infinity - you will be hired, infinity * infinity - the correct answer !!)
2. There is a room at the corner of a hallway. It has a light in it and the door is closed. You have three switches (which of course has only on and off positions only) at the other end of the hall. You are allowed to go in to the room only once. How do you find the correct switch associated with the bulb in that room?
3. This is a famous question and considered as a fermi problem.
How much you will charge to wash all the windows in your city?
4. How do you weigh a boeing 747?
5. There are 5 pirates on an island. They have 100 gold coins. These pirates have seniority rankings. Number one is senior than all the others, Number two is the second and so on. The most senior person can propose a plan on how to distribute the coins. If it is accepted with 50% or more votes (including his vote) he can execute that plan. Else, if he lost it, he will be killed and the next senior pirate gets the chance to propose and so on. You can assume that all the pirates are thinking logically and all wants to live. If you are the most senior person what is your plan which will get you lots of coins and will not be killed by others?
6. How many times a day, the clocks hour and minute hands overlap?
None of these questions are stupid or has stupid answers. All these are logical. I will post my answers to these after some time.
There are lot more questions like this, and I really like those brain teasers.
More questions : 1, 2, 3
Interesting book on these : How would you move mount fuji?
What is Facebook ... My Perspective
I joined facebook recently and I can still remember I asked from people what is this all about. The answer was "social-networking" but it had zero information for me, at that time. When I logged in to Internet, the first think I do is check my GMail account. But I've seen lots of undergrads here, the first thing they do is check facebook, once they login.
Later I think I got a grasp of it. It has different meanings for different people. Most students here use that as a way of being with the community. Sharing photos, arranging/announcing meetings, parties, expressing themselves, etc.,
For a different ethnic community it is a way of showing-off. Looking at the photos they publish, information on their profile, how they comment on others and their photos, sometimes it is funny to see how these people behave. Whenever they get something to show-off, first thing they do is put that in facebook.
Also facebook is making you lose your private life, especially when others publish your drunk, naked (to a greater extent), humiliating photos. Once I can remember Fil mentioned about a presentation he did to his class, taking all the humiliating photos from facebook.
So whatever some people do, they forget that it is public on facebook. For some people they like this publicity. One day I'm sure they will understand.
Personally now I like facebook, mainly I can be in touch with friends I know. Also I can find my old friends. At least I can see photos of them once in a while (of course some of them are fun to see). I like some of the apps facebook has, but sometimes it is irritating when i get messages like "X wanted to Hug you", "X wanted to compare movie taste with you", "X wanted you to be an angel" etc.,
Sometimes it is funny and sometimes it is irritating, isn't it?
New Home
>> Friday, March 07, 2008
I was blogging in bloglines.com for sometime and now I decided to move in to blogspot once again. The main reason was comments and image support here. Also I wanted to link all my accounts in to one place, including mail, docs, RSS readers, etc., So here I am.
