Wednesday, December 02, 2009

Search : Past, Present and Future

In my previous post on semantic search I discussed the drawbacks of current searches and also mentioned that it takes average 3 google searches to get the desired result.

Recently I read a paper on evolution of search 3.0. This paper described how the search has evolved over a period of time. This is what author has to say in the paper:
"In the coming third decade of the Web, Web 3.0 (2009 - 2019), there will be another shift in the search paradigm. This is a shift to from the past to the present, and from the social to the personal, and from the generic to the precise."

In short the next generation of search will be returning results based on the information supplied by the user. This means user's data has to be available to the search engine or user will publish a personal information (virtual card) along with every request they submit. These details will be metadata driven and will be used by various search engines to filter the search result and tailor it to suit user requirements matching his expertise level.

What it means is each content that is published on web must publish the metadata that describes what the content is. The metadata must contain sufficient details about the content and must be in a form that it can be interpreted by search engines. But metadata is just one side of the story. The search algorithms must be modified to make use of this metadata and produce the results considering the (published) user information. While some searches will be locations independent, there will be few searches that need to be location sensitive and the results must be valid in current location of the user.

Until Next Time...

Wednesday, November 11, 2009

Representing Uncertainty

In one of my earlier blog post Is AI a Possibility I discussed the need of a 3rd state. For past few days I am thinking about scenario where just returning a Boolean value ie True or False is not good enough.

Normally a function evaluates to either "True" or "False" based on whether the attributes of the entities meet the conditions defined in the rule or not. But it may happen at times that the entity does not contain the attributes required by the rule to evaluate it properly. Then in that case we need to have a 3rd (Not Available) and 4th (Not Applicable) state as rule outcome. When a function returns Not Available then the it implies that the entity does not contain the attribute needed for the rule to execute or process the object. On the other hand Not Applicable means the rule does not apply to the type of the entity in context.

So in total we have 4 return values for the function:
- True
- False
- Not Applicable
- Not Available
A Boolean value (outcome) is not the possible solution here. So we need an alternate representation here for the function result. All functions cannot evaluate to True/False. For those rules (functions) that cannot be evaluated we need to find out the what was the state (Not Applicable or Not Available). When we apply multiple rules to the same object in a sequence (workflow) the outcome is a set of conclusions. But the conclusion must include which rule was evaluated and which ones could not be evaluated.

I am still puzzled as how to represent the two more state considering that internally everything is represented as either 1 or 0 and that does not leave room for representing uncertainty.

Until Next Time....

Monday, October 26, 2009

Representing Frequency

While reading about Tree of Porphyry (proposed by Ramon Lull in 1272) I learnt about the 10 questions that can be asked to any entities. But one thing that was missed out of this list is representing frequency. Suppose some process A takes place every 2 days. So we need to find a mechanism to represent the repetition and the frequency at which this occurs. Ramon Lull describes When as the question that can represent the date and time related attribute of the object.

What I propose is extending the 10 questions as listed in Tree of Porphyry and adding another question to the list How Often. The purpose of how often is to represent the frequency of a repetitive attribute of the object. It will have few sub attributes like a Value (How Much) and the Unit (What Kind). Together these will describe the nature of the repetition.

Until Next Time...!!!

Wednesday, October 21, 2009

Finding the Right Data Structure for Knowledge Representation

The most commonly used data structure today is the Row-based data structure where in one row represents details about an instance of an entity type. But in real-world the representation of an entity is not a flat structured. An object's representation contains several sub-attributes that those sub-attributes may have their own sub-attributes that make up the entire object (its attributes). But if we go by the row-based representation of the object we cannot represent the sub-attribute and their relation with the main object.

Consider an example where a Person has FirstName, Surname, Home Address (Street, Suburb, State, Post Code) and Work Address (Company Name, Street, Suburb, State, Post Code). If we use a row-based representation here then we find that our records look like this :

FirstName, Surname, HomeStreet, HomeState, HomePostCode, CompanyName, WorkStreet, WorkSuburb, WorkState, WorkPostCode.

The limitations here is unless we uniquely name the Street, PostCode, State attributes for both Home and Work Address we will not be able to distinguish their real meaning. On the other hand consider a Structure like this:
Person
  - Name
    - First Name
    - SurName
  - Home Address
    - Street
    - Suburb
    - PostCode
    - State
  - Work Address
    - Street
    - Suburb
    - State
    - Post Code

By looking at this structure we can easily tell that the home Address is made up of 4 attributes and Work address is made up of 5 sub-attributes. These in-turn can have their sub attributes as well that will define them in more detail.

It is evident that the hierarchical data structure provides more flexibility and room to grow than the flat row-based structure for representing a real-world object.

Until Next Time...!!!

Tuesday, September 15, 2009

How do we achieve Artificial Intelligence?

Artificial Intelligence, I am sure many readers are familiar with this buzzword created by research groups around the world not so long ago. Where we started dreaming about many things a machine can do that we do in our day to day life and will in turn make our life simple and easy. But what happened to most of those projects, they are either shelved or have very limited usability in our day-to-day life. Though there are few outcomes that we did find useful.

When I read about the artificial intelligence and where it went wrong, I ask a question as what went wrong? Where did it all go wrong?

Lets define the intelligence. The intelligence is art of making best choices based on what we know (or rather don't know). But what determines whether we know something or not. It is our ability to recall something we learnt in past. Learning is associating facts to a context. Context define how the entities are being linked together. The linking does not have to be static.

So in a nutshell, in order to build a system that can:
  • Understand the context in which a particular fact is stated.
  • Retrieve the most appropriate rule that can be applied to the available facts i.e. show some sort of intelligent behavior.
  • The retrieve operation depends on how the raw data is structured.

In my opinion it all comes down to how the data is structured (represented) and the reasoning mechanism that works on the data.

Until Next Time..