Saturday, October 26, 2013

implementation of dependency click model

background

sometimes we are just not satisfied with the rank when implementing IR system, even it is based on sophisticated rank strategy.
therefore, user feedback is an important part of ranking system.
implementation of click model is one way to achieve this.
this article is based on the paper: http://research.microsoft.com/pubs/73115/multiple-click-model-wsdm09.pdf, which is presented by Microsoft Research.
it is called Dependency Click Model because it is considers dependency on position.


hypothesis

considering a list of query result, we simply assume that user will examine them strictly by order.
and user click these results that seems relevant, and may continue this examining process, then stop somewhere.
DCM is based on what this basic hypothesis implied.
for details on how it works, please read: http://research.microsoft.com/pubs/73115/multiple-click-model-wsdm09.pdf

engineering implementation

to implement such feature, we decompose the model into 4 processes.

a) model data storage
we propose this array-of-counter-pairs structure. each array-of-counter-pairs is an array for counter-pairs, which contain two counters, so it may look like this:
[(alpha, beta), (alpha, beta), (alpha, beta), (alpha, beta), ...]
element index indicates corresponding position.

we need to store global array-of-counter-pairs and array-of-counter-pairs for each keyword. 
this could be done with databases or simple binary file.
it may contain a hash table (key:keyword, value: array-of-counter-pairs). 

b) data recording and collection
we need to record exactly which result did user clicked for a query using specified keyword, and what result are in the list.
therefore, for a particular query, we may have a record like this (say session):
USERID, KEYWORD, SHOWN RESULTS, CLICKED DOCS

c) log analyzation
we need periodic analyze data record to update counter-pairs.
for global array-of-counter-pairs, each alpha indicates how many last click in session occur in corresponding position, while beta indicates how many click occur at corresponding position.
for keyword's array-of-counter-pairs, each alpha indicates how many click in session occur in corresponding position, while beta indicates how many impression occur at corresponding position (if it shows before last clicked position, then it is impressed based on hypothesis).

first update global counter for each recorded session, and then update that for specified keyword.
the relevance of individual document is alpha / beta.

d) rank adjustment
we can simply sort Top-N document by the computing score:
score = 0.3 * relevance + 0.7 * positionScore,
where positionScore is a constant corresponding to original position in list.


No comments:

Post a Comment