Least Significant Bit

Ask Hack Learn Share!

API only provides queries with OR

Recently we had to integrate our system with a label API that provided OR as its unique operator in queries. For us as consumers of the API it meant that the number of labels we needed per object increased at orders of growth of quadratic of the labels of an object. As a consequence the complexity of our code also increased at quadratic orders of growth

The API call

The API call was something like this GET labelsapi.com/video/find.json?labels=romantic,comedy. and the query to find the video is romantic OR comedy

Let’s compare both operators OR and AND to see how they differ and see why the API providing only an OR increases the complexity on the consumers of it.

Tags: OR vs AND

Let’s say you have many videos, and you can add labels to them, e.g. comedy, romantic, drama etc.

Now let’s say you want to find a video that is a romantic comedy. With an AND your query will be: romantic AND comedy.

So, with AND you need a label per subset, the orders of growth is constant O(1).

To use an OR for the same subset, your atomic labels above won’t do, i.e. romantic OR comedy will return all videos that are comedies but also romantic ones even when they are a drama.

So with OR you need one more label for the interception of the subsets romantic and comedy, e.g. you need a label like romantic:comedy. And that’s because you don’t have AND so you need to label the subsets interceptions, which is what the AND does.

There’s one problem of OR: you need a label per subset, plus another per every interception of subsets that you want to query for. Of course the problem gets worse the more subsets you have and the more interceptions of those subsets that you need to label.

The order of growth for number of labels is at best sub quadratic depending of how you combine the atomic labels. E.g. for 2 labels, you need a total of 3 labels, i.e. 1 per atomic label plus another for the combination of them, for 3 labels you’ll need 6, a.s.o. The order is then O(n * (n + 1) / 2) i.e. it has a quadratic behavior.

For your code it means code complexity in orders similar, both when a label is added to or removed from a video.

Let’s see an example: at first your video only is a comedy, i.e. just one label comedy. Then an user labels it as romantic, you not only have to add the atomic romantic but also its’ combinations with other labels that you need to query for, in this case comedy:romantic.

It also increases the complexity of debugging a problem and you have that big amount of labels to deal with.

You task will be then to reduce that number of tags as much as you can. As an example you could sort the labels, then you just need one combination of them, i.e. tags are sorted in the combinations comedy:romantic and not unsorted ones romantic:comedy are allowed because then the amount of labels will double.

So by only providing OR the API meant for us much greater orders of growth for both the algorithm to generate labels, and the number of labels.

Thanks to Charles Bernasconi for detecting a problem in my math.

blog comments powered by Disqus