Ask Hack Learn Share!
The API call was something like this
GET labelsapi.com/video/find.json?labels=romantic,comedy
.
and the query to find the video is romantic OR comedy
Let’s compare both operators OR and AND to see how they differ and see why the API providing only an OR increases the complexity on the consumers of it.
Let’s say you have many videos, and you can add labels to them, e.g. comedy
,
romantic
, drama
etc.
Now let’s say you want to find a video that is a romantic comedy. With an AND
your query will be: romantic AND comedy
.
So, with AND you need a label per subset, the
orders of growth is constant O(1)
.
To use an OR for the same subset, your atomic labels above won’t do, i.e.
romantic OR comedy
will return all videos that are comedies but also romantic
ones even when they are a drama.
So with OR you need one more label for the interception of the subsets
romantic
and comedy
, e.g. you need a label like romantic:comedy
. And
that’s because you don’t have AND so you need to label the subsets
interceptions, which is what the AND does.
There’s one problem of OR: you need a label per subset, plus another per every interception of subsets that you want to query for. Of course the problem gets worse the more subsets you have and the more interceptions of those subsets that you need to label.
The order of growth for number of labels is at best
sub quadratic depending of how you combine the atomic labels.
E.g. for 2 labels, you need a total of 3 labels, i.e. 1 per atomic label plus
another for the combination of them, for 3 labels you’ll need 6, a.s.o. The
order is then O(n * (n + 1) / 2)
i.e. it has a
quadratic behavior.
For your code it means code complexity in orders similar, both when a label is added to or removed from a video.
Let’s see an example: at first your video only is a comedy, i.e. just one label
comedy
. Then an user labels it as romantic, you not only have to add the
atomic romantic
but also its’ combinations with other
labels that you need to query for, in this case comedy:romantic
.
It also increases the complexity of debugging a problem and you have that big amount of labels to deal with.
You task will be then to reduce that number of tags as much as you can. As an
example you could sort the labels, then you just need one combination of them,
i.e. tags are sorted in the combinations comedy:romantic
and not unsorted ones
romantic:comedy
are allowed because then the amount of labels will double.
So by only providing OR the API meant for us much greater orders of growth for both the algorithm to generate labels, and the number of labels.
Thanks to Charles Bernasconi for detecting a problem in my math.