Ask Hack Learn Share!

The API call was something like this
`GET labelsapi.com/video/find.json?labels=romantic,comedy`

.
and the query to find the video is `romantic OR comedy`

Let’s compare both operators OR and AND to see how they differ and see why the API providing only an OR increases the complexity on the consumers of it.

Let’s say you have many videos, and you can add labels to them, e.g. `comedy`

,
`romantic`

, `drama`

etc.

Now let’s say you want to find a video that is a romantic comedy. With an AND
your query will be: `romantic AND comedy`

.

So, with AND you need a label per subset, the
orders of growth is constant `O(1)`

.

To use an OR for the same subset, your atomic labels above won’t do, i.e.
`romantic OR comedy`

will return all videos that are comedies but also romantic
ones even when they are a drama.

So with OR you need one more label for the interception of the subsets
`romantic`

and `comedy`

, e.g. you need a label like `romantic:comedy`

. And
that’s because you don’t have AND so you need to label the subsets
interceptions, which is what the AND does.

There’s one problem of OR: *you need a label per subset, plus another per
every interception of subsets that you want to query for*. Of course the
problem gets worse the more subsets you have and the more interceptions of
those subsets that you need to label.

The order of growth for **number of labels** is at best
sub quadratic depending of how you combine the atomic labels.
E.g. for 2 labels, you need a total of 3 labels, i.e. 1 per atomic label plus
another for the combination of them, for 3 labels you’ll need 6, a.s.o. The
order is then `O(n * (n + 1) / 2)`

i.e. it has a
quadratic behavior.

For your code it means **code complexity** in orders similar, both when a label
is added to or removed from a video.

Let’s see an example: at first your video only is a comedy, i.e. just one label
`comedy`

. Then an user labels it as romantic, you not only have to add the
atomic `romantic`

but also its’ combinations with other
labels that you need to query for, in this case `comedy:romantic`

.

It also increases the **complexity of debugging** a problem and you have that
big amount of labels to deal with.

You task will be then to reduce that number of tags as much as you can. As an
example you could sort the labels, then you just need one combination of them,
i.e. tags are sorted in the combinations `comedy:romantic`

and not unsorted ones
`romantic:comedy`

are allowed because then the amount of labels will double.

So by only providing OR the API meant for us *much greater orders of growth for
both the algorithm to generate labels, and the number of labels*.

Thanks to Charles Bernasconi for detecting a problem in my math.