How to implement AI on the correct way, and use it?

Let face it: AI seems to be smarter and smarter, but this is just an illusion. AI is not smarter, just it is more computational power put in it. Common sense, so natural to humans, is still something that today’s AI is missing. Neural networks are probably good at “recognizing” items on the shown image, but can’t explain what is it exactly on the image in the wider context.

When a human sees an image, she will describe what is there on the picture, and what the recognized items are doing.

When AI take a look at the picture, the maximum we can get wit some confidence are tags of the items recognized on the image. Let taka e look at these images:

The human will see a girl hugging a rabbit on the desk. But AI will “tag” only a girl and a rabbit.

Or, let take a look at the following picture:

Human will see an elephant walking on the road. But AI will tag only elephant and trees.

If you present an image to some image recognition system, what you’ll get from it is a list of a bunch of tags with “recognized” probability for each of them. When I showed an image of myself to the ImageNet based fine-tuned convolutional NN, it recognized me (with given 20 images of me for training). The next tag that it presented for me with a little bit smaller probability was a rhododendron.

No matter how smart AI systems seem to be, they still need human supervision.

So, how to use AI in everyday life, and still consider it reliable?

The possible answer are Hybrid AI systems, ones that combine AI with the human reasoning and supervision.

One example how it is done is the following startup:

I tried it with the following image:

And I got the following “recognition”: white three seat sectional sofa with chaise.

One example more:

Nos Etoiles Contraires by John Green hardcover book

What system saw there is Nos Etoiles Contraires by John Green hardcover book

Another example:

pair of men’s black leather derby shoes

What is shown on the image, according to them: pair of men’s black leather derby shoes.

How they are doing this? Instead of relying on AI alone, they pair an AI system with an army of human captioners. In the beginning, humans do most of the work of describing images, and the AI essentially watches and learns.

Over time, though, the AI starts to take over. As it learns more and more from its human trainers, it relies on them less and less to produce usable output. By the end of the training process, the humans are almost entirely out of the loop. However, they do remain sometimes involved, even once the system is in production.

Quite amaizing, for me, at least.


