How do you keep control of a being that’s smarter than you? No one actually knows the answer, not yet. The greatest minds available are scrambling to find a solution before it’s too late. There seem to be three basic theoretical approaches: 1) don’t let it get smarter than you in the first place, 2) don’t let it do anything, and 3) make sure it wants to do what you want it to do.
The first approach is the most obvious: if it doesn’t get smarter than you at all, you haven’t got a problem. This approach mostly takes preventative measures, such as making sure you install a kill switch to be used at the first sign that the system’s getting too smart, and always having someone watching what the system’s doing. This is one of the reasons behind why we’re trying to make AI that you can look at the inner workings of, to understand what it’s doing and why. If you can’t look at the AI’s thoughts, it can pretend to be less intelligent than it really is, and it can have hidden goals that you don’t know about until too late.
Unfortunately, there are problems with this approach. Even with as much human oversight as possible, there’s still a chance that things could happen so unexpectedly and so quickly that the whole situation’s out of control before anyone can do anything. Not to mention, it eliminates all the good a superintelligent system could do, because the whole point is to not have a superintelligent system. Right now this is the form of safety that is most heavily relied on by far, but it can’t hold up in the long term.
You’re also safe if you take the AI and, basically, put it in a box. This is the second approach, not letting the superintelligence do anything. If the superintelligence has no access to the internet, it can’t take it over. If the superintelligence can’t move in any way, it can’t build an army of robots. With this approach you ask the superintelligence to give you information, and then you’re very very careful about scouring that information to make sure it’s safe, and then at the end you have yourself some extremely useful information.
There are problems with this approach, too. As long as the superintelligence can transmit information, it can try to manipulate the people around it – and is likely to succeed, at least eventually. If the superintelligence can’t transmit information, then it’s totally useless and also might still figure out some clever way to escape somehow, by doing something couldn’t have anticipated. You’d be taking the smartest thing in the observable universe and offering it absolutely no distractions from doing anything but focusing it’s huge, constantly improving intellect on escaping. Experts on the matter seem to agree this is the weakest approach.
And finally, we have the big kahuna: making it so that the superintelligence only wants to do the things you want it to do. This is the ultimate goal of AI safety, the only way that we could let a superintelligence help us as much as it could, while still being safe. The thing is, figuring out what we actually want is really hard even for us, let alone trying to explain it in terms a machine could understand. The AI can’t take orders from just anyone, because then some psycho would tell it to nuke the world. But we can’t just tie it to a specific person or organization, because then they could use it against us – who would you trust with ultimate power? The AI has to give us what would actually be best for us, not just what we think we want, because everyone knows that humans are terrible at figuring out what would actually make us happy. And then someone has to figure out what actually counts as “best for us”, and the conversation goes on.
This is a very, very difficult problem. Whole institutions are built around solving this one thing. Careers are made of toiling over this, and scholarly papers are written about every new idea. We believe we are making progress – we just have to hope we solve it in time.
For more information I recommend Nick Bostom’s groundbreaking book, Superintelligence: Paths, Dangers, Strategies.