In part one of this series we looked at some specific ways that AI broke down. What is most notable is that the AI did some frankly impressive things. It managed to find text on an image, recognize the general context of the image, and provide a good translation for Japanese text. However, it failed to do these things simultaneously in a reliable way. It might recognize the context of the image, and spit out an entirely made up phrase based on that context that had nothing to do with the text. The problem is that the AI is "gerneral." Since it is supposed to do everything, and it has not been programmed specifically to do any certain task, there is always the real danger that it will do soemthing other than what is wanted, such as "make up something that this image might say, ignoring the text" instead of "give me an accurate translation of the text." If it only did one thing, like copying over text from an image, and we could clearly command it to do that one thing, then it wouldn't make the same sorts of mistakes. That is, the "general" nature of the AI is what is causing the problems, which is a real shame considering the cool things it can do when it does work.
That raises the question: why do general AI in the first place? If you look into trendy articles talking about AI, they are all about things like large language models where the AI is trained to make associations on extremely large data sets without being programmed specifically for a task. What I mean is for example suppose that we want to sort a list. The old fashioned way is to simply choose a sorting strategy, like mergesort or quicksort, and give an explicit list of commands in code for the computer to use to follow that strategy. Now articles take it for granted that of course what you really want to do is use some sort of large language model to interpret the user request, and then train it until it magically sorts the list. Perhaps you throw in a websearch as part of the process so that your AI might find a website that sorts lists for you and uses that. Obviously this is going to be much less efficient than the old way, on top of introducing many new ways for errors to show up. Thus, why bother?
The most charitable reason would be that in some cases we might not be able to parse what is needed just from one source. For example, suppose you want to have a computer translate a manga. Japanese often lacks any indicators in text about the sex of a person, but in English in order to have a natural sentence it's often necessary to select "he" or "she" (or possibly "it" if we're talking about an object.) Now suppose you have a simple sentence in Japanese like "来た!" This basically means "(something) has arrived!" with the "something" unstated. In the way that this is used, "He's here" or "she's here" or "it's here" might all be valid translations. Which do we pick? In a manga the character saying this is probably referencing another who will be visible in the image, and that tells us. Suppose we really did have a computer program that could extract text from images flawlessly, and another which could translate Japanese text, and another which could replace the text in manga with an English phrase (possibly rephrasing as necessary to fit the space.) If we used each of these individually, there would be no way for the computer to determine the proper translation, since none of the parts can see the character being addressed. Thus it is argued that we need a "general" AI to fully understand all parts of the manga.
Fair enough, though I have some responses. First of all, even if we did only have programs for each part, it would be trivial for a human using the computer to pick the correct translation, even if he did not understand Japanese. That is, perhaps our goal shouldn't be to have a computer that can do everything but instead to have it do enough as to make our lives simpler. Keep in mind that the current state of general AI does not reliably work. From what I have seen, it sometimes will get the correct phrase, and identify the context (such as a female character arriving) and use that to get the translation of "She's here!" But it could equally well identify a female character and ignore the text and get a translation like "She's pretty!" Thus, at least at the current stage, a general AI is going to require a human to check its output. Note in this case the human will be required to understand Japanese, at least at a basic level, since we can no longer rely on the translation that the computer gives being accurate. That is, if it says that the text means "she's pretty!" and there's a guy looking at a pretty girl, well that could be what it meant if you didn't understand the text at all. You'd have to know that the characters 来た have nothing to do with beauty. That leads me to next point: how does the computer know what really matters for the translation? There will be many things on the page that have nothing to do with phrase. For example, maybe the computer says it means "look at the cat!" because there's a cat two panels afterwards (and this is a very real risk, since in my experience the AI's we have now tend not to follow the right to left reading order common in manga.) Or maybe it translates differently because the main character is wearing a blue shirt, or because there are 8 panels on this page, or because of the font used. You might say "but clearly none of that matters!" The truth is that the AI can't possibly know that; no one ever told it what matters. It is just able to get good associations of "context" most of the time because the insane amount of data fed into it (and data which had already been interepreted by humans at that.) There is always a risk that this time its methods pull a completely random detail as the appropriate "context" and that causes it to screw up the translation.
The response to that will be that this is just the early stages of the technology. Let's grant that for the sake of argument. The fact remains that general AI is only really needed for specialized applications like comic translations. There's no real need to use this sort of generative AI to do things like sort lists, generate calenders, do web searches based on specific keywords, find files, answer common tech support questions, etc. These can all be programmed using far more efficient methods that won't randomly decide to do something other than what you asked. And they certainly won't run into the "lazy AI" problem (i.e. when you say something like "please sort this list" and it responds "sure thing, I'll do it tomorrow" or "I've done it and posted it to github!" because it is mimicking what it saw in e-mails about work requests, rather than correctly carrying out the task.) Why would we even consider using this sort of AI for problems that do not require it?
As chance would have it, I've already given much of the answer in my AI article from two years ago. Particularly, I note that there are four types of people who flock to "AI:"
Let's look at how general AI benefits each of these groups:
What it really connects all these things together is that general AI is "magic." I mean it isn't, but for each of the groups above it feels like that in one way or another. For the normie who knows nothing about it, of course it is magical. But for the CEO, it magically makes existing problems go way. (Really it doesn't, but it makes them appear to go away, and isn't that just as good?) Here is a charmingly titled essay from someone ranting about that attitude. You should really read that article too, but here are some of the main points to keep in mind: First, CEOs are pushing AI hard even while their basic technological infrastructure is falling apart. But fixing infrastructure is hard and unrewarding; at best you get back to where you were. If you can convince yourself that the new AI will not only resolve existing problems but allow you new applications, when then it really is a magic fix, isn't it? Next, there is definitely a lot of exagerration, but everyone ignores it because of the hype. For example a third of companies polled said that they use AI for key strategic decisions, which is insane and everyone knows it, but because AI is the current big thing people can say that and not only keep a straight face but actually get rewarded by others because of it. The author notes that this kind of fraud is hardly anything new; trade shows for products that don't exist and which have entirely fake mock-ups aren't just a Dilbert joke, and how it must be assumed that a lot of AI "successes" are more of the same. Notably, he makes the same remark I do that for most applications generative AI is completely unncessary, regardless of how it turns out in the future, but he goes further and notes that even if it does work out, then lazily throwing AI onto everything you have now is not going to be a good move (because what we have now doesn't do the job, and if it does work out as well as people claim it will the new stuff will be better than anything you have and will be available to everyone anyway.) However remember the CEO isn't necessarily trying to do what is actually the most beneficial for his company, he just wants a magic fix that makes the problems go away. And that's exactly what generative AI claims to do, and as long as the hype remains in place claiming to fix the problem is as good as fixing it for the CEO's concerns.
Not a lot to say about the snake-oil salesmen. They are of course selling this as a magic fix all, but they did that with every previous technology. "Just having your applicances connect to the network", "Just make a social media page", "just focus on e-commerce", "just make a website", "just buy a new computer," etc. Now it's just use AI. Honestly for them the technology doesn't really matter in the same way that it does for the other groups because in their minds the purpose of technology is to get sold not necessarily to work.
But for the true believers, the magic of AI is very, very important. It's hard to ignore that AI has failed to deliver on its promises decade after decade. We were supposed to be getting weather reports a year out by 1970, and the weather forecast still isn't even accurate for 10 days? Similarly, image processing and self-driving cars have been "ten years out" for something like half a century. Even the most Pollyanna of programmers has to catch on eventually, so we need a real paradigm shift. The key thing about generative AI for the true believer is not really what it does or how it works. It's how it was made. You see, programs have generally been programmed directly by people, so maybe the problem is that people are just too dumb to make a true AI? Not understanding the models used by generative AI is a feature, not a bug, for the true believer. What particularly excites them is how they got results they didn't plan for, like trying to make a chatbot but having it be able to solve some mathematics problems or translate languages it wasn't directly trained on. This did require insanely high amounts of data and training time, but maybe that's the key? You make things complicated enough and like magic it suddenly becomes a "general" AI, even if it wasn't planned out that way.
I remember a series of papers that I read, which unfortunately I cannot find a link for now, where some CS researchers basically stated this. That is, they kept track of how many mathematics problems various iterations of Chat-GPT could solve. More advanced ones could solve more, but the key feature they touted was an asymptotic behavior where between a couple of verions (I forget which) it went from being able to solve practically nothing to a huge amount. This was the magic moment when somehow the AI understood mathematics. But another group of reserachers repeated the same tests, except instead of using a "you get this right or you get this wrong" scale for correctness, they gave more points for each number matched. For example, if you asked "what is 2,013,419+ 3,157,219" the first group of researchers would give no points for 5,170,633, since the last digit is supposed to be 8, while the second group would give 6/7 points since 6 digits match. On this scale the various iterations of the software increased rather smoothly, suggesting that there was no "magic moment" where the computer understood mathematics. Rather, as the training data increased it had more and more mathematics to work with leading to answers which were closer. After all, in a large enough text set you're going to eventually see "1+1 = 2" and so if your model is asked "what is 1+1?" you'll get the answer 2, but no one thinks this requires a knowledge of addition. If you asked the same model "what is 104+127?" it might not have ever seen a sum of two three digit numbers. The closest output is "1+1=2" so it could output 2, or 204, or 217 or something like that. With a larger data set that included "104+127 = 231" of course it'd get the answer correct, but larger sums might throw it (though it would likely get some digits correct by applying patterns to parts.)
The strange thing to me is that there would be any resistance to this latter interpretation. It goes along with how LLM are supposed to work and how we've observed them to work. But many in the CS world don't like that explanation, since it takes away the "magic moment" where the AI becomes a real boy. If humans are too dumb to make real AI, and you can't simply have it magically come into being by throwing in more junk, what else is there to do? It needs to be magic.
And that gets us back to why we have a quest for general AI, even when we really don't need it. People don't want computers that perform useful tasks. They've been able to do that for years. Hell, I was giving examples of addition through LLMs, but the built in Windows 3.1 calcualtor could handle those calcualtions no problem and of course electronic calculators existed long before that. What people want is a magic fix for all their woes, and this is only possible with an entirely new type of thing that we do not and cannot understand. What they want is a god in the machine.
This ties into some more general trends with technology which we will deal with in the next essay.
August 7, 2024