The Quest for General AI: Part Three

Generalized Technology

In the last part of this series we looked at why people specifically want a "general" AI that can do anything, rather than a set of specialized AIs. It is more feasible to get a series of specialized AIs that do something small very effectively, but these would require human input. A hypothetical general AI wouldn't require anything but an initial prompt. The downside is of course that the general AIs tend to not work in specialization, with the problem not being so much that they cannot do these specialized tasks, but that since they have no concept of what a specialized task is that they will often do something other than what they were asked to do and pretend that they completed the task successfully, such as by making up websites that don't exist when asked to do a websearch for specific information.

There are reasons specific to AI that have caused people to desire a general AI, which we went over in the previous essay. However, the idea that technology should be "general" is part of a long trend. In marketing terms the idea is the "all-in-one" device. Would you prefer to have a separate printer and scanner, or would you want to have both in the same device? Perhaps nothing exemplifies this trend more than the smart phone which is (in theory) a phone, internet browser, camera, computer, music player, etc. In fact, the idea of technology being "all-in-one" has become so ubiquitous that many people struggle to understand why you would want a specialized device. How many times have you heard someone ask why anyone would need a digital camera, when everyone has smart phones? Or why bother with a CD player, when you can stream from your smart phone? Indeed the only devices needed beyond smart phones are those that obviously cannot be made to fit into your pocket, like a printer.

Of course, there are obvious upsides to having specialized devices. First, a specialized device can have features that work well for its specialty but would not be cost effective on a general device. They can also be designed in a way that is most convenient to use for that application, but might not be convenient generally. For example, take a digital camera. It will probably have a telescopic lens to allow for some level of zooming. This would be too expensive to put onto a smart phone, and could make it harder to carry, so smart phones do not have these. They may have "software zoom" but that is not the same thing (generally being either simply a magnfication of the image file, or some form of software interpolation, which results in data loss in either case.) A dedicated digital camera will be made in such a way that it can be held easily with two hands without blocking the screen, which requires it to be thicker than a smart phone usually is. The "take picture" and zoom actions will be done by actual physical buttons on the camera itself, while the smart phone will require the use of virtual buttons on the touch screen. This means that the digital camera can be operated by touch alone, and without blocking the preview of the picture being taken, which does not work on a smart phone. The benefit of having a specialized device is even more obvious when using a smart phone for gaming: if you have a controller or a keyboard then you can easily use dozens of buttons by feel alone, without blocking the screen. A smart phone must sacrifice its already tiny screen space for virtual buttons, or do things through tap and drag actions on the main action itself, which require the user to block the screen with his hands. That's not even getting into the superior visual and audio performance on a dedicated device, increased processing power, etc.

Another advantage of specialization is the fact that parts can be upgraded or replaced without replacing everything. This is why I abandoned all-in-one printers: I had an HP all-in-one which ran out of yellow ink (or at least it claimed to, I have my doubts based on the age of the cartridge.) It then not only refused to print in black and white, but it also refused to use its scanner which caused some major inconveniences for me. With a separate printer and scanner it is impossible that a malfunction in one will prevent the other from working. Similarly with seperate devices upgrades are now easier. For example, if I have a black and white printer and I want to upgrade to a color printer, then I might have problems if I also need it to be a scanner. Perhaps the color printers that meet my standards do not have scanners. Or if I like my printing quality but need higher resolution on my scans, I could be in a fix if there are no printers that also have scanners with that resolution. If I keep both separate, I have more options. With things like smart phones there is no way that it can possibly do as well as specialized devices in all of its various applications without driving the cost up to thousands and thousands of dollars. Similarly if I have separate devices and one of them breaks, then I only lose access to that one application. If I have an all-in-one and something breaks, then I have to replace everything. This could be really inconvenient if something breaks that is non-essential. For example, suppose that I have a smart phone and its outer camera lens gets damaged, but otherwise it is fine. Perhaps I can get it repaired (though that is less and less likely these days), but if not I am left with the dilemma: Do I just keep a smart phone and do things like using the selfie lens with the camera flipped around to take pictures? Or do I buy an entirely new phone even though it works perfectly fine for 90% of what I do? If I have a separate camera that is broken beyond repair, then I just replace the camera.

Now there are obviously benefits to an all-in-one device, particularly in terms of convenience. If you have a flip cell phone, an mp3 player, a digital camera, a GPS unit, and something like a 3DS or a Switch, then you've hit most of the applications that a smart phone will give you. But this will be a lot to carry. So you can make a choice to have a smart phone for convenience. However, most people do not consciously make that choice. They choose the all-in-one device simply because that's the way things are done in the future, i.e. we are once again seeing future chic being the default rather than intentional technology use.

You can see a similar trend in relation to how people use the web. In the early days of the web, i.e. the 90's, you would find individual homepages or other specialty pages that each had something interesting. If you needed something new, you would explore through crawling through directories, webrings, etc. As we got into the 00's we started to get more and more "one stop shops" on the web. Directories became "start pages" which offered not only links to other webpages, but also news, movie times and reviews, classfieds, etc. (You can get a real feel for this by looking at the front page of Yahoo! throughout the years.) At the same time e-commerce sites like Amazon and ebay beame more and more comprehensive, and we also saw places like Wikipedia and Youtube show up. Now you could spend hours online without leaving more than a dozen or so sites. This went into overdrive in the late 00's and throughout the 10's with the advent of social media. How many kids spend all day online without ever leaving TikTok? Even though the total amount of content online continued to increase year by year, the effective size of the web for the average user shrank since the trend was to go to the same sites for everything.

Thus in retrospect it makes sense that of course the average user wants an AI that can do everything. Suppose you asked someone from the 90's or early 00's whether a language translation AI should also be able to give you recommendations on Spanish restaurants or to chat about your day. The reaction would probably be "of course not, it's just something to translate languages." But that person would conceive of things in that way since he would be used to dedicated language tools: textbooks, electronic translators, specific websites dealing with grammar, etc. Indeed, that's how he would have had to deal with everything. There was no "google it, and accept whatever comes up at the top of the search without even clicking on the link." However, in the 10's and 20's that's how people generally interact with technology. Ask it a question, hope to get a result immediately, and if you don't it's because it's fucking broken. Of course a person like that would expect everything to do everything.

In fact, the average user may have already been using the web like it was a general AI even before LLM existed. This video by Pseudiom goes into detail about the various ways that Google searches have decayed, but the part that I have cued up is about the shift to "interpretative" searches. In the early days of searches it was understood that it was a tool. The search sites themselves would give you advice on how to use operators like AND and NEAR to craft good search results, there are many books which go into more detail, and if you're of my age you probably got taught about this in school. Those tips don't work now since users tended to ask questions instead. For example, when looking for a recipe for a good Cherry Pie the classic technique was to search form something like '"cherry pie" AND recipe NOT Warrant' (the last bit to get rid of song lyric pages.) But the average user was more apt to search "what is a good recipe for a cherry pie?" or even "hey, I'd like to bake a pie." Phone searches made this even more common; it's impossible to craft a good classic search through a voice command, and people tended to through in a lot of conversational baggage when they searched in the middle of a conversation. Thus search engines were forced to parse conversation even before LLMs existed. The result was largely pushing users to the same sites, Wikipedia, Youtube, TikTok, etc. Search Engine Optimized sites killed off the rest of the results. This creates a feedback loop where even the users who would like to create a precise search are forced to treat the search engine like a conversational partner, because that's the only thing that sort of works.

I am speaking from personal bias here, but my impression is that the 90's and the 00's had the greatest focus on specialized technology over general technology. Most software through that period had extensive documentation, as discussed here. Any true command line interface is necessarily going to focus on specific tasks, and these were a major focus of the 90's. Single focus software was common,m as mentioned here. These programs still exist, but are usually obscure. For example, I was still able to find this 3D architect software (which I've never used, so I have no idea of its quality). In the early 00's you could find software like this in any computer store. It would be a lot cheaper too; I have something from about '01 which does everythign that the platinum version of this software does (plus has a garden planner) but it retailed for about $80 in contrast to the about $220 they are asking for. (Though to be fair I suppose with inflation that $80 from 2001 is about $140.) The point is in 2001 if you wanted to use a computer to plan out a home renovation project, there was software geared directly for that, and it was easy to get. Same thing for finding recipes, learning about classical musicians, sorting your D&D notes, etc. Now if you want to do the same thing you have the obscure piece of software that I linked to, or generalized CAD software which is unusuable to the casual user. Hell, in 2020 people might just tell you to draw things out in paint, and now people will probably tell you to ask a LLM to create an image of your house based off your description (and if that doesn't work, it's because we haven't trained it long enough.) The very idea that there would be some sort of easdy to use software that works well for what you are trying to do, but doesn't try to hit other applications, is unknown.

I feel sorry for the younger generations on this one, because if you didn't experience the age of utility software it is difficult to tell that you are missing something. If your experience with technology is just using a few general purpose video apps on your phone, then even using something like Excel will be difficult. (I have seen people born around 2007 struggle with the very concept of a spreadsheet program.) This creates a higher learning curve for general purpose software, and so most people decide that it is not worth it. The general purpose stuff might be less reliable, but it can be used immediately. And to circle back to the original topic, this is a big part of why there is such a hype for general AI on the user side. Maybe if we have an AI interpreting what I am writing, it WILL be effective (though as we have seen, it won't.) Then I won't need anything else.

The only way that I see out of this is for the AI bubble to get built up to the point where basic functionality is locked behind AI interpretation (which most browsers and Windows certainly seem to be trying), then for the bubble to burst with even basic tasks becoming impossible as the AI keeps doing something else (such as when we saw the AI make up stories for pictures rather than translating the text in them.) At that point people will be so sick of it that "this does one thing, and one thing only, with no AI interpretation" will become a hot selling point, but until we get there the desire for a general AI is too great for people to see through the hype.

September 2, 2024

<- Back |Main Page| Forward ->