The first ideas of artificial intelligence (AI) were developed in the mid 1950’s by a team of researchers, led by John McCarthy (Schmidhuber 348-349), with help from Marvin Minsky, Allen Newell, and Herbert Simon. At the time, these men were overly optimistic about the future of AI, predicting that machines would be capable of doing anything a human could do within twenty years, and that the AI, as a whole, would become error free. In the 1990’s, the use of AI for text mining and data retrieval began in the technology industry. The development of natural language processing (NLP) follows a similar timeline as AI, as the first concepts were generated in the 1950’s, but were, generally, not fully functional until the late 1980’s. NLP gives machines the ability to understand spoken languages, by extracting and processing spoken words through semantic indexing. Natural language processing would eventually form the basis of Apple’s Siri, and other virtual assistant applications.
Siri, Apple Inc.’s intelligent personal assistant is widely accepted as being the first application to a natural language user interface to perform actions directed to it. The first concepts of Siri were developed by the U.S. Department of Defense DARPA program, and SRI International Artificial Intelligence Center. Siri was the vision of Norwegian, Dag Kittlaus, who would develop Siri along with Adam Cheyer and Tom Gruber (Bosker). In February 2010, Siri’s co-founders were approached by the late Steve Jobs, who wanted to acquire the company, and the rights to Siri. Jobs had a great deal of interest in speech recognition and how to use it to create a speech interface. Apple acquired Siri in April 2010, and has been constantly fine tuning and updating the far from perfect application.
The first concepts of AI in language and natural language processing were created in the 1950’s by Alan Turing. Turing published an article, ‘Computing Machinery and Intelligence’ which forms the basis of the Turing test as a criterion of intelligence (BBC-History-Alan Turing). After decades of little advancement of NLP, research conducted in the late 1980’s started to offer some credible results. The introduction of machine learning algorithms for language processing presented a viable method of recognizing spoken words. Decision trees and statistical models were some of the earliest methods of machine learning algorithms used. Decision trees were found to produce similar results to hand written rules, while statistical models make probability based decisions relevant to the input data. There are many advantages of using machine learning algorithms versus hand written rules, such as reliability, more focused learning procedures, and the amount of time saved in the process.
The way that AI operates in language processing is similar to a normal conversation between humans. Some of the things a speaker needs to include in communicating are intent, generation, and synthesis. This means that the speaker must know what they want to communicate, how they want the communication put together, and how they want the communication delivered (Talking to Machines). The receiver is concerned with the perception, analysis, and incorporation of what was spoken. Perception, or recognition, is realizing what was said. Analysis is the in-depth part of communication, as the information in the communication must be interpreted. This is done through syntactic interpretation, or parsing, semantic interpretation, and pragmatic interpretation to find the meaning of what is being communicated. Finally, the incorporation is done to determine the believability of what was said. Many humans can effectively communicate as they understand sentence structure and the different meanings of words. NLP has a more difficult time processing this information as machines have a difficult time processing the ambiguities found in spoken languages (Natural Language Processing). While developers thought that this was going to be an easy process, it has proved to be much more difficult than originally thought.
AI researchers, since the mid 1950’s, have developed several programming languages for artificial intelligence. Some of the languages developed are (Alan Turing.net):
- IPL was the first programming language developed, in 1956. It was created to do general problem solving.
- Lisp is a practical mathematical notation based on lambda calculus. Lisp was developed in 1958, and is still the most widely used language for AI. One of Lisp’s major data structures are linked lists, and Lisp source code is made up of lists. This allows Lisp programs to manipulate source code as a data structure, allowing for the creation of new syntax or domain specific programming languages.
- Prolog, a declarative language useful for symbolic reasoning and database/language parsing applications, was developed in 1972, and is still widely used today.
These specific programming languages, as well as others, are widely used by AI researchers and are consistently being updated and fine tuned. Even though artificial intelligence researchers are constantly updating their programs and applications, there are still many people who think that AI is not necessarily a good thing.
Judith Shulevitz, in her article, Siri, We Have to Talk, provides a troubling look into the effects of interactions between children and AI. Results from studies involving childrens interactions with robots and virtual assistants included children becoming too reliant on the machines and reducing their emotions. Children may also may suffer from psychological and moral development, and become selfish as the machines do everything for them. There is also a concern that AI hampers the learning abilities of children by providing them with all the answers needed and reducing the amount of research that must be performed by children.
Apple Inc., in 2010, launched the release of, Siri, the virtual personal assistant that was supposed to be magical answer to everything that ailed the AI field prior to its debut. Siri was supposed to have all of the answers, but like many other projects before it, Siri had many loose ends and limitations of its own. Siris voice recognition system is provided by Nuance Communications, a speech technology company that also created the very successful Dragon Dictation system. The words that are spoken to Siri are recorded and compressed, and then sent to Nuance’s speech-to-text and Siri’s AI-like language processing engine. From there, Siri has to figure out what was said, and depending on the inquiry, either answers locally via the phone and apps found on that phone, or performs queries through the Internet and Siri backend services.
The actual process of how Siri operates is a very arduous process. Scripts are recorded with live voice actors which can take months to complete. Words and sentences are then analyzed, catalogued, and tagged in a big database, a process which is performed by both linguists, and linguistic software (Vox Technica). Nuance’s text-to-speech engine then looks for the correct pieces of recorded sound and combines those with other, already recorded sounds to create words and phrases that may have not been spoken by a voice actor, but sound like something that the actor might say. This process of voice building is called ‘unit selection’ or ‘concatentative speech synthesis’ (Machine Language). The process is similar to if you chopped letters of a sentence up and then pasted them back together to form different sentences. We learn to speak before we can write, so we really do not think about how we are doing it. And, we definitely do not think about the small fluctuations of stress, intonation, pitch, speed, tongue position, or relationships between phonemes. All these factors need to be taken into account in order for a computer to process a human voice.
There are many operations and functions that Siri does well. You can ask Siri for the weather forecast, to make a call, send messages, or set alarms or reminders, to name a few. This is where Siri, the virtual assistant, excels. When given commands, and when the information that is being asked of Siri is found locally within the phone, Siri provides quick and detailed responses. Siri also works well with the built-in apps on the iPhone. Some of the apps that Siri uses locally within the phone are music, weather, contacts, reminders, mail, and messages. Siri also provides tremendous safety benefits by allowing for hands free texting, calling, and web searching when you may be driving in the car. The problems, or limitations, of Siri are exposed when it needs to go outside the iPhone, and submit queries via the Internet or other Siri back-end services.
One of the biggest difficulties that Apple Inc.’s Siri consistently faces is in the area of discourse modeling and semantics. Siri knows the words that are being spoken, as there are thousands of words for it to choose from, but has a hard time understanding the meaning, or semantics, of language (Behind Apple’s Siri). While Apple relies on Nuance’s speech recognition database, which relies on statistical methods, the actual language models are completely overlooked. This presents Siri with the problem of only retrieving information in particular datasets, while overlooking basic sentence structure and different meanings of the same words. Siri, like many other voice recognition systems, has a problem solving the ambiguity inherent in a sentence’s syntax and semantics (Epstein). The natural conversations that humans have, that are completely understandable, are difficult to process for a computer. Translating language into logic has been, and still is, one of the most difficult aspects of the voice recognition process. Programmers have yet to figure out all the complex ambiguities that are present in language and how to make computers understand them clearly. This is something that will probably get ironed out in the future, but currently is one of Siri’s biggest limitations.
Even though Siri is constantly being fine tuned, refined, and updated, there are still many other limitations of how the application operates. Another big issue with Siri’s operation is when there is little, or no, wireless or network coverage in the area. If Siri cannot connect to the Internet, any type of detailed search query is not possible. Another big limitation of Siri is its issues understanding different regional dialects and accents. Speech recognition is based on two applications – dictation and command recognition (Talking to Machines). The different accents and dialects that are common all over the world make this very difficult, as it forces some people to modulate their voice for Siri to understand what they are saying. This makes for a not so normal type of conversation, which can be very frustrating. Siri does not understand all of the spelling variations that are common through the English language. Names which sound the same, but are spelled different, like Sarah and Sara, can commonly be mistaken by Siri (HowStuffWorks). Despite its limitations, Apple Inc.’s Siri is still very useful, but they are facing some serious competition from their rivals.
Google has developed and released Google Now, and Microsofts virtual personal assistant, Cortana, are both very capable of supplanting Siri as the leader of the pack. Cortana seems to be the flashiest and most able of the three at this point. While not yet released, Cortana is said to be able to perform everything that Siri can do and more. The biggest difference between the three is the amount of data each is able to draw from. Apple Inc.’s Siri is limited to its own apps, while Cortana and Google Now will be available to third party developers, as well as having their massive search engines and their enormous cloud processing power to draw from (Hill). It seems as though this is a normal, healthy competition between rivals, and one that will go on for years, with each one upping the other for supremacy.
Natural language processing, AI, and virtual digital assistants have been around for years, and are not going away anytime soon. The constant work that is being put in to make them better will persist. And, there will also always be those who think that AI and computers can do everything, and more, than a human can do. This will never end, and at this point, I will look forward to the results of what they come up with.