While the tech industry is enthusiastic about generating artificial intelligence, one giant is holding back: Apple.The company has yet to launch AI-generated emojis, according to one person New York Times The company is in preliminary talks with Google about adding the search company’s Gemini AI model to the iPhone, according to today’s report and an earlier report from Bloomberg.
However, a research paper quietly posted online by Apple engineers on Friday suggests the company is making significant new investments in artificial intelligence that are already bearing fruit. It details the development of a new generative AI model called MM1, which is capable of processing text and images. The researchers demonstrated it answering questions about photos and demonstrating the general knowledge skills demonstrated by chatbots such as ChatGPT. The model’s name is not explained, but it likely stands for MultiModal 1.
MM1 appears to be similar in design and complexity to various AI models recently launched by other tech giants, including Meta’s open-source Llama 2 and Google’s Gemini. Work by Apple competitors and academics suggests that this type of model could be used to power capable chatbots, or build “agents” that solve tasks by writing code and taking actions, such as using a computer interface or website. This shows that MM1 may still find its way into Apple products.
“The fact that they are doing this shows that they have the ability to understand how to train and build these models,” said Ruslan Salakhutdinov, a professor at Carnegie Mellon University who worked at Apple a few years ago. Led artificial intelligence research. “It requires a certain amount of expertise.”
MM1 is a multimodal large language model (MLLM), which means it is trained on both images and text. This enables the model to respond to textual prompts and answer complex questions about specific images.
An example from an Apple research paper shows what happens when MM1 receives a photo of a sun-dappled dining table along with several beer bottles and a picture of a menu. When asked how much someone would pay for “all the beer on the table,” the model correctly read the correct price and calculated the cost.
When ChatGPT launched in November 2022, it could only ingest and generate text, but recently its creators OpenAI and others have worked to extend the underlying large language model technology to handle other types of data. When Google launched Gemini last December (the model now powers ChatGPT’s solution), the company claimed that its multimodal nature opened up an important new direction in artificial intelligence. “Following the rise of the LL.M., the LL.M. is emerging as the next frontier for foundational models,” Apple’s paper states.
MM1 is a relatively small model, as measured by its number of “parameters,” or the number of internal variables that are adjusted when the model is trained. Kate Saenko, a professor at Boston University who specializes in computer vision and machine learning, said this could make it easier for Apple engineers to try different training methods and improvements, and then expand when they find something promising.
Saenko said the MM1 paper provides a lot of detail on how to train models for corporate publications. For example, the engineers behind MM1 describe techniques to improve model performance, including increasing image resolution and mixing text and image data. Apple is known for its secrecy, but it has previously shown an unusual openness to artificial intelligence research as it tries to attract the talent it needs to compete in key technologies.