An Apple Byte : Apple More Responsible In Training Its AI Models?

Following concerns about AI companies scraping data from the web to train their AI models, Apple has sought to present itself as being a better and more ethical AI provider by highlighting how the training and development of its AI models have been done responsibly.  

Apple Intelligence 
 
At its 2024 Worldwide Developers Conference, the tech giant introduced its Apple Intelligence, a personal system for integration with iOS 18, iPadOS 18, and macOS Sequoia. However, Apple also recently announced that it will be using its own AI models alongside OpenAI’s technology to power its generative AI tools in iOS 18. OpenAI, for example, has been accused of scraping data from the web to train its models, e.g. using its ‘GPTBot’ crawler, although it says it curates data from diverse sources (which may include the web) in compliance with legal and ethical guidelines. 

Apple, therefore, has sought to head-off accusations and clarify details about its own AI model training sources in a report it published about its Apple Foundation Model (AFM)-on-device, and AFM-server. Although Apple uses its own Applebot web crawler to get data from the web (which websites can opt-out of), in the report, Apple says it has “created Responsible AI principles to guide how we develop AI tools, as well as the models that underpin them”. It also says its pre-training data comes from “a diverse and high-quality data mixture” (no private Apple user data) and stresses its “extensive efforts” to “exclude profanity, unsafe material, and personally identifiable information from publicly available data”, and its “rigorous decontamination” of data.