Data input methods—and the people inputting that data—can significantly alter an A.I.’s behavior.
In 2018, researchers at MIT revealed “Norman,” an A.I. trained to perform image captioning, a deep learning method of generating a textual description of an image. They trained Norman using only the image captions from a subreddit that’s known for content that’s disturbing.
When Norman was ready, they unleashed him against a similar neural network that had been trained using standard data. Researchers fed both systems Rorschach inkblots and asked them to caption what they saw, and the results were striking: Where the standard system saw “a black and white photo of a baseball glove,” Norman saw “a man murdered by machine gun in broad daylight.” The point of the experiment was to prove that A.I. isn’t inherently biased, but that data input methods—and the people inputting that data—can significantly alter an A.I.’s behavior.
In 2019, new pre-trained systems built for natural language generation were released—and the conversations they learned from were scraped from Reddit and Amazon reviews. This is problematic: Both Reddit and Amazon commenters skew white and male, which means that their use of language isn’t representative of everyone. But it illustrates an ongoing challenge within the developer community. It is already difficult to get authentic data from real people to train systems, and with new privacy restrictions, developers may choose to rely more on public—and problematic—data sets.
This trend is part of our section on Artificial Intelligence. Other trends in this section include:
Accountants, Advertising and Public Relations, Aerospace, Agriculture, Airlines, Alternative Energy Production & Services, Architectural Services, Auto Manufacturers, Banking, Bars & Restaurants, Beer, Wine and Liquor, Book Publishers, Broadcasters, Radio and TV, Builders/General Contractors, Cable & Satellite TV Production & Distribution, Casinos/Gambling, Chemical & Related Manufacturing, Civil Servants/Public Officials, Clergy & Religious Organizations, Clothing Manufacturing, Commercial TV & Radio Stations, Construction, Corporate Boards & Directors, Covid-19/ coronavirus, CPG, Cruise Ships & Lines, Defense, Doctors & Other Health Professionals, Drug Manufacturers, Education Colleges & Universities, Education K-12, Education Online, Education Trades, Electric Utilities, Entertainment Industry, Environment, Finance, Foreign & Defense Policy, Gas & Oil, Government - International, Government - National, Government - State and Local, Health Professionals, Heavy Industry, Hedge Funds, Hospitality, Hotels/Motels/Tourism, Human Resources, Information Technology, Insurance, Law Enforcement, Lawyers/Law Firms/Legal Industry, Lobbyists, Luxury Retail, Magazines, Manufacturing, National Security, News Media, Non-profits/Foundations/Philanthropists, Online Media, Pharmaceuticals/Health Products, Private Equity, Professional Services, Professional Sports, Radio/TV Stations, Real Estate, Retail, Technology Company, Telecommunications, Trade Associations, Transportation, Travel Industry, TV Production, TV/Movies/Music, Utilities, Venture Capital, Waste Management, Work (Future of)