“We need even larger models,” Huang said. “We’re going to train it with multimodality data, not just text on the internet, we’re going to train it on texts and images, graphs and charts, and just as we learned watching TV, there’s going to be a whole bunch of watching video.”