DisplayPort and HDMI Transmit technologies are highly complex mixed signal designs which require both high performance analog design (equalizer, PLL, DLL, CDR, etc.) and complicated logic design (HDCP ...
They've become such commonplace audio devices that it'd be odd not to see one (or several) speakers at parties, poolside, at the beach, or even strapped to the front of a mountain bike.
ESP8266 is fully supported and most mature, but ESP32 is also mostly there with built-in DAC as well as external ones. For real-time, autonomous speech synthesis, check out ESP8266SAM, a library which ...