The Rise of Multimodal AI: Fusion of Text, Image & Video