68. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation