首页-->栏目-->研究生导师

余婷

时间:2022-03-22 10:22:18 文章来源 :学科 浏览量:6720

图片1

一、导师基本情况

姓名: 余婷 副教授 硕士生导师

邮箱:yut@hznu.edu.cn

指导专业:计算机科学与技术、网络空间安全、软件工程、电子信息

二、研究领域

跨模态分析:致力于运用人工智能、机器学习和深度学习算法,对计算机视觉与自然语言等多模态数据进行深入分析和处理。主要研究内容涵盖多模态数据融合、知识关联以及可解释推理等核心技术,具体探讨视频问答、跨媒体检索、视觉语言导航等前沿课题;

跨模态交叉: 聚焦跨媒体前沿技术在医学、遥感等领域的创新突破与跨学科融合;

跨模态安全:研究跨媒体智能系统在实际部署中的安全性与隐私保护问题。

三、主讲课程

计算机网络、Python程序设计、移动应用开发等。

四、教育和工作经历

分别于2013年和2021年在浙江大学和杭州电子科技大学获得系统分析与集成专业硕士学位和计算机科学与技术专业博士学位,自2021年10月起任职于杭州师范大学信息科学与技术学院。

五、学术简介

主要从事跨模态学习、视觉问答、视觉语言导航等研究,运用人工智能和机器学习算法对计算机视觉与自然语言进行跨媒体统一表达。目前担任CCF多媒体技术执行委员、CCF自然语言处理专委委员、CCF智能机器人专委委员以及CCF计算机视觉专委委员。以第一作者或通讯作者身份,在TIP、TMM、TCSVT等国际顶级期刊以及CVPR、AAAI等CCF A类国际会议上发表高水平论文10余篇。主持国家自然科学基金青年项目和浙江省自然科学基金面上项目,并以课题骨干参与多项国家自然科学基 金重点与面上项目、教育部支撑计划、省“尖兵”“领雁”研发攻关计划等。

六、教学科研项目

[1] 国家自然科学基金青年项目:基于跨媒体层次深度推理的视频问答技术研究(62002314),2020.01-2023.12

[2] 浙江省自然科学基金面上项目:跨媒体“数据-知识”联合增强的视频问答技术研究(LY23F020005), 2023.01-2025.12

七、代表性论著

[1] T. Yu, Y. Lin, J. Yu, Z. Lou, Q. Cui, “Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. (CCF A)

[2] T. Yu, Z. Tong, J. Yu, K. Zhang, “Fine-grained Adaptive Visual Prompt for Generative Medical Visual Question Answering,” in Proceeding of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025. (CCF A)

[3] T. Yu, Y. Wu, Q. Cui, Q. Huang, J. Yu, “MossVLN: Memory-Observation Synergistic System for Continuous Vision-Language Navigation,” in IEEE Transactions on Multimedia (TMM), 2025. (SCI 中科院一区 TOP)

[4] T. Yu, W. Lu, Y. Yang, W. Han, Q. Huang, J. Yu, " Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation," in IEEE Journal of Biomedical and Health Informatics (JBHI), 2025. (SCI 中科院一区 TOP)

[5] T. Yu, K. Fu, J. Zhang, Q. Huang, J. Yu, "Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering," in IEEE Transactions on Image Processing (TIP), vol. 33, pp. 3115-3129, 2024. (CCF A, SCI 中科院一区 TOP)

[6] T. Yu, K. Fu, S. Wang, Q. Huang, J. Yu, "Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering," in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2024. (SCI 中科院一区 TOP)

[7] T. Yu, X. Lin, S. Wang, W. Sheng, Q. Huang, J. Yu, "A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes," in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 34, no. 3, pp. 1322-1338, 2024. (SCI 中科院一区 TOP)

[8] T. Yu, B. Ge, S. Wang, Y. Yang, Q. Huang, J. Yu, "Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering," in IEEE Journal of Biomedical and Health Informatics (JBHI), 2024. (SCI 中科院一区 TOP

[9] T. Yu; J. Yu; Z. Yu; Q. Huang; Q. Tian; Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT),2021,31(3):1051-8215. (SCI 中科院一区 TOP)

[10]     T. Yu; J. Yu; Z. Yu; D. Tao; Compositional Attention Networks with Two-Stream Fusion for Video Question Answering, IEEE Transactions on Image Processing (TIP), 2019, 29(1): 1204-1218. (SCI 中科院一区 TOP)

[11]     X. Dong, J. Zhang, J. Yu, T. Yu*; 3D human pose estimation with multi-hypotheses gated transformer. Multimedia Systems 30, 309, 2024. (SCI, JCR Q1)

[12]     Y. Yang, J. Yu, Z. Fu, K. Zhang, T. Yu, X. Wang, H. Jiang, J. Lv, Q. Huang, W. Han, "Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting," in IEEE Transactions on Medical Imaging (TMI), vol. 43, no. 11, pp. 4017-4028, 2024. (SCI 中科院一区 TOP)

[13]     Y. Zhan; J. Yu; T. Yu; D. Tao; Multi-task Compositional Network for Visual Relationship Detection, International Journal of Computer Vision (IJCV), 2020,128:2146-2165. (SCI中科院一区 TOP)

[14]     Y. Zhan, J. Yu, T. Yu, D. Tao, On Exploring Undetermined Relationships for Visual Relationship Detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019. (CCF A)

[15]     Z. Yu, D. Xu, J. Yu, T. Yu, Z. Zhao, Y. Zhuang; D. Tao; ActivityNet-QA: A dataset for understanding complex Web videos via question answering, AAAI Conference on Artificial Intelligence (AAAI), Hawaii, USA, 2019. (CCF A)

八、授权专利及转化

[1] 一种跨媒体层次深度视频问答推理框架,202011499931.2

[2] 一种面向长跨度视频问答的多粒度对比学习协同生成方法,CN202410280286.7  

[3] 一种多层次跨媒体融合的视觉语言导航方法,CN202410915863.5  

[4] 一种基于语义增强混合重建的三维生成方法,CN202411384389.4  

[5] 一种基于细粒度视觉提示的医学视觉问答推理方法,CN202411384380.3

[6] 一种特定域细粒度启发提示的视频问答方法,CN202411373766.4

[7] 一致性约束下的记忆动态化医学图像问答分类系统及方法,CN202410462542.4

[8] 一种轻量级空间适配器增强的医学报告生成方法,CN202411007351.5