π’ Industry Internship Projects
π€ BiDexVLA: Hybrid Bimanual Dexterous Grasping Framework
Company: Samsung Research China
Duration: 2025.05 - 2025.10
Project Homepage: https://sevenfo.github.io/BiDexVLA/
Background & Objectives
Developed a hybrid framework combining Vision-Language-Action (VLA) models with traditional planning for bimanual robot grasping.
Key Contributions
- Designed a two-stage architecture: model-based pre-grasp generation followed by a predict-refine diffusion model fusing vision, force feedback, and point cloud data.
- Built a bimanual teleoperation data collection system with automated trajectory generation.
- Fine-tuned multimodal LLMs (e.g., Qwen-2.5-VL) for semantic planning.
Key Achievements
- 41.8% increase in grasping success rate and 23.5% reduction in execution time compared to baselines.
- Paper “BiDexVLA: A Hybrid Framework for Fast and Robust Bimanual Dexterous Grasping” submitted to ICRA 2026 (Under Review).
Technical Highlights
- Multimodal perception fusion.
- Hierarchical planning.
- Real-time bimanual coordination.
π CARE: Context-Aware Retrieval-Enhanced Reasoning
Company: MetaGPT
Duration: 2024.06 - 2024.12
Project Homepage: https://foundationagents.github.io/CARE/
Background & Objectives
Contributed to research on improving LLM context fidelity through a native retrieval-augmented reasoning framework.
Key Contributions
- Designed the CARE framework to handle context hallucination with native retrieval.
- Implemented two-phase training: supervised fine-tuning and curriculum-based reinforcement learning.
- Developed a reward function using Group Relative Policy Optimization (GRPO).
Key Achievements
- +15.29% average F1 improvement over baselines, with higher gains on complex datasets.
- +13.69% improvement on counterfactual benchmarks.
- Paper “Improving Context Fidelity via Native Retrieval-Augmented Reasoning” accepted at EMNLP 2025 (4th author).
Technical Highlights
- Curriculum learning for multi-hop reasoning.
- Evidence retrieval across model scales.
- Performance in long-context QA.
π οΈ Aviation Big Data Natural Language Modeling Tool POC
Company: MetaGPT
Duration: 2024.06 - 2024.12
Background & Objectives
Participated in POC validation for a natural language tool for aviation data analysis using multi-agent systems.
Key Contributions
- Validated 7+ scenarios such as anomaly detection and path planning.
- Produced 15+ analysis documents.
- Built baselines with statistical, deep learning, and operations research models.
- Developed multi-agent prompt engineering.
Key Achievements
- Demonstrated natural language interface feasibility for aviation analytics.
- Implemented models for pattern recognition and optimization.
- Created evaluation framework.
Technical Highlights
- Multi-agent collaboration.
- Prompt engineering for aviation tasks.
- Integration of modeling approaches.
π¬ Research Projects
Large Model-based Robot Visual Manipulation and Collaboration Methods Research
Institution: Beihang University
Duration: 2024.01 - Present
Funding: Siyuan Alliance Fund
Background & Objectives
Developed a multimodal multi-agent framework and imitation learning models for robot manipulation and collaboration.
Key Contributions
- Designed multi-agent task planning with visual observation, reflection, decision-making, skill management, and feedback.
- Built simulation environment in Isaac Sim for lunar base scenarios.
- Collected demonstration data and trained skill models using ACT, Diffusion Policy, RT-1, and OpenVLA.
Key Achievements
- Improved manipulation capabilities in simulated environments.
- Optimized imitation learning for skills.
Technical Highlights
- Multimodal fusion.
- Hierarchical planning.
- ROS integration.
Embodied Intelligence Aircraft Online Autonomous Motion Planning Research
Institution: Beihang University
Duration: 2023.09 - 2024.09
Funding: CALT University-Industry Collaboration Fund
Background & Objectives
Implemented LLM-based planning for UAV autonomous motion in open environments.
Key Contributions
- Built point cloud semantic segmentation using OWLv2, SAM, and XMem.
- Applied prompt engineering with DeepSeek-coder for code generation.
- Integrated Voxposer architecture with CoppeliaSim/AirSim/Isaac Sim and ROS.
- Added feedback for dynamic planning.
Key Achievements
- Enhanced adaptability and task success rates.
- Open-sourced components.
Technical Highlights
- Open-vocabulary detection.
- LLM code generation.
- Code Repositories: VLM Pipeline, VoxPoser Extension.
π Academic & Early Projects
Distance Metric-based Meta-Learning Few-Shot Classification Method Research
Duration: 2023.01 - 2023.06
Type: Undergraduate Thesis
Background & Objectives
Proposed Graph Prototype Network (GPN) for few-shot classification using graph attention mechanisms.
Key Contributions
- Designed GPN architecture.
- Implemented PyTorch pipeline.
- Conducted experiments and visualization.
Key Achievements
- Over 20% accuracy improvement vs. baselines.
- 69.4% on miniImageNet 5-way 5-shot.
Technical Highlights
- Metric and meta-learning.
- Graph attention networks.
Multimodal Deep Learning-based Bus Driving Safety Evaluation Research
Duration: 2021.07 - 2022.06
Type: Municipal-level University Student Innovation Project
Background & Objectives
Developed system for detecting safety-related sound events in buses.
Key Contributions
- Designed CRNN model.
- Deployed with ONNX C++.
- Built Qt UI.
Key Achievements
- Rated “Good” on completion.
- Functional detection system.
Technical Highlights
- Audio processing with CRNN.
- Code Repository: HikvionProjectV2.
Hexapod Biomimetic Robot
Duration: 2020.07 - 2021.06
Type: University-level SRTP Project (Project Leader)
Background & Objectives
Led development of hexapod robot with kinematics and navigation.
Key Contributions
- Modeled kinematics.
- Implemented ROS navigation.
- Developed Flask web interface.
Key Achievements
- Rated “Good” on completion.
Technical Highlights
- Kinematic modeling.
- ROS and web development.
π Competition Experience
National Smart Car Competition
- 2020: National First Prize π₯
- 2021: National Second Prize π₯
Competition Content: Path planning, perception, and control for intelligent vehicles.
For more project details and code implementations, please visit my GitHub π