Video Discussion Points #
-
Intelligence Density of Qwen 3.5 Small Models
- Alibaba has released multimodal versions of Qwen 3.5 in sizes as small as 0.8B and 2B parameters.
- The concept of "intelligence density" allows these tiny models to handle reasoning, coding, and vision tasks previously reserved for much larger models.
- Benchmark highlights: The 2B model achieves an MMLU score of 66.5, outperforming the original Llama 2 7B (45.3).
- Models feature a significant 262K context window, enabling the analysis of large PDFs or codebases.
-
Local Coding Performance (MacBook Pro)
- Tested offline using LM Studio and the "Cline" extension in VS Code.
- 0.8B Model: Successfully generated a simple cafe website using HTML/CSS/JS in 1 minute. However, the design was poor, and it attempted to hardcode invalid image URLs.
- 2B Model: Produced a cleaner, more aesthetically relevant design with functional sidebars. It took longer (3 minutes) and suffered from occasional infinite loops during the generation process.
- Conclusion: While impressive for their size, these models are not yet reliable for professional-grade development.
-
Mobile Capabilities on iPhone (Native Implementation)
- Tests were performed on an iPhone 14 Pro using a custom Swift app powered by Apple’s MLX framework (fully offline).
- Inference Speed: The 0.8B model demonstrated near-instant streaming responses, showcasing high efficiency on mobile hardware.
- Reasoning: Both the 0.8B and 2B models passed the "car wash" logic test that many larger models fail.
-
Multimodal Vision & OCR Testing
- Image Recognition: The models correctly identified objects like bananas but struggled with specific details (e.g., misidentifying a dog breed as a Golden Retriever or Pomeranian).
- OCR & Language Detection: The 0.8B model failed to identify the Latvian language, whereas the 2B model correctly identified it and accurately read the text.
- Hallucination: Noted issues with accuracy in visual descriptions, such as the 0.8B model claiming a banana was overripe when it was not.
-
Uncertainty Regarding Qwen’s Future
- Reports indicate a major restructuring of the Qwen team at Alibaba.
- Key engineers and leadership are reportedly leaving to launch independent AI startups.
- The community is concerned that the pace of open-source breakthroughs from Qwen may slow down following the 3.5 release.
Summary #
The Qwen 3.5 small model series (0.8B and 2B parameters) represents a significant leap in "intelligence density," enabling native multimodal capabilities (text, vision, and code) on edge devices like smartphones and older laptops. In local testing, the models proved capable of passing logic tests and building basic websites, though they remain prone to hallucinations and performance loops. Despite their limitations, the ability to run 2B parameter models with high-speed inference on an iPhone 14 Pro marks a milestone for offline AI. However, internal restructuring at Alibaba creates uncertainty about the future development of the Qwen ecosystem.