LongCat 560B Bye Sonnet Why is

LongCat Flash is a 560 billion parameter model developed by Meituan, a Chinese food delivery company. This Mixture of Experts (MoE) model dynamically activates parameters, making it efficient. It beats Sonnet in benchmarks and excels at tool calling. It is currently challenging to run due to high compute requirements and limited provider support, though Shoots.ai offers it. LongCat Flash performs well in various coding and creative tasks, generating good floor plans and SVGs, handling creative coding challenges, and solving general riddles. Its tool-calling capabilities are impressive, exemplified by its quick and mostly one-shot generation of mobile and web applications. The model is praised for its performance, especially as a first-generation model, and its open-source nature. Challenges include lack of official API, quantization, and broader inference provider support.

LongCat Flash Overview #

Developer: Meituan, a Chinese food delivery and local services company.
Parameters: 560 billion total parameters.
Architecture: Mixture of Experts (MoE).
Dynamic Activation: Activates 18 to 31 billion parameters per token (averaging 27 billion) dynamically, improving efficiency.
Status: Free to use on LongCat's site; no account needed.
Upcoming Variant: Working on a "thinking variant" of the model.

Availability and Deployment #

Official API: No official API currently available.
Compute Requirements: Requires significant compute power, estimated at 8 H200 clusters.
Provider Support: Limited support from inference providers.
Shoots.ai: Offers access at $0.19/million tokens for input and $0.80/million tokens for output.
Self-Deployment: Deployable on Lightning AI with 8 H100s.
Frameworks: Official support in SGLANG; VLM has a pending PR.
Dependencies: Uses Flash Infer for speed.
Challenges: Not yet available on Olama; no quantization available.
Call for Action: Hope for broader inference provider support and an official API from LongCat.

Performance Benchmarks #

KingBench Tests: Ran all tests in under a minute.
Leaderboards: Ranks fourth, with performance comparable to DeepSeek GLM.
Comparison to Sonnet: Beats Sonnet in various benchmarks.

Creative Task Performance (KingBench Examples) #

Floor Plan Generation: Generates functional floor plans with visible walls (furniture placement is decent but not perfect).
SVG Creation: Excellent at creating SVGs (e.g., a panda SVG with a burger).
3JS Integration (Pokeball): Did not work.
Chessboard with Autoplay: Generated a functional chessboard with legal but sometimes "dumb" moves; logs of moves provided.
Kandinsky Minecraft Clone: Works but appears "glitchy" due to the style.
Butterfly Image: Resulted in a blank screen.
CLI Tool (Image Conversion): Worked well.
Blender Script (Pokeball): Generated elements but did not resemble a Pokeball.
General Riddles: Solved easily.

Tool Calling Performance (AI Coding) #

AI Coding Testing: Tested extensively for AI coding.
Speed: Amazingly fast when deployed on Lightning AI with 8 H200s.
Movie Tracker Mobile App (Expo): Successfully created a movie tracker app with well-executed tool calling, minimal failures, terminal command execution, and good code generation; mostly a one-shot generation with one error fix.
Movie Tracker Web App (Next.js): Also successfully built a functional web app version, working fast and well.

Overall Assessment and Recommendations #

High Praise: Considered a "really awesome model" and a significant achievement, especially for a first-generation model from a food delivery company.
Comparison to OpenAI: Positively contrasted with GPTOSS for its superior quality.
Recommendation: Highly recommended to check out via Shoots.ai or by deploying on a GPU cloud.
Hopes for Future: Hopes for quantization options, Olama availability, and an official APIプラットフォーム from LongCat.

last updated: 2025-09-03