LLM y dla sceptykw co moesz

The speaker, Jakub Mrugalski, aims to present AI to be useful for everyone, from work to hobbies. He introduces himself as a proponent of AI/LLM implementation. The presentation addresses common skepticism and misconceptions about LLMs in the context of IT and everyday tasks.

Skepticism Towards LLMs #

Misconception 1: LLMs make mistakes and "talk nonsense."
- Mistakes are like 2+2=5, while "talking nonsense" is entirely irrelevant information (e.g., 2+2=go to Wałbrzych).
- This often happens with unoptimized models or free, test-based implementations.
Misconception 2: LLMs are slow.
- Compared to human-written code, LLMs are generally slower.
- Programmers often prove their code is better, cheaper, faster, and easier to maintain/debug, which is true.
- The comparison point should be the task LLM is replacing, not optimized code (e.g., LLM vs. a human taking 5 minutes to categorize a ticket versus LLM doing it in 13 seconds).
Misconception 3: Users have to explain things to LLMs, making it time-consuming.
- This is often true, requiring detailed step-by-step instructions and error handling.
- The speaker draws a humorous parallel: these same criticisms (makes mistakes, talks nonsense, slow, needs explaining) apply to a junior-level human employee or intern.
Misconception 4: It's faster to do it yourself manually.
- Often true for a well-versed experienced professional.

The "Zdzisław" Problem: Automating Human Tasks #

Identifying Automatable Work: Some work is repetitive (copy-paste, data entry, information retrieval) and can be automated even without AI.
The Unscalable Human Element: Many IT systems have advanced tech but rely on a human ("Zdzisław") at the end for crucial tasks like approving tickets, assigning categories, or interpreting user intent. Zdzisław doesn't scale.
AI as an Enabler: AI extends the "Zdzisław boundary," allowing code to perform tasks previously requiring human intervention, thus expanding capabilities.
LLMs as Support for Human Intelligence: LLMs should augment, not replace, human intelligence, making tasks faster, more efficient, and with fewer errors. Zdzisław becomes a supervisor/verifier instead of a manual executor.

Practical Applications of LLMs #

Content Moderation:
- Traditional regex/keyword-based moderation fails with nuanced language (e.g., "ty klarnecie jeden").
- Humans excel at understanding context and sentiment.
- AI can analyze sentiment and context across diverse vocabularies, often performing better than humans due to comprehensive linguistic knowledge.
- Moderation API (OpenAI) offers text and multimodal (image) categorization, often free to use, being faster and more effective than manual or self-coded solutions.
Information Retrieval (Vector Databases):
- Traditional search (full-text, fuzzy, exact) struggles with descriptive, natural language queries (e.g., "app that looks like hand-drawn").
- Vector databases (embeddings): Represent text as numerical vectors to capture semantic meaning.
  - Easier to use now than historically (lower barrier to entry).
  - Process:
    1. Use an LLM to vectorize text (convert snippets of information into numerical vectors).
    2. Load vectors into a vector database.
    3. When a user queries, vectorize their query and compare it to vectors in the database to find semantically similar results.
  - Chunking: The critical step of dividing documents into sensible "chunks" for vectorization (e.g., sentences, paragraphs, or logical parts). This ensures connected information isn't split.
Anomaly Detection / Experience Capture (Fine-Tuning):
- Automating tasks where human expertise is hard to articulate (e.g., QA team knowing "at a glance" if a report is correct). This is "stealing someone's experience."
- Fine-tuning process:
  1. Gather positive and negative examples of classified data.
  2. Use these examples to "fine-tune" a model. This is not re-training but adapting the model's output style or inference based on samples.
  3. The structure for training is usually chatml format (system, user, assistant roles) with compressed JSONL files.
  4. Hundreds of examples are typically sufficient for new models.
- Example: Speaker fine-tuned a model for 30 cents to replicate his newsletter link selection (taste) and for 8 dollars to replicate his writing style.
Addressing LLM "Hallucinations" and Limited Knowledge:
- Hallucinations: LLMs "hallucinate" by default; it's how they generate text. The goal is for them to hallucinate in the "right" direction (produce factual or desired output).
- Limited/Outdated Knowledge: Many production LLMs have knowledge cut-offs (e.g., September 2023).
  - Online LLMs exist but can be prone to "SEO" spam/misinformation.
  - Solution: Restrict knowledge source. For professional applications, don't rely on the model's inherent knowledge base. Provide a specific "playbook" or knowledge base and instruct the LLM to only use that. If information is outside the scope, it should state it cannot answer or escalate.
  - In-context learning (Prompt Engineering): For tasks like a school shop bot, all relevant information (menu, hours, specific rules) is provided directly within the prompt. The LLM is explicitly told not to add outside information.
Managing LLM Output and Debugging:
- "Thinking" Process: LLMs "think aloud" (generate tokens sequentially). Giving them space to "think" (e.g., explicitly asking for reasoning steps before the final answer) improves accuracy.
- Parsable Output: Instruct LLMs to return answers in structured formats (e.g., JSON) so they can be easily processed programmatically. This may cost more (more tokens generated) but yields better, more reliable results.
LLM Memory and Context Window ("Rag"):
- LLM context window limitations (e.g., 128k or 1M tokens) means entire company documentation won't fit.
- Retrieval Augmented Generation (RAG):
  1. Vectorize a knowledge base (e.g., documents in a database).
  2. When a user asks a question, retrieve the most relevant "chunks" of information from the knowledge base using vector similarity.
  3. Dynamically insert these retrieved chunks into the LLM's prompt.
  4. The LLM then generates an answer based only on the provided, relevant context, minimizing hallucination.

General Advice and Mindset #

LLMs are not Oracles: Even with controlled knowledge and RAG, hallucinations can still occur. Users of LLM-powered tools must be aware.
Internal First: Start by improving internal company processes/workflows with LLMs before deploying customer-facing solutions.
Local Models: For privacy or cost, local LLMs can be deployed on less powerful hardware (e.g., a Mac for consumer-grade private use).
Human Oversight: Don't completely replace humans. Humans should supervise automated processes, verifying outputs and correcting errors.
Avoid High-Stakes Applications: Don't use LLMs for tasks requiring extreme precision, like medical advice, financial investments, or where legal ramifications are significant.
Simplicity Wins: The presented applications are relatively "kindergarten-level" for LLMs, but they demonstrate practical, achievable value.
Don't Fear, Learn: Programmers, especially seniors, might fear LLMs because they represent a "magic" beyond their full comprehension or control. This limits their adoption.
Focus on Utility, Not Misconceptions:
- If you don't like LLMs for code generation – don't use them for it.
- If you don't like chatbots – don't build chatbots.
- LLMs can create drafts for human review (e.g., automatically drafting email responses for customer support, speeding up reaction time without directly sending AI-generated replies).
Programmers as a Threat: The speaker ends by humorously flipping the script: programmers are not threatened by LLMs; they are the threat who can automate others' jobs.

Q&A #

Handling Visual Information (Charts, Diagrams): LLMs struggle directly with images. Best practice is to extract numerical data from charts and represent it textually (e.g., "X-axis is..., Y-axis is..."). Automate this extraction. Focus on indexing only the relevant parts of a document that human decision-makers actually use. Requires data restructuring and preparation by a programmer.

last updated: 2025-08-01