The speaker, Jakub Mrugalski, aims to present AI to be useful for everyone, from work to hobbies. He introduces himself as a proponent of AI/LLM implementation. The presentation addresses common skepticism and misconceptions about LLMs in the context of IT and everyday tasks.
Skepticism Towards LLMs #
- Misconception 1: LLMs make mistakes and "talk nonsense."
- Mistakes are like 2+2=5, while "talking nonsense" is entirely irrelevant information (e.g., 2+2=go to Wałbrzych).
- This often happens with unoptimized models or free, test-based implementations.
- Misconception 2: LLMs are slow.
- Compared to human-written code, LLMs are generally slower.
- Programmers often prove their code is better, cheaper, faster, and easier to maintain/debug, which is true.
- The comparison point should be the task LLM is replacing, not optimized code (e.g., LLM vs. a human taking 5 minutes to categorize a ticket versus LLM doing it in 13 seconds).
- Misconception 3: Users have to explain things to LLMs, making it time-consuming.
- This is often true, requiring detailed step-by-step instructions and error handling.
- The speaker draws a humorous parallel: these same criticisms (makes mistakes, talks nonsense, slow, needs explaining) apply to a junior-level human employee or intern.
- Misconception 4: It's faster to do it yourself manually.
- Often true for a well-versed experienced professional.
The "Zdzisław" Problem: Automating Human Tasks #
- Identifying Automatable Work: Some work is repetitive (copy-paste, data entry, information retrieval) and can be automated even without AI.
- The Unscalable Human Element: Many IT systems have advanced tech but rely on a human ("Zdzisław") at the end for crucial tasks like approving tickets, assigning categories, or interpreting user intent. Zdzisław doesn't scale.
- AI as an Enabler: AI extends the "Zdzisław boundary," allowing code to perform tasks previously requiring human intervention, thus expanding capabilities.
- LLMs as Support for Human Intelligence: LLMs should augment, not replace, human intelligence, making tasks faster, more efficient, and with fewer errors. Zdzisław becomes a supervisor/verifier instead of a manual executor.
Practical Applications of LLMs #
- Content Moderation:
- Traditional regex/keyword-based moderation fails with nuanced language (e.g., "ty klarnecie jeden").
- Humans excel at understanding context and sentiment.
- AI can analyze sentiment and context across diverse vocabularies, often performing better than humans due to comprehensive linguistic knowledge.
- Moderation API (OpenAI) offers text and multimodal (image) categorization, often free to use, being faster and more effective than manual or self-coded solutions.
- Information Retrieval (Vector Databases):
- Traditional search (full-text, fuzzy, exact) struggles with descriptive, natural language queries (e.g., "app that looks like hand-drawn").
- Vector databases (embeddings): Represent text as numerical vectors to capture semantic meaning.
- Easier to use now than historically (lower barrier to entry).
- Process:
- Use an LLM to vectorize text (convert snippets of information into numerical vectors).
- Load vectors into a vector database.
- When a user queries, vectorize their query and compare it to vectors in the database to find semantically similar results.
- Chunking: The critical step of dividing documents into sensible "chunks" for vectorization (e.g., sentences, paragraphs, or logical parts). This ensures connected information isn't split.
- Anomaly Detection / Experience Capture (Fine-Tuning):
- Automating tasks where human expertise is hard to articulate (e.g., QA team knowing "at a glance" if a report is correct). This is "stealing someone's experience."
- Fine-tuning process:
- Gather positive and negative examples of classified data.
- Use these examples to "fine-tune" a model. This is not re-training but adapting the model's output style or inference based on samples.
- The structure for training is usually
chatml
format (system, user, assistant roles) with compressed JSONL files. - Hundreds of examples are typically sufficient for new models.
- Example: Speaker fine-tuned a model for 30 cents to replicate his newsletter link selection (taste) and for 8 dollars to replicate his writing style.
- Addressing LLM "Hallucinations" and Limited Knowledge:
- Hallucinations: LLMs "hallucinate" by default; it's how they generate text. The goal is for them to hallucinate in the "right" direction (produce factual or desired output).
- Limited/Outdated Knowledge: Many production LLMs have knowledge cut-offs (e.g., September 2023).
- Online LLMs exist but can be prone to "SEO" spam/misinformation.
- Solution: Restrict knowledge source. For professional applications, don't rely on the model's inherent knowledge base. Provide a specific "playbook" or knowledge base and instruct the LLM to only use that. If information is outside the scope, it should state it cannot answer or escalate.
- In-context learning (Prompt Engineering): For tasks like a school shop bot, all relevant information (menu, hours, specific rules) is provided directly within the prompt. The LLM is explicitly told not to add outside information.
- Managing LLM Output and Debugging:
- "Thinking" Process: LLMs "think aloud" (generate tokens sequentially). Giving them space to "think" (e.g., explicitly asking for reasoning steps before the final answer) improves accuracy.
- Parsable Output: Instruct LLMs to return answers in structured formats (e.g., JSON) so they can be easily processed programmatically. This may cost more (more tokens generated) but yields better, more reliable results.
- LLM Memory and Context Window ("Rag"):
- LLM context window limitations (e.g., 128k or 1M tokens) means entire company documentation won't fit.
- Retrieval Augmented Generation (RAG):
- Vectorize a knowledge base (e.g., documents in a database).
- When a user asks a question, retrieve the most relevant "chunks" of information from the knowledge base using vector similarity.
- Dynamically insert these retrieved chunks into the LLM's prompt.
- The LLM then generates an answer based only on the provided, relevant context, minimizing hallucination.
General Advice and Mindset #
- LLMs are not Oracles: Even with controlled knowledge and RAG, hallucinations can still occur. Users of LLM-powered tools must be aware.
- Internal First: Start by improving internal company processes/workflows with LLMs before deploying customer-facing solutions.
- Local Models: For privacy or cost, local LLMs can be deployed on less powerful hardware (e.g., a Mac for consumer-grade private use).
- Human Oversight: Don't completely replace humans. Humans should supervise automated processes, verifying outputs and correcting errors.
- Avoid High-Stakes Applications: Don't use LLMs for tasks requiring extreme precision, like medical advice, financial investments, or where legal ramifications are significant.
- Simplicity Wins: The presented applications are relatively "kindergarten-level" for LLMs, but they demonstrate practical, achievable value.
- Don't Fear, Learn: Programmers, especially seniors, might fear LLMs because they represent a "magic" beyond their full comprehension or control. This limits their adoption.
- Focus on Utility, Not Misconceptions:
- If you don't like LLMs for code generation – don't use them for it.
- If you don't like chatbots – don't build chatbots.
- LLMs can create drafts for human review (e.g., automatically drafting email responses for customer support, speeding up reaction time without directly sending AI-generated replies).
- Programmers as a Threat: The speaker ends by humorously flipping the script: programmers are not threatened by LLMs; they are the threat who can automate others' jobs.
Q&A #
- Handling Visual Information (Charts, Diagrams): LLMs struggle directly with images. Best practice is to extract numerical data from charts and represent it textually (e.g., "X-axis is..., Y-axis is..."). Automate this extraction. Focus on indexing only the relevant parts of a document that human decision-makers actually use. Requires data restructuring and preparation by a programmer.
last updated: