AI engineering learning path
A comprehensive AI course to build core machine learning and AI skills

To see why and how I built this course, see my blog post Learning AI with AI.
Table of contents
- 1. Introduction to machine learning
- 2. Mathematical foundations for ML
- 3. Classical machine learning algorithms
- 4. Feature engineering & data preprocessing
- 5. Model evaluation & explainability
- 6. Neural networks & deep learning
- 7. Convolutional neural networks (CNNs)
- 8. Recurrent neural networks & transformers
- 9. Vector embeddings & text representations
- 10. Large language models (LLMs)
- 11. Prompt engineering & retrieval-augmented generation (RAG)
- 12. Generative AI & diffusion models
- 13. Reinforcement learning
- 14. Model deployment & MLOps
- 15. AI ethics & bias mitigation
- 16. Agentic AI & tool-using models
- 17. Cutting-edge research & future trends
1. Introduction to machine learning
Learn what ML is, the different types (supervised, unsupervised, reinforcement learning), and why it’s transforming industries.
Why learn this: Establishes a high-level understanding of ML before diving into technical details.
Project ideas:
- Write an explainer blog post on the different types of ML, applying examples from real-world applications.
1/1. Machine learning introduction through stanford university on coursera
Usefulness: 5 / 5, Time needed: 80 hours, read on laptop: Link – manual notes: Do a course (not expensive and should be cool, plus I can add ""Machine Learning Stanford"" to my resume like this There are a few other ones here)
The Machine Learning Specialization is a comprehensive program designed to help learners master fundamental AI concepts. Offered by Stanford University and DeepLearning.AI, it provides a structured pathway for individuals seeking to enter the AI and machine learning field through professional training.
For ML/AI engineers, this specialization offers a structured learning path to build core machine learning skills. It covers essential concepts and techniques, making it valuable for engineers looking to strengthen their foundational knowledge or transition into machine learning roles.
Task: Enroll in and complete this specialization to gain a comprehensive introduction to machine learning, covering fundamental AI concepts and techniques.
Output: Publish certificate of completion on resume and LinkedIn profile, and apply learned concepts to future projects
1/2. Machine learning engineering roadmap
Usefulness: 5 / 5, Time needed: 2 hours, read on laptop: Link – manual notes: Learn ML engineering (I've created a roadmap of concepts to learn. Would be a good starting point...)
The roadmap covers essential topics in Machine Learning Engineering, starting from foundational knowledge like Python, mathematics, and data structures, progressing through machine learning concepts, deep learning, model deployment, and advanced topics like MLOps, system design, and emerging technologies.
This roadmap serves as an excellent structured guide for learning ML engineering, offering a clear progression from basics to advanced concepts. It helps learners understand the interconnected skills and technologies needed to become a proficient machine learning engineer.
Task: Read it to understand the overall structure and required skills for machine learning engineering, and use it as a guide to plan your learning path.
Output: Understand the basics of machine learning engineering and create a personalized learning plan
1/3. Google's machine learning crash course
Usefulness: 5 / 5, Time needed: 15 hours, read on laptop: Link – manual notes: Check a free, in-depth ML course from Google
Machine Learning Crash Course is a free online learning platform by Google, featuring 12 modules spanning ML models, data handling, and advanced techniques. The course includes 100+ interactive exercises, video lectures, and practical visualizations, designed to provide a comprehensive introduction to machine learning in approximately 15 hours.
This resource is excellent for engineers looking to build a solid foundation in machine learning. It covers essential topics like regression, classification, neural networks, and data preprocessing, with hands-on exercises that help translate theoretical concepts into practical skills. The modular design allows learners to focus on specific areas of interest.
Task: Complete the course to gain a comprehensive introduction to machine learning, covering essential topics like regression, classification, neural networks, and data preprocessing.
Output: Understand the basics of machine learning
1/4. Introduction to python programming with ai assistance
Usefulness: 4 / 5, Time needed: 4 hours 10 minutes, read on laptop: Link – manual notes: Do AI Python for beginners course by Andrew Ng (seems free)
AI Python for Beginners is a 4-hour course by Andrew Ng that teaches Python programming fundamentals through hands-on AI-powered projects. Students learn coding basics, data manipulation, and how to create practical applications like recipe generators and travel planners, with real-time AI assistance to help write, debug, and understand code.
For ML/AI engineers, this course offers a foundational Python learning experience that emphasizes practical AI integration. It provides hands-on experience with AI tools, libraries like matplotlib and BeautifulSoup, and demonstrates how to leverage AI for code development and problem-solving across various domains.
Task: Take this course to learn the basics of Python programming and understand how AI can assist in coding and problem-solving. Complete the hands-on projects to gain practical experience with AI tools and libraries.
Output: Complete the course and publish code for the projects, such as the recipe generator and travel planner, to demonstrate understanding of Python programming and AI integration.
1/5. Datacamp ai courses
Usefulness: 5 / 5, Time needed: 4 hours, read on laptop: Link – manual notes: Check out DataCamp courses (DataCamp courses for learning AI!)
DataCamp provides a wide range of AI courses for learners at different levels, with 55 total results spanning topics like ChatGPT, prompt engineering, deep learning, and AI ethics. Courses range from 1-4 hours and cover fundamental concepts, practical applications, and emerging AI technologies.
For ML/AI engineers, these courses offer structured pathways to understand cutting-edge AI technologies, from foundational concepts to practical implementation. The courses cover critical areas like LLMs, prompt engineering, ethical AI, and hands-on development using tools like PyTorch and LangChain.
Task: Explore the courses on this page to find relevant AI topics, such as ChatGPT, generative AI, and large language models. Take notes on the courses that interest you and consider taking them to deepen your understanding of AI concepts.
Output: Understand the basics of AI and identify relevant courses for further learning
1/6. Matt might's blog on technical topics including ai
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check this awesome blog from Matt Might (the articles on AI are especially good)
Matt Might's blog is an extensive collection of technical articles covering programming languages, computer science, academic life, and personal development. The index showcases a diverse range of topics including AI, functional programming, static analysis, language design, and practical coding techniques.
For ML/AI engineers, this blog offers valuable technical insights, particularly through articles like 'Hello, Perceptron: An introduction to neural networks' and 'Make a game in 5 minutes with generative AI'. The site provides deep technical perspectives that can supplement formal learning and offer practical programming and theoretical insights.
Task: Read the relevant articles on AI to gain deep technical perspectives and supplement formal learning. Explore the whole surroundings of this source to understand the topics discussed.
Output: Understand the basics of neural networks and generative AI
1/7. Hyperskill courses for ml/ai skill development
Usefulness: 4 / 5, Time needed: 2 hours, read on laptop: Link – manual notes: Use Hyperskill courses (I have a coupon for 3 months for 50% off from JetBrains: ""EXTENDJB50"")
Hyperskill offers a diverse range of learning tracks and project-based courses across multiple programming languages like Python, Java, Kotlin, and JavaScript. Each course provides hands-on projects, estimated learning time, and a certificate upon completion, with courses designed to take learners from beginner to advanced skill levels.
For ML/AI engineers, Hyperskill offers targeted learning paths in Python, Data Science, Machine Learning, and related domains. The project-based approach allows learners to gain practical skills by building real-world applications, which is crucial for developing applied machine learning and AI competencies.
Task: Use the Hyperskill courses to gain practical skills in ML/AI by building real-world applications, and take advantage of the coupon EXTENDJB50 for 50% off for 3 months
Output: Publish code that does practical ml/ai projects
2. Mathematical foundations for ML
Covers linear algebra, probability, and optimization techniques crucial for understanding ML algorithms.
Why learn this: Strong math foundations enable better understanding of model behavior.
Project ideas:
- Implement gradient descent from scratch in Python and visualize convergence.
- Create a visual interactive demo explaining eigenvalues and PCA.
2/1. Understanding singular value decomposition
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Understand SVD
Singular Value Decomposition (SVD) is a powerful matrix factorization technique that can transform any linear transformation into a combination of rotations and scalings. By decomposing a matrix into three component matrices, SVD reveals fundamental geometric properties of linear transformations, showing how any matrix can be understood as stretching, compressing, or reflecting a geometric object.
For ML/AI engineers, understanding SVD is crucial as it's a fundamental technique in dimensionality reduction, data compression, and feature extraction. The article provides an intuitive, geometric approach to learning SVD, making it easier to grasp the underlying mathematical principles that are essential in machine learning algorithms like PCA and various data analysis techniques.
Task: Read this article to gain a deep understanding of SVD, its geometric intuition, and mathematical formalization. This will help in grasping the underlying principles of dimensionality reduction, data compression, and feature extraction in machine learning.
Output: Understand the basics of SVD and its applications in machine learning
3. Classical machine learning algorithms
Covers regression models, decision trees, SVMs, and clustering techniques.
Why learn this: Fundamental algorithms still power many ML systems today.
Project ideas:
- Build a price prediction model for housing data.
- Implement a clustering algorithm to group customer purchase behavior.
4. Feature engineering & data preprocessing
Learn how to clean, transform, and optimize data for ML.
Why learn this: Garbage in, garbage out—good data engineering is crucial.
Project ideas:
- Create a pipeline that automatically preprocesses messy data for ML models.
- Write a blog post on the most useful feature engineering techniques.
5. Model evaluation & explainability
Learn about precision, recall, AUC, and model interpretability techniques.
Why learn this: Understanding model performance prevents misleading results.
Project ideas:
- Build an interactive dashboard that visualizes confusion matrices.
- Use SHAP to analyze feature importance in an ML model.
5/1. Understanding monosemanticity in ai neural networks
Usefulness: 5 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn about Monosemanticity (article is a good intro that Matt Manela read, then this is a paper that Rik Nauta linked.)
Anthropic researchers have developed a method to peek inside AI neural networks, revealing that AIs use complex strategies like 'superposition' to represent concepts efficiently. By creating an autoencoder that simulates a higher-dimensional representation of an AI's neurons, they discovered that these simulated neurons can be surprisingly interpretable, with some representing specific concepts like 'God' or text genres.
For ML/AI engineers, this paper offers crucial insights into how neural networks actually represent and process information. Understanding superposition and feature representation can help develop more transparent and interpretable AI systems, which is critical for improving AI safety, debugging models, and understanding machine learning at a deeper level.
Task: Read the article to understand the concept of monosemanticity and its relation to AI interpretability, then explore the linked paper for a deeper dive into the research
Output: Understand the basics of monosemanticity and its implications for AI interpretability
6. Neural networks & deep learning
Introduction to multi-layer perceptrons and backpropagation.
Why learn this: Forms the foundation of modern AI techniques.
Project ideas:
- Implement a neural network from scratch in NumPy.
- Train a basic image classifier using PyTorch or TensorFlow.
6/1. Introduction to neural networks with 3Blue1Brown
Usefulness: 5 / 5, Time needed: 25 minutes, watch: Link – manual notes: Learn ML from 3Blue1Brown series (Might be a great starting point)
The video introduces neural networks through a classic example of handwritten digit recognition, explaining how a network with multiple layers can transform pixel inputs into digit classifications. It details the network's structure, including input layers (representing pixel values), hidden layers, and output layers, and describes how neurons are connected through weights and biases.
For ML/AI learners, this resource provides an intuitive, visual explanation of neural network fundamentals, breaking down complex concepts into digestible components. It offers insights into how neural networks process information, the role of weights and biases, and the potential for layered networks to recognize increasingly abstract patterns across different domains.
Task: Watch it to get an intuitive understanding of neural network fundamentals, including their structure, operation, and application in digit recognition.
Output: Understand the basics of neural networks and their application in digit recognition
6/2. History of neural networks and deep learning
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Re-read this for history
A detailed historical narrative of neural networks, starting from Frank Rosenblatt's perceptron in the 1950s through the AI winters and resurgence, highlighting pivotal developments like backpropagation, convolutional neural networks, and probabilistic neural models that gradually transformed neural networks from a theoretical curiosity to a powerful machine learning approach.
This article is an excellent primer for understanding the foundational concepts and historical progression of neural networks and deep learning. It provides context for modern machine learning techniques, explaining how fundamental ideas evolved, which helps learners appreciate the theoretical underpinnings and incremental innovations that led to today's AI capabilities.
Task: Read this article to understand the historical development of neural networks from the 1950s to the present, including key milestones and innovations that shaped the field of deep learning.
Output: Understand the historical context of neural networks and deep learning
6/3. Neural network initialization and training dynamics
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, watch: Link – manual notes: Continue makemore studies with Andrej Karpathy ((Need large chunks of time for this.) See also my notes in [[2023-03-31 Learn ChatGPT]]. I think the ultimate goal here is to train my own foundation model.)
The lecture discusses critical aspects of neural network training, specifically how to initialize weights and layers to ensure stable and effective learning. It covers techniques like careful weight scaling, understanding activation distributions, and introduces batch normalization as a method to control and standardize neural network layer activations.
For ML/AI engineers, this lecture provides deep insights into neural network initialization and training dynamics. It offers practical diagnostic tools for understanding network behavior, explaining how to monitor activation statistics, gradient flows, and parameter updates to ensure robust and efficient neural network training.
Task: Watch this lecture to gain deep insights into neural network initialization and training dynamics, and to learn practical diagnostic tools for understanding network behavior.
Output: Understand the importance of proper neural network initialization and how to apply techniques like weight scaling and batch normalization to ensure stable training.
6/4. Neural networks from scratch with Andrej Karpathy's project
Usefulness: 5 / 5, Time needed: 8 hours, write on laptop: Link – manual notes: Go through ""Neural Networks: Zero to Hero""
Karpathy's 'Neural Networks: Zero to Hero' is a comprehensive GitHub repository designed to teach neural networks through practical, ground-up implementation. It appears to be a detailed learning journey from fundamental concepts to advanced neural network architectures.
For ML/AI engineers, this repository offers a deep-dive learning path into neural network fundamentals. By walking through implementations from scratch, learners can gain profound insights into the inner workings of neural networks, making it an excellent resource for understanding core ML principles.
Task: Go through the repository and implement the neural networks from scratch to gain a deep understanding of the fundamentals and advanced architectures.
Output: Publish code that implements a neural network from scratch
6/5. Apple's transformer architecture optimized for apple silicon
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Understand transformer architecture optimized for Apple Silicon
Apple has released a GitHub repository for a Transformer architecture optimized for Apple Silicon, signaling their strategic approach to AI by focusing on efficient, privacy-preserving, on-device machine learning models that can leverage their specialized neural engine hardware.
This resource provides insights into how hardware-specific optimization can dramatically improve machine learning model performance, particularly for edge computing. Engineers can learn about the intersection of specialized chip design, model architecture, and on-device AI inference strategies.
Task: Read it to understand how apple is approaching ai and on-device inference capabilities, and to learn about the intersection of specialized chip design, model architecture, and on-device ai inference strategies
Output: Understand the basics of transformer architecture and its optimization for apple silicon
6/6. Apple neural engine transformers repository
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check Apple Neural Engine (ANE) Transformers
Apple's ml-ane-transformers is an open-source reference implementation of the Transformer architecture specifically optimized to run efficiently on Apple Neural Engine (ANE), providing developers with a specialized tool for machine learning model deployment on Apple devices.
For ML/AI engineers interested in hardware-specific optimization, this repository offers insights into how Transformer models can be tailored for specialized neural processing units, demonstrating practical approaches to improving inference performance on Apple platforms.
Task: Explore the repository to understand how Transformer models are optimized for Apple Neural Engine, and use the code as a reference for deploying ML models on Apple devices.
Output: Understand the basics of optimizing Transformer models for Apple Neural Engine
6/7. Exploring transformers.js examples for machine learning in javascript
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check Transformers.js examples
Tom Dörr shared a collection of demos and example applications for Transformers.js, showcasing machine learning capabilities like text embeddings, sentiment analysis, and image segmentation across JavaScript environments including Node.js, Deno, and WebGPU.
For ML/AI engineers interested in JavaScript-based machine learning, this resource provides practical examples of applying transformer models directly in web and server-side environments, demonstrating how to implement various AI tasks using a lightweight, versatile library.
Task: Check the Transformers.js examples to understand how to apply transformer models in web and server-side environments, and explore the demos for text embeddings, sentiment analysis, and image segmentation.
Output: Understand how to use Transformers.js for machine learning tasks in JavaScript environments
6/8. Evolution simulation with neural networks and genetic algorithms
Usefulness: 4 / 5, Time needed: 25 minutes, watch, write on laptop: Link – manual notes: Recreate ""Simple organisms"" demo (made 14 years ago by a guy who now works at Tesla)
This project creates a simulated 'primordial soup' environment where organisms with neural networks evolve survival strategies. The organisms use two antennas to detect food, have neurons that simulate brain activity, and employ a genetic algorithm to select organisms that can survive longer by efficiently finding and consuming food resources.
For ML/AI learners, this project provides an excellent practical demonstration of neural network design, genetic algorithms, and evolutionary computation. It illustrates how complex behaviors can emerge from simple rules, and shows techniques for modeling biological processes in computational systems.
Task: Watch it to understand how neural networks and genetic algorithms can be applied to simulate evolution and complex behaviors in a food-seeking environment. Try to recreate the 'Simple organisms' demo as a project to solidify understanding.
Output: Publish code that recreates the 'Simple organisms' demo, understanding the application of neural networks and genetic algorithms in evolutionary computation.
7. Convolutional neural networks (CNNs)
Understand CNN architectures, from simple convolutions to ResNets.
Why learn this: Essential for computer vision applications.
Project ideas:
- Train a CNN to classify images from the CIFAR-10 dataset.
- Build a real-time webcam-based object detector.
7/1. Exploring facefusion for advanced face manipulation techniques
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try facefusion
FaceFusion is an industry-leading face manipulation platform hosted on GitHub. The repository provides an open-source solution for advanced face manipulation techniques, likely involving deep learning and computer vision technologies for transforming or synthesizing facial images.
For ML/AI engineers, this repository offers a practical example of applying deep learning to image manipulation. By exploring its code, developers can learn techniques in face detection, generation, and transformation, gaining insights into state-of-the-art face manipulation algorithms and implementation strategies.
Task: Explore the FaceFusion repository to understand its implementation of face manipulation using deep learning and computer vision. Try to replicate or build upon its techniques to gain hands-on experience.
Output: Publish code that applies face manipulation techniques learned from FaceFusion
7/2. MonST3R: a novel approach for estimating geometry in dynamic scenes
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Play with Monst3r (that reverse-engineers videos to 4D models: github project, github repo)
MonST3R is a computer vision research project that introduces a novel method for estimating geometry in dynamic scenes. By adapting a pointmap representation and strategically fine-tuning on limited dynamic video datasets, the researchers demonstrate a surprisingly effective approach to 4D scene reconstruction, outperforming prior methods in video depth estimation, camera pose tracking, and scene understanding.
For ML/AI engineers interested in 3D computer vision and scene understanding, this paper offers insights into handling dynamic scenes, demonstrating how transfer learning and strategic fine-tuning can overcome data scarcity. The feed-forward approach and point cloud representation techniques could be valuable for developing robust multi-frame geometric estimation models.
Task: Read the research paper and explore the GitHub repository to understand the MonST3R approach and its applications in 4D scene reconstruction and video depth estimation.
Output: Understand the basics of 4D scene reconstruction and video depth estimation using MonST3R
7/3. Moondream for vision: A tiny, open-source vision language model
Usefulness: 4 / 5, Time needed: 25 minutes, watch: Link – manual notes: Learn about Moondream for vision (– this guy built a gaze detection application with it: https://www.reddit.com/r/LocalLLaMA/comments/1hz5caf/tutorial_run_moondream_2bs_new_gaze_detection_on/)
Moondream is a tiny, open-source vision language model under 2 billion parameters that can run anywhere. Developed as a developer tool, it focuses on accurately understanding images without hallucinating, using synthetic data and carefully processed training techniques to achieve performance comparable to much larger models like Lava 1.5.
For ML/AI engineers, this talk offers insights into developing small, efficient vision models, exploring synthetic data generation techniques, and understanding the trade-offs between model size and performance. It provides practical lessons in creating specialized AI tools that prioritize accuracy and developer usability over general intelligence.
Task: Watch it to understand the development and applications of Moondream, a small, open-source vision language model, and explore its potential in creating efficient AI tools
Output: Understand the basics of Moondream and its applications in vision language modeling
7/4. Practical application of computer vision for image background removal
Usefulness: 3 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check remove.bg for private models (Have we ever used BRIA for anything commercially?)
remove.bg is a web service that uses AI to automatically remove backgrounds from images in just 5 seconds. It supports multiple image types like people, products, animals, and cars, offering transparent or new backgrounds with high-quality results and integration capabilities for various software workflows.
For ML/AI engineers, this tool demonstrates practical application of computer vision and image segmentation techniques. It provides insights into how AI can be used for precise image manipulation, potentially offering inspiration for developing similar background removal or image processing algorithms.
Task: Explore remove.bg to understand how AI-powered image background removal works and consider its potential applications in computer vision projects.
Output: Understand the basics of image segmentation and background removal using AI
8. Recurrent neural networks & transformers
Learn about RNNs, LSTMs, and how they evolved into transformers.
Why learn this: Key to understanding modern NLP advancements.
Project ideas:
- Implement a simple RNN for text generation.
- Train a transformer model on a small dataset.
8/1. Understanding the illustrated transformer
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Understand ""The illustrated transformer"" ((Might be a good starting point to understand Transformers.) Beyang: ""This dude's entire blog is a fantastic resource"")
The Transformer is a neural network architecture introduced in the 'Attention is All You Need' paper, revolutionizing machine translation and sequential data processing. It uses self-attention mechanisms to process input sequences in parallel, enabling more efficient training and improved performance compared to previous sequence-to-sequence models like RNNs.
For ML/AI engineers, this resource provides an intuitive, visual explanation of the Transformer's inner workings. It's especially valuable for understanding the core mechanisms of modern language models, attention techniques, and the architectural innovations that led to breakthroughs in natural language processing and subsequent large language models.
Task: Read it to understand the basics of Transformer architecture and its applications in natural language processing
Output: Understand the basics of Transformer architecture and its applications in natural language processing
8/2. Transformer architecture tutorial
Usefulness: 5 / 5, Time needed: 2 hours, read on laptop: Link – manual notes: A great hands-on primer to Transformers (Might be a good starting point to understand Transformers.)
Brandon Rohrer's tutorial provides an in-depth exploration of transformer architecture, breaking down complex concepts like attention, embedding, and sequence modeling into digestible explanations. Starting from simple Markov chains, the tutorial progressively builds understanding of how transformers process and generate language sequences through matrix multiplications and neural network techniques.
For ML/AI engineers, this resource offers a granular understanding of transformer internals, explaining key mechanisms like multi-head attention, positional encoding, and skip connections. It's particularly valuable for those wanting to move beyond black-box understanding and gain insight into the mathematical and computational foundations of modern language models.
Task: Read it to understand the basics of transformer architecture, attention mechanism, and key components. Use this resource as a starting point to gain a deep understanding of how transformers process and generate language sequences.
Output: Understand the basics of transformer architecture and attention mechanism
8/3. Attention is all you need
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, read on laptop: Link – manual notes: Read ""Attention is all you need"" (Might be a good starting point to understand Transformers. 2017-06-12.)
The paper 'Attention Is All You Need' presents the Transformer, a novel neural network architecture that uses only attention mechanisms for sequence transduction tasks. Experiments on machine translation demonstrate superior performance, achieving state-of-the-art BLEU scores on English-to-German and English-to-French translation tasks with less training time and computational resources.
For ML/AI engineers, this paper is crucial as it introduced the Transformer architecture, which became the foundation for modern large language models like BERT, GPT, and many others. Understanding its attention mechanism and architectural design is fundamental to comprehending how contemporary NLP models function.
Task: Read this paper to understand the Transformer architecture and its attention mechanism, which is fundamental to comprehending modern NLP models.
Output: Understand the basics of the Transformer architecture and its attention mechanism
8/4. Visualizing neural machine translation models
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Understand ""Visualizing A Neural Machine Translation Model"" ((Might be a good starting point to understand Transformers.) Mechanics of Seq2seq Models With Attention)
The blog post provides an in-depth visual explanation of sequence-to-sequence neural machine translation models, detailing how encoders and decoders work together. It introduces the concept of attention mechanisms, which allow models to focus on relevant parts of input sequences, significantly improving translation quality by overcoming the limitations of fixed-context vector representations.
For ML/AI engineers, this resource offers an intuitive visualization of complex neural network architectures, particularly useful for understanding how attention mechanisms work in sequence-to-sequence models. The step-by-step graphical explanations make abstract machine learning concepts more concrete and accessible.
Task: Read this to understand the mechanics of sequence-to-sequence models with attention in neural machine translation, which can serve as a good starting point for understanding transformers.
Output: Understand the basics of sequence-to-sequence models with attention and how they apply to neural machine translation.
8/5. Introduction to RNN with Walkable Key-Value language model
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn what RNN is and see this project
RWKV is an open-source, RNN-based language model that achieves transformer-level performance with significantly lower computational requirements. It offers linear scaling to context length, 10-100x lower compute needs, and performs particularly well in multiple languages. The project is sponsored by organizations like Stability AI and EleutherAI, with multiple model versions available from 0.4B to 14B parameters.
For ML/AI engineers, RWKV represents an innovative alternative to traditional transformer models. Understanding its architecture could provide insights into more efficient neural network design, especially for projects with limited computational resources. The model's unique approach to handling context and training makes it a valuable case study in alternative deep learning architectures.
Task: Read it to understand the basics of RNN and its application in the RWKV language model, and explore the project to see how it achieves transformer-level performance with lower computational requirements
Output: Understand the basics of RNN and its application in the RWKV language model
8/6. Understanding the attention mechanism in transformers
Usefulness: 5 / 5, Time needed: 45 minutes, watch: Link – manual notes: Check this video on transformers and attention (Spent some time in the past two weeks digging into transformers... what a miracle!)
The video provides an in-depth explanation of the attention mechanism, a core component of transformer models like those used in large language models. It breaks down how words can dynamically adjust their semantic meaning based on context by using query, key, and value matrices, which allow words to 'attend' to and influence each other's representations in a high-dimensional embedding space.
For ML/AI engineers, understanding the attention mechanism is crucial for comprehending modern neural network architectures, especially in natural language processing. This resource offers a detailed, visual walkthrough of how transformers process text, making complex concepts like multi-headed attention and embedding spaces more accessible.
Task: Watch it to get an in-depth explanation of the attention mechanism and how words can dynamically adjust their semantic meaning based on context
Output: Understand the basics of the attention mechanism in transformers and how it applies to natural language processing
9. Vector embeddings & text representations
Covers word embeddings like Word2Vec, GloVe, and sentence transformers.
Why learn this: Essential for NLP and retrieval-based AI.
Project ideas:
- Build a semantic search engine using embeddings.
- Compare different embedding methods for text similarity.
9/1. Understanding machine learning embeddings
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Learn what embeddings are (from this comprehensive doc)
This is an in-depth exploration of machine learning embeddings, created as a comprehensive guide for various audiences including engineers, product managers, and students. The document aims to demystify embeddings by providing a foundational understanding, drawing parallels between mastering art and understanding machine learning techniques.
For ML/AI learners, this resource offers a structured approach to understanding embeddings, with sections tailored to different technical levels. It provides insights into vector representations, recommendation systems, and the fundamental building blocks of natural language processing, making complex concepts more accessible.
Task: Read this comprehensive guide to learn what embeddings are, how they work, and their importance in machine learning, particularly in natural language processing and recommendation systems.
Output: Understand the basics of embeddings and how they are used in machine learning
9/2. Vector embeddings as a multitool in ai
Usefulness: 5 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Learn about vector embeddings (Meet AI’s multitool: Vector embeddings)
Vector embeddings are a powerful machine learning technique that transforms various types of data (text, images, products) into points in a multi-dimensional space, enabling semantic similarity searches, recommendation systems, and advanced AI applications with minimal training data.
For ML/AI engineers, this article provides a foundational understanding of embeddings, demonstrating their versatility across domains. It offers practical insights into pre-trained models, embedding creation techniques, and real-world use cases, making it an excellent primer for understanding how semantic representations can enhance machine learning projects.
Task: Read this article to understand the basics of vector embeddings, their applications, and how they can be used in machine learning projects. Take notes on the different techniques for creating embeddings and the real-world use cases presented in the article.
Output: Understand the basics of vector embeddings and their applications in machine learning
9/3. Coding a vision language model from scratch
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, write on laptop: Link – manual notes: Code a vision language model from scratch
Umar Jamil has shared a YouTube video tutorial demonstrating how to code a Multimodal (Vision) Language Model from scratch using Python and PyTorch. The video covers detailed implementation steps for creating a vision model, specifically referencing PaliGemma and Gemma models, with a comprehensive explanation of every concept.
This resource is highly valuable for ML/AI engineers looking to understand multimodal model architecture, particularly those interested in vision-language models. The tutorial offers a practical, code-first approach to learning complex ML concepts, making it an excellent hands-on learning resource for understanding model implementation details.
Task: Watch the YouTube video tutorial to understand how to create a Multimodal (Vision) Language Model from scratch using Python and PyTorch, and follow along with the code implementation to gain practical experience.
Output: Publish code that implements a vision language model from scratch
9/4. Efficient vector search with ScaNN
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Learn about efficient vector search (Nice animation recommended by a Head of AI candidate for Sourcegraph.)
ScaNN is an open-source vector similarity search library developed by Google Research that introduces anisotropic vector quantization. This technique allows for more efficient and accurate maximum inner-product search (MIPS) by optimizing how database embeddings are compressed and compared, enabling faster semantic search across large datasets.
ML engineers can learn about advanced embedding search techniques, vector quantization strategies, and performance optimization in machine learning. The paper and library demonstrate how algorithmic innovations can significantly improve computational efficiency in large-scale machine learning applications, particularly in information retrieval and recommendation systems.
Task: Read it to understand the basics of efficient vector search and how ScaNN's anisotropic vector quantization technique can improve performance in semantic search and recommendation systems.
Output: Understand the basics of efficient vector search and ScaNN's anisotropic vector quantization technique
9/5. Customizing embeddings with OpenAI cookbook
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn how to customize embeddings (from this OpenAI example)
A Jupyter notebook from the OpenAI cookbook repository demonstrating techniques for customizing embeddings, which are crucial for transforming text data into machine-readable vector representations used in various machine learning and natural language processing tasks.
This resource provides practical insights into embedding customization, which is essential for ML engineers working with text data. By studying this notebook, learners can understand how to adapt and fine-tune embeddings for specific use cases, improving model performance and semantic understanding.
Task: Use this Jupyter notebook as a reference to learn how to customize embeddings for specific use cases, and experiment with the provided code to improve model performance and semantic understanding.
Output: Understand how to customize embeddings for specific NLP tasks
9/6. Embedding models in Ollama for semantic search and retrieval
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Play with Ollama embedding models
Ollama now supports embedding models, which generate vector embeddings - long arrays of numbers representing semantic meaning of text. These embeddings can be stored in databases for semantic search and retrieval. The page introduces several embedding models like mxbai-embed-large, provides usage instructions, and includes a detailed Python example of building a RAG application.
For ML/AI engineers, this resource offers practical insights into vector embeddings, demonstrating how to generate, store, and use semantic representations of text. The step-by-step example with Python code provides a hands-on approach to understanding embedding techniques and their integration with tools like ChromaDB and language models.
Task: Read it to understand how to generate, store, and use semantic representations of text with Ollama embedding models, and follow the Python example to build a RAG application
Output: Understand the basics of vector embeddings and how to use them in retrieval augmented generation workflows
9/7. Rerankers library for improving search and retrieval systems
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop, write on laptop: Link – manual notes: Learn about rerankers (seems important)
The 'rerankers' repository is a lightweight, low-dependency Python library that offers a unified API to use common reranking and cross-encoder models. It aims to simplify the process of integrating different reranking techniques into machine learning and AI workflows.
For ML/AI engineers, this library provides a practical tool to improve search and retrieval systems by implementing advanced reranking techniques. It could be particularly useful for those working on information retrieval, recommendation systems, or natural language processing projects.
Task: Read the documentation and explore the repository to understand how to use the rerankers library for improving search and retrieval systems in machine learning and AI workflows.
Output: Publish code that integrates the rerankers library into a project for search and retrieval system improvement
9/8. Understanding vector databases and inverted indexes
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Learn what an ""inverted index"" is (Julie: ""you can't use an inverted index for vector search..."" plus references to https://arxiv.org/abs/2111.08566)
Turbopuffer is a vector database designed to be 10x-100x cheaper than traditional solutions, with serverless vector and full-text search capabilities. Built on object storage, it offers massive scalability, usage-based pricing, and high performance, with production deployments by companies like Cursor, Suno, and Notion.
For ML/AI engineers, this resource provides insights into modern vector database design, cost optimization strategies, and scalable search infrastructure. It demonstrates practical approaches to handling large-scale vector embeddings, caching strategies, and efficient query performance in production environments.
Task: Read the Turbopuffer webpage to understand its architecture and how it relates to vector search and storage. Also, explore the referenced arxiv paper to learn more about the limitations of inverted indexes in vector search.
Output: Understand the basics of vector databases and inverted indexes, and how they are used in vector search and storage.
9/9. Multimodal recipe recommender tutorial
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Follow ""Multimodal Recipe Recommender"" tutorial (Medium)
The tweet describes a tutorial by Benito Martin on creating a multimodal RAG (Retrieval-Augmented Generation) pipeline that can ingest YouTube video playlists, process video descriptions and frames, and generate recipe recommendations using vector indexing with Qdrant and multimodal AI capabilities.
For ML/AI engineers, this resource provides insights into advanced multimodal AI techniques, demonstrating how to combine different data types (text, video frames) into a coherent recommendation system, and showcasing practical implementation of vector search and generative AI technologies.
Task: Follow the tutorial to learn how to build a multimodal RAG pipeline that can ingest YouTube video playlists, process video descriptions and frames, and generate recipe recommendations using vector indexing with Qdrant and multimodal AI capabilities.
Output: Publish code that implements a multimodal recipe recommender using Qdrant and LlamaIndex
9/10. Exploring google's gemma embeddings model
Usefulness: 4 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Understand and try this new embeddings model
Google has released GemmaEmbed, a dense-vector embedding model specifically designed for retrieval tasks. As of December 12, 2024, it has achieved the top position on the MTEB leaderboard with a score of 72.72, demonstrating significant performance in embedding technology.
For ML/AI engineers, this tweet provides insights into state-of-the-art embedding models, showcasing how advanced embedding techniques can improve retrieval performance. Understanding such models is crucial for developing efficient machine learning systems that require semantic search or information retrieval capabilities.
Task: Read it to understand the basics of Google's Gemma Embeddings model and its performance on the MTEB leaderboard
Output: Understand the basics of Google's Gemma Embeddings model
9/11. Exploring moonshine web for speech recognition
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Look into speech recognition options, per lang
Moonshine Web is an innovative web-based speech recognition tool that offers real-time transcription directly in the browser. The technology appears to be a significant advancement in speech-to-text technology, promising improved speed and accuracy compared to existing solutions like OpenAI's Whisper model.
For ML/AI engineers interested in speech recognition and natural language processing, this resource provides insights into cutting-edge browser-based speech recognition techniques. Understanding such technologies can be crucial for developing more efficient and accessible AI-powered communication tools.
Task: Read this post to understand the basics of Moonshine Web and its potential applications in speech recognition. Also, explore the comments and discussions to gain insights from the community.
Output: Understand the basics of Moonshine Web and its comparison to Whisper for speech recognition
10. Large language models (LLMs)
Learn about GPT, BERT, and fine-tuning LLMs.
Why learn this: LLMs are transforming how AI interacts with text.
Project ideas:
- Fine-tune a small LLM on a niche dataset.
- Build a chatbot using an LLM API.
10/1. Comprehensive guide to large language models
Usefulness: 5 / 5, Time needed: 3 hours, read on laptop: Link – manual notes: Check Normcore LLM reads (this is an AMAZING resource for learning everything before and around LLMs!)
A carefully curated 'anti-hype' reading list for understanding Large Language Models, covering foundational concepts, transformer architecture, significant open-source models, training data, pre-training techniques, and alignment methods like RLHF and DPO.
This resource is an excellent roadmap for ML/AI engineers wanting to deeply understand LLMs. It offers academic papers, technical blog posts, and explanatory resources that go beyond marketing hype, providing a structured approach to learning about transformer architectures, model training, and technological nuances.
Task: Read it to understand the technical foundations, evolution, and current state of LLMs, and use the provided resources to gain practical insights and demystify the hype around LLMs.
Output: Understand the basics of LLMs and their underlying technologies
10/2. Stanford lecture on building large language models
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, watch: Link – manual notes: Watch this Stanford lecture on building LLMs (It was incredibly good, want to write a separate thing about it.)
A comprehensive lecture exploring the technical intricacies of Large Language Models (LLMs), detailing the critical components of training such as pre-training on internet data, tokenization strategies, post-training techniques like supervised fine-tuning and reinforcement learning from human feedback, and the importance of scaling laws in model development.
For ML/AI engineers, this resource offers deep insights into the practical challenges of building LLMs, covering data processing, model alignment, evaluation techniques, and system optimization. The lecture provides a nuanced understanding of how modern AI models are developed beyond surface-level explanations.
Task: Watch this lecture to gain a deep understanding of the technical aspects of building Large Language Models, including pre-training, tokenization strategies, and system optimization. Take notes on key challenges and considerations in LLM development.
Output: Write a blog post or create a detailed notes document summarizing the key points from the lecture, focusing on the technical challenges and solutions in LLM development.
10/3. Minimal gpt implementation in pytorch
Usefulness: 5 / 5, Time needed: 1 hour, read on laptop: Link – manual notes: Learn AI with minGPT (chatGPT-like stuff)
MinGPT is a minimal, educational PyTorch implementation of the Generative Pretrained Transformer (GPT) by Andrej Karpathy. It serves as a clean, understandable reference for how GPT models are constructed, with a focus on code simplicity and learning potential for machine learning practitioners.
For ML/AI engineers, this repository is an excellent resource to understand the core mechanics of transformer-based language models. By studying the implementation, developers can gain insights into GPT architecture, learn PyTorch best practices, and see a simplified version of a complex deep learning model.
Task: Read and experiment with the code to understand the GPT architecture and PyTorch best practices
Output: Understand the basics of transformer-based language models and GPT architecture
10/4. Infini-attention for large language models
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check LLMs with 1M+ token context windows
The paper introduces 'Infini-attention', a new transformer attention technique that enables large language models to process infinitely long contexts with bounded memory and computation. By incorporating a compressive memory mechanism into standard attention, the approach allows efficient processing of extremely long sequences, demonstrated on tasks involving 1M token context lengths and 500K length book summarization.
For ML/AI engineers, this paper is crucial for understanding cutting-edge techniques in scaling transformer architectures. It provides insights into solving context length limitations in LLMs, which is a critical challenge in developing more capable and flexible language models that can handle extensive contextual information.
Task: Read this paper to understand the 'Infini-attention' technique and its implications for developing more capable language models that can handle extensive contextual information.
Output: Understand the 'Infini-attention' technique and its potential applications in large language models
10/5. Exploring gpt-4's capabilities and potential for artificial general intelligence
Usefulness: 5 / 5, Time needed: 1 hour, read on laptop: Link – manual notes: Read ""Sparks of Artificial General Intelligence: Early experiments with GPT-4"" (2023-03-22)
A comprehensive research paper investigating GPT-4's advanced capabilities, demonstrating its ability to solve complex tasks across mathematics, coding, vision, medicine, law, and psychology. The researchers argue that GPT-4 exhibits more general intelligence than previous AI models and could be viewed as an early, incomplete version of an Artificial General Intelligence (AGI) system.
This paper is crucial for ML/AI engineers to understand the current state of large language models and their potential for general intelligence. It provides insights into GPT-4's capabilities, limitations, and potential future research directions, offering a deep technical exploration of cutting-edge AI technology.
Task: Read this paper to gain insights into GPT-4's capabilities, limitations, and potential future research directions, offering a deep technical exploration of cutting-edge AI technology.
Output: Understand the current state of large language models and their potential for general intelligence
10/6. Fine-tuning openai models
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Learn about OpenAI fine-tuning (for example, preference fine-tuning)
Fine-tuning allows developers to customize AI models for specific tasks by training them on custom datasets. OpenAI supports fine-tuning for several GPT models, including gpt-4o-mini and gpt-3.5-turbo. The process involves preparing a high-quality training dataset, uploading it, creating a fine-tuning job, and then using the resulting specialized model.
This resource is invaluable for ML/AI engineers wanting to understand model customization. It provides detailed, practical guidance on preparing training data, understanding token limits, analyzing training metrics, and implementing fine-tuning strategies that can significantly improve model performance for specific use cases.
Task: Read it to understand the process and technical details of fine-tuning GPT models, and use the guide to implement fine-tuning strategies that can improve model performance for specific use cases.
Output: Understand the basics of fine-tuning OpenAI models and implement fine-tuning strategies to improve model performance
10/7. Microsoft's semantic kernel repository
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Learn about Microsoft's Semantic Kernel
Semantic Kernel is an open-source Microsoft project that enables developers to quickly integrate cutting-edge Large Language Model (LLM) technologies into their applications. It provides a flexible framework for building AI-powered software solutions with easy-to-use components and seamless LLM integration.
For ML/AI engineers, this repository offers a practical example of how to structure and implement LLM integration frameworks. By studying the code, documentation, and examples, developers can learn advanced techniques for building AI-powered applications and understand best practices in LLM technology implementation.
Task: Explore the repository to understand how to structure and implement LLM integration frameworks, and study the code, documentation, and examples to learn best practices in LLM technology implementation.
Output: Understand the basics of integrating LLM technology into applications
10/8. Building a chatbot in 87 minutes
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Play with AI (Took me 87 minutes to build a chatbot trained on Gumroad's Help Center docs. Fork askmybook, and replace manuscript text with scraped Help Center docs. That's it! Try it here | video tutorial)
Sahil Lavingia shared a concise tutorial for building a chatbot in just 87 minutes by forking a GitHub repository and replacing the training text with help center documentation. He demonstrates a straightforward approach to creating a custom AI-powered support tool.
For ML/AI engineers, this tweet provides a practical example of rapid AI prototype development, showcasing how to quickly train a custom chatbot using existing documentation. It illustrates techniques for knowledge transfer and demonstrates the accessibility of building AI-powered conversational interfaces.
Task: Read it to understand how to quickly build a custom chatbot using existing documentation, and watch the video tutorial to get a step-by-step guide
Output: Publish code that builds a custom chatbot using the askmybook repository and scraped Help Center docs
10/9. Openplayground repository for local large language model experimentation
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Try nat.dev
OpenPlayground is an open-source LLM (Large Language Model) playground designed to run directly on a personal laptop. The project aims to provide an accessible and local environment for experimenting with language models.
For ML/AI engineers, this repository offers insights into creating local AI model playgrounds, demonstrating practical approaches to making complex AI technologies more accessible and portable. It can serve as a learning resource for understanding LLM deployment and experimentation strategies.
Task: Explore the repository to understand how to create a local AI model playground, and consider implementing the strategies demonstrated for deploying and experimenting with LLMs.
Output: Understand the basics of deploying and experimenting with LLMs locally
10/10. Unifying large language models and knowledge graphs
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read paper: ""Unifying Large Language Models and Knowledge Graphs: A Roadmap"" ((2023-06-14) – Steve: ""This paper seems to be a fairly comprehensive (and daunting) survey..."")
The paper proposes a comprehensive roadmap for unifying Large Language Models and Knowledge Graphs, outlining three key frameworks: 1) KG-enhanced LLMs, which incorporate external knowledge during LLM training and inference, 2) LLM-augmented KGs, where LLMs help improve knowledge graph tasks, and 3) Synergized LLMs + KGs, where both technologies work mutually to enhance reasoning capabilities.
For ML/AI engineers, this paper provides critical insights into bridging the gap between structured knowledge representations (knowledge graphs) and generative AI models. Understanding these integration strategies can help develop more interpretable, knowledge-aware, and robust AI systems that can leverage both data-driven and knowledge-driven approaches.
Task: Read this paper to understand how to integrate Large Language Models with Knowledge Graphs and explore their synergistic potential across three key frameworks.
Output: Understand the basics of integrating LLMs and Knowledge Graphs
10/11. Advanced ai model capabilities showcased in a tweet
Usefulness: 4 / 5, Time needed: 5 minutes, read on mobile: Link – manual notes: Check another interesting anecdote about the new Claude models
Alex Albert shares a fascinating insight from testing Claude 3 Opus, where the AI model not only found a deliberately inserted 'needle' (a random sentence about pizza toppings) in a corpus of documents but also recognized that the sentence was artificially placed as part of an evaluation test.
This tweet provides an intriguing glimpse into advanced AI model capabilities, specifically around context understanding, meta-awareness, and the nuanced ways language models can detect and interpret deliberately constructed test scenarios. It offers insights into the evolving sophistication of AI language models.
Task: Read it to understand a unique aspect of AI model testing and development, specifically how models like Claude 3 Opus demonstrate meta-awareness and contextual understanding.
Output: Nothing, just read it.
10/12. Multi-needle retrieval benchmark for long-context LLMs
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check ""And then on top of multi-needle, there is reasoning + retrieval."" (by Beyang)
Lance Martin presents an extended version of the 'Needle in a Haystack' benchmark, testing GPT-4-128k's ability to retrieve and reason about multiple facts in long-context scenarios. His research reveals performance degradation as the number of facts, context window size, and needle placement increase.
This resource provides crucial insights into LLM context retrieval limitations, helping ML engineers understand how large language models handle complex information extraction tasks. The open-source code and detailed analysis make it an excellent practical learning resource for understanding LLM context processing.
Task: Read it to understand the limitations of LLM context retrieval and how to test their performance with multiple context retrievals
Output: Understand the limitations of LLM context retrieval and how to test their performance
10/13. Marlin repository for FP16xINT4 LLM inference kernel
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Understand GPTQ (and this would be a good benchmark for my learnings in this area)
Marlin is an open-source project by IST-DASLab that develops a specialized inference kernel for large language models, focusing on achieving near-ideal ~4x speedups for medium batch sizes (16-32 tokens) through FP16xINT4 quantization techniques.
For ML/AI engineers interested in model inference optimization, this repository offers insights into quantization strategies, kernel design, and performance enhancement techniques for large language models, particularly in reducing computational overhead.
Task: Read the repository documentation and explore the code to understand the quantization strategies and kernel design for large language models, focusing on performance enhancement techniques.
Output: Understand the basics of Marlin and its application in optimizing large language model inference
10/14. Exploring fish speech v1.4 on hugging face
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn to use Huggingface (This could be an interesting example)
Fish Speech V1.4 is an advanced multilingual text-to-speech model trained on 700k hours of audio data, supporting 8 languages including English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic. The model is released under a CC-BY-NC-SA-4.0 license and is associated with an arXiv paper detailing its development.
For ML/AI engineers interested in text-to-speech technologies, this model provides an excellent reference for multilingual TTS techniques. Its open-source nature and comprehensive documentation make it a valuable resource for understanding state-of-the-art speech synthesis approaches, especially for those working on voice generation or natural language processing projects.
Task: Read the model page to understand the capabilities and usage of Fish Speech V1.4, and explore the associated arXiv paper for a deeper dive into its development.
Output: Understand the basics of multilingual text-to-speech models and their applications
10/15. Evaluating large language models trained on code
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read ""Evaluating Large Language Models Trained on Code"" (2021-07-07)
The paper introduces Codex, a GPT language model fine-tuned on GitHub code, which demonstrates impressive Python code generation capabilities. On the HumanEval benchmark, the model solves 28.8% of problems directly, and up to 70.2% with repeated sampling. The researchers also explore the model's limitations and potential broader impacts of code generation technologies.
This paper is crucial for ML/AI engineers interested in code generation and large language models. It provides insights into how transformer models can be adapted for programming tasks, demonstrates evaluation methodologies for code generation, and highlights the potential and challenges of AI-assisted coding tools.
Task: Read this paper to understand the capabilities and limitations of large language models in code generation, and how they can be evaluated and fine-tuned for programming tasks.
Output: Understand the basics of large language models in code generation and their potential applications and limitations.
10/16. Synergizing reasoning and acting in language models with ReAct
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read ""eAct: Synergizing Reasoning and Acting in Language Models"" (2022-10-06)
ReAct is a method for large language models that interleaves reasoning traces and task-specific actions, enabling models to generate more interpretable and effective solutions. By allowing reasoning to help track and update action plans, and actions to interface with external sources, the approach overcomes issues like hallucination and improves performance on question answering, fact verification, and interactive decision-making tasks.
For ML/AI engineers, this paper provides insights into advanced language model techniques that go beyond traditional prompting. It demonstrates how combining reasoning and acting can make language models more robust, interpretable, and capable of handling complex tasks by dynamically updating action plans and interfacing with external knowledge sources.
Task: Read this paper to understand how ReAct combines reasoning and acting capabilities in large language models, and how it can improve model performance on tasks like question answering, fact verification, and interactive decision-making.
Output: Understand the ReAct approach and its potential applications in large language models
10/17. Understanding in-context learning in large language models
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read ""Why Can GPT Learn In-Context? ..."" (2022-12-20) (Steve: ""really interesting paper about how in-context learning simulates fine-tuning..."")
The paper investigates the mechanism behind in-context learning (ICL) in large language models, proposing that Transformer attention has a dual form of gradient descent. The authors argue that GPT models implicitly perform meta-optimization, generating meta-gradients from demonstration examples and applying them to build an in-context learning model.
For ML/AI engineers, this paper provides deep insights into the inner workings of large language models, particularly how they can quickly adapt to new tasks. Understanding this meta-optimization perspective can help in designing more flexible and efficient machine learning architectures and improving transfer learning techniques.
Task: Read this paper to gain deep insights into the inner workings of large language models and how they can quickly adapt to new tasks
Output: Understand the mechanism of in-context learning in large language models and how it can be applied to improve transfer learning techniques
10/18. Improving language models with explicit planning for logical reasoning
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read ""Explicit Planning Helps Language Models in Logical Reasoning"" (| twitter (2023-03-28))
LEAP is a novel system that uses language models for multi-step logical reasoning by incorporating explicit planning. The approach enables more informed reasoning decisions by looking ahead and predicting future effects. The system significantly outperforms competing methods across multiple datasets, performing competitively with GPT-3 despite being much smaller.
For ML/AI engineers, this paper offers insights into advanced reasoning techniques for language models. It demonstrates how strategic planning and lookahead mechanisms can improve logical reasoning capabilities, which is crucial for developing more sophisticated AI systems that can handle complex reasoning tasks.
Task: Read this paper to understand how strategic planning and lookahead mechanisms can improve logical reasoning capabilities in language models, and explore its potential applications in developing more sophisticated AI systems.
Output: Understand the basics of LEAP and its applications in language models, and potentially publish a summary or analysis of the paper
10/19. Self-refine: Iterative refinement with self-feedback for large language models
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Read ""Self-Refine: Iterative Refinement with Self-Feedback"" (2023-03-30)
Self-Refine is an innovative method for enhancing large language model performance by enabling models like GPT-3.5, ChatGPT, and GPT-4 to iteratively refine their own outputs. By providing self-feedback and iterative refinement, the approach improves task performance by approximately 20% across diverse tasks without requiring additional training or supervised data.
For ML/AI engineers, this paper offers insights into advanced language model techniques that go beyond single-pass generation. It demonstrates how meta-cognitive processes like self-reflection and iterative improvement can be programmatically implemented in AI systems, which could inspire more sophisticated prompt engineering and model interaction strategies.
Task: Read this paper to understand how self-refinement can improve large language model outputs and to gain insights into advanced language model techniques that go beyond single-pass generation.
Output: Understand the self-refine method for improving large language model outputs and its potential applications in prompt engineering and model interaction strategies.
10/20. Causal reasoning and large language models
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Read ""Causal Reasoning and Large Language Models: Opening a New Frontier for Causality"" (2023-04-28) – sounds interesting, maybe just read the abstract (sent by Beyang)
The paper investigates the causal reasoning capabilities of large language models (LLMs) across multiple tasks. By conducting a behavioral study, the researchers found that GPT-3.5 and GPT-4 can generate accurate causal arguments, outperforming existing methods in tasks like causal discovery, counterfactual reasoning, and event causality analysis.
For ML/AI engineers, this paper offers insights into the emerging field of causal reasoning with LLMs. It provides a framework for understanding how language models can be used to generate causal graphs, identify causal contexts, and potentially assist domain experts in setting up causal analyses.
Task: Read the abstract and introduction to understand the basics of causal reasoning in LLMs, then decide whether to dive deeper into the full paper based on interest and relevance to current projects.
Output: Understand the potential of LLMs in causal reasoning and how this can be applied to improve AI models, possibly writing a blog post or creating a project that utilizes these insights.
10/21. Are emergent abilities of large language models a mirage?
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Read ""Are Emergent Abilities of Large Language Models a Mirage?"" (2023-04-28) – goes well with the previous one (also by Beyang)
The paper investigates whether emergent abilities in large language models are real or a mirage. The researchers argue that apparent sudden capabilities might actually result from researchers' choice of metrics, demonstrating that nonlinear or discontinuous metrics can produce seemingly abrupt performance changes where linear metrics show smooth, predictable improvements.
For ML/AI engineers, this paper provides critical insight into model evaluation methodologies. It encourages a more nuanced approach to understanding model capabilities, emphasizing the importance of metric selection and statistical analysis when assessing AI model performance and potential emergent behaviors.
Task: Read this paper to understand the potential limitations of emergent abilities in large language models and the importance of metric selection in model evaluation.
Output: Understand the concept of emergent abilities in large language models and the potential pitfalls of model evaluation methodologies.
10/22. Textbooks are all you need: phi-1 language model for code generation
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Read ""Textbooks Are All You Need"" ((2023-06-20) ""new high quality codegen model trained on 'textbook-quality' data"" by Beyang)
The paper introduces phi-1, a compact Transformer-based language model for code generation trained on carefully curated 'textbook quality' data. Despite its relatively small size of 1.3B parameters, phi-1 achieves high accuracy on coding tasks like HumanEval and MBPP, demonstrating the importance of high-quality training data over sheer model size.
For ML/AI engineers, this paper offers insights into data quality's critical role in model performance. It challenges the assumption that larger models always perform better and highlights how carefully selected, curated training data can lead to more efficient and effective machine learning models, especially in code generation tasks.
Task: Read this paper to understand the importance of high-quality training data in model performance and how it can lead to more efficient and effective machine learning models, especially in code generation tasks.
Output: Understand the role of data quality in model performance and how it can be applied to code generation tasks
10/23. Running large language models locally with dalai
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Try dalai llama
Dalai is a GitHub repository that offers an easy-to-use solution for running LLaMA (Large Language Model Meta AI) on local machines. It aims to simplify the process of deploying and interacting with advanced language models without complex setup requirements.
For ML/AI engineers, this repository provides practical insights into local deployment of large language models. It demonstrates how to make complex AI technologies more accessible and can serve as a reference for understanding model deployment strategies and local inference techniques.
Task: Explore the repository to understand how to deploy and interact with LLaMA on a local machine, and consider using it as a reference for model deployment strategies.
Output: Understand how to deploy LLaMA locally and potentially publish code that integrates dalai into a project
10/24. Exploring the Mistral-Small-22B-ArliAI-RPMax-v1.1 model for creative writing and role-playing
Usefulness: 3 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Check this role-playing model (came out on 2024-09-28. It's supposed to be great. Can I run this locally?)
The Mistral-Small-22B-ArliAI-RPMax-v1.1 is a 22B parameter language model trained on diverse creative writing datasets with a focus on variety and preventing repetitive outputs. It's part of the RPMax series, which aims to create models capable of understanding and adapting to different characters and situations without getting stuck in specific personality patterns.
For ML/AI engineers interested in language model fine-tuning and creative AI, this model offers insights into dataset curation, training techniques like QLORA, and strategies for reducing model repetitiveness. The detailed training notes and experimental approach make it a valuable case study in specialized model development.
Task: Read the model's page and training notes to understand the approaches used to minimize repetition and enhance creative output. Experiment with the model to see its capabilities in role-playing scenarios and creative writing.
Output: Understand the capabilities and limitations of the Mistral-Small-22B-ArliAI-RPMax-v1.1 model and its potential applications in creative writing and role-playing
10/25. Llm api showdown comparison tool
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Add to tooling under my belt
LLM API Showdown is a web tool designed to help users compare different AI language models across providers, with a focus on pricing and performance metrics. Users can select models, pricing strategies, and input-output ratios to find the most suitable LLM for their needs through an interactive, gradient-styled interface.
For ML/AI engineers, this tool provides a practical resource to understand current LLM market dynamics, API pricing structures, and comparative performance benchmarks. It offers a quick way to evaluate different language models without deep technical diving, helping engineers make informed technology selection decisions.
Task: Use the LLM API Showdown tool to compare different language models across providers, focusing on pricing and performance metrics, to inform technology selection decisions.
Output: Understand the current LLM market dynamics and comparative performance benchmarks
10/26. Building a command-line chatbot in Go with Cobra and Langchain
Usefulness: 4 / 5, Time needed: 25 minutes, watch: Link – manual notes: Project: #48 Golang - Building a LLM (OpenAI) Command Line Chatbot with Cobra and LangChain (– 10 min)
The video tutorial demonstrates how to create a command-line chatbot in Go by leveraging Cobra for CLI functionality and Langchain Go for interfacing with OpenAI's language models. It provides a step-by-step guide covering project initialization, dependency installation, command structure creation, user input processing, and implementing an interactive chat experience with AI-generated responses.
For ML/AI engineers, this tutorial offers practical insights into building conversational AI applications using Go. It showcases how to integrate language models programmatically, handle user interactions, and create scalable CLI tools that can interface with advanced AI services like OpenAI's GPT models.
Task: Watch it to learn how to create a conversational AI application using Go and integrate it with OpenAI's language models.
Output: Understand how to build a command-line chatbot in Go and integrate it with OpenAI's language models.
10/27. Running meta llama on mac
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Running Meta Llama on Mac | Llama Everywhere (← https://www.datacamp.com/blog/llama-3-3-70b)
The guide provides a step-by-step walkthrough for setting up and running Meta Llama 3 models locally on a Mac using Ollama. It covers installation, model downloading, and demonstrates different interaction methods including terminal, curl commands, and potential Python scripting approaches.
For ML/AI engineers, this tutorial offers practical insights into local AI model deployment, showcasing how to run large language models on personal hardware. It demonstrates techniques for model interaction, prompt engineering, and highlights the ease of using tools like Ollama for AI experimentation.
Task: Read it to understand how to deploy and interact with large language models locally on a Mac, and use the provided steps for reference while setting up your own model.
Output: Understand the basics of deploying large language models on personal hardware and how to interact with them using different methods.
10/28. Hugging face smol course on preference alignment for local llms
Usefulness: 4 / 5, Time needed: 2 hours, read on laptop: Link – manual notes: Check out Hugging Face free course on preference alignment for local LLMs (""Hugging Face offers a free 'smol' course on preference alignment for local LLMs, featuring modules like Argilla, distilabel, lightval, PEFT, and TRL. The course covers seven topics, with ""Instruction Tuning"" and ""Preference Alignment"" already released, while others like ""Parameter Efficient Fine Tuning"" and ""Vision Language Models"" are scheduled for future release."" Reddit)
The 'smol-course' is a GitHub repository by Hugging Face that provides a course on aligning small machine learning models. It aims to teach techniques for fine-tuning and improving the performance and behavior of compact AI models.
This resource is valuable for ML/AI engineers interested in model alignment, particularly those working with smaller, resource-constrained models. It offers practical insights into refining model behavior and performance in scenarios with limited computational resources.
Task: Read through the course materials and complete the exercises to understand the techniques for aligning small language models. Explore the GitHub repository and utilize the resources provided to gain practical insights into refining model behavior and performance.
Output: Publish code that demonstrates understanding of model alignment techniques for local LLMs
10/29. Byte latent transformer introduction
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Understand BLT Byte Latent Transformer (2024-12-15)
Meta's research paper introduces the Byte Latent Transformer (BLT), a novel approach to language model tokenization that demonstrates superior performance up to an 8B parameter model size. The research suggests a potential paradigm shift in how language models process text, with implications for future AI language understanding.
For ML/AI engineers, this paper represents a cutting-edge exploration of tokenization techniques. Understanding the BLT approach could provide insights into more efficient and nuanced language model architectures, potentially influencing future model design and performance optimization strategies.
Task: Read the thread and the linked paper to understand the basics of BLT and its potential impact on language models
Output: Understand the basics of Byte Latent Transformer and its potential applications in language models
10/30. Exploring SillyTavern for applied AI learning
Usefulness: 4 / 5, Time needed: 30 minutes, write on laptop: Link – manual notes: Try SillyTavern (Would be good to learn applied AI stuff.)
SillyTavern is an open-source LLM (Large Language Model) frontend designed for power users, offering a flexible and customizable interface for interacting with AI language models. The project is hosted on GitHub and provides a rich environment for users to engage with AI technologies.
For ML/AI engineers, this repository offers insights into frontend design for AI interactions, demonstrating how to create user-friendly interfaces for language models. It can help understand practical implementation of AI model integration and frontend development in the AI interaction space.
Task: Try out SillyTavern to learn about applied AI and understand how to create a user-friendly interface for language models
Output: Publish code that integrates SillyTavern with other AI models or writes a blog post about the experience of using SillyTavern for applied AI learning
10/31. Introduction to cerebrascoder for web development
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Maybe try CerebrasCoder
CerebrasCoder is an open-source application that generates websites using Llama3.3-70b from Cerebras Systems. It allows users to create websites as quickly as they can type, and is completely free. The tool represents an innovative approach to automated web development leveraging large language models.
For ML/AI engineers, this tool offers insights into practical applications of large language models in web development. It demonstrates how generative AI can be used for rapid prototyping and content generation, showcasing the potential of models like Llama3.3-70b in transforming creative and technical workflows.
Task: Read it to understand how large language models can be used for rapid prototyping and content generation in web development, and explore the potential of models like Llama3.3-70b in transforming creative and technical workflows.
Output: Understand the basics of using large language models in web development
10/32. Community discussion on local large language models
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: See best locally hosted LLMs ([here on reddit])
A community discussion thread about local Large Language Models (LLMs), where the original poster highlights Qwen2.5 32B as their primary model. The post invites community members to share their own preferred local LLM and provides technical details about model configuration and hardware requirements.
This resource offers practical insights into running local LLMs, including specific configuration details like quantization techniques, context length, and GPU memory management. It provides real-world perspectives from practitioners on selecting and optimizing local AI models for personal or research use.
Task: Read it to understand the community's preferences and experiences with local LLMs, and to gain insights into optimizing model configuration and hardware requirements
Output: Understand the current state of local LLMs and their applications
10/33. Large language model developments in 2024
Usefulness: 5 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Check this ""2024 summary"" for a great recap of LLMs
Simon Willison's article reviews the major developments in Large Language Models during 2024, highlighting the breakdown of the GPT-4 performance barrier, dramatic price reductions, increased model efficiency, and the emergence of multi-modal AI capabilities. The year saw 18 organizations develop models that outperform the original GPT-4, with significant improvements in context length, computational efficiency, and cross-modal understanding.
For ML/AI engineers, this resource provides a comprehensive overview of the current state of LLM technology, offering insights into model performance trends, pricing dynamics, and emerging capabilities. The article's technical yet conversational style makes it an excellent snapshot of the rapidly evolving AI landscape, highlighting key technological breakthroughs and their practical implications.
Task: Read this article to get a comprehensive overview of the current state of LLM technology, including model performance trends, pricing dynamics, and emerging capabilities.
Output: Understand the current state of LLM technology and its practical implications
10/34. Exploring chatgpt for conversational ai insights
Usefulness: 5 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Learn ChatGPT well
ChatGPT is an AI-powered conversational platform that helps users with various tasks like writing, learning, and brainstorming. The landing page features a clean, user-friendly interface with a text input area and minimal instructions, emphasizing the ease of use and accessibility of the AI tool.
For ML/AI engineers, exploring ChatGPT provides insights into large language model design, conversational AI interfaces, and prompt engineering. The platform demonstrates practical applications of transformer-based models and offers a real-world example of how advanced AI can interact with users intuitively.
Task: Use ChatGPT to understand the capabilities and limitations of large language models in conversational settings. Experiment with different prompts to see how the AI responds and learn from its outputs.
Output: Publish a blog post comparing the strengths and weaknesses of different conversational AI platforms, including ChatGPT.
10/35. Exploring cerebras coder for instant app generation
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Try Cerebras Coder (Thorsten built Eat the presents with it)
Cerebras Coder is an AI-powered platform that allows users to transform ideas into fully functional applications in less than a second. Users can input app concepts like 'Todo App' or 'Weather App' into a simple interface, and the system leverages Llama3.3-70b to generate complete, functional code instantly.
For ML/AI engineers, this platform showcases advanced language models' capabilities in code generation, demonstrating how large AI models can translate natural language requirements into executable software. It provides insights into prompt engineering, generative AI techniques, and the practical application of large language models in software development.
Task: Try out Cerebras Coder to understand how AI can generate fully functional apps instantly based on user prompts, and explore its potential for rapid software development.
Output: Publish a blog post about the experience and potential applications of using Cerebras Coder for instant app generation.
10/36. Training a large language model with moxin llm 7b
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: To train my own model: Moxin, a 7B model (This is honestly an incredible resource for anyone trying to train their own model...)
Moxin LLM 7B is a fully open-source 7 billion parameter language model developed under the Model Openness Framework. Trained on SlimPajama and the-stack-dedup datasets, it offers superior zero-shot performance, supports 32k context length, and provides comprehensive access to pre-training code, configurations, and checkpoints.
For ML/AI engineers, this resource offers insights into open-source model development, demonstrating detailed model transparency and comprehensive release strategies. The post provides practical links to GitHub, Hugging Face repositories, and research paper, making it an excellent reference for understanding modern LLM development practices.
Task: Read this post to understand the technical details of Moxin LLM 7B and how to train your own model, and explore the provided links to GitHub, Hugging Face repositories, and research papers for further learning
Output: Publish code that trains a model using Moxin LLM 7B, or write a blog post about the experience of training a large language model
10/37. Webgpu accelerated reasoning llms running 100
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Understand Transformers.js (see e.g. [this])
The post showcases a technical demonstration of running large language models (LLMs) entirely in-browser using WebGPU acceleration and Transformers.js, enabling local AI inference without server-side processing.
For ML/AI engineers, this resource provides insights into cutting-edge web-based machine learning techniques, highlighting browser GPU acceleration, client-side model inference, and the potential of running complex AI models directly in web browsers.
Task: Read it to understand how Transformers.js can be used for browser-based AI capabilities and the potential of running complex AI models directly in web browsers.
Output: Understand the basics of Transformers.js and its application in browser-based AI inference
10/38. Fast LLM inference from scratch
Usefulness: 5 / 5, Time needed: 1 hour, read on laptop: Link – manual notes: Check Andrew Chan's ""Fast LLM Inference From Scratch"" (Mind-blowing: “pushing single-GPU inference throughput to the edge without libraries”...)
The article provides an in-depth exploration of building a fast LLM inference engine from scratch, demonstrating optimization techniques across CPU and GPU platforms. The author systematically improves performance through techniques like multithreading, weight quantization, SIMD instructions, kernel fusion, and memory access coalescing, ultimately pushing single-GPU inference throughput to impressive speeds.
For ML/AI engineers, this resource offers a comprehensive walkthrough of low-level optimization strategies for LLM inference. It provides practical insights into performance bottlenecks, CUDA kernel design, memory bandwidth constraints, and hardware-specific optimizations that are critical for developing efficient machine learning inference systems.
Task: Read this blog post to understand how to optimize LLM inference performance using low-level techniques such as multithreading, weight quantization, SIMD instructions, kernel fusion, and memory access coalescing.
Output: Implement optimized LLM inference using the techniques described in the post
10/39. Deepseek comparison with gpt-4o on benchmarks
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Use DeepSeek for my AI projects rather than OpenAI (DeepSeek is better than 4o on most benchmarks at 10% of the price)
A Reddit discussion in the LocalLLaMA subreddit highlighting DeepSeek's performance, suggesting it outperforms OpenAI's GPT-4o on most benchmarks while costing only 10% of the price. The post includes a visual comparison and has generated significant community interest with 232 comments.
For ML/AI engineers, this resource provides insights into emerging language models, comparative performance metrics, and cost-effectiveness considerations. It offers a practical perspective on how newer AI models are challenging established players like OpenAI, which can help in understanding model selection strategies.
Task: Read it to understand the comparative performance of DeepSeek and GPT-4o, and consider its implications for model selection in AI projects.
Output: Understand the basics of DeepSeek and its potential as an alternative to OpenAI models
10/40. Understanding modern transformer models with BERT and ModernBERT
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Check this ChatGPT share about AI/transformers (learn BERT and ModernBERT ""ModernBert: small new Retriever/Classifier..."")
BERT is a transformer-based model introduced in 2018 for bidirectional text understanding. ModernBERT, announced in December 2024, is an advanced encoder model with significant improvements like extended context length (8,192 tokens), architectural enhancements such as Rotary Positional Embeddings, and pre-training on 2 trillion diverse tokens.
For ML/AI engineers, ModernBERT offers insights into modern NLP model design, showcasing advanced techniques in transformer architectures, context handling, and efficient training. It's particularly valuable for understanding encoder model evolution, long-context processing, and hardware-optimized model development.
Task: Read the conversation to gain insights into the architectural and functional improvements of ModernBERT over the original BERT model, and understand how these advancements can be applied to NLP tasks.
Output: Understand the basics of BERT and ModernBERT, and how they can be used for NLP tasks
10/41. Introduction to modernbert
Usefulness: 5 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Check ModernBERT huggingface blog (Should keep an eye on this when dedicated models start popping up)
ModernBERT is a state-of-the-art encoder model series with two sizes (base: 149M params, large: 395M params) that offers superior performance across retrieval, natural language understanding, and code retrieval tasks. It features a longer context length of 8,192 tokens, improved architectural design, and significantly faster processing compared to previous encoder models.
For ML/AI engineers, ModernBERT represents a critical evolution in encoder models. It demonstrates practical improvements in transformer architecture, attention mechanisms, and training techniques. Understanding its design can help engineers optimize their own encoder models and apply modern machine learning engineering principles to representational models.
Task: Read it to understand the improvements and applications of ModernBERT, a state-of-the-art encoder model series, and how it can be applied to practical problems.
Output: Understand the basics of ModernBERT and its potential applications in NLP tasks
10/42. Training large language models: Comparative analysis of DeepSeek v3 and Meta's Llama 405B
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Training large LLMs
A technical X post by researcher 'wh' discussing the details of training a massive 670B parameter model, providing insights into the DeepSeek v3 model and comparing it with Meta's Llama 405B model approach.
For ML/AI engineers, this resource offers a glimpse into state-of-the-art large language model training techniques, model scaling strategies, and comparative analysis of different model architectures from leading AI research teams.
Task: Read the tweet and linked report to understand the technical details of training large language models, including model scaling strategies and comparative analysis of different model architectures.
Output: Understand the basics of training large language models and comparative analysis of different model architectures
10/43. Exploring SillyTavern: A customizable frontend for large language models
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: SillyTavern Becomes the AI Playground We Didn't Know We Needed
SillyTavern is an open-source LLM (Large Language Model) frontend designed for power users, offering a flexible and customizable interface for interacting with AI language models. The project is hosted on GitHub and provides a rich environment for users to engage with AI technologies.
For ML/AI engineers, this repository offers insights into frontend design for AI interactions, demonstrating how to create user-friendly interfaces for language models. It can help understand practical implementation of AI model integration and frontend development in the AI interaction space.
Task: Explore the repository to understand how to create a user-friendly interface for AI interactions and learn from the implementation of AI model integration and frontend development in the AI interaction space.
Output: Understand the basics of creating a frontend for large language models and how to integrate AI models into a user-friendly interface
10/44. Langchain state of ai 2024 report
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Read LangChain State of AI 2024 report (What LLMs are the most widely used today? etc.)
LangChain's State of AI 2024 report provides insights into the current AI ecosystem, focusing on the most widely used LLMs, common evaluation metrics, and the success of developers in building AI agents. The report is based on data collected through LangSmith and promises to reveal five key insights about the state of AI technology.
This resource offers a snapshot of the current AI landscape, which can be valuable for ML/AI engineers to understand trending technologies, popular language models, and evaluation approaches. The report likely provides practical insights into real-world AI development and deployment strategies.
Task: Read the LangChain State of AI 2024 report to understand the current AI landscape, including the most widely used LLMs, common evaluation metrics, and AI agent development strategies.
Output: Understand the current state of AI technology, including popular LLMs and evaluation approaches.
10/45. Building a custom gpt properly
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Learn to use customGPTs (Also search YouTube for guides)
A detailed discussion about creating custom GPTs, focusing on practical questions like document upload strategies, file formats, content preparation, and potential limitations of current GPT training approaches.
This resource provides hands-on, community-sourced insights into custom GPT development. It offers practical advice from practitioners who are actively experimenting with GPT configuration, making it valuable for engineers looking to understand real-world implementation nuances.
Task: Read this thread to understand community-sourced insights into custom GPT development, focusing on practical questions and implementation nuances.
Output: Understand the basics of building custom GPTs and potential limitations
10/46. Files by Google gets a long-awaited Gemini feature to understand pdf contents
Usefulness: 3 / 5, Time needed: 5 minutes, read on mobile: Link – manual notes: Check Files by Google gets a long-awaited Gemini feature ([ask pdf])
Google is rolling out an 'Ask about this PDF' feature in the Files by Google app, powered by Gemini AI. This functionality allows users to get summaries and ask questions about lengthy PDF documents directly within the app, enhancing document comprehension and review efficiency.
For ML/AI engineers, this feature demonstrates practical application of large language models in document understanding. It showcases real-world AI integration, context-aware processing, and how AI can transform mundane tasks like document review through intelligent summarization and question-answering.
Task: Read it to understand how large language models are applied in real-world document understanding tasks
Output: Understand how AI can transform mundane tasks like document review through intelligent summarization and question-answering
11. Prompt engineering & retrieval-augmented generation (RAG)
Learn how to improve LLM responses and integrate external knowledge.
Why learn this: Crucial for making LLMs more useful in real-world applications.
Project ideas:
- Build a RAG-based question-answering system.
- Experiment with different prompt engineering strategies.
11/1. Introduction to prompt engineering basics
Usefulness: 5 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Learn prompt engineering (promptengineeringguide.ai is an awesome comprehensive page about it!)
The page provides a foundational overview of prompt engineering, exploring core concepts and techniques for effectively communicating with large language models. It likely covers key strategies for designing prompts that help AI systems understand and respond to queries with greater accuracy and relevance.
For ML/AI engineers, this resource offers critical insights into crafting effective prompts, which is essential for leveraging AI models' capabilities. Understanding these basics can significantly improve interaction quality, model performance, and the ability to extract precise, desired outputs from language models.
Task: Read it to understand the fundamental techniques and principles for effectively interacting with AI language models, which is crucial for leveraging their capabilities and improving model performance.
Output: Understand the basics of prompt engineering and how to apply these principles to improve interactions with large language models.
11/2. LlamaIndex documentation for building RAG applications
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, read on laptop: Link – manual notes: Understand LlamaIndex (and this variant)
LlamaIndex is an open-source data framework for building intelligent applications with Large Language Models (LLMs). It offers comprehensive tools for loading, indexing, querying, and augmenting data across various sources, enabling developers to create advanced RAG (Retrieval Augmented Generation) systems and AI agents.
For ML/AI engineers, this documentation is an excellent resource to understand modern RAG techniques, data ingestion strategies, and advanced AI application development. It covers key concepts like indexing, querying, agents, and workflow design with practical examples and tutorials.
Task: Read the documentation to understand the capabilities and technical details of LlamaIndex, and explore the provided tutorials and examples to learn how to build advanced RAG systems and AI agents.
Output: Understand the basics of LlamaIndex and RAG techniques, and potentially build a simple RAG-based application using the provided tutorials and examples.
11/3. Introduction to langchain js
Usefulness: 5 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn ChatGPT LangChain
LangChain is an open-source framework for developing applications powered by large language models, simplifying the entire LLM application lifecycle from development to deployment. It consists of multiple libraries like @langchain/core, @langchain/community, and integrates with tools like LangSmith and LangGraph.
For ML/AI engineers, LangChain offers a comprehensive toolkit to build sophisticated LLM applications. Its modular architecture allows developers to leverage building blocks for tasks like chaining prompts, creating agents, implementing retrieval strategies, and integrating various AI models and tools.
Task: Read the documentation to understand how to use LangChain's JavaScript library for building LLM applications
Output: Understand how to use LangChain for building LLM applications
11/4. Practical application of RAG for personal knowledge management
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: A great starting point to understand RAG with a hands-on approach (source was reddit here, with nice comments)
A software professional describes his method of using Retrieval Augmented Generation (RAG) to efficiently process tens of thousands of accumulated technical emails. By exporting emails, creating a text vault, and using local language models, he transforms an overwhelming email collection into a searchable, intelligible knowledge base.
This article provides a practical walkthrough of implementing RAG for personal knowledge management. It demonstrates how ML/AI techniques can be applied to solve real-world information overload problems, showing how engineers can leverage local language models to extract insights from large, unstructured text collections.
Task: Read it to understand how RAG can be used to solve real-world information overload problems and to get a hands-on approach to implementing RAG for personal knowledge management
Output: Understand the basics of RAG and its application in personal knowledge management
11/5. The future of knowledge assistants
Usefulness: 5 / 5, Time needed: 25 minutes, watch: Link – manual notes: Watch video (Title: ""The Future of Knowledge Assistants: Jerry Liu"", Length: 17 min)
Jerry discusses the progression of knowledge assistance technologies, moving beyond basic Retrieval-Augmented Generation (RAG) to more sophisticated AI systems. He introduces the concept of advanced data processing, single-agent query flows, and multi-agent task solvers, emphasizing the importance of good data quality, intelligent query understanding, and the potential of agents working together as microservices.
For ML/AI engineers, this talk provides insights into building production-grade AI systems, covering crucial concepts like advanced data parsing, agent reasoning, and multi-agent architectures. It offers practical perspectives on moving from simple RAG implementations to more complex, flexible knowledge assistance technologies that can handle diverse and complex tasks.
Task: Watch it to understand how to build production-grade AI systems using advanced RAG techniques and multi-agent architectures
Output: Understand the concepts of advanced RAG techniques, single-agent and multi-agent architectures, and production-ready AI systems
11/6. Learning lmql for large language model interactions
Usefulness: 4 / 5, Time needed: 45 minutes, write on laptop: Link – manual notes: Do a mini-project with lmql.ai (Need a large chunk of time for this.)
LMQL is a programming language that enables sophisticated, constrained interactions with Large Language Models. It supports types, templates, and constraints, allowing developers to write more predictable and structured prompts using Python-like syntax. The language is backend-agnostic, supporting multiple LLM providers like OpenAI, Hugging Face Transformers, and llama.cpp.
For ML/AI engineers, LMQL offers a powerful abstraction layer for LLM prompting. By learning LMQL, engineers can develop more precise and controlled LLM interactions, implement complex prompt strategies, and create more reliable AI applications with built-in type checking and generation constraints.
Task: Use the lmql.ai resource to learn about the programming language and its applications in LLM interactions. Do a mini-project with lmql.ai to gain hands-on experience and understand how to develop more precise and controlled LLM interactions.
Output: Publish code that demonstrates the use of lmql for LLM interactions
11/7. Dspy framework for programming language models
Usefulness: 4 / 5, Time needed: 45 minutes, write on laptop: Link – manual notes: Contribute to dspy (Need a large chunk of time for this.)
DSPy is an innovative framework by Stanford NLP that transforms language model interaction from prompt engineering to a more systematic programming approach. It enables developers to create complex AI workflows with improved consistency, modularity, and adaptability across different language models and tasks.
For ML/AI engineers, DSPy offers a sophisticated method to develop language model applications by treating model interactions as programmable components. It provides abstraction layers and optimization techniques that can help create more reliable and reproducible AI systems, making it valuable for those seeking to move beyond traditional prompt engineering.
Task: Contribute to the dspy repository to gain hands-on experience with a programmatic approach to language model interaction, focusing on creating complex AI workflows with improved consistency and modularity.
Output: Publish code contributions to the dspy repository, demonstrating understanding of the framework's capabilities in programming language models.
11/8. Guidance repository for controlling large language models
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check out microsoft/guidance (Need a large chunk of time for this.)
Guidance is an open-source project providing a specialized language for controlling and guiding large language models. It allows developers to create more precise and predictable interactions with AI models by implementing a structured approach to prompting and model interaction.
For ML/AI engineers, this project offers insights into advanced prompt engineering and model control techniques. By studying the guidance library, developers can learn sophisticated methods of constraining and directing language model outputs beyond traditional prompting approaches.
Task: Read and explore the repository to understand the guidance language and its applications in prompt engineering and model control. Use the documentation and examples for reference while building projects that involve large language models.
Output: Understand the basics of prompt engineering and model control using the guidance language. Possibly publish code that applies the guidance library to a large language model.
11/9. Understanding pilotwatch for insight into ai code generation
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Try PilotWatch (Twitter, a tool that allows you to intercept, log and inspect all Copilot code generation requests without patching the Copilot plugin.)
PilotWatch is a Node.js proxy tool that intercepts and logs GitHub Copilot's code generation requests and responses in real-time. Created by John Robinson, the tool allows developers to understand Copilot's prompt engineering techniques, revealing how the AI selects context from surrounding code and files to generate accurate code completions.
For ML/AI engineers interested in code generation, PilotWatch offers a deep dive into practical AI model deployment. It demonstrates sophisticated prompt engineering techniques, context selection strategies, and provides insights into how large language models can generate contextually relevant code suggestions in real-world development environments.
Task: Read the article and try PilotWatch to understand how it can be used to intercept and log GitHub Copilot's code generation requests and responses, and gain insights into prompt engineering techniques and context selection strategies.
Output: Understand the basics of PilotWatch and its potential applications in AI code generation and prompt engineering
11/10. ChainForge: a visual programming tool for prompt engineering and evaluating LLM responses
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Try ChainForge (The prompt tester tool I wanted to build, but better!)
ChainForge is a visual programming environment designed for prompt engineering and evaluation across multiple large language models. It provides a tool for researchers and developers to systematically test and compare LLM responses through a graphical interface.
For ML/AI engineers, ChainForge represents an important tool for understanding and improving prompt design. It offers a practical way to experiment with different prompting strategies, compare model outputs, and develop more robust language model interactions.
Task: Read the Medium article linked in the tweet to understand how ChainForge works and how it can be used for prompt engineering and LLM evaluation. Try out ChainForge to experiment with different prompting strategies and compare model outputs.
Output: Understand how ChainForge can be used for prompt engineering and LLM evaluation, and try out the tool to experiment with different prompting strategies
11/11. Exploring advanced prompt engineering techniques with Claude 3 Opus
Usefulness: 4 / 5, Time needed: 3 minutes, read on laptop: Link – manual notes: Try out this prompt engineering technique
Alex Albert shares insights about Claude 3 Opus, noting it's the first LLM he's encountered that excels not just in coding, but also in prompt engineering. He suggests he has a specific workflow for using the model effectively.
For ML/AI engineers, this tweet offers a practical perspective on advanced language model capabilities, specifically how cutting-edge AI can assist in refining prompt engineering techniques. It suggests exploring nuanced interaction strategies with large language models.
Task: Read it to understand a practical workflow for using large language models in prompt engineering and explore nuanced interaction strategies with these models.
Output: Try out the suggested prompt engineering technique and document the results
11/12. Apple's ml-superposition-prompting repository
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check GitHub - apple/ml-superposition-prompting
Apple's GitHub repository for a machine learning project focused on superposition prompting, which appears to be an experimental technique in AI/ML research. The repository likely contains code, research papers, or implementation details related to this novel approach.
For ML engineers, this repository could provide insights into advanced prompting techniques and Apple's approach to improving AI model performance. Exploring the code and documentation could reveal innovative strategies for handling complex language model interactions.
Task: Explore the repository to understand Apple's approach to superposition prompting and how it can be applied to improve AI model performance. Read the documentation, research papers, and code to gain insights into this novel technique.
Output: Understand the concept of superposition prompting and its potential applications in AI/ML
11/13. Advanced roleplay prompts for large language models
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Understand what roleplay prompts work (great source)
A comprehensive guide to creating rich, immersive role-playing experiences with AI, focusing on detailed character and location generation prompts. The author provides systematic approaches to crafting complex characters, introducing narrative conflicts, and maintaining narrative coherence through memory management techniques.
For ML/AI engineers and researchers, this post offers valuable insights into prompt engineering, demonstrating advanced techniques for controlling language model outputs. The detailed prompts showcase how carefully structured instructions can guide models to generate more nuanced, contextually aware, and dynamically interactive responses.
Task: Read this post to understand advanced prompting techniques for role-playing with large language models and to learn how to create rich, immersive experiences with AI.
Output: Understand the role of prompt engineering in controlling language model outputs and apply this knowledge to improve interaction with large language models.
11/14. Gemini 1.5 flash structured outputs with json schema mode
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Try Gemini with JSONs (We just shipped Structured Outputs, JSON schema mode, for Gemini 1.5 Flash...)
Logan Kilpatrick from Google announced the launch of Structured Outputs, a JSON schema mode for Gemini 1.5 Flash, which allows developers to generate more predictable and structured AI responses through API and AI Studio.
For ML/AI engineers, this feature is crucial for improving AI output reliability. By using JSON schema mode, developers can define precise output formats, making it easier to integrate AI responses into applications and ensuring more consistent and machine-readable results.
Task: Read it to understand how to use JSON schema mode for generating more predictable and structured AI responses with Gemini 1.5 Flash, and explore its potential applications in improving AI output reliability.
Output: Understand how to utilize Structured Outputs with JSON schema mode in Gemini 1.5 Flash for more reliable AI responses
11/15. Advanced rag techniques for elevating retrieval-augmented generation systems
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems
RAG_Techniques is a GitHub repository showcasing advanced techniques for Retrieval-Augmented Generation (RAG) systems. These systems combine information retrieval with generative models to create more accurate and contextually rich AI responses, demonstrating cutting-edge approaches in AI language processing.
For ML/AI engineers, this repository offers a practical exploration of RAG techniques, providing insights into how retrieval and generation can be combined effectively. By studying the code and techniques, engineers can understand state-of-the-art methods for improving AI language models' contextual understanding and response generation.
Task: Explore this repository to understand and implement advanced RAG techniques, focusing on how to combine information retrieval with generative models to create more accurate and contextually rich AI responses.
Output: Implement and test advanced RAG techniques in a personal project to understand their application and effectiveness.
11/16. Rag++ in production course
Usefulness: 3 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Complete Weights & Biases' RAG++ course (I've enrolled. The videos are by some Indian guys and the first vids looked good enough)
The web page is a sign-up interface for Weights & Biases, a machine learning platform, using Auth0 for authentication. It offers multiple sign-up methods including GitHub, Google, and Microsoft accounts, with an option to register using a work email and password.
For ML/AI engineers, this page represents a typical authentication flow for cloud ML platforms. Understanding authentication mechanisms, social login integration, and secure user registration processes are crucial skills in modern software engineering and ML platform development.
Task: Complete the course to learn about retrieval-augmented generation and its applications in production environments.
Output: Publish code that implements RAG++ in a real-world application
11/17. Using xml tags to control tone in openai models
Usefulness: 4 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Try the OpenAI realtime API in the web browser (mixing voice and text. It's nuts, man.)
Ilan Bigio shares a discovery about OpenAI's models: you can use XML tags in the realtime API to precisely control tone in generated content, like adding stage directions (laugh here, cough there). He demonstrated this by experimenting in the OpenAI playground after someone asked about tone control during an OpenAI DevDay.
For ML/AI engineers, this tweet provides insights into advanced prompt engineering techniques. It showcases how XML-like tags can be used to add metadata and control model behavior, which is valuable for creating more nuanced and contextually aware AI-generated content.
Task: Read it to understand how to use XML-like tags to add metadata and control model behavior in OpenAI's realtime API, and try experimenting with this technique in the OpenAI playground
Output: Understand how to use XML tags to control tone in OpenAI models and experiment with this technique in the OpenAI playground
11/18. Chain of thought prompting techniques for large language models
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Learn Chain-of-thought prompting
A Reddit discussion in the LocalLLaMA subreddit where the original poster shares a comprehensive chain of thought prompting technique. The prompt emphasizes structured reasoning, step tracking, self-evaluation, and adaptive problem-solving through explicit tagging and scoring mechanisms.
This resource offers valuable insights into advanced prompt engineering for large language models. Engineers can learn sophisticated techniques for guiding AI reasoning, including meta-cognitive strategies like self-reflection, step budgeting, and dynamic approach adjustment based on intermediate performance metrics.
Task: Read the Reddit thread to learn about advanced chain-of-thought prompting techniques and how to apply them to improve the performance of large language models.
Output: Understand the basics of chain-of-thought prompting and how to apply it to improve LLM performance
11/19. GeminiCoder: AI-powered app development with prompts
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Introducing a holidays pet project: GeminiCoder (Go wild and create apps in seconds just with a prompt)
Omar Sanseviero shared a holiday pet project called GeminiCoder, a tool that enables users to create apps in seconds using just a prompt. The project is hosted on Hugging Face and aims to simplify app development through AI-assisted generation.
For ML/AI engineers, this resource provides insights into AI-powered code generation and app development. It demonstrates how large language models can be applied to create development tools that reduce coding complexity and enable rapid prototyping.
Task: Explore the Hugging Face space and experiment with creating apps using prompts to understand the potential of AI-assisted code generation.
Output: Understand the basics of AI-powered code generation and its applications in app development
11/20. Conversational voice assistant with retrieval-augmented generation
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check AI for Document Processing by llama_index (demonstrated an AI assistant that performs RAG over 1M+ PDFs...)
LlamaIndex demonstrates a technical workflow for creating a conversational voice assistant that can perform Retrieval-Augmented Generation (RAG) over 1M+ PDFs. The solution involves using LlamaCloud for document processing, ElevenLabs for voice conversion, and Fly.io for API deployment.
For ML/AI engineers, this resource provides practical insights into combining multiple AI technologies: document retrieval (LlamaIndex), text-to-speech (ElevenLabs), and cloud deployment (Fly.io). It showcases a real-world example of building an advanced conversational AI system with document processing capabilities.
Task: Read it to understand how to create a conversational AI system with document processing capabilities using RAG technology and integration with LlamaCloud, ElevenLabs, and Fly.io
Output: Understand the basics of retrieval-augmented generation and its applications in conversational AI systems
11/21. AutoRAG framework for optimizing RAG pipeline configurations
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check AutoRAG Framework (@llama_index unveiled AutoRAG. Hybrid retrieval often outperforms pure vector or BM25.)
AutoRAG is a research framework that systematically evaluates different Retrieval-Augmented Generation (RAG) techniques. The study found that hybrid retrieval methods often outperform pure vector or BM25 approaches, and that more complex techniques aren't always better. Key insights include the potential benefits of combining search methods and the importance of testing different configurations.
For ML/AI engineers working with RAG systems, this resource provides valuable empirical insights into RAG pipeline optimization. It offers practical guidance on selecting retrieval techniques, understanding the nuances of query expansion, and the importance of systematic testing across different components of retrieval-augmented generation systems.
Task: Read it to understand the basics of AutoRAG and how to optimize RAG pipeline configurations. Use the insights from the research paper to inform the development of RAG systems.
Output: Understand the basics of AutoRAG and how to apply its insights to optimize RAG pipeline configurations
11/22. Introduction to writer's rag tool api
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Writer announces a RAG tool API
Writer has launched a predefined RAG tool that allows developers to build production-ready AI applications quickly by autonomously accessing data from a Knowledge Graph through a simple API call. The tool eliminates the need for complex vendor setups and enables seamless integration of accurate, scalable AI workflows.
For ML/AI engineers, this resource provides insights into modern RAG implementation strategies, demonstrating how to simplify complex AI application development using a full-stack platform. The example Hacker News app shows practical implementation techniques and highlights the importance of efficient data retrieval and integration.
Task: Read it to understand how to simplify complex AI application development using a full-stack platform and to learn about modern RAG implementation strategies.
Output: Understand the basics of RAG implementation using Writer's API
11/23. Understanding web consent mechanisms and internationalization
Usefulness: 2 / 5, Time needed: 3 minutes, watch: Link – manual notes: Watch videos for RAG
A standard YouTube consent page that requires user selection of language and consent before accessing the platform. It provides multiple language options and login/sign-in functionality, typical of Google/YouTube's region and privacy compliance process.
While not directly related to ML/AI learning, understanding web consent mechanisms and internationalization approaches can provide insights into user experience design and localization strategies used by large tech platforms.
Task: Watch videos from the aiDotEngineer channel to learn about retrieval-augmented generation (RAG) and other ML/AI topics, but first, navigate through this consent page to access the content.
Output: Nothing, just navigate through the consent page to access the aiDotEngineer channel's content on RAG
12. Generative AI & diffusion models
Covers GANs, VAEs, and diffusion models.
Why learn this: Generative models are at the forefront of AI research.
Project ideas:
- Train a GAN to generate anime-style faces.
- Experiment with stable diffusion to create AI-generated artwork.
12/1. OpenUI: a tool for ui generation using ai
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Check open source v0.dev (OpenUI lets you describe UI using your imagination, then see it rendered live.)
OpenUI is an innovative project by Weights & Biases (wandb) that enables users to describe user interfaces using natural language, which can then be immediately rendered live. It appears to be a cutting-edge tool leveraging AI to transform textual descriptions into visual UI designs.
For ML/AI engineers, this project offers insights into generative AI applications, specifically in UI design. It demonstrates practical applications of language-to-visual generation techniques, which can be valuable for understanding prompt engineering, generative models, and UI synthesis.
Task: Explore the OpenUI repository and documentation to understand how it uses AI to generate UI designs. Check the v0.dev version as suggested by David.
Output: Understand the basics of generative AI applications in UI design
12/2. Top 100 gen ai consumer apps report
Usefulness: 5 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn apps here to know the scene
A16z's third edition of Top 100 Gen AI Consumer Apps reveals dynamic shifts in AI product usage, with creative tools dominating, new players emerging in content generation across images, video, and music, and ChatGPT maintaining its leadership position while facing increasing competition.
For ML/AI engineers, this report provides a comprehensive snapshot of consumer AI trends, highlighting innovative approaches in content generation, AI assistants, and emerging categories. It offers insights into technological capabilities, user engagement patterns, and potential areas of innovation in generative AI applications.
Task: Read the report to understand the current trends and innovations in generative AI consumer apps, and to get insights into technological capabilities, user engagement patterns, and potential areas of innovation.
Output: Understand the current trends and innovations in generative AI consumer apps
12/3. Exploring fluxmusic, a text-to-music ai model
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check out text to music model
Michael Lingelbach shared a tweet about a state-of-the-art open-source text-to-music model called FluxMusic, which uses a rectified flow transformer. The tweet provides links to the academic paper on arXiv and the GitHub repository, and notes that the model's examples sound similar to Suno AI's work.
For ML/AI engineers interested in generative AI, this resource offers insights into cutting-edge text-to-music generation techniques. The paper and code provide an opportunity to understand the latest advancements in transformer-based generative models and their application in music synthesis.
Task: Read the paper and explore the GitHub repository to understand the architecture and implementation of FluxMusic, and how it achieves state-of-the-art results in text-to-music generation.
Output: Understand the basics of text-to-music generation using transformer-based models and implement a simple project using the FluxMusic repository
12/4. Allegro text-to-video generation model
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try Allegro for local video gen
Allegro is an advanced open-source text-to-video generation model developed by Rhymes.AI. It generates 6-second, 720x1280 resolution videos at 15 FPS, using a 175M parameter VideoVAE and 2.8B parameter VideoDiT model. The model supports multiple precision modes and requires only 9.3 GB GPU memory with CPU offloading.
For ML/AI engineers interested in generative video models, Allegro offers a practical, open-source implementation of text-to-video generation. Its compact design, detailed documentation, and Apache 2.0 license make it an excellent case study for understanding modern diffusion-based video generation techniques.
Task: Try Allegro for local video generation and explore its documentation to understand its capabilities and limitations.
Output: Publish code that generates videos using Allegro
12/5. Learning diffusion models through community recommendations and mathematical explanations
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Understand diffusion models (| article)
A Reddit discussion in the Machine Learning subreddit where the original poster is seeking comprehensive resources to understand the mathematical foundations of diffusion models, highlighting the technical complexity of this emerging machine learning technique.
For ML/AI engineers wanting to dive deep into diffusion models, this thread likely contains valuable community recommendations for technical resources, mathematical explanations, and learning pathways to comprehend the underlying mathematical principles of these generative models.
Task: Read the Reddit thread and explore the linked article to understand the basics of diffusion models and find additional learning resources.
Output: Understand the basics of diffusion models and possibly publish notes or a summary of the learning process.
12/6. DimensionX: controllable video diffusion for 3D and 4D scene generation
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try DimensionX (Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion)
DimensionX is a novel AI research project that enables creating 3D and 4D scenes from a single image using controllable video diffusion. The method introduces ST-Director to decouple spatial and temporal parameters in video generation, allowing precise control over camera movements, scene reconstruction, and dynamic scene evolution across multiple views.
For ML/AI engineers, this project offers insights into advanced generative AI techniques, specifically video diffusion models, spatial-temporal parameter decomposition, and multi-view scene reconstruction. Understanding the ST-Director approach could be valuable for developing more controllable and context-aware generative AI systems.
Task: Read and explore the project webpage to understand the novel approach of using ST-Director for decoupling spatial and temporal parameters in video generation, allowing for precise control over camera movements and scene reconstruction. Try to replicate or experiment with the DimensionX method to gain hands-on experience with advanced generative AI techniques.
Output: Understand the basics of controllable video diffusion and its applications in 3D and 4D scene generation. Possibly publish code or a report on experimenting with DimensionX.
12/7. Rapid 3d printing workflow using ai tools
Usefulness: 4 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Try text to 3D
Andrew Carr describes a streamlined workflow where he used AI tools like Gemini for brainstorming and Imagen 3.0 for image generation, then converted the image to a 3D printable object using Trellis and STL conversion, ultimately printing the object in less than 10 minutes.
For ML/AI engineers, this tweet highlights the emerging capabilities of AI in generative design and digital fabrication, showing how AI can be used to rapidly prototype and create physical objects from text-based ideation through multiple AI and conversion tools.
Task: Read it to understand how AI can be used in generative design and digital fabrication, and to get inspiration for potential projects that combine AI with 3D printing.
Output: Understand the basics of AI-driven design and manufacturing, and potentially publish a blog post about the intersection of AI and 3D printing.
12/8. Understanding LoRA in AI art generation
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Learn what LoRA is
A Reddit post in the r/aiArt community where the user asks for a simple explanation of LoRA, a term they've encountered in AI art generation platforms. The user acknowledges their lack of technical knowledge and requests an easy-to-understand description.
This thread can help ML/AI learners understand a key concept in AI image generation. LoRAs are specialized fine-tuning techniques for machine learning models, and the community discussion will likely provide practical insights into how they are used in generative AI art platforms.
Task: Read this Reddit thread to get a basic understanding of what LoRA is and how it's used in AI art generation platforms.
Output: Understand the basics of LoRA in AI art generation
12/9. Exploring veo 2 for advanced video generation
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Try Veo 2 - Google DeepMind
Veo 2 is a state-of-the-art video generation model that creates high-quality videos up to 4K resolution. It offers advanced capabilities in realistic motion, detailed rendering, and extensive camera control, significantly improving upon previous AI video generation technologies.
For ML/AI engineers, Veo 2 represents a critical advancement in generative AI, demonstrating sophisticated understanding of physics, motion, and visual instruction following. Studying its technical approach could provide insights into multimodal AI generation and complex prompt interpretation.
Task: Read the page to understand the capabilities and advancements of Veo 2 in video generation, and explore how it can be applied to improve generative AI models.
Output: Understand the basics of generative AI for video and potential applications in ML/AI projects
12/10. Exploring oasis, a universe in a transformer
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check Oasis: A Universe in a Transformer (the first playable, realtime, open-world AI model. It's a video game, but entirely generated by AI | some other thing)
Oasis is the first playable, real-time, open-world AI model that generates entire video game experiences dynamically. Developed by Decart and Etched, it uses a transformer-based spatial autoencoder and latent diffusion backbone to create interactive worlds with complex game mechanics, physics, and graphics in real-time.
For ML/AI engineers, Oasis represents a cutting-edge example of generative AI applied to interactive environments. It demonstrates advanced techniques in transformer inference, diffusion models, and real-time generation, offering insights into future AI model architectures and the potential of foundation models beyond traditional applications.
Task: Read and explore the oasis model to understand how it generates interactive game worlds and learn from its architecture and techniques
Output: Understand the basics of generative AI applied to interactive environments and its potential applications
12/11. Introducing act-one: A new way to generate expressive character performances
Usefulness: 4 / 5, Time needed: 25 minutes, read on laptop: Link – manual notes: Check Runway ML: Introducing Act-One (A new way to generate expressive character performances using simple video inputs.)
Act-One is a new AI tool by Runway that enables creators to generate highly expressive character animations using simple video inputs. The technology can transform a single actor's performance into complex character animations across different characters, preserving subtle facial expressions, micro-movements, and emotional nuances.
For ML/AI engineers, Act-One offers insights into advanced generative AI techniques in computer vision, motion transfer, and performance synthesis. It demonstrates cutting-edge approaches in machine learning for translating human performance data into realistic character animations, showcasing potential applications in creative AI and generative modeling.
Task: Read it to understand the basics of generative AI in animation and character performance, and explore its potential applications in creative AI and generative modeling.
Output: Understand the basics of generative AI in animation and character performance
13. Reinforcement learning
Covers Q-learning, deep Q-networks, and policy gradients.
Why learn this: Key to AI in gaming, robotics, and autonomous systems.
Project ideas:
- Train an RL agent to play a simple video game.
- Optimize a robotic control task using reinforcement learning.
13/1. Creating a self-parking car simulation using genetic algorithms
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try creating a self-parking car (Need a large chunk of time for this.)
The article describes a self-parking car simulation where a genetic algorithm is used to evolve car behavior. By representing the car's 'brain' as a set of coefficients encoded in a 180-bit genome, the simulation demonstrates how cars can learn to park themselves through multiple generations of evolution, starting with random movements and progressively improving their parking strategy.
This resource provides an excellent practical example of applying genetic algorithms to a real-world problem. It covers key ML/AI concepts like genome representation, fitness evaluation, and evolutionary optimization. The step-by-step explanation and accompanying interactive simulator make complex machine learning techniques more approachable for engineers.
Task: Read this article to understand how genetic algorithms can be applied to complex problems like self-parking cars. Try to replicate the simulation and experiment with different parameters to see how the algorithm evolves the car's behavior.
Output: Publish code that implements a self-parking car simulation using a genetic algorithm
14. Model deployment & MLOps
Learn about model serving, monitoring, and CI/CD for ML.
Why learn this: Knowing how to deploy and maintain ML models is a valuable skill.
Project ideas:
- Deploy a trained ML model as an API with FastAPI.
- Build an automated ML pipeline using Docker and Kubernetes.
14/1. Exploring codesearch.ai for semantic code search
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Learn how codesearch.ai works
codesearch.ai is an open-source semantic code search engine repository on GitHub. It aims to provide advanced code search capabilities by leveraging AI technologies to understand and index code semantically, enabling more intelligent and context-aware code discovery.
For ML/AI engineers, this repository offers insights into semantic search techniques, code understanding models, and practical implementation of AI-driven code analysis. Exploring the source code and project structure can provide valuable learning about applying machine learning to code search and comprehension.
Task: Read through the GitHub repository to understand how codesearch.ai works, focusing on its application of AI for semantic code search and analysis. Explore the source code and project structure to gain insights into the implementation of machine learning models for code comprehension.
Output: Understand the basics of applying machine learning to code search and comprehension, and potentially publish a blog post or create a project that utilizes similar techniques.
14/2. Finding and using api keys for ai services
Usefulness: 3 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Try Writesonic (ChatGPT-like API for text and image generation)
The page provides step-by-step instructions for finding and activating an API key in the Writesonic platform. Users need to log into their account at app.writesonic.com, navigate to the profile menu, access the API Dashboard, and activate the API by clicking a switch.
For ML/AI engineers, this page demonstrates a typical process of API key management in SaaS platforms. It illustrates the importance of secure authentication and the standard workflow of accessing developer tools and API credentials in modern AI service platforms.
Task: Read it to understand how to access and manage API keys for services like Writesonic, which can be useful for deploying and integrating AI models into applications.
Output: Understand the process of accessing API credentials for AI services
14/3. SGLang: a fast serving framework for large language models
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Learn about structured generation
SGLang is an open-source fast serving framework designed specifically for large language models and vision language models. It aims to optimize model serving performance by providing efficient inference and deployment capabilities, potentially reducing latency and computational overhead.
For ML/AI engineers, this project offers insights into modern model serving techniques, framework design for AI inference, and potential optimization strategies for large language and vision models. Exploring its implementation could provide practical knowledge about efficient model deployment.
Task: Explore the SGLang repository to understand its architecture and implementation, and learn about efficient model serving techniques for large language and vision models.
Output: Understand the basics of model serving and deployment for large language models
14/4. Deploying ai models with baseten
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Learn how to deploy stuff on Baseten (and similar platforms)
Baseten provides a comprehensive solution for deploying AI models, enabling developers to serve optimized open-source and custom models through a reliable model delivery network. The platform supports various use cases like transcription, large language models, image generation, and embeddings across different deployment configurations.
For ML/AI engineers, Baseten offers practical insights into model deployment strategies, infrastructure considerations, and scalable inference techniques. Its model library and documentation can help understand real-world model serving challenges and best practices for production AI systems.
Task: Read the documentation and explore the platform to understand how to deploy AI models on Baseten and similar platforms.
Output: Publish code that deploys an AI model on Baseten
14/5. Supabase launches ai-powered postgres database service
Usefulness: 4 / 5, Time needed: 5 minutes, read on mobile: Link – manual notes: Try Postgres.new
Supabase has launched an AI-powered Postgres database service that enables developers to build and launch databases with advanced AI capabilities, including chart creation, embeddings, database visualization, and sample data generation.
For ML/AI engineers, this resource provides insights into AI-enhanced database technologies, showcasing how AI can be integrated into data infrastructure and management tools, which can be valuable for building more intelligent and automated data systems.
Task: Read it to understand how AI can be integrated into database technologies and explore the potential applications of AI-enhanced databases in ML/AI projects
Output: Understand the basics of AI-powered database services and their potential applications in ML/AI
14/6. Meticulous ai-powered testing tool
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Try Meticulous (Marketing text is: ""Meticulous eliminated the need..."")
Meticulous is an AI-powered testing tool that auto-generates visual end-to-end tests for web applications. By monitoring application interactions, it creates a continuously evolving test suite that covers every code branch and user flow without developers manually writing or maintaining tests.
For ML/AI engineers, this tool demonstrates advanced AI application in software testing automation. It showcases how machine learning can be applied to generate intelligent test scenarios, understand application behavior, and create adaptive testing strategies without human intervention.
Task: Try Meticulous to understand how AI can be applied in automated software testing and explore its potential in generating intelligent test scenarios.
Output: Understand the application of AI in software testing automation
14/7. Framer for quick website building
Usefulness: 3 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Check out Framer (Gyuri said this was the de-facto standard quick website builder now. Pricing says it's $5/month for small sites (max. 1k visitors/month).)
Framer is a web builder that enables designers and developers to create stunning, modern websites at any scale. The platform offers a visual design interface with powerful features, targeting both individual creators and enterprise users.
For ML/AI engineers, Framer could be useful for quickly prototyping project websites, creating portfolio sites, or building interactive demos for machine learning projects. Its visual design tools and code export features could help technical professionals create professional web presentations without extensive web design expertise.
Task: Explore Framer to understand how it can be used for quickly building websites for ML/AI projects, and consider using it for creating a portfolio site or project demo.
Output: Publish a website for an ML/AI project using Framer
14/8. Andrew ng's ai standardizing api library
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Try Andrew Ng's AI standardizing API library
The 'aisuite' project is a simple, unified interface designed to interact with multiple generative AI providers. It appears to be an open-source tool that aims to simplify access and interaction with different AI services through a common framework.
For ML/AI engineers, this project could provide insights into creating abstraction layers for AI services, understanding modular AI integration strategies, and exploring practical approaches to standardizing generative AI interactions across different providers.
Task: Explore the repository to understand how aisuite provides a unified interface for multiple generative AI providers and consider using it to simplify access and interaction with different AI services.
Output: Understand the basics of aisuite and its potential applications in standardizing generative AI interactions
14/9. Andrew Ng's ai-assisted coding best practices
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Check AI-Assisted Coding Best Practices by Andrew Ng
Andrew Ng discusses his approach to software prototyping using AI-assisted coding, sharing his current preferred tech stack including Python with FastAPI, Uvicorn, cloud deployment platforms like Heroku and AWS Elastic Beanstalk, MongoDB, and AI coding assistants like OpenAI's o1 and Claude 3.5 Sonnet.
For ML/AI engineers, this post provides practical insights into modern software development workflows, demonstrating how AI tools can accelerate prototype development, and offering a pragmatic perspective on selecting technology stacks that enable rapid innovation.
Task: Read it to understand how ai can accelerate software development and prototype development, and to learn about andrew ng's preferred technology stack for rapid prototyping
Output: Understand the basics of ai-assisted coding and its application in software development
14/10. Exploring the DeepSeek-V3 repository on GitHub
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check out DeepSeek v3
DeepSeek-V3 is an open-source AI/ML repository hosted on GitHub by deepseek-ai. While the repository's specific details are not immediately visible from this page view, it appears to be a machine learning project with potential significance in the AI research community.
For ML/AI engineers, exploring the repository's code, documentation, and README would provide insights into the project's technical approach, architecture, and potential applications. GitHub repositories often contain valuable implementation details and research contributions.
Task: Read the README, explore the code, and documentation to understand the project's technical approach, architecture, and potential applications.
Output: Understand the technical details of the DeepSeek-V3 project and potentially apply similar approaches in own projects.
14/11. Make academy foundation course
Usefulness: 3 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Complete make.com's Foundation course (to learn how to wire up stuff with make.com (60 min))
Make Academy offers a structured learning path with foundation-level courses focused on workflow automation. The courses cover scenario setup, operations introduction, and user interface basics, with highly rated content ranging from setting up first scenarios to understanding Make's UI.
For ML/AI engineers, these courses provide practical insights into workflow automation and integration techniques. While not directly ML-focused, understanding automation platforms like Make can help engineers streamline data processing, model deployment, and create more efficient machine learning pipelines.
Task: Complete the course to learn how to wire up workflows with Make.com and understand how to automate data processing and model deployment
Output: Understand the basics of workflow automation with Make.com and how to apply it to ML/AI pipelines
14/12. Understanding cloudflare security challenge for ml/ai engineers
Usefulness: 2 / 5, Time needed: 3 minutes, read on laptop: Link – manual notes: Try PicPicAI (Enhance portraits with AI for stunning details)
This is a security verification page from Cloudflare, designed to prevent automated or potentially malicious access to the website 'theresanaiforthat.com'. The page requires human verification before allowing access to the site's actual content.
While this specific page doesn't provide learning content, understanding web security mechanisms like Cloudflare's challenge pages can be relevant for ML/AI engineers dealing with API access, web scraping, or protecting machine learning model endpoints from unauthorized access.
Task: Read about Cloudflare's security features to understand how they can be used to protect ML/AI model endpoints from unauthorized access. Explore the actual website content if possible, to learn about AI portrait enhancement.
Output: Nothing, just read it.
14/13. Understanding cloudflare security challenges
Usefulness: 2 / 5, Time needed: 2 minutes, read on laptop: Link – manual notes: Try SVG generator
This is a Cloudflare security verification page designed to validate human users before allowing access to the website 'theresanaiforthat.com'. It requires JavaScript and challenges the user to prove they are not an automated bot.
For ML/AI engineers, this page represents a typical web security mechanism used to prevent automated scraping and protect web resources. Understanding such challenges is relevant when building web crawlers or data collection pipelines.
Task: Read it to understand the basics of web security mechanisms and how they can impact ML/AI projects, particularly when building web crawlers or data collection pipelines.
Output: Nothing, just read it.
14/14. Modern authentication strategies and user identity management
Usefulness: 3 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Check out projectneo.adobe.com
Adobe's sign-in page for Project Neo provides users multiple authentication methods, including email and social login options like Google, Facebook, and Apple. The interface is clean and modern, offering a straightforward way to create an account or continue with an existing login.
For ML/AI engineers, this page demonstrates modern authentication strategies and user identity management. Understanding multi-provider authentication, secure login flows, and user experience design can be valuable when building machine learning platforms or services that require user authentication.
Task: Explore this page to understand multi-provider authentication, secure login flows, and user experience design, and consider how these concepts can be applied to machine learning platforms or services
Output: Understand the basics of modern authentication strategies and user identity management
14/15. Modern web authentication practices
Usefulness: 2 / 5, Time needed: 5 minutes, read on laptop: Link – manual notes: Try tldraw computer (it looks quite neat. I've watched the first 9:48 of this: https://www.youtube.com/watch?v=NTDBqZdOGAM)
tldraw computer is a web application that provides a sign-in page for users to access their account. The authentication can be completed through Google OAuth or by entering an email address, with options to continue or sign up for a new account.
While this specific page doesn't directly relate to ML/AI learning, it demonstrates modern web authentication practices using Clerk, which could be relevant for building user management systems in ML/AI web applications or platforms.
Task: Explore the authentication process and Clerk integration to understand how user management systems can be built in ML/AI web applications
Output: Understand modern web authentication practices and their potential application in ML/AI web applications
14/16. Understanding 404 errors and web development fundamentals
Usefulness: 2 / 5, Time needed: 3 minutes, read on laptop: Link – manual notes: Check ""A Generative and Universal Physics Engine for Robotics and Beyond""
A GitHub-generated 404 error page that informs users that the requested file or URL cannot be found. It provides basic troubleshooting guidance, such as checking filename case and ensuring proper file permissions.
For ML/AI engineers, this page represents a common web development error handling scenario. Understanding 404 errors and how web servers manage missing resources is a fundamental skill in deploying and maintaining web applications and APIs.
Task: Read it to understand the basics of error handling in web development and how it can be applied to ML/AI applications
Output: Nothing, just read it
14/17. Error handling on hugging face
Usefulness: 2 / 5, Time needed: 1 minutes, read on laptop: Link – manual notes: Try Apollo for making my videos searchable (See the comments. – Reddit: [Meta releases the Apollo family...] – How to run)
A standard 500 error page from Hugging Face, indicating an internal server error. The page displays a Hugging Face logo, shows the error code '500', and includes a message that they are working to fix the issue, along with a unique request ID.
While this specific page doesn't offer learning content, Hugging Face is a critical platform for ML/AI practitioners, providing models, datasets, and tools. Their error handling demonstrates professional web infrastructure practices relevant to software engineering in AI.
Task: Take a brief look to understand how error handling is done on a professional platform, but do not spend too much time on this item.
Output: Nothing, just read it.
14/18. Cloudflare security challenge page
Usefulness: 1 / 5, Time needed: 1 minutes, read on laptop: Link – manual notes: Play with Easy Posters AI (Transform ideas into stunning poster designs instantly)
This is a security verification page from Cloudflare, designed to validate human access to the website theresanaiforthat.com. The page requires user interaction to proceed and verify the connection's security.
While this page itself offers no learning value, it demonstrates web security practices like bot prevention and connection verification, which are tangentially relevant to modern web infrastructure supporting AI platforms.
Task: Recognize this as a security measure and understand its purpose in protecting websites, including those related to AI services.
Output: Nothing, just recognize the security page.
15. AI ethics & bias mitigation
Covers fairness, transparency, and responsible AI.
Why learn this: AI needs to be fair, transparent, and accountable.
Project ideas:
- Analyze bias in a dataset and apply debiasing techniques.
- Write a blog post about ethical AI principles.
15/1. Google's people + ai guidebook
Usefulness: 5 / 5, Time needed: 1 hours 30 minutes, read on laptop: Link – manual notes: Learn from Google's Design for AI guidebook (See also patterns)
The People + AI Guidebook is a comprehensive resource developed by Google's PAIR team, offering six detailed chapters that guide user experience professionals through AI product development. Based on insights from over a hundred experts, the guidebook covers critical aspects like identifying user needs, data collection, mental models, explainability, feedback mechanisms, and handling errors.
For ML/AI engineers, this guidebook provides crucial insights into the human-centered design of AI systems. It bridges the technical implementation with user experience, offering practical worksheets and strategies for creating AI products that are not just technically sound, but also intuitive, trustworthy, and aligned with user expectations.
Task: Read it to understand human-centered design principles for AI product development and how to integrate AI responsibly
Output: Understand the basics of human-centered design for AI and apply these principles to future AI projects
16. Agentic AI & tool-using models
Learn how AI can integrate with external tools and act autonomously.
Why learn this: Future AI systems will need to interact with the world, not just predict outputs.
Project ideas:
- Build a language model that can interact with APIs.
- Create a simple AI agent that automates web searches and summarizes results.
16/1. Toolformer: language models can teach themselves to use tools
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Read the toolformer paper (Might be a good starting point. Toolformer: Language Models Can Teach Themselves to Use Tools)
Toolformer is a groundbreaking language model that can autonomously learn to use external tools like calculators, search engines, and Q&A systems through minimal demonstrations. By training itself to decide when and how to call APIs, the model achieves improved zero-shot performance across various tasks while maintaining its core language modeling capabilities.
For ML/AI engineers, this paper introduces a critical concept of making language models more versatile and self-improving. Understanding Toolformer can help in developing more adaptive AI systems that can dynamically integrate external tools and knowledge sources, expanding the potential of large language models beyond their current limitations.
Task: Read the paper to understand how Toolformer enables language models to autonomously learn to use external tools, and how this concept can be applied to develop more adaptive AI systems.
Output: Understand the concept of Toolformer and its potential applications in developing more versatile language models
16/2. Gorilla: connecting large language models with api capabilities
Usefulness: 5 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Check Gorilla (A way for LLMs to safely use tools and APIs. Seems like a cool alternative approach to the OpenCodeGraph general concept.)
Gorilla is an open-source Large Language Model developed by UC Berkeley researchers that specializes in connecting LLMs with massive APIs. The project provides advanced capabilities like function calling across multiple programming languages, a benchmarking leaderboard for function capabilities, and innovative techniques for retrieval-augmented generation and LLM action execution.
For ML/AI engineers, Gorilla offers crucial insights into practical LLM integration with real-world systems. Its components like OpenFunctions and GoEX demonstrate cutting-edge techniques for making language models more actionable, showing how to enable models to intelligently call APIs, validate actions, and interact with diverse service ecosystems.
Task: Read about Gorilla to understand how to enable large language models to safely use tools and APIs, and explore its components like OpenFunctions and GoEX to learn about cutting-edge techniques for making language models more actionable.
Output: Understand how Gorilla enables large language models to interact with tools and APIs, and consider how to apply these techniques to own projects.
16/3. Exploring google ai studio for multimodal ai capabilities
Usefulness: 5 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Try Google AI Studio (Also search for ""Welcome to Google AI Studio, David"" in my emails)
Google AI Studio is a web platform for developers to quickly access and integrate Gemini AI models from Google DeepMind. It offers a simple way to get an API key, provides access to multimodal AI capabilities that understand text, code, images, audio, and video, and features a generous free tier with flexible pricing.
For ML/AI engineers, this platform offers a practical entry point to explore state-of-the-art multimodal AI models. The extensive context window (2M tokens) and features like context caching make it valuable for understanding advanced AI model capabilities and practical implementation strategies.
Task: Try out Google AI Studio to understand its multimodal AI capabilities and explore how to integrate Gemini AI models into projects. Also, search for 'Welcome to Google AI Studio, David' in emails for potentially relevant information.
Output: Publish code that demonstrates the use of Google AI Studio's multimodal AI models
16/4. Improving ai code analysis with senior developer insights
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Short but inspiring post on improving context (by applying how a senior dev thinks about code...)
The author describes an experiment in improving AI code analysis by teaching it to think like a senior developer. By grouping files contextually, providing system-level context, and focusing on architectural understanding, they developed an AI system that can identify complex code relationships, potential issues, and system-wide implications beyond simple file-level examination.
For ML/AI engineers, this article offers insights into advanced code analysis techniques, demonstrating how context and system-level understanding can significantly improve AI's ability to comprehend and analyze complex codebases. It provides a practical example of enhancing AI's reasoning capabilities beyond basic pattern matching.
Task: Read it to understand how context and system-level understanding can improve AI's ability to comprehend complex codebases
Output: Understand the basics of teaching AI to think like a senior developer for code analysis
16/5. Democratized chatgpt plugins with dynamic api parsing
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Try ChatGPT plugins (Democratized ChatGPT plugins)
Justin Liang created a chatbot that can dynamically access any official ChatGPT Plugin by parsing its OpenAPI specification. He built this using wellknown.ai and Guardrails AI, demonstrating a clever workaround to plugin access limitations.
This resource provides insights into API integration, dynamic plugin parsing, and creative problem-solving in AI development. It showcases how developers can extend AI capabilities by understanding and manipulating API specifications.
Task: Read it to understand how to access chatgpt plugins dynamically and explore the possibilities of api integration in ai development
Output: Understand the concept of dynamic plugin parsing and its applications in ai development
16/6. Exploring mem0ai/mem0 for context-aware ai interactions
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Try embedchain
Mem0 is an open-source project that provides a memory layer for AI agents, enabling more sophisticated and context-aware AI interactions by allowing agents to store, retrieve, and manage contextual information dynamically.
For ML/AI engineers, this project offers insights into memory management techniques for AI agents. By exploring the repository, developers can learn about advanced AI agent architectures, memory retrieval strategies, and practical implementations of context-aware AI systems.
Task: Explore the repository to understand how mem0ai/mem0 enables AI agents to store, retrieve, and manage contextual information. Look into the project's documentation, code, and examples to learn about advanced AI agent architectures and memory retrieval strategies.
Output: Understand the basics of memory management for AI agents and how mem0ai/mem0 implements this functionality
16/7. The future of software engineering automation
Usefulness: 5 / 5, Time needed: 10 minutes, read on mobile: Link – manual notes: See this from Karpathy
Karpathy explores the future of software engineering automation, describing a progression where AI gradually takes over more development tasks. He draws a comparison with self-driving car technology, showing how human oversight shifts from direct control to high-level guidance and strategic direction.
For ML/AI engineers, this tweet provides insights into the potential future of software development. It offers a conceptual framework for understanding how AI might transform coding practices, emphasizing the importance of understanding incremental autonomy and the evolving role of human developers.
Task: Read it to understand Karpathy's perspective on the potential progression of automating software engineering and how AI might transform coding practices
Output: Understand the potential future of software development and the evolving role of human developers
16/8. Automating daily stand-ups with AI
Usefulness: 3 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Check ""it's so over"" from Beyang
Charlie Holtz developed an AI clone called 'Jarley' that automatically tracks his daily work activities by listening to audio with Whisper and generating responses using a language model, effectively replacing traditional daily stand-up meetings.
This project demonstrates practical application of AI technologies like speech recognition (Whisper), language models, and conversational AI. It showcases how ML/AI can be used to automate routine workplace communication and potentially improve productivity.
Task: Read it to understand a practical application of AI in a workplace setting, such as automating daily stand-up meetings.
Output: Understand how AI can be used to automate routine workplace communication
16/9. Auto code rover project
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Understand how auto-code-rover and similar projects work (HIgh-level AI use)
Auto Code Rover is an AI-driven autonomous software engineer designed to improve program code. It has demonstrated remarkable performance, resolving 37.3% tasks in SWE-bench lite and 46.2% tasks in SWE-bench verified, with each task costing less than $0.7.
For ML/AI engineers interested in autonomous code generation and software improvement, this project offers insights into advanced AI techniques for program transformation and task-solving. It represents a cutting-edge approach to using AI for automated software engineering tasks.
Task: Read it to understand how auto-code-rover and similar projects work, focusing on their application of AI for automated software engineering tasks
Output: Understand how Auto Code Rover works and its potential applications in autonomous software engineering
16/10. Language models can solve computer tasks
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Read ""Language Models can Solve Computer Tasks"" (2023-03-30)
The research demonstrates that pre-trained large language models can execute computer tasks through a simple prompting scheme called Recursively Criticizing and Improving (RCI). By using this method, LLMs can solve tasks using only a few demonstrations, outperforming existing supervised and reinforcement learning approaches on the MiniWoB++ benchmark.
For ML/AI engineers, this paper provides insights into advanced prompt engineering techniques that enable language models to self-improve and solve complex tasks. The RCI approach showcases how recursive feedback and criticism can enhance AI reasoning capabilities across different domains.
Task: Read it to understand how language models can be used to solve computer tasks through recursive criticism and improvement
Output: Understand the basics of using language models for computer tasks and how recursive criticism and improvement can enhance AI reasoning capabilities
16/11. Humanlayer for human-in-the-loop ai interactions
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Investigate for AI toolkit
HumanLayer is an API and SDK that allows AI agents to pause and request human feedback, approval, or input during critical operations. It supports multiple frameworks like OpenAI, LangChain, and CrewAI, enabling developers to add human oversight to AI workflows with minimal code changes.
For ML/AI engineers, HumanLayer offers a practical approach to improving AI agent reliability by introducing human verification. It demonstrates how to build more trustworthy and controllable AI systems by integrating human judgment at key decision points, which is crucial for developing production-ready AI applications.
Task: Investigate the HumanLayer platform to understand how it can be used to add human oversight to AI workflows, and explore its potential applications in developing more trustworthy and controllable AI systems.
Output: Understand how HumanLayer can be used to improve AI agent reliability and develop production-ready AI applications
16/12. Building agentic workflows in invoice processing with LlamaIndex
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Check resources for building agentic workflows in invoice processing (@llama_index provided these resources)
LlamaIndex demonstrates how to build an intelligent invoice processing agent that automatically extracts line items and enriches them with spend categories and cost centers, leveraging advanced parsing and retrieval techniques.
This resource provides a concrete example of applying AI agents to document processing, showcasing practical techniques for extracting structured information from unstructured documents using LlamaIndex's parsing and indexing tools.
Task: Read it to understand how to apply agentic AI techniques to document processing, and explore the provided resources for building intelligent invoice processing agents.
Output: Understand how to build an intelligent invoice processing agent using LlamaIndex tools
16/13. Personalizing ai agents through fine-tuning
Usefulness: 4 / 5, Time needed: 3 minutes, read on mobile: Link – manual notes: Agents: Record work processes to enhance learning and data availability (@RichardMCNgo advocated for this)
Richard Ngo shares a brief insight about 'personal use' of AI, specifically referring to fine-tuning an AI agent to perform the same type of work as the user, indicating a potential approach to creating more tailored AI assistants.
For ML/AI engineers, this tweet highlights an emerging trend of personalized AI adaptation. It suggests exploring techniques like transfer learning and fine-tuning to create AI agents that can mimic a user's specific work patterns and skills.
Task: Read it to understand the concept of fine-tuning AI agents for personal use and how it can be applied to create more task-specific AI tools.
Output: Understand the basics of fine-tuning AI agents for personal use
16/14. Understanding ai agents from a software engineering perspective
Usefulness: 4 / 5, Time needed: 3 minutes, read on mobile: Link – manual notes: Agents: Workflow and Automation ([@bindureddy detailed how agents manage workflows, data transformation, and visualization widgets])
Bindu Reddy explains that AI agents are similar to traditional software, with specific characteristics: they have defined workflows, require deployment and monitoring, need to process and understand potentially large datasets, and can utilize various UX widgets for information visualization.
This tweet provides a concise, practical overview of AI agents from an engineering perspective, helping ML/AI learners understand agents not as mystical entities, but as structured software components with clear operational requirements and design considerations.
Task: Read it to grasp the key characteristics of AI agents, including workflows, deployment, data processing, and visualization widgets.
Output: Understand the basics of AI agents and their software engineering aspects
16/15. Building agentic workflows in invoice processing with LlamaIndex
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check resources for building agentic workflows in invoice processing (@llama_index repeated mention)
LlamaIndex presents a notebook showcasing an automated invoice processing agent that uses LlamaParse to extract and enrich invoice line items with spend categories and cost centers, leveraging advanced AI techniques for document intelligence.
This resource provides a concrete example of applied machine learning for document processing, demonstrating practical techniques in AI agents, document parsing, information extraction, and workflow automation using LlamaIndex tools.
Task: Read the tweet and explore the associated resources to understand how to build agentic workflows in invoice processing using LlamaIndex and LlamaCloud technologies.
Output: Understand how to apply machine learning for document processing and automation in invoice processing
16/16. Ai-powered code refactoring with deepseek and sonnet
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Investigate this approach to use 500 DeepSeek instances... (then feed only relevant parts to Claude. GitHub repo: https://github.com/VictorTaelin/AI-scripts)
Victor Taelin demonstrates a novel AI-powered code refactoring method where large codebases are split into chunks, analyzed by ~500 DeepSeek instances in parallel to determine necessary edits, and then processed by Sonnet. The entire process takes less than a minute and provides a 'git diff' for manual review.
This resource provides insights into advanced AI-assisted software engineering techniques, showcasing how multiple AI models can be orchestrated to solve complex code transformation tasks. It highlights the potential of AI in automating large-scale code refactoring and semantic reasoning.
Task: Read the tweet and explore the GitHub repo to understand the approach and its implementation. Investigate how to use 500 DeepSeek instances to identify relevant code blocks and then feed them to Claude for further processing.
Output: Understand the basics of AI-powered code refactoring and how to implement it using DeepSeek and Sonnet
16/17. NVIDIA Cosmos: A world model development platform for Physical AI
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try NVIDIA Cosmos
Cosmos is an NVIDIA project aimed at accelerating Physical AI development through world foundation models, specialized tokenizers, and a video processing pipeline. It's designed specifically for robotics and autonomous vehicle labs to enable advanced AI model inference and video generation capabilities.
For ML/AI engineers interested in Physical AI and world modeling, this repository offers insights into cutting-edge AI techniques for robotics and autonomous systems. Exploring its models, inference scripts, and video processing approach could provide valuable practical knowledge about emerging AI technologies.
Task: Explore the repository to understand the foundation models, tokenizers, and video processing pipelines provided by NVIDIA Cosmos. Try to implement or integrate these models into projects related to robotics or autonomous vehicles to gain practical experience.
Output: Publish code that integrates NVIDIA Cosmos models into a robotics or autonomous vehicle project
16/18. Zencoder ai-powered coding assistant
Usefulness: 4 / 5, Time needed: 20 minutes, write on laptop: Link – manual notes: Try Zencoder
Zencoder is an AI coding agent designed to help developers code faster and smarter by automating routine tasks. The platform offers features like AI Agents and Repo Grokking, targeting developers seeking to improve productivity and stay in their creative flow.
For ML/AI engineers, Zencoder represents an emerging category of AI development tools that leverage machine learning to assist in code generation and repository analysis. Understanding such platforms can provide insights into AI's evolving role in software development workflows.
Task: Try Zencoder to understand how AI can assist in code generation and repository analysis, and explore its features to improve productivity in ML/AI development
Output: Publish code that leverages Zencoder's AI-powered coding assistant
16/19. Setting up an MCP server with Python and integrating with Claude
Usefulness: 4 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Set up MCP in Claude
A comprehensive guide for developers to build an MCP server that provides weather-related tools, demonstrating how to create server-side functionality that can be dynamically called by AI clients like Claude. The tutorial walks through creating tools to fetch weather alerts and forecasts using the National Weather Service API.
This resource is excellent for ML/AI engineers looking to understand how to extend AI capabilities through server-side tool creation. It provides a practical example of building an MCP server, showing how to define tools, handle API interactions, and integrate with AI clients using Python and the MCP SDK.
Task: Follow this tutorial to set up an MCP server and integrate it with Claude for Desktop, using the provided example of a weather-related server as a guide.
Output: Publish code that sets up an MCP server with Python and integrates it with Claude
16/20. Exploring claude ai assistant
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Learn Claude well
Claude is an AI assistant by Anthropic offering privacy-first, creative collaboration tools. Available in Free, Pro, Team, and Enterprise tiers, it provides features like document creation, image analysis, and team collaboration with different levels of usage and access across pricing plans.
For ML/AI engineers, this page provides insights into commercial AI assistant development, showcasing how advanced language models can be packaged as user-friendly products. It highlights practical considerations like model versioning, usage tiers, and enterprise feature requirements.
Task: Read it to understand the commercial applications of AI assistants and how they can be used for collaboration and creativity
Output: Understand the capabilities and features of Claude AI assistant and how it can be used in real-world applications
16/21. Exploring moonshine web for real-time speech recognition
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Do some project with Moonshine web (Moonshine is optimized for real-time, on-device applications...)
Moonshine Web is an innovative in-browser speech recognition tool that offers real-time transcription with performance claims of being faster and more accurate than existing solutions like Whisper. The technology appears to be a significant advancement in web-based speech-to-text processing.
For ML/AI engineers, this resource provides insights into cutting-edge speech recognition technologies, demonstrating how web-based AI can be implemented for real-time language processing. Understanding such technologies can help in developing more efficient and performant speech recognition solutions.
Task: Read the Reddit post and explore the Moonshine Web technology to understand its capabilities and potential applications in real-time speech recognition. Consider how it compares to other technologies like Whisper and think about potential project ideas that could utilize this technology.
Output: Understand the basics of Moonshine Web and its potential for real-time speech recognition projects
16/22. Building a realtime voice assistant with webrtc and openai
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Try OpenAI voice API (see also https://x.com/simonw/status/1869143764775907494)
Justin Uberti shares a compact JavaScript implementation of a realtime voice assistant using WebRTC and OpenAI's Realtime API. The code requires just 12 lines and involves passing a local stream from getUserMedia(), an audio output element, and an API token.
This resource provides a practical, code-level insight into integrating WebRTC with AI voice technologies. It's an excellent example for ML/AI engineers looking to understand how to quickly prototype voice assistants using modern web technologies and AI APIs.
Task: Read it to understand how to quickly prototype voice assistants using modern web technologies and AI APIs, and try to replicate the example using OpenAI's voice API as suggested by David.
Output: Publish code that implements a similar voice assistant using OpenAI's voice API
16/23. Model context protocol for integrating ai with data sources
Usefulness: 4 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Try Claude's Model Context Protocol (MCP) (Definitely try this)
Anthropic introduces the Model Context Protocol (MCP), an open-source standard that enables developers to build secure, two-way connections between data sources and AI-powered tools. The protocol simplifies integrations across content repositories, business tools, and development environments, allowing AI systems to access and maintain context across different datasets.
For ML/AI engineers, MCP represents a crucial architectural approach to solving context and integration challenges. By understanding this protocol, engineers can learn how to create more flexible, context-aware AI systems that can seamlessly interact with multiple data sources without building custom connectors for each integration.
Task: Read it to understand the basics of MCP and how it can be used to create more flexible AI systems, then try implementing Claude's Model Context Protocol (MCP) as suggested by David
Output: Understand the basics of MCP and its potential applications in ML/AI
16/24. Genesis physics ai engine
Usefulness: 4 / 5, Time needed: 20 minutes, write on laptop: Link – manual notes: Try this AI physics engine (reddit)
Genesis is a groundbreaking open-source physics AI engine that delivers ultra-fast simulation capabilities, processing 43M frames per second on an RTX 4090. Built in pure Python, it supports cross-platform environments, multiple physics solvers, robot simulations, and offers photorealistic ray-tracing rendering with impressive performance metrics.
For ML/AI engineers interested in robotics, physics simulations, and generative AI, Genesis offers a cutting-edge platform to explore advanced simulation techniques. Its Python implementation, extensive robot support, and high-performance computing make it an excellent tool for researchers and developers working on complex physical world modeling.
Task: Try out the Genesis physics AI engine to explore its capabilities in ultra-fast physics simulation and robotics, and consider using it as a tool for researching and developing complex physical world modeling projects.
Output: Publish code that integrates Genesis with an ML/AI project, such as a robotics simulation or a generative AI model
16/25. Ai-generated debugger for improved programming workflow
Usefulness: 4 / 5, Time needed: 12 minutes, read on laptop: Link – manual notes: Read interesting article on example use case for Claude Artifacts (Similar vibe, but quite different example with Aider)
Geoffrey Litt describes how he used AI (specifically Claude) to quickly generate a custom debugger UI for a Prolog interpreter project. Instead of writing code himself, he leveraged AI to create an interactive visualization tool that made debugging more intuitive, efficient, and enjoyable, demonstrating how AI can accelerate development of specialized tools.
For ML/AI engineers, this article showcases practical AI-assisted development techniques, highlighting how generative AI can be used to rapidly prototype development tools, improve debugging workflows, and create custom interfaces that enhance understanding of complex system behaviors.
Task: Read the article to understand how AI can be used to generate custom debugging interfaces and improve programming workflows, and consider how this technique can be applied to other areas of software development.
Output: Understand how AI-generated tools can enhance programming workflows and consider potential applications in own projects
16/26. Exploring cline, an autonomous coding agent
Usefulness: 4 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Meet Cline, an AI assistant that can use your CLI and Editor (Ezt emlitette egy srac mult heten, hogy hasznalja es szerinte meglepoen jo)
Cline is an open-source autonomous coding agent designed to work directly in an IDE, capable of creating and editing files, executing commands, using browsers, and performing actions with user permission at each step.
For ML/AI engineers interested in AI-powered development tools, this project represents an emerging category of AI coding assistants. It demonstrates practical applications of language models in autonomous code generation and task execution, offering insights into potential future development workflows.
Task: Read the GitHub repository page to understand the capabilities and potential applications of cline, and explore the project's code to gain insights into its implementation.
Output: Understand the basics of agentic AI and its applications in coding assistants
16/27. Remotion library for programmatic video creation
Usefulness: 3 / 5, Time needed: 45 minutes, read on laptop: Link – manual notes: Try Remotion
Remotion is an innovative library that enables developers to create videos programmatically using React. It supports rendering real MP4 videos, provides interactive editing capabilities, and offers tools for building video applications with features like server-side rendering, parametrization, and cloud-based scaling.
For ML/AI engineers, Remotion offers an interesting approach to video generation and visualization. It could be particularly useful for creating explainer videos, data visualization, machine learning model demonstrations, or generating synthetic training data with programmatic control.
Task: Explore the Remotion library to understand how it can be used for video generation and visualization in ML/AI applications. Try building a simple video application using Remotion to get hands-on experience.
Output: Publish code that demonstrates the use of Remotion for video generation or visualization in an ML/AI context
16/28. Browser automation for ai agents and applications
Usefulness: 4 / 5, Time needed: 30 minutes, read on laptop: Link – manual notes: Check a web browser for your AI
Browserbase provides a web browser infrastructure for AI agents and developers, offering scalable, fast, and secure browser automation. With features like seamless integration with existing tools, global browser deployment, and advanced stealth capabilities, it simplifies running web browsers for complex AI and automation tasks.
For ML/AI engineers, Browserbase offers insights into web browser automation techniques, demonstrating how to integrate browser interactions into AI workflows. Its documentation, SDKs, and playground can help developers understand practical approaches to building web-interactive AI agents and applications.
Task: Explore the documentation, SDKs, and playground to understand practical approaches to building web-interactive AI agents and applications
Output: Understand how to integrate browser interactions into AI workflows
16/29. Google authentication overview
Usefulness: 2 / 5, Time needed: 3 minutes, read on laptop: Link – manual notes: Read Google's prompt engineering overview (→ public (if not found, look for ""Vertex PaLM"" in my Dropbox))
A standard Google authentication page requiring users to enter their email or phone number to sign in to Google Docs. The page provides options for creating an account, language selection, and accessing help resources.
While not directly related to ML/AI, understanding authentication flows and security practices is important for ML engineers when building secure machine learning platforms and managing access to data and models.
Task: Read it to understand the basics of Google's authentication flow, but note that the actual content related to prompt engineering is likely found in the linked public Vertex AI documentation or the Dropbox file.
Output: Nothing, just read it, but potentially useful for understanding security practices in ML engineering
16/30. Google sign-in page for Gmail
Usefulness: 1 / 5, Time needed: 2 minutes, read on laptop: Link – manual notes: Try ChatGPT and whisper API (email [here])
A standard Google account login page for Gmail, featuring an email/identifier input field, language selection options, and basic sign-in workflow with Google's typical blue and white design aesthetic.
While not directly related to ML/AI learning, understanding authentication flows and user interfaces can be valuable for ML engineers when developing user-facing machine learning applications and understanding user interaction design.
Task: Skip this, as it's not relevant to ML/AI learning. If needed, use it as a reference for understanding authentication flows in user-facing applications.
Output: Nothing, just skip it.
17. Cutting-edge research & future trends
Stay updated on neurosymbolic AI, multimodal learning, and frontier research.
Why learn this: AI is evolving fast—staying ahead is crucial.
Project ideas:
- Reproduce results from a recent AI research paper.
- Explore AI’s role in emerging applications like drug discovery.
17/1. 2025 ai engineer reading list
Usefulness: 5 / 5, Time needed: 4 hours, read on laptop: Link – manual notes: Check 2025 AI Engineer Reading List (looks like a perfect and up-to-date starting point!)
The 2025 AI Engineering Reading List is a meticulously curated collection of around 50 papers and resources spanning 10 critical domains in AI engineering. Designed for practical learning, the list focuses on providing context and practical insights rather than just academic references, helping AI engineers understand the most important technological developments.
For AI engineers looking to stay current, this reading list offers a structured approach to understanding cutting-edge AI technologies. By covering papers from frontier LLMs to specialized domains like agents and code generation, engineers can gain comprehensive insights into the evolving AI landscape, learn about key benchmarks, and understand practical applications of emerging technologies.
Task: Read through the list to understand the scope of covered topics, and then selectively dive into papers and resources that align with your current learning goals or interests.
Output: Understand the current landscape of AI technology and identify key areas for further learning.
17/2. Staying current with ai advancements
Usefulness: 4 / 5, Time needed: 3 minutes, read on mobile: Link – manual notes: Consider this when I plan my learning (I might not need to know what's in the box)
Scott Stevenson argues that traditional machine learning experience is becoming less valuable unless professionals can demonstrate adaptation to recent AI advancements. He emphasizes that processes that previously took months can now be completed in minutes, and warns against being too attached to old methodological approaches.
For ML/AI learners, this tweet serves as a crucial reminder to stay current with rapidly evolving AI technologies. It highlights the importance of being flexible, learning new prompt engineering techniques, and understanding that modern AI tools can dramatically reduce development time compared to traditional ML approaches.
Task: Read it to understand the importance of adapting to recent AI advancements and the need to stay current with evolving technologies.
Output: Understand the importance of adapting to new AI technologies
17/3. Openai's 12 days of shipmas announcements
Usefulness: 5 / 5, Time needed: 15 minutes, read on laptop: Link – manual notes: Understand and try OpenAI shipmas announcements
OpenAI's '12 Days of Shipmas' featured daily announcements of groundbreaking AI features, including the o1 reasoning model, ChatGPT Pro, Sora video generation, Canvas collaborative tool, Advanced Voice Mode, and integration with Apple Intelligence, signaling substantial progress in AI technology and user interaction.
For ML/AI engineers, this article provides insights into the latest AI capabilities, product innovations, and development strategies. It offers a comprehensive overview of emerging technologies, potential integration points, and the evolving landscape of generative AI tools and models.
Task: Read the article to understand the latest AI product launches and updates from OpenAI, and try to implement or explore the announced features and tools.
Output: Understand the latest AI capabilities and product innovations from OpenAI, and potentially implement or integrate the announced features and tools into personal projects.
17/4. A decade of progress in neural networks: Reflections and future directions
Usefulness: 5 / 5, Time needed: 45 minutes, watch: Link – manual notes: Watch this talk (first 15 minutes recommended) (Reposting from @Amie Rotherham's link. Key takeaways about future data, code domain, etc.)
The talk provides a 10-year retrospective on deep learning, highlighting the evolution from early neural network models to current large language models. The speaker discusses key milestones like auto-regressive models, the scaling hypothesis, and the transition from traditional machine learning approaches to massive pre-trained models with increasing computational capabilities.
For ML/AI engineers, this talk offers valuable historical context on neural network development, insights into scaling laws, and a forward-looking perspective on AI's potential future. It provides a nuanced understanding of how current AI technologies emerged and hints at potential future directions like reasoning, agents, and synthetic data generation.
Task: Watch this talk to understand the historical context and future directions of neural networks, focusing on key milestones like auto-regressive models and the scaling hypothesis.
Output: Understand the evolution of neural networks and potential future directions in AI research
17/5. Novel attention mechanisms in ml architecture
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: ""Attention thingie"" from Beyang
Together AI announced a new machine learning architecture called 'Based', which leverages short sliding window attention and softmax-approximating linear attention. The research is a collaboration with Hazy Research, focusing on novel attention-like primitives to potentially improve model performance.
For ML/AI engineers, this tweet provides insight into cutting-edge research on attention mechanisms. Understanding alternative attention techniques can help in designing more efficient neural network architectures and potentially improving model performance across various tasks.
Task: Read it to understand the basics of alternative attention techniques and their potential to improve model performance
Output: Understand the basics of novel attention mechanisms in ML architecture
17/6. Mixture of experts paper and exponentially faster language modeling
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Check ""Mixture of Million Experts"" paper
A technical tweet discussing a cutting-edge approach to Mixture of Experts (MoE) strategy, highlighting an innovative method for expanding the pool of experts with a novel routing solution. The approach goes beyond mere efficiency, emphasizing lifelong learning and dynamic expert expansion.
This resource provides insights into advanced machine learning model architectures, specifically Mixture of Experts techniques. For ML/AI engineers, it offers a glimpse into emerging research on scalable, adaptive neural network designs that could significantly impact how large language models are constructed and trained.
Task: Read the referenced paper and explore the discussed approach to understand how it can be applied to improve the efficiency and adaptability of large language models.
Output: Understand the basics of Mixture of Experts models and their application in large language models
17/7. Deepseek's mathematical reasoning model
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Check deep seek papers recommended by Beyang (They are apparently super accessible and transparent.)
A tweet by Teortaxes praising DeepSeek's new mathematical reasoning model, which was pre-trained on an enriched corpus of 4.5T tokens. The model seems to have achieved state-of-the-art performance with sensible experimental techniques and minimal self-sabotage.
For ML/AI engineers, this tweet provides insights into model pre-training strategies, specifically how targeted corpus enrichment and careful training techniques can significantly improve model performance in specialized domains like mathematical reasoning.
Task: Read it to understand the latest advancements in mathematical reasoning using large language models
Output: Understand the basics of DeepSeek's mathematical reasoning model and its potential applications
17/8. Music transformer project for creative music composition
Usefulness: 4 / 5, Time needed: 10 minutes, read on laptop: Link – manual notes: Understand music transformer that added violin for Für Elise, etc
Percy Liang shares a post about a post-doctoral project by John Thickstun that developed an anticipatory music Transformer. The project resulted in a violin accompaniment for Für Elise, which was premiered at the SF Symphony SoundBox event in April, with accompanying blog post and reflections.
This resource provides insight into cutting-edge AI research in creative domains, specifically music generation. It demonstrates how transformer models can be applied beyond traditional language and image tasks, offering ML/AI engineers a perspective on innovative machine learning applications in artistic creation.
Task: Read the tweet and accompanying blog post to understand how transformer models can be applied to music generation and composition. Explore the project's results and reflections to gain insight into the potential of AI in creative domains.
Output: Understand the basics of music generation using transformer models and their potential applications in creative music composition
17/9. Exploring the a16z ai page for insights into cutting-edge ai development
Usefulness: 5 / 5, Time needed: 20 minutes, read on laptop: Link – manual notes: Explore https://a16z.com/ai/ 10 min at a time (This is so cool!)
Andreessen Horowitz's AI page presents a comprehensive vision for AI's transformative potential across various domains like medicine, national defense, and technological innovation. They highlight their commitment through open-source projects, developer tools, and strategic funding for AI infrastructure and research.
For ML/AI engineers, this page offers valuable insights into cutting-edge AI development through their open-source repositories, infrastructure tools, and developer resources. The GitHub projects like ai-getting-started, ai-town, and companion-app provide practical examples of AI application development.
Task: Read and explore the page to understand the vision and commitment of Andreessen Horowitz to AI technologies, and discover the open-source projects and resources available for AI development.
Output: Understand the role of venture capital in ai development and the types of projects and resources being supported
17/10. Gemini 2.0 multimodal live streaming demo
Usefulness: 4 / 5, Time needed: 10 minutes, watch: Link – manual notes: Building with Gemini 2.0: Multimodal live streaming
The video transcript shows a technical demonstration of Gemini 2.0's Multimodal Live API, highlighting its ability to process different data types simultaneously, understand screen content, respond to interruptions, and maintain conversational context across interactions.
For ML/AI engineers, this demo provides insights into advanced AI interaction models, showcasing practical implementations of multimodal AI, real-time processing, and conversational AI techniques that are increasingly important in developing sophisticated interactive systems.
Task: Watch it to get a high-level overview of multimodal AI and real-time processing capabilities, and take notes on the implementation of conversational AI techniques.
Output: Understand the basics of multimodal AI and its applications in real-time interactive systems
17/11. Gemini 2.0 multimodal live streaming demo
Usefulness: 4 / 5, Time needed: 10 minutes, watch: Link – manual notes: Building with Gemini 2.0: Multimodal live streaming (duplicate)
The video transcript shows a technical demonstration of Gemini 2.0's Multimodal Live API, highlighting its ability to process different data types simultaneously, understand screen content, respond to interruptions, and maintain conversational context across interactions.
For ML/AI engineers, this demo provides insights into advanced AI interaction models, showcasing practical implementations of multimodal AI, real-time processing, and conversational AI techniques that are increasingly important in developing sophisticated interactive systems.
Task: Watch it to understand the practical implementation of multimodal AI and real-time processing in interactive systems
Output: Understand the basics of multimodal AI and its applications in real-time interactive systems
17/12. google authentication page
Usefulness: 1 / 5, Time needed: 2 minutes, read on laptop: Link – manual notes: Check AI reading list by Beyang (Might be a good starting point to understand Transformers.)
A Google authentication page for signing into Google Docs, requiring users to enter their email or phone number to proceed. The page provides options for signing in, creating an account, and selecting language preferences.
While not directly related to ML/AI learning, understanding authentication flows and user interface design can be valuable for ML engineers building user-facing systems or considering user authentication in machine learning applications.
Task: Skip this item as it is not relevant to ML/AI learning. If you need to access the AI reading list by Beyang, try finding an alternative link or contact the author for access.
Output: Nothing, just skip it.
Happy learning!