I’ve been in a mode of trying lots of new AI tools for the past year or two, and feel like it’s useful to take an occasional snapshot of the “state of things I use”, as I expect this to continue to change pretty rapidly.

  • Claude 3.5 Sonnet (via API Console or LLM): I currently find Claude 3.5 Sonnet to be the most delightful / insightful / poignant model to “talk” with. It excels at complex reasoning tasks, especially those that GPT-4 fails at. For example, I tasked Sonnet with writing an AST parser for Jsonnet, and it was able to do so with minimal additional help. I don’t subscribe to Claude’s pro tier, so I mostly use it within the API console or via Simon Willison’s excellent llm CLI tool. The Artifacts feature of Claude web is great as well, and is useful for generating throw-away little React interfaces.

  • GPT-4o: This is my current most-used general purpose model. The most powerful use case I have for it is to code moderately complex scripts with one-shot prompts and some nudges. GPT-4o seems better than GPT-4 in receiving feedback and iterating on code. I also use it for general purpose tasks, such as text extraction, basic knowledge questions, etc. The main reason I use it so heavily is that the usage limits for GPT-4o still seem significantly higher than sonnet-3.5. And the pro tier of ChatGPT still feels like essentially “unlimited” usage.

  • GPT macOS App: A surprisingly nice quality-of-life improvement over using the web interface. Having the ability to ⌥-Space into a ChatGPT session is super handy. I don’t use any of the screenshotting features of the macOS app yet. They’re not automated enough for me to find them useful. If there was a background context-refreshing feature to capture your screen every time you ⌥-Space into a session, this would be super nice.

  • Github Copilot: I use Copilot at work, and it’s become nearly indispensable. I recently did some offline programming work, and felt myself at least a 20% disadvantage compared to using Copilot. Copilot has two components today: code completion and “chat”. I find the chat to be nearly useless. It has “commands” like /fix and /test that are cool in theory, but I’ve never had work satisfactorily. The chat model Github uses is also very slow, so I often switch to ChatGPT instead of waiting for the chat model to respond.

Use cases

  • Docs/Reference replacement: I never look at CLI tool docs anymore. LLMs have memorized them all. Whenever I need to do something nontrivial with git or unix utils, I just ask the LLM how to do it. I very much could figure it out myself if needed, but it’s a clear time saver to immediately get a correctly formatted CLI invocation.

  • Limited Scope Refactorings: Copy/pasting a small chunk (<100 lines) of code or SQL, and asking it to perform some transformation (i.e. “Make the query return weekly data instead of daily data”, “Change this function to work with Fizz protos instead of Buzz protos”) tends to have a high enough success rate that it is a time saver.

  • General Knowledge Conversations: I’ve enjoyed using the original ChatGPT voice chat feature during my commute. It feels like talking with someone who has read every Wikipedia article ever. As of 2024, the “new” voice chat feature powered by GPT-4o hasn’t landed yet, so I don’t have any experience with that.

Things I Haven’t Had Time to Try

  • Gemini Pro/Advanced, or its related tooling like NotebookLM. The coolest part of the recent Gemini models is their extremely large context window (2M input tokens). In my limited testing, Gemini seems “good”, I just haven’t had enough time tinkering with it to see where it exceeds the capacities of the OpenAI/Anthropic models.
  • Deepseek Coder V2: An extremely powerful open-source model for coding. This one looks pretty great by the benchmark results Deepseek have posted. However, I tried playing with the quantized model locally and was disappointed. The full model is rather expensive to host locally, which has been a barrier. Deepseek also offer the model via an API (at quite low cost too), which I hope to try eventually.

Resources

  • Zvi Mowshowitz’s weekly AI posts are excellent, and give an extremely verbose AI “state of the world”.
  • Simon Willison’s blog is also an excellent source for AI news.
  • The Cognitive Revolution podcast hosts some pretty good interviews that I find to be high-signal-to-noise, and is much less hype-driven than many other AI-centric podcasts I’ve attempted to listen to.