Setting Up Ollama

← Back to Home

What is Ollama?

Ollama is a lightweight, local LLM runtime that lets you run models like LLaMA, Mistral, and Gemma directly on your machine without needing a GPU or even an API token. It’s perfect for experimentation, prototyping, and even production-grade setups with privacy in mind.

Think of it as the “Docker for language models” download a model once and run it anywhere, completely offline.

Why Ollama?

Downloading and Installing Ollama

You can grab the latest release from the official website: ollama.com.

  1. Visit https://ollama.com
  2. Click on Download
  3. Select your operating system (macOS, Windows, or Linux)
  4. Once downloaded, run the installer
  5. Click through the setup steps (Next → Install)
  6. After installation, open your terminal/command prompt
  7. Run: ollama --help to verify it’s working

Basic Commands

Pull a model
ollama pull gemma3

This will download the latest version of the specified model.

ollama pull
Run a model
ollama run gemma3

Starts an interactive session with the model in your terminal.

ollama run
List installed models
ollama list
ollama list
Remove a model
ollama rm llama2
View model info
ollama show gemma3
ollama show
Start the server manually
ollama serve
Stop the running model
ollama stop

Final Thoughts

Ollama is a fantastic tool if you want local inference without the hassle of API keys or cloud latency. It’s fast, private, and incredibly easy to use. With just a few commands, you’ll have state-of-the-art models running right on your machine.

Give it a try, and see how it fits into your AI workflow!

Reference

ollama.com