Getting to know Foundation Models

Cover Image

So, what now?

Every so often the world takes a violent shift as reality changes. The way things were interacted with before are violently thrown out and a new approach makes its way into the gaping void left. In recent years, the iPhone changed the way people interacted with mobile devices. Sure, there were mobiles and PDAs (Personal Digital Assistants, not Public Displays of Affection) out in the world but it was the iPhone that propelled everything forward radically.

The Apple Watch was another big change, yes smart watches existed before but were mainly just gps and fitness focused. In the last few years, it has been generative AI that caused the shift.

First came ChatGPT and other LLMs, but them alone is not the shift that has happened. The shift, the reality changing event, is in the agents which provide the data and allow for interactions with an LLM. This is what’s known as Model Context Protocol (MCP). It is a communication protocol that allows for building of complex workflows. It is this that makes the whole reality shift happen.

It is also one which provokes heated discussion and with regards to those in favour and those against. Do we need to embrace it because of this? Do we need to discard all that has come before? No, no and emphatically no. Like anything else, we must learn to understand it within the context that it exists. This means we look at its advantages and incorporate them where appropriate. We also acknowledge the limitations of the reality shift.

So just what is it?

Apple’s adventure into this landscape is FoundationModels which gives us access to the on device LLM. All of the concepts in the framework are similar to how LLM’s work in the broad sense.

This includes the idea of a LanguageModelSession, Instructions, Prompt and Tool.

What isn’t there is an ability to connect to MCP servers outside of the running app. What we do have though is Tool which maps directly to the idea of a tool in the sense of an MCP server concept.

This is a very Apple approach in that limits functionality through providing an abstraction over the core concepts around LLM’s and MCP.

Interacting with FoundationModels

It’s all good and true to think about things in the abstract from a liquid glass tower, but the fun is in working through an implementation. So lets look at the different parts involved and how they can all work together.

Session

LanguageModelSession is the entry point to the interactions with the framework. It is what gets initialised with the instructions that provide guidance to all the prompts which the session receives.

Instruction

When a language model session is constructed, it allows for setting a series of instructions that provide guidance to how prompts are answered by the session. The best way to create the instructions is by the use of the InstructionsBuilder type that works like any other result builder. This means you can provide some conditional logic in the instructions that are evaluated at the time the session is created.

Instruction {
  "You are a friendly neighbourhood health professional who loves to provide care for the people in your community."
}

Prompt

The prompt is the fun side. It is where you ask the LLM to do something. This can be direct user input, though more often than not you will want to be supplementing that with further guidance around the question asked. Where as Instructions are for the overall interactions with the session, Prompt is for a specific single interaction with the session. There is a builder to help with the construction of the prompt being PromptBuilder.

Prompt {
  "Answer the question from the user in a kind and empathetic way:"
  userQuestion
}

Tool

Tool is the part of session interactions that allows for the LLM to access data outside of that which it knows about by default. This can range from accessing data local to your app to a framework such as HealthKit or WeatherKit or even making an API request.

Implementing a Tool.

When a tool gets defined, you need to provide it with some scaffolded data so that the session knows what the functionality of the tool is. This is done by making the type conform to the Tool protocol. The name property gets used by the session to identify the tool and know what can be called in order to get responses.

struct GetHeartRate: Tool {
  let name = "getHealthData"
  let description = "Gets the users latest health data for the user"


  @Generable
  struct Arguments {
    @Guide(description: "The health data to be retrieved")
    let healthData: String
  }


  func call(arguments: Arguments) async throws -> [String] {
    // Fetch the health data from HealthKit

    return "Your heart rate was 75 beats per minute 1 minute ago"
  }
}

Arguments

The Arguments are what the session passes into the tool as a way of instructing the tool to provide specific content. It takes the form of a type that is @Generable. The specific arguments tend to be annotated with @Guide which is a macro that provides conformance to GenerationGuide and a way of allowing the session to interpret data in the prompt and provide it as an argument to the Tool.

Along with a text description of the argument, you can also provide restrictions around the values such as whether it matches one of the values in a String array using anyOf(_:) or is within a range of numerical values using range(_:).

You can also use a custom type as the argument by having the type annotated with @Generable. By using a type, typically a Struct, you can then provide structure in the output and give the LLM more ability to provide a response from the data.

Call

The call(arguments:) is the implementation of the Tool in that it is what sources the relevant data and constructs an appropriate output for the session to make use of.

Telling the session about the Tool

Defining the tool doesn’t mean that it will automatically get used. For that you need to tell the session that the tool is available. This is done as part of the initialiser to the session. In the init you create an instance of the Tool that can be used. In the instructions you tell the session that the function provided by the Tool, the tools name, is available for use.

let session = LanguageModelSession(
  model: .default,
  tools: [
    HealthTool(),
  ],
  instructions: Instructions {
    "You are a friendly health care professional."
    "If you need to look at the users health data you can use the getHealthData function."
  }
)

Further reading

There’s a stack of reading material out there not only around FoundationModels as an implementation but MCP as an interaction protocol.


TAGS

SHARE VIA

RELATED POSTS