What is Llama 2? Meta’s large language model explained

What is Llama 2? Meta’s large language model explained

Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs), ranging in scale from 7B to 70B parameters, from the AI group at Meta, the parent company of Facebook. According to Meta AI, Llama 2 Chat LLMs are optimized for dialogue use cases and outperform open-source chat models on most benchmarks they tested. Based on Meta’s human evaluations for helpfulness and safety, the company says Llama 2 may be “a suitable substitute for closed source models.”

Llama 2, like the original Llama model, is based on the Google transformer architecture, with improvements. Llama’s improvements include RMSNorm pre-normalization, inspired by GPT-3; a SwiGLU activation function, inspired by Google’s PaLM; multi-query attention instead of multi-head attention; and rotary positional embeddings (RoPE), inspired by GPT Neo. Llama training used the AdamW optimizer. Llama 2’s primary differences from Llama are increased context length (4096 vs. 2048 tokens) and grouped-query attention (GQA) instead of multi-query attention (MQA) in the two larger models.

Llama 2’s training corpus includes a mix of data from publicly available sources, which Meta says does not include data from Meta’s products or services. There were two trillion tokens of training data.

Meta used its Research Super Cluster and some internal production clusters for pre-training, with Nvidia A100 GPUs. Pre-training time ranged from 184K GPU-hours for the 7B-parameter model to 1.7M GPU-hours for the 70B-parameter model.

Fine-tuning Llama 2 Chat took months and involved both supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). Meta used Ghost Attention (GAtt) to keep Llama 2 Chat from forgetting its system message (overall instruction) from turn to turn in a dialogue.

Is Llama 2 safe?

Some generative AI is notorious for making up (or hallucinating) answers, saying horrible things, and even recommending suicide. There are many situations where wrong answers can be dangerous, of course, and almost all LLMs come with boilerplate warnings. No matter how “safe” an AI is thought to be, you must always check its answers, since after all it is only a stochastic parrot.

Meta claims that Llama 2-chat is as safe or safer than other models, based on evaluation by human raters using ~2,000 adversarial prompts, as discussed in Meta’s Llama 2 paper. Note Meta’s caveat, however: There may be inherent bias of LLM evaluations due to limitations of the prompt set, subjectivity of the review guidelines, and subjectivity of individual raters. These safety evaluations are performed using content standards that are likely to be biased towards the Llama 2 Chat models.

Ethical considerations for Llama 2

In addition to the usual ethical and safety issues for LLMs, Llama 2 and Llama have an issue with one of the collections in their training corpus, the Books3 section of the Pile. In a class action lawsuit by Richard Kadrey, Sarah Silverman, and Christopher Golden, the plaintiffs allege that Meta has violated their copyrights by training on Books3, which includes their copyrighted books, and have asked for damages and restitution of profits. That lawsuit has not yet been litigated or settled.

At least one repository, the Eye, has recently responded to a DMCA takedown request from the Danish anti-piracy group Rights Alliance and removed Books3. Books3 is still available at other sites, although Rights Alliance is attempting to take them down as well. Ironically, Books3 was intended to democratize generative AI training, after OpenAI used private books datasets to train GPT-3.

Is Llama 2 open source?

In the introduction to my review of Llama 2 Chat and Code Llama, I described Llama 2 as “free almost-open-source.” Why “almost” open source? It’s because the Llama 2 license has a couple of restrictions. According to Stefano Mafulli of the OSI (Open Source Initiative),

Among other requirements, for a license to be Open Source, it may not discriminate against persons or groups or fields of endeavor (OSD points 5 and 6). Meta’s license for the LLaMa models and code does not meet this standard; specifically, it puts restrictions on commercial use for some users (paragraph 2) and also restricts the use of the model and software for certain purposes (the Acceptable Use Policy).

Paragraph 2 of the Llama 2 community license agreement says

2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.

That sounds like it has been designed to exclude AWS, Google Cloud, and Microsoft Azure. It follows the spirit of the Business Software License, originally developed by MariaDB. Most software developers don’t really care about this kind of restriction, but open-source advocates care very much, indeed.

The Llama Acceptable Use Policy says, at a high level, that you can’t use Llama to violate the law or others’ rights; engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals; intentionally deceive or mislead others; or fail to appropriately disclose to end users any known dangers of your AI system. They are saying that if you use Llama to develop weapons, create illegal drugs, create defamatory statements or images, or any of a long list of harms that have come out of other AI models, then you are violating your license.

I can sympathize with that attitude, and almost agree with it, but Mafulli is correct that it violates the Open Source Definition (OSD). I have to question, however, whether there’s a clear a priori line between, say, using generative AI to design drugs that turn out to be legal and ones that turn out to be illegal.

For example, suppose you’re designing a compound for pain relief using Llama, but post-market drug safety monitoring determines that it’s highly addictive and it is subsequently classified as a Schedule 1 substance, and is banned. How do you know during the design process that you will be violating the acceptable use policy? I think that it’s hard, if not impossible, to know in advance, and I doubt that the policy would hold up in court. Meanwhile, the OSI is trying to come up with a new definition of open source AI.

What is Code Llama?

Code Llama was trained by fine-tuning Llama 2 on code-specific datasets, specifically more of the same code that was used to train Llama 2 to begin with. Meta says that Code Llama can generate code, and natural language about code, from both code and natural language prompts, and can also be used for code completion and debugging. Code Llama supports many of the most popular languages being used today, including Python, C++, Java, PHP, TypeScript (JavaScript), C#, and Bash.

According to Meta AI:

  • Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts.
  • Code Llama is free for research and commercial use.
  • Code Llama is built on top of Llama 2 and is available in three models:
    • Code Llama, the foundational code model;
    • Code Llama – Python, specialized for Python;
    • and Code Llama – Instruct, which is fine-tuned for understanding natural language instructions.
  • In Meta’s own benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks.

There are three sizes of Code Llama with 7B, 13B, and 34B parameters. All the Code Llama models are trained on sequences of 16K tokens and show improvements on inputs with up to 100K tokens. The 7B and 13B base and instruct models have also been trained with fill-in-the-middle capabilities, which enables code completion.

The 7B model can be served from a single GPU. As you’d expect, the 34B model returns the best results, but the smaller models are faster and have lower latency, so they’d be more appropriate for use inside an editor doing code completion, which is why they are the models trained on fill-in-the-middle tasks.

Add a Comment