Using Hugging Face Open-Source Models with Langchain


In the realm of language processing tools, the choice of language model plays a pivotal role in determining the success of your applications. While OpenAI stands as a prominent name in the industry, we propose exploring a superior alternative for LangChain development: Hugging Face’s open-source LLMs.

In this blog post, we will dive deep into the reasons why Hugging Face’s LLMs outshine OpenAI, from the extensive model collection to the collaborative ecosystem. Discover how Hugging Face empowers developers and researchers to revolutionize Open source LLMs with langchain.

Why Hugging Face Models?

Here are some potential reasons to choose Hugging Face models:

  1. Vast Collection of Open Source Models: Hugging Face boasts an extensive collection of open-source language models that effectively rival OpenAI. You have the freedom to choose from a wide range of models, each tailored to specific language processing tasks. Whether you’re working on sentiment analysis, named entity recognition, or machine translation, Hugging Face’s LLMs provide scalable and accurate solutions. By turning to open-source models, you eliminate the unnecessary restrictions and costs associated with proprietary systems like OpenAI.
  2. Flexibility and Customization with Local Deployment: Sometimes the need arises to fine-tune language models or harness the power of high-performance GPUs for specific LangChain tasks. Hugging Face’s local deployment option using pipelines grants you the flexibility and customization required for such scenarios. With locally run models, you can fine-tune LLMs according to your specific project requirements or fully exploit the capabilities of advanced hardware. OpenAI’s restricted cloud-based approach simply cannot match the power and versatility of locally deployed Hugging Face models.
  3. Community Support and Resources: Hugging Face’s commitment to community support sets it apart from OpenAI. The platform offers a wealth of resources, including forums, chat groups, and a dedicated community of experts who are ready to assist you with any challenges you may encounter. The collaborative nature of Hugging Face’s community ensures that you have access to a vast pool of knowledge and expertise, making your journey with LLMs smoother and more rewarding.
  4. Growing Open Source Community: One of Hugging Face’s greatest strengths is its thriving open-source community. Supported by a large and passionate developer community, the platform continuously expands its offerings. Developers contribute new models, refine existing ones, and contribute valuable insights. This vibrant ecosystem ensures that Hugging Face’s LLMs remain at the forefront of language processing advancements. OpenAI, on the other hand, lacks the collaborative spirit and active involvement from the community, limiting its potential for growth.

How to use Hugging Face Models with LangChain?

Now to use these models we will explore two approaches to using Hugging Face’s language models: accessing them via an API and downloading and running them locally with Hugging Face pipelines.

Accessing Models via API Calls

Hugging Face’s API allows users to leverage models hosted on their server without the need for local installations. This approach offers the advantage of a wide selection of models to choose from, including stable LM from Stability AI and Dolly from Data Breaks. By making API calls, you can harness the power of large models like the extra-large Flan T5 model.

We’ll demonstrate how to access models using prompt templates, define a chain with a prompt template and a chosen model, and obtain responses from the models.

Running Models Locally with Hugging Face Pipelines

In certain cases, it may be necessary to run language models locally, such as for fine-tuning or utilizing the power of a high-performance GPU. Hugging Face’s pipelines come to the rescue by enabling local inference with downloaded models. We’ll guide you through the process of installing the required packages, including Transformers, Accelerate, and Bits and Bytes.

Then, we’ll demonstrate how to initialize and use a local language model with Hugging Face pipelines, focusing on text-to-text generation models and text generation models (decoder-only models).

Practical Implementation

Since running these notebooks would require GPU. We recommend either using a free one from google colab, or creating a GPU instance VM.

Now let us dive into the implementation:

!pip -q install langchain huggingface_hub transformers sentence_transformers accelerate bitsandbytes

Now we need an HF token to access those models via API calls. For that, you can get an API token from (

import os

Let us first start by testing the models with HF API

T5 Flan Large

Using Hugging Face API

Let us first get the model details to initialize it. For that, we can visit the model page on Hugging Face for the same. (

from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

We define a prompt template that will be used to answer the questions with this model

llm_chain = LLMChain(prompt=prompt, 

Next, we initialize a chain with the model and LLM parameters.

now let us test it with sample queries

question = "Which planet has largest moon in solar system?"



The moons of Jupiter and Saturn are the largest in the solar system. Jupiter has the largest moon in the solar system. So the answer is Jupiter.
question = "Translate to German:  Integrating Open source LLM's with Langchain"



Integrieren Sie Open Source LLM's mit Langchain.

Local Inference of Models

Import the necessary libraries and modules for working with Hugging Face’s language models (LLMs). We import the HuggingFacePipeline class from the langchain.llms module, which allows us to create a pipeline for utilizing LLMs. We also import torch for any necessary tensor operations and the AutoTokenizerAutoModelForCausalLMpipeline, and AutoModelForSeq2SeqLM classes from the transformers library. These classes provide the tools for tokenizing text, loading LLM models, and creating pipelines for text generation.

from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

We define the model_id variable, which specifies the specific LLM model we want to use. Here, we use the “google/flan-t5-small” model as an example. We then create an instance of the AutoTokenizer class and load the tokenizer for the specified model using the from_pretrained method. Next, we create an instance of the AutoModelForSeq2SeqLM class and load the LLM model using the from_pretrained method. We pass load_in_8bit=True to optimize memory usage and device_map='auto' , automatically select the appropriate device for running the model.

When load_in_8bit is set to True, it enables the model to be loaded in an 8-bit quantized format. Quantization is a technique used to reduce the memory footprint and improve the inference speed of deep learning models. By loading the model in an 8-bit quantized format, it consumes less memory compared to loading the model in full precision, which can be beneficial when working with limited computational resources or deploying the model on devices with memory constraints.

model_id = 'google/flan-t5-small'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto')

We create a pipeline for text generation using the pipeline function from the transformers library. the task is specified as “text2text-generation” to indicate that we want to generate text based on input text. We pass the previously defined model and tokenizer objects to the pipeline, as well as set the maximum length of the generated text to 128 characters.

pipeline = pipeline(

Now that we have created a pipeline we can use it to create an LLM and use it with langchain

local_llm = HuggingFacePipeline(pipeline=pipeline)

Let us now test it using a chain

llm_chain = LLMChain(prompt=prompt, 

question = "what is capital of england"



London is the capital of England. London is the capital of England. London is the capital of England. So, the answer is London.

Now let us try to use decoder architecture with inference API

Using Falcon-7b-Instruct model

llm_chain = LLMChain(prompt=prompt, 

Let us test it with queries

question = "what is freelancing?"



1. Freelancing is when you work for a client on a project-by-Project basis


To sum up, Hugging Face’s open-source LLMs present a highly attractive option for LangChain development, outshining OpenAI across several key aspects. Hugging Face stands out due to its extensive collection of language models, offering developers a diverse range of options that cater to specific language processing tasks.

This expansive selection eliminates the constraints and expenses typically associated with proprietary systems like OpenAI, providing users with scalable and accurate solutions for tasks such as sentiment analysis, named entity recognition, and machine translation.

An additional strength of Hugging Face lies in its thriving open-source community. Developers actively contribute new models, refine existing ones, and share valuable insights, ensuring that Hugging Face’s LLMs remain at the forefront of language processing advancements. OpenAI, on the other hand, lacks this collaborative spirit and active community involvement, which limits its potential for growth.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like