The rchroma package provides an R interface to ChromaDB, a vector database for storing and querying embeddings. This vignette demonstrates the basic usage of the package.
You can install the development version of rchroma from GitHub:
Before using rchroma, you need to have a running ChromaDB instance. The easiest way to get started is using Docker:
This will start a ChromaDB server on
http://localhost:8000
.
For other installation methods and configuration options, please refer to the ChromaDB documentation.
First, we need to establish a connection to ChromaDB:
Collections are the main way to organize your data in ChromaDB:
Documents are the basic unit of data in ChromaDB. Each document consists of text content and its associated embedding:
# Add documents with embeddings
docs <- c(
"apple fruit",
"banana fruit",
"carrot vegetable"
)
embeddings <- list(
c(1.0, 0.0, 0.0), # apple
c(0.8, 0.2, 0.0), # banana (similar to apple)
c(0.0, 0.0, 1.0) # carrot (different)
)
# Add documents to the collection
add_documents(
client,
"my_collection",
documents = docs,
ids = c("doc1", "doc2", "doc3"),
embeddings = embeddings
)
# Query similar documents using embeddings
results <- query(
client,
"my_collection",
query_embeddings = list(c(1.0, 0.0, 0.0)), # should match apple best
n_results = 2
)
You can update or delete documents as needed: