By Johnny Chadda -

Indexing docs and websites using Github Actions

With Opper Indexes and GitHub Actions, you can automate the process of indexing your documentation, ensuring that your content is always up to date and easily retrievable. This blog post will guide you through setting up GitHub Actions to automatically index your documentation and website content into Opper Indexes.

What are Opper Indexes?

Opper Indexes are databases with advanced semantic retrieval capabilities that allow you to store and search for information based on semantic similarity. This means you can query your documentation not just with exact keywords but with conceptually similar queries, making it much easier for users to find the information they need.

For example, if a user searches for "how to configure authentication", an Opper Index can retrieve relevant sections about "setting up login credentials" or "user authorization setup", even if those exact keywords weren't used in the query.

Why Use GitHub Actions for Indexing?

GitHub Actions allows you to automate workflows directly from your repository. By integrating Opper with GitHub Actions, you can:

  1. Automatically index documentation when changes are pushed to your repository
  2. Ensure your indexes are always up-to-date with the latest content
  3. Integrate indexing into your existing CI/CD pipeline
  4. Eliminate manual work in keeping your documentation searchable

Setting Up the Opper Index GitHub Action

Opper provides two GitHub Actions for indexing content:

  1. opper-index-action: For indexing repository files (documentation, code, etc.)
  2. opper-web-indexer-action: For scraping and indexing website content

Let's look at how to set up both.

Indexing Repository Files

To index your repository files, you'll need to create a GitHub workflow file. Here's a basic example:

name: Index Docs in Opper
on:
  push:
    branches: [main]
jobs:
  index-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Index docs in Opper
        uses: opper-ai/opper-index-action@v1
        with:
          apikey: ${{ secrets.OPPER_API_KEY }}

This simple workflow will run whenever changes are pushed to the main branch, indexing all Markdown (.md), MDX (.mdx), and text (.txt) files in your repository by default.

Customizing Your Indexing

You can customize various aspects of the indexing process:

Specifying a Custom Folder

If you only want to index files in a specific directory, such as a docs folder:

name: Index Docs in Opper
on:
  push:
    branches: [main]
jobs:
  index-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Index docs in Opper
        uses: opper-ai/opper-index-action@v1
        with:
          folder: 'docs'
          apikey: ${{ secrets.OPPER_API_KEY }}
          index: 'my-custom-index'

Indexing Different File Types

You can specify which file types to index:

name: Index Docs in Opper
on:
  push:
    branches: [main]
jobs:
  index-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Index docs in Opper
        uses: opper-ai/opper-index-action@v1
        with:
          apikey: ${{ secrets.OPPER_API_KEY }}
          file_types: '.md .mdx .txt .html .js'

Indexing Website Content

If you want to index your website content rather than repository files, you can use the web indexer action:

name: Index Website in Opper
on:
  push:
    branches: [main]
jobs:
  index-website:
    runs-on: ubuntu-latest
    steps:
      - name: Index website in Opper
        uses: opper-ai/opper-web-indexer-action@v1
        with:
          apikey: ${{ secrets.OPPER_API_KEY }}
          url: 'https://example.com'
          index: 'my-website-index'

This action will scrape the specified website and add its contents to an Opper index.

Setting Up Your Opper API Key in GitHub

To use these actions, you'll need to store your Opper API key as a GitHub secret:

  1. Go to your repository on GitHub
  2. Click on "Settings" > "Secrets and variables" > "Actions"
  3. Click "New repository secret"
  4. Name the secret OPPER_API_KEY and paste your Opper API key as the value
  5. Click "Add secret"

This ensures your API key is securely stored and can be accessed by the GitHub Action without being exposed.

Utilizing Your Indexed Documentation

Once your documentation is indexed, you can use the Opper SDK to query it from your applications:

from opperai import Opper, Index

opper = Opper()

# Get the index that was created by the GitHub Action
index = opper.indexes.get("my-custom-index")

# Query the index
results = index.query(
    query="How do I authenticate users?",
    k=3  # Retrieve the 3 most relevant results
)

# Process the results
for result in results:
    print(f"Content: {result.content}")
    print(f"Source: {result.metadata.get('file_name')}")
    print(f"Page: {result.metadata.get('page')}")
    print("---")

Advanced Use Cases

Scheduled Indexing

You can also set up scheduled indexing for websites that change frequently:

name: Scheduled Website Indexing
on:
  schedule:
    - cron: '0 0 * * *'  # Run daily at midnight
jobs:
  index-website:
    runs-on: ubuntu-latest
    steps:
      - name: Index website in Opper
        uses: opper-ai/opper-web-indexer-action@v1
        with:
          apikey: ${{ secrets.OPPER_API_KEY }}
          url: 'https://example.com'
          index: 'my-website-index'

Conditional Indexing

You may want to only index documentation files when they change:

name: Conditional Doc Indexing
on:
  push:
    branches: [main]
    paths:
      - 'docs/**'
jobs:
  index-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Index docs in Opper
        uses: opper-ai/opper-index-action@v1
        with:
          folder: 'docs'
          apikey: ${{ secrets.OPPER_API_KEY }}

Building a Documentation Search Portal

With your documentation indexed, you can build a search portal to help users find information easily. Here's a simple example using Streamlit:

import streamlit as st
from opperai import Opper

# Initialize Opper client
opper = Opper()
index = opper.indexes.create("my-custom-index")

# Set up the UI
st.title("Documentation Search")
query = st.text_input("Enter your search query:")

if query:
    # Search the index
    results = index.query(query=query, k=5)
    
    # Display results
    for i, result in enumerate(results, 1):
        st.subheader(f"Result {i}")
        st.write(result.content)
        st.write(f"Source: {result.metadata.get('file_name')}")
        st.markdown("---")

Conclusion

Automating the indexing of your documentation with GitHub Actions and Opper provides a powerful way to keep your content searchable and accessible. By integrating this process into your CI/CD pipeline, you ensure that your documentation is always up to date and easily retrievable.