Adding more APIs

This page outlines the basic process for implementing another API into Sasquatch-backpack

Set up locally

First, you’re going to want to get a function that can call the api locally.

To do this, clone or fork the Sasquatch-backpack repo. You’ll also probably want to run up a virtual environment, then source it.

git clone https://github.com/lsst-sqre/sasquatch-backpack.git
python3.12 -m .venv venv
source .venv/bin/activate

Finally, run make init to finish setup.

Next, add your chosen API wrapper to the requirements/main.in file and run make update

Call the API

Navigate to the src/sasquatchbackpack/scripts folder and create a python file named appropriately for the intended API. Inside this file, create a funciton that calls your desired API and returns the desired data. It is reccomended to parameterize all of the API call’s arguments and include default values. For example, if your API call takes location parameters then you might add a coordinates parameter, setting its default to (-30.22573200864174, -70.73932987127506), aka: the coordinates of Cerro Pachon. Make a similar function for each different call you want to make to this API.

Create the CLI Commands

Sasquatch-backpack uses click for its CLI. To add your API calls to the CLI, create a new python file in src/sasquatchbackpack/commands and add your CLI functions inside. There should be one function for every distinct API call you want to make.

Next, populate these functions with calls to your API, printing the results to console via click.echo (or click.secho if you want to get funky with the colors :P) To do so import the script in src/sasquatchbackpack/scripts that you made earlier, then feed in the relevant paremeters, and echo the results. This will be what is logged in argoCD later on for debugging, so make sure to make echoes detailed in nature for easier debugging down the line.

Implement commands with Click

Tag each function as @click.command() and add each parameter as a @click.option(). Next, add the command defaults as constants, refering to these constants in each relevant click option. Do the same with any parameter validation functions you want to add, using click callbacks to trigger them. Parameter validation functions should raise click.BadParameter() on an invalid input, and return the initial value on valid input. Also, remember to write a help statement for each parameter.

Once complete, import your script to src/sasquatchbackpack/cli.py to access everything. You’ll want to first import your commands python script at the top of the file like so: from sasquatchbackpack.commands import yourfilenamehere then add main.add_command(yourfilenamehere.yourfunctionnamehere) at the bottom of the file. You’ll want to call main.add_command() in this way for each function you’ve added, so that the CLI can access them.

Test your API Call

At this point (assuming you’ve still got your venv active) you can run the following in your terminal:

pip install -e .
sasquatchbackpack yourfunctionnamehere

You should be able to see the results echoed to console as you wrote above. Use this as an opportunity to debug your API calls so they work as intended before you start sending data to sasquatch.

Add a schema

Sasquatch (the wearer of the proverbial backpack), uses Avro schemas for data serialization. Navigate to src/sasquatchbackpack/schemas and create a folder for your API. Inside, create a .avsc file for each different API call you want to make. The contents of the file depends on the data in question, so make sure to look at what you’re getting from your API call and use the doccumentation to create an accurate representation of that data that you’ll be sending.

Add configs

Going back to your src/sasquatchbackpack/scripts file, you’ll want to add a dataclass for each different API call you want to make. Make sure to include all of the relevant parameters that you’ll need to make that call, as well as two paths to the schema file and a topic name.

@dataclass
class MyConfig:
    """I'm a docstring!"""
    # Parameters up here
    schema_file: str = "src/sasquatchbackpack/schemas/yourfoldernamehere/yourschemanamehere.avsc"
    cron_schema: str = (
        "/opt/venv/lib/python3.12/site-packages/"
        "sasquatchbackpack/schemas/yourfoldernamehere/yourschemanamehere.avsc"
    )
    topic_name: str = "yourfunctionnamehere",

The first path should be the local path to the schema and the second should be the path to the schema when running in a cron job. The topic name should be the name of your command.

Add source

Next, you’ll make a source class, inhereting from sasquatchbackpack.sasquatch.DataSource. This will require two methods: load_schema() and get_records().

load_schema() can be copied 1 to 1 from the following:

def load_schema(self) -> str:
    """Load the relevant schema."""
    try:
        with Path(self.config.schema_file).open("r") as file:
            return file.read()
    except FileNotFoundError:
        with Path(self.config.cron_schema).open("r") as file:
            return file.read()

get_records() should make an API call, then return the encoded results in an array. This should be surrounded with the following try:

try:
    # API Call
    # return results
except ConnectionError as ce:
    raise ConnectionError(
        f"A connection error occurred while fetching records: {ce}"
    ) from ce

The class’s __init__ should read in the config you made in the pervious step. You’ll also want to call super().__init__(config.topic_name) inside. Otherwise, feel free to initialize your parameters as you will.

Update CLI

You’ll want to add a dry run option to your CLI command. To do so, add the following to your CLI command

@click.option(
    "--dry-run",
    is_flag=True,
    default=False,
    help="Perform a trial run with no data being sent to Kafka.",
)

Remember to also add dry_run: bool,  # noqa: FBT001 as a parameter. You can add the funciton of the dry run flag after the body of the extant function with the following:

if dry_run:
    click.echo("Dry run mode: No data will be sent to Kafka.")
    return

click.echo("Sending data...")

To actually send the data, simply import and instantiate the config and source objects you made in your src/sasquatchbackpack/scripts file. Then, import sasquatchbackpack.sasquatch and add the following:

backpack_dispatcher = sasquatch.BackpackDispatcher(
    source, sasquatch.DispatcherConfig()
)
result = backpack_dispatcher.post()

if "Error" in result:
    click.secho(result, fg="red")
else:
    click.secho("Data successfully sent!", fg="green")

Note that the source object is simply the source you just instantiated.

Test it!

Running the CLI command should now result in the data being posted to sasquatch! Specifically you can search kafdrop on data-int for the lsst.backpack topic, and your data should show up there.