Batch Extract Skills

post
Batch Extract Skills

https://apigateway.boost.rs/skillfinder/batch
This endpoint allows you to submit a batch of documents for automatic skill extraction.
Request
Response
Request
Headers
Authentication
required
string
Authorisation token received from oAuth API. Bearer :token
Body Parameters
name
required
string
The name of the document batch. The name is used for internal reference only (it's not used for content analysis).
documents
required
array
An array of document objects to be processed. Example of document format: { "title": String / Required / Max Length: 100 "content": String / Required / Max Length: 5000 "language_code": String / Required / in: en, fr "external_id": String / Optional / Max Length: 100 "provider": String / Optional / Max Length: 100 }
Response
200: OK
The batch has been created and documents have been queued for processing. Current batch status and information is returned.
{
"success": true,
"message": "",
"result": {
"id": 1,
"name": "Test batch",
"total_documents": 4,
"processed_documents": 1,
"pending_documents": 3
}
}

Batch & document limits

The limits are as follows:

  • max 5,000 characters per content field of a document

  • max 1,000,000 characters per batch (the sum of all content fields)

  • max 2,000 documents per batch

These batch will be rejected if at least one of the limits is exceeded.

Examples of batch values 1. Batch of 200 documents, 5,000 characters per document. 2. Batch 1,000 documents at 1,000 characters per document. 3. Batch of 2,000 documents at 500 characters per document.

Request body (sample)

{
"name": "",
"documents": [
{
"title": "internal name 1",
"content": "",
"language_code": "en",
"provider": "",
"external_id": ""
},
{
"title": "doc 2",
"content": "",
"language_code": "fr",
"provider": "",
"external_id": ""
}
]
}

Response fields

Field name

Description

id

Batch ID

name

The name given to the batch.

status

Batch status.

'active' - all documents have been processed.

'queued' - documents will start processing based on arrival.

'processing' - some documents have been processed and some documents are being processed right now.

total_documents

Total number of documents request in the batch.

processed_documents

Number of documents for which the processing is complete.

pending_documents

Number of documents which are queued.

‚Äč