Files
Kontia/README.md
Marcelo Dares 65aaf9275e initial push
2026-03-15 15:03:56 +01:00

136 lines
3.8 KiB
Markdown

This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).
## Getting Started
First, run the development server:
```bash
npm run dev
```
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file.
This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel.
## Learn More
To learn more about Next.js, take a look at the following resources:
- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!
## Deploy on Vercel
The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
## OCR Setup (Recommended)
PDF analysis uses direct text extraction first. If text is insufficient (common in scanned PDFs), the API falls back to OCR with `ocrmypdf`.
Install host dependencies (Ubuntu/Debian):
```bash
sudo apt-get update
sudo apt-get install -y ocrmypdf poppler-utils tesseract-ocr tesseract-ocr-spa tesseract-ocr-eng
```
Verify:
```bash
ocrmypdf --version
```
If OCR is not available, the API returns a specific error (`OCR_UNAVAILABLE`) with install guidance.
## AI Extraction for Acta Constitutiva
Onboarding now uses AI as the default extraction engine after PDF text analysis:
1. Extract direct text from PDF.
2. If text is insufficient, run OCR.
3. Send extracted text to OpenAI to map fields and lookup dictionary.
4. If AI fails, fallback extraction is used so onboarding is not blocked.
Environment variables:
```bash
OPENAI_API_KEY=sk-...
OPENAI_ACTA_MODEL=gpt-4.1-mini
OPENAI_ACTA_TIMEOUT_MS=60000
OPENAI_ACTA_MAX_CHARS=45000
```
## Local CLI Script (PDF -> OCR/text -> AI)
Run:
```bash
npm run acta:analyze:ai -- ./path/to/acta.pdf
```
Optional output file:
```bash
npm run acta:analyze:ai -- ./path/to/acta.pdf --out ./result.json
```
## Licita Ya API Key Test
Add these vars to `.env`:
```bash
LICITAYA_API_KEY=your-licitaya-api-key
LICITAYA_BASE_URL=https://<licitaya-base-url>
LICITAYA_TEST_ENDPOINT=/tender/search?items=10&page=1
LICITAYA_ACCEPT=application/json
LICITAYA_TIMEOUT_MS=20000
```
Run the connection test:
```bash
npm run licitaya:test
```
Override values on demand:
```bash
npm run licitaya:test -- --base-url https://www.licitaya.com.mx/api/v1 --endpoint /tender/search?items=10&page=1 --accept application/json
```
You can also pass a full URL in `--endpoint`:
```bash
npm run licitaya:test -- --endpoint https://<licitaya-base-url>/<country-endpoint>
```
Common Licita Ya lookups:
```bash
# Search tenders (keyword + filters)
npm run licitaya:test -- --endpoint '/tender/search?keyword=computadora,monitor&state=NLE,XX&items=10&page=1&order=1'
# Search by date (YYYYmmdd)
npm run licitaya:test -- --endpoint '/tender/search?date=20260313&items=10&page=1'
# Get one tender by ID
npm run licitaya:test -- --endpoint '/tender/SCRZJ'
```
Country base URL (pick one only):
- Mexico: `https://www.licitaya.com.mx/api/v1`
- Argentina: `https://www.licitaya.com.ar/api/v1`
Notes:
- The script sends your key in header `X-API-KEY`.
- It prints status code + response preview.
- A non-2xx response exits with code `1` (useful for CI checks).