initial push

This commit is contained in:
Marcelo Dares
2026-03-15 15:03:56 +01:00
parent d48b9d5352
commit 65aaf9275e
146 changed files with 70245 additions and 100 deletions

111
README.md
View File

@@ -6,12 +6,6 @@ First, run the development server:
```bash
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev
```
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
@@ -34,3 +28,108 @@ You can check out [the Next.js GitHub repository](https://github.com/vercel/next
The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
## OCR Setup (Recommended)
PDF analysis uses direct text extraction first. If text is insufficient (common in scanned PDFs), the API falls back to OCR with `ocrmypdf`.
Install host dependencies (Ubuntu/Debian):
```bash
sudo apt-get update
sudo apt-get install -y ocrmypdf poppler-utils tesseract-ocr tesseract-ocr-spa tesseract-ocr-eng
```
Verify:
```bash
ocrmypdf --version
```
If OCR is not available, the API returns a specific error (`OCR_UNAVAILABLE`) with install guidance.
## AI Extraction for Acta Constitutiva
Onboarding now uses AI as the default extraction engine after PDF text analysis:
1. Extract direct text from PDF.
2. If text is insufficient, run OCR.
3. Send extracted text to OpenAI to map fields and lookup dictionary.
4. If AI fails, fallback extraction is used so onboarding is not blocked.
Environment variables:
```bash
OPENAI_API_KEY=sk-...
OPENAI_ACTA_MODEL=gpt-4.1-mini
OPENAI_ACTA_TIMEOUT_MS=60000
OPENAI_ACTA_MAX_CHARS=45000
```
## Local CLI Script (PDF -> OCR/text -> AI)
Run:
```bash
npm run acta:analyze:ai -- ./path/to/acta.pdf
```
Optional output file:
```bash
npm run acta:analyze:ai -- ./path/to/acta.pdf --out ./result.json
```
## Licita Ya API Key Test
Add these vars to `.env`:
```bash
LICITAYA_API_KEY=your-licitaya-api-key
LICITAYA_BASE_URL=https://<licitaya-base-url>
LICITAYA_TEST_ENDPOINT=/tender/search?items=10&page=1
LICITAYA_ACCEPT=application/json
LICITAYA_TIMEOUT_MS=20000
```
Run the connection test:
```bash
npm run licitaya:test
```
Override values on demand:
```bash
npm run licitaya:test -- --base-url https://www.licitaya.com.mx/api/v1 --endpoint /tender/search?items=10&page=1 --accept application/json
```
You can also pass a full URL in `--endpoint`:
```bash
npm run licitaya:test -- --endpoint https://<licitaya-base-url>/<country-endpoint>
```
Common Licita Ya lookups:
```bash
# Search tenders (keyword + filters)
npm run licitaya:test -- --endpoint '/tender/search?keyword=computadora,monitor&state=NLE,XX&items=10&page=1&order=1'
# Search by date (YYYYmmdd)
npm run licitaya:test -- --endpoint '/tender/search?date=20260313&items=10&page=1'
# Get one tender by ID
npm run licitaya:test -- --endpoint '/tender/SCRZJ'
```
Country base URL (pick one only):
- Mexico: `https://www.licitaya.com.mx/api/v1`
- Argentina: `https://www.licitaya.com.ar/api/v1`
Notes:
- The script sends your key in header `X-API-KEY`.
- It prints status code + response preview.
- A non-2xx response exits with code `1` (useful for CI checks).