initial push

2026-03-15 15:03:56 +01:00
parent d48b9d5352
commit 65aaf9275e
146 changed files with 70245 additions and 100 deletions
--- a/README.md
+++ b/README.md
@@ -6,12 +6,6 @@ First, run the development server:

 ```bash
 npm run dev
-# or
-yarn dev
-# or
-pnpm dev
-# or
-bun dev
 ```

 Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
@@ -34,3 +28,108 @@ You can check out [the Next.js GitHub repository](https://github.com/vercel/next
 The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.

 Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
+
+## OCR Setup (Recommended)
+
+PDF analysis uses direct text extraction first. If text is insufficient (common in scanned PDFs), the API falls back to OCR with `ocrmypdf`.
+
+Install host dependencies (Ubuntu/Debian):
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ocrmypdf poppler-utils tesseract-ocr tesseract-ocr-spa tesseract-ocr-eng
+```
+
+Verify:
+
+```bash
+ocrmypdf --version
+```
+
+If OCR is not available, the API returns a specific error (`OCR_UNAVAILABLE`) with install guidance.
+
+## AI Extraction for Acta Constitutiva
+
+Onboarding now uses AI as the default extraction engine after PDF text analysis:
+
+1. Extract direct text from PDF.
+2. If text is insufficient, run OCR.
+3. Send extracted text to OpenAI to map fields and lookup dictionary.
+4. If AI fails, fallback extraction is used so onboarding is not blocked.
+
+Environment variables:
+
+```bash
+OPENAI_API_KEY=sk-...
+OPENAI_ACTA_MODEL=gpt-4.1-mini
+OPENAI_ACTA_TIMEOUT_MS=60000
+OPENAI_ACTA_MAX_CHARS=45000
+```
+
+## Local CLI Script (PDF -> OCR/text -> AI)
+
+Run:
+
+```bash
+npm run acta:analyze:ai -- ./path/to/acta.pdf
+```
+
+Optional output file:
+
+```bash
+npm run acta:analyze:ai -- ./path/to/acta.pdf --out ./result.json
+```
+
+## Licita Ya API Key Test
+
+Add these vars to `.env`:
+
+```bash
+LICITAYA_API_KEY=your-licitaya-api-key
+LICITAYA_BASE_URL=https://<licitaya-base-url>
+LICITAYA_TEST_ENDPOINT=/tender/search?items=10&page=1
+LICITAYA_ACCEPT=application/json
+LICITAYA_TIMEOUT_MS=20000
+```
+
+Run the connection test:
+
+```bash
+npm run licitaya:test
+```
+
+Override values on demand:
+
+```bash
+npm run licitaya:test -- --base-url https://www.licitaya.com.mx/api/v1 --endpoint /tender/search?items=10&page=1 --accept application/json
+```
+
+You can also pass a full URL in `--endpoint`:
+
+```bash
+npm run licitaya:test -- --endpoint https://<licitaya-base-url>/<country-endpoint>
+```
+
+Common Licita Ya lookups:
+
+```bash
+# Search tenders (keyword + filters)
+npm run licitaya:test -- --endpoint '/tender/search?keyword=computadora,monitor&state=NLE,XX&items=10&page=1&order=1'
+
+# Search by date (YYYYmmdd)
+npm run licitaya:test -- --endpoint '/tender/search?date=20260313&items=10&page=1'
+
+# Get one tender by ID
+npm run licitaya:test -- --endpoint '/tender/SCRZJ'
+```
+
+Country base URL (pick one only):
+
+- Mexico: `https://www.licitaya.com.mx/api/v1`
+- Argentina: `https://www.licitaya.com.ar/api/v1`
+
+Notes:
+
+- The script sends your key in header `X-API-KEY`.
+- It prints status code + response preview.
+- A non-2xx response exits with code `1` (useful for CI checks).