v0.3.44 | Pluggable cryptographic provider — FIPS 140-3 compliance for
pdf_oxide::crypto::CryptoProvidertrait — new abstraction that decouples PDF encryption and signature paths from any one cryptography crate. Two providers ship out of the box:RustCryptoProvider(default): pure-Rust stack as before (sha2,aes,rsa,p256,p384,getrandom,md-5,sha1). Permits every algorithm PDF specs reference, including the legacy MD5+RC4 path required by ISO 32000-1 R≤4 documents.AwsLcProvider(opt-in via--features fips): backed byaws-lc-rs, FIPS 140-3 validated since 2024. Refuses MD5 / SHA-1-for-signing / RC4 withError::AlgorithmNotPermittedand a clear remediation message.
- Single source of randomness.
src/encryption/algorithms.rs's formerSHA-256(uuid_v4 || timestamp_ns || …)cascade is replaced withcrypto::active().random_bytes()— under the default provider this isgetrandom::fill()(OS entropy pool); under FIPS it'saws_lc_rs::rand::SystemRandom. Cryptographically suitable for AES-256 file keys and salts; auditable. - Closes #236.
Three sub-traits compose into CryptoProvider:
Hasher— incremental hashing (update/finalize).SymmetricCipher— AES-128/256-CBC (PKCS#7 + no-padding) and RC4.SignatureVerifier— RSA-PKCS#1-v1.5, RSA-PSS, ECDSA P-256/P-384.
Plus an opaque Signer handle so HSM / PKCS#11 / Cloud KMS backends can plug in via SigningKeyMaterial (which is #[non_exhaustive] — future variants for HSM slots etc. are not breaking changes).
The is_legacy_allowed() policy bit lets each provider declare whether MD5 / SHA-1-sign / RC4 are permitted. PDF Standard Security R≤4 documents are gated at EncryptionHandler::new: under a FIPS provider they fail with a remediation message ("re-encrypt at R=6 or build pdf_oxide without the 'fips' feature so the default 'rust-crypto' provider stays active") rather than panic deep inside the cipher path.
use std::sync::Arc;
use pdf_oxide::crypto::{set_provider, AwsLcProvider};
set_provider(Arc::new(AwsLcProvider::new()))?;
let doc = pdf_oxide::PdfDocument::open("encrypted-r6.pdf")?;
See docs/CRYPTO_PROVIDERS.md for the algorithm coverage matrix, custom-provider walkthrough (sovereign-jurisdiction algorithms, HSMs), and the legacy-PDF policy table.
- New
fipsjob in.github/workflows/ci.ymlbuilds with--features fips, runs the 11-test AwsLcProvider suite including across_provider_aes_compatcheck that asserts the FIPS and rust-crypto AES paths produce byte-identical output, and enforces clippy-D warningsunder the FIPS feature.
-
New
.github/workflows/release-fips.ymlworkflow (manually triggered) builds and publishes parallel FIPS distributions on every package index, all from the same Rust source compiled with--features fipsso each binary contains only AWS-LC's FIPS-validated module:Ecosystem Package Install PyPI pdf_oxide_fipspip install pdf_oxide_fips==0.3.44npm pdf-oxide-fipsnpm install pdf-oxide-fips@0.3.44NuGet PdfOxide.Fipsdotnet add package PdfOxide.Fips --version 0.3.44Go github.com/yfedoseev/pdf_oxide/go-fipsgo get github.com/yfedoseev/pdf_oxide/go-fips@v0.3.44Platform matrix in v0.3.44 (every binding × every platform):
Platform Python npm NuGet Go Linux x86_64 ✅ ✅ ✅ ✅ Linux aarch64 ✅ ✅ ✅ ✅ macOS x86_64 ✅ ✅ ✅ ✅ macOS arm64 ✅ ✅ ✅ ✅ Windows x86_64 ✅ ✅ ✅ ✅ All distributions move in lockstep with the regular release — FIPS and default variants of the same release tag are byte-equal in their non-crypto code paths. Per-platform smoke tests in the workflow confirm the FIPS provider is reachable AND
crypto_use_fips()(or equivalent) flips the active provider as expected — catches API mismatches before publishing.Why
pdf_oxide_fips(underscore) for Python: PyPI normalizes hyphens / underscores to the same canonical form per PEP 503 (pip install pdf_oxide_fipsandpip install pdf-oxide-fipsresolve to the same package). Using underscore inpyproject.tomlmakes the wheel filename and theimport pdf_oxidepath identical to the default distribution — only the package name differs.Why parallel distributions instead of
pip install pdf_oxide[fips]: Python extras (PEP 508) can add Python dependencies but cannot swap the compiled.sobaked inside a wheel. The industry pattern (cryptography, pyOpenSSL) ships separate FIPS distributions; we follow suit.Why a
go-fipssubmodule path: Go modules are import-path-bound, so users pick atgo gettime:go get github.com/yfedoseev/pdf_oxide/go # default go get github.com/yfedoseev/pdf_oxide/go-fips # FIPSBoth submodules re-export the same Go API; only the linked native static lib differs.
- Restore
manylinux_2_28glibc floor for Python wheels. 0.3.42 and 0.3.43 published onlymanylinux_2_35Linux glibc wheels because the release workflow ranmaturin builddirectly onubuntu-latest(Ubuntu 24.04, glibc 2.39), letting the runner's glibc set the wheel tag. That excluded Amazon Linux 2023 / AWS Lambda Python (glibc 2.34), RHEL 8, Ubuntu 20.04 and Debian 11 — pip rejected the wheel and fell back to a source build that OOM-killedrustup-initinside the Lambda build container. Reported by @potatochipcoconut on PR #463. Bothrelease.yml(default wheels) andrelease-fips.yml(pdf_oxide_fipswheels) now build the Linux glibc wheels viaPyO3/maturin-actioninside themanylinux_2_28container, and a CI guard step fails the job if amanylinux_2_28wheel is not produced for either Linux target — preventing this regression from recurring. The 0.3.21 baseline (originally added in #284) is restored.
Extraction of page ranges from large PDFs is now bound by serialisation work instead of redundant document rebuilds and tree walks. Closes #474, reported by community contributor @potatochipcoconut, whose careful root-cause writeup (chunk-by-chunk timings, comparison against PyMuPDF's doc.select(), and a profiling-grade reproduction case from an AWS Lambda IDP pipeline) made this fix possible.
Measured on the public 1112-page / 38 MB Artificial Intelligence — A Modern Approach corpus (pdfs_slow2/) on an idle laptop:
| Workload | 0.3.43 | 0.3.44 | Speedup |
|---|---|---|---|
extract_pages_to_bytes(0..300) |
7301 ms / 36 MB out | 382 ms / 12 MB out | 19× + 3× smaller |
extract_pages_to_bytes(0..50) |
7983 ms / 36 MB out | 155 ms / 4 MB out | 51× + 9× smaller |
| Sequential 23 × 50-page chunks | ~3 min | 1542 ms total | ~120× |
Extrapolating to the reporter's 12k-page / 50 MB document chunked into five 3000-page slices: an AWS Lambda invocation that previously timed out at 900 s after two chunks now finishes the entire five-chunk batch in roughly 30 s.
All in src/editor/document_editor.rs + src/document.rs:
- Triple full-document rewrite.
extract_pages_to_bytesserialised the whole doc, re-parsed the bytes, removed pages one at a time, and serialised again — three full passes when one would do. Replaced with a non-mutating in-place trimmedpage_order, restored after the save (even onErr). - Garbage collector walked the original page tree. The trimmed
/Pagesdict was rebuilt locally insidewrite_full_to_writer, butcollect_reachable_ids()started its BFS from the unmodified catalog and pulled in every dropped page's resources — so the output never shrank no matter how few pages were kept. Fixed by staging the trimmed/Pagesdict inmodified_objectsbefore the save; the GC walker already prefers staged dicts over source. get_page_ref(i)in a 0..n loop is O(n²). Each call walks the page tree from the root and stops at the i-th leaf, so collecting all n leaf refs walks 1 + 2 + … + n nodes. New helperPdfDocument::all_page_refs()does it in one DFS. The flat-tree common case (root/Pageswhose/CountmatchesKids.len()) reads the ref array straight out of/Kidswithout touching individual leaves at all.
The same n² loop pattern was lurking in four other call sites on the reporter's hot path (their pipeline does PDF/A validate + convert before the chunked extract). All five collapsed to a single all_page_refs() call:
src/outline.rs—find_page_index(O(n²) per outline entry → O(n³) on documents with bookmarks).src/editor/document_editor.rsline ~4275 — page-ref → index map for partial form-flatten.src/editor/document_editor.rsline ~4505 — same map forget_form_fields().src/compliance/validators.rs—validate_fonts(doc.validate_pdf_a('2b')).src/compliance/converter.rs— per-page/AAstrip (doc.convert_to_pdfa('2b')).
Two additions, both directly requested by @potatochipcoconut in #474; both available in Rust and Python (the other bindings can be added on demand):
# Batch extraction — same single-call efficiency, ergonomic for
# the chunked-for-OCR / chunked-for-S3 pattern.
chunks = doc.extract_page_ranges_to_bytes(
[(0, 3000), (3000, 6000), (6000, 9000), (9000, 12000)]
)
# In-place selection — equivalent to PyMuPDF's doc.select(...).
# After this call, the document holds only the listed pages,
# in the order given. doc.save() / doc.save_to_bytes() then
# emit only those pages with garbage-collected resources.
doc.select_pages([1, 4, 7, 99])
PDFs whose /Pages root publishes shared /Resources used by all leaf pages (typical of high-resolution book scans, atypical of office documents with subset fonts) still produce full-size chunk output: GC correctly preserves resources reachable from kept pages, and a single shared resource pool stays reachable as long as any kept page references it. The principled fix is per-page resource sub-setting — parsing each kept page's content stream to determine which fonts / XObjects are actually used and emitting a minimal /Resources for that page. That is a feature, not a bug fix, and is deferred from this release. The wall-clock speedup (12–54×) holds regardless.
- 5050 lib tests pass under
--features python,fips(5039 default + 11 FIPS-only). - 119 encryption tests still pass byte-equal post-rewire to the trait.
- 69 signatures tests still pass byte-equal post-rewire.
- Hash vectors validated against NIST FIPS 180-4 for SHA-256/384/512 and RFC 1321 / 3174 for MD5 / SHA-1.
- New regression tests cover the issue #474 workflow:
test_extract_pages_chunked_sequential(4 sequential chunks on the sameDocumentEditor, source observably unchanged between calls),test_extract_pages_non_sequential(out-of-order indices[3, 0, 4]),test_extract_page_ranges_to_bytes_batch,test_select_pages_in_place, andtest_select_pages_out_of_range.
AwsLcProviderRSA-PKCS#1 v1.5 verify-from-digest (#475) —AwsLcProvider::verify_rsa_pkcs1v15is currently a stub; PDF/CMS signatures using RSA-PKCS#1 v1.5 returnSignerVerify::Unknowninstead of verifying under FIPS. Blocked onaws-lc-rsexposing a stableRSA_PKCS1_PRIM_VERIFYAPI.RustCryptoProvider(default) is not affected.AwsLcProvidersigning wiring — signing calls are currently routed toRustCryptoProvider. Full AWS-LC signing integration lands in v0.3.45.- musllinux Python wheels for the FIPS variant — FIPS musllinux wheels (Alpine / musl libc) require a musl-targeted
aws-lc-fips-sysbuild; work in progress.
Rust (crates.io)
cargo add pdf_oxide
Python (PyPI)
pip install pdf_oxide
JavaScript/WASM (npm)
npm install pdf-oxide-wasm
CLI (Homebrew)
brew install yfedoseev/tap/pdf-oxide
CLI (Scoop — Windows)
scoop bucket add pdf-oxide https://github.com/yfedoseev/scoop-pdf-oxide
scoop install pdf-oxide
CLI (Shell installer)
curl -fsSL https://raw.githubusercontent.com/yfedoseev/pdf_oxide/main/install.sh | sh
CLI (cargo-binstall)
cargo binstall pdf_oxide_cli
MCP Server (for AI assistants)
cargo install pdf_oxide_mcp
Pre-built Binaries Download archives for Linux, macOS, and Windows from the assets below. Each archive includes both pdf-oxide (CLI) and pdf-oxide-mcp (MCP server).
| Platform | Architecture | Archive |
|---|---|---|
| Linux | x86_64 (glibc) | pdf_oxide-linux-x86_64-*.tar.gz |
| Linux | x86_64 (musl) | pdf_oxide-linux-x86_64-musl-*.tar.gz |
| Linux | ARM64 | pdf_oxide-linux-aarch64-*.tar.gz |
| macOS | x86_64 (Intel) | pdf_oxide-macos-x86_64-*.tar.gz |
| macOS | ARM64 (Apple Silicon) | pdf_oxide-macos-aarch64-*.tar.gz |
| Windows | x86_64 | pdf_oxide-windows-x86_64-*.zip |
See CHANGELOG.md for full details.
v4.9.4
what change?
- add
let tx:RBatisTxExecutorGuard = tx.auto_commit();to support auto commit/rollback for example:
async fn transaction(tx: RBatisTxExecutor) -> Result<(), Error> {
let tx = tx.auto_commit(); // defer commit or rollback
log::info!("transaction [{}] start", tx.tx_id());
let _ = Activity::insert(
&tx,
&Activity {
id: Some("3".into()),
name: Some("3".into()),
pc_link: Some("3".into()),
h5_link: Some("3".into()),
pc_banner_img: None,
h5_banner_img: None,
sort: None,
status: Some(3),
remark: Some("3".into()),
create_time: Some(DateTime::now()),
version: Some(1),
delete_flag: Some(1),
},
)
.await;
//if not commit or rollback,tx.done = false,
//tx.commit().await?;
Ok(())
}v0.39.3 - Fix panic when parse malformed DTD
- #950: Fix subtraction with overflow when parse malformed DTD in some cases. Note, that currently we do not check the validity of DTD, so the returned
Event::DocTypemay contain the malformed DTD.
Full Changelog: https://github.com/tafia/quick-xml/compare/v0.39.2...v0.39.3
0.34.2: Text layout and selection fixes
egui is an easy-to-use immediate mode GUI for Rust that runs on both web and native.
Try it now: https://www.egui.rs/
egui development is sponsored by Rerun, a startup building an SDK for visualizing streams of multimodal data.
- Fix wrong color of last glyph of selected text #8075 by @emilk
- Fix text selection of centered and right-aligned text #8076 by @emilk
- Fix
Context::is_pointer_over_eguiandContext::egui_wants_pointer_input#8081 by @emilk - Fix centered & right aligned
TextEdit#8082 by @lucasmerlin
- Optimize text selection performance for large documents #7917 by @rustbasic
Tokio v1.52.2
This release reverts the LIFO slot stealing change introduced in 1.51.0 (#7431), due to its performance impact. (#8100)
Tokio v1.51.1
This release reverts the LIFO slot stealing change introduced in 1.51.0 (#7431), due to its performance impact. (#8100)
v0.3.43 | Cross-binding parity, WASI build target, and a basket of issue fixes.
render_page_fit()now ships in all five bindings (Rust core + Python, Node.js / TypeScript, C#, Go). Picks the largest DPI such that both rendered dimensions fit inside a target pixel box, preserving aspect ratio. No more "what DPI hits 1024×768?" math on the caller's side. Fixes #441, closes #448.- Idiomatic page iteration parity across bindings. Rust gets
page_indices(), Python gets.pages, Node.js gets[Symbol.asyncIterator](the sync[Symbol.iterator]was already there). C#Pagesand GoPages()were already shipped. Closes #447. - WASI build target —
cargo build --target wasm32-wasip1now builds the lib cleanly on stable Rust. Unblocks @RALaBarge's externalpdf-oxide-wasistdin→stdout wrapper and any other consumer wanting to embed pdf_oxide in a sandboxed WASI runtime. CI now gates that the WASI build stays green. Closes #214. - Spurious-table fix on dense word grids — Roland's #405 lands via cherry-pick. A new
has_split_modal_column_groupsvalidator inspects the column co-occurrence graph across modal rows and rejects candidates whose populated columns split into two or more disconnected components — the signature of two adjacent text flows mis-clustered as one table. Composes cleanly with v0.3.42'sTable::is_real_gridfilter. Validated against the 86-PDF cross-build corpus: 888 / 888 byte-equal — zero observable change on common documents, the gate's value is in the safety net for adversarial cases.
- #456 —
PdfDocument::open(path)now populatessource_bytes, unblockingconvert_to_pdf_a(), the C FFIpdf_document_get_source_bytes, and any other API that re-reads the in-memory copy. Path-loaded documents previously got an emptyVec<u8>and hit"Invalid PDF header: File is empty (0 bytes read)"from the PDF/A converter. Reported by @potatochipcoconut on PR #445. - #451 — Standard14 PostScript fonts with no open-source equivalent (
Symbol,ZapfDingbats) are now downgraded from hardFontNotEmbeddederrors to a newKnownUnembeddableFontwarning during PDF/A conversion. A document that's otherwise compliant no longer fails solely because of one symbolic font. - #395 — closed; verified the off-by-one C#
ExceptionMapperfix in v0.3.38 actually resolves the reportedRenderPage→SignatureException [8500]. Added a Rust regression test that opens @gevorgter's exact reproducer PDF and assertsrender_pagesucceeds. The fixture is pinned inpdf_oxide_tests. - #462 — dropped the
scripts/modernize_stubs.pypost-processor and thepython_version = "3.8"setting fromrylai.toml. Rylai's default already emits PEP-585 / PEP-604 syntax withfrom __future__ import annotationsat the top, so post-processing was duplicate work in opposite directions. Runtime support for Python 3.8/3.9 is unaffected —.pyistubs are type-checker artifacts, never imported at runtime. Reported by @monchin with a clean diagnosis of the root cause.
PdfDocument::open(path)now reads the file once into memory rather than streaming viaBufReader<File>. The doc comment already promised "Reads the entire file into memory"; this makes it true. Memory usage onopen()is now equivalent tofrom_bytes(std::fs::read(path)?). Required by #456; the streaming reader was a partial optimisation no caller could rely on (every code path that touchedsource_bytesalready required the in-memory copy).PdfReaderenum collapsed to a single in-memory variant — removed unusedFilevariant.std::io::{Read, Seek, BufRead, …}imports are no longer cfg-gated, which is what unblocked the wasm32-wasip1 build target.
- Batch-applied 9 dependabot bumps onto
release/v0.3.43: CI workflows (golangci-lint-actionv7→v9,setup-go5.5→6.4,setup-node4.4→6.4,github-scriptSHA refresh,scorecard-action2.4.0→2.4.3), Go (testify1.8→1.11 — was declared but unimported, dropped entirely), JS (rimraf5→6 —@types/nodedeferred to a follow-up after a TypeScript-strict shake-out), Python (onnx≥1.14→≥1.19.1). - The RustCrypto 0.8 stack (
pkcs8 0.11,spki 0.8,der 0.8,digest 0.11,crypto-common 0.2,block-buffer 0.12) stays pinned —rsa 0.10andp256/p384 0.14are still RC upstream. See the existing pin note atCargo.toml:185-187.
- New
wasm32-wasip1build smoke check in.github/workflows/ci.ymlalongside the existingwasm32-unknown-unknownjob. - Regenerated SBOMs (
pdf_oxide_cli/sbom.cdx.json,pdf_oxide_mcp/sbom.cdx.json) for 0.3.43. - New regression tests:
tests/test_issue_456_path_open_source_bytes.rstests/test_issue_447_page_indices.rstests/test_issue_395_render_page.rs
- New unit tests on
compliance::converter::downgrade_known_unembeddable_fonts.
86-PDF stratified corpus comparison (academic, mixed, forms, government, newspapers, theses, plus the three #211 fixtures), 888 sampled (pdf, page, method) triples across extract_text, to_plain_text, to_markdown, to_html:
- v0.3.43 vs v0.3.42 — 888 / 888 byte-equal, zero deltas
- v0.3.43 vs PyPI v0.3.41 — 860 equal, 28 reorder/de-dup, 0 real content losses (same profile as v0.3.42's regression report)
This release exists because of the community. Special thanks to:
- @RolandWArnold — landed the spurious-table fix in #405. After iterating away from an earlier density-gate framing, the shipped form is
has_split_modal_column_groups: a connected- component check on the column co-occurrence graph across modal rows that flags two-flow grids the regular-row-ratio gate accepts. Roland's doc-comment explicitly flags it as a heuristic, making it easy to revisit later. The fix composes with v0.3.42's struct-tree-aware reading-order rewire without any merge conflict. - @RALaBarge — built an external WASI binary wrapper for pdf_oxide (pdf-oxide-wasi) and reported in #214 that it required nightly Rust because of an internal
ceil_char_boundarycall. That call was already removed; this release fixes the second hidden blocker (cfg-gatedstd::ioimports) and adds CI gating so the WASI target stays green. - @gevorgter — flagged two rendering-area gaps: the C# binding's misleading
SignatureExceptiononRenderPage(#395, fixed in v0.3.38, regression-guarded here) and the lack of a pixel-dimension render API (#441, closed byrender_page_fitshipping in all five bindings). - @potatochipcoconut — surfaced the
convert_to_pdf_afailure on path-loaded documents while testing PR #445; the investigation traced it to the emptysource_bytesfield and produced the one-line fix in this release (#456). - @monchin — pointed out (#462) that
scripts/modernize_stubs.pywas redundant work because rylai itself controls the typing flavour via itspython_versionsetting, and noted thatoffice/barcodes/ocrfeature alignment betweenrylai.tomland the released wheel is worth a follow-up. The cleaner stub pipeline ships in this release.
Rust (crates.io)
cargo add pdf_oxide
Python (PyPI)
pip install pdf_oxide
JavaScript/WASM (npm)
npm install pdf-oxide-wasm
CLI (Homebrew)
brew install yfedoseev/tap/pdf-oxide
CLI (Scoop — Windows)
scoop bucket add pdf-oxide https://github.com/yfedoseev/scoop-pdf-oxide
scoop install pdf-oxide
CLI (Shell installer)
curl -fsSL https://raw.githubusercontent.com/yfedoseev/pdf_oxide/main/install.sh | sh
CLI (cargo-binstall)
cargo binstall pdf_oxide_cli
MCP Server (for AI assistants)
cargo install pdf_oxide_mcp
Pre-built Binaries Download archives for Linux, macOS, and Windows from the assets below. Each archive includes both pdf-oxide (CLI) and pdf-oxide-mcp (MCP server).
| Platform | Architecture | Archive |
|---|---|---|
| Linux | x86_64 (glibc) | pdf_oxide-linux-x86_64-*.tar.gz |
| Linux | x86_64 (musl) | pdf_oxide-linux-x86_64-musl-*.tar.gz |
| Linux | ARM64 | pdf_oxide-linux-aarch64-*.tar.gz |
| macOS | x86_64 (Intel) | pdf_oxide-macos-x86_64-*.tar.gz |
| macOS | ARM64 (Apple Silicon) | pdf_oxide-macos-aarch64-*.tar.gz |
| Windows | x86_64 | pdf_oxide-windows-x86_64-*.zip |
See CHANGELOG.md for full details.
v4.9.3
what change?
- deprecated of
MssqlTableMapper, MysqlTableMapper, PGTableMapper, SqliteTableMapper - update table sync plugin to simple code for example:
let rb = RBatis::new();
rb.init(rbdc_sqlite::SqliteDriver {}, "sqlite://target/sqlite.db");
let conn = rb.acquire().await?;
_ = RBatis::sync(
&conn,
&rb,
&Activity {
id: Some(String::new()),
name: Some(String::new()),
pc_link: Some(String::new()),
h5_link: Some(String::new()),
pc_banner_img: Some(String::new()),
h5_banner_img: Some(String::new()),
sort: Some(String::new()),
status: Some(0),
remark: Some(String::new()),
create_time: Some(DateTime::now()),
version: Some(0),
delete_flag: Some(0),
},
"activity",
)
.await;