OCaml implementation of the Mozilla Public Suffix service

metadata

+183 -27
+17 -1
.gitignore
··· 1 - _build 1 + # OCaml build artifacts 2 + _build/ 3 + *.install 4 + *.merlin 5 + 6 + # Third-party sources (fetch locally with opam source) 7 + third_party/ 8 + 9 + # Editor and OS files 10 + .DS_Store 11 + *.swp 12 + *~ 13 + .vscode/ 14 + .idea/ 15 + 16 + # Opam local switch 17 + _opam/
+1
.ocamlformat
··· 1 + version=0.28.1
+53
.tangled/workflows/build.yml
··· 1 + when: 2 + - event: ["push", "pull_request"] 3 + branch: ["main"] 4 + 5 + engine: nixery 6 + 7 + dependencies: 8 + nixpkgs: 9 + - shell 10 + - stdenv 11 + - findutils 12 + - binutils 13 + - libunwind 14 + - ncurses 15 + - opam 16 + - git 17 + - gawk 18 + - gnupatch 19 + - gnum4 20 + - gnumake 21 + - gnutar 22 + - gnused 23 + - gnugrep 24 + - diffutils 25 + - gzip 26 + - bzip2 27 + - gcc 28 + - ocaml 29 + - pkg-config 30 + 31 + steps: 32 + - name: opam 33 + command: | 34 + opam init --disable-sandboxing -a -y 35 + - name: repo 36 + command: | 37 + opam repo add aoah https://tangled.org/anil.recoil.org/aoah-opam-repo.git 38 + - name: switch 39 + command: | 40 + opam install . --confirm-level=unsafe-yes --deps-only 41 + - name: build 42 + command: | 43 + opam exec -- dune build 44 + - name: switch-test 45 + command: | 46 + opam install . --confirm-level=unsafe-yes --deps-only --with-test 47 + - name: test 48 + command: | 49 + opam exec -- dune runtest --verbose 50 + - name: doc 51 + command: | 52 + opam install -y odoc 53 + opam exec -- dune build @doc
+15
LICENSE.md
··· 1 + ISC License 2 + 3 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 4 + 5 + Permission to use, copy, modify, and distribute this software for any 6 + purpose with or without fee is hereby granted, provided that the above 7 + copyright notice and this permission notice appear in all copies. 8 + 9 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+63 -11
README.md
··· 1 - # ocaml-publicsuffix 1 + # ocaml-publicsuffix - Public Suffix List for OCaml 2 2 3 - Public Suffix List implementation for OCaml. 3 + An OCaml library for parsing and querying the Mozilla Public Suffix List (PSL) to determine public suffixes and registrable domains. This library implements the algorithm specified at [publicsuffix.org](https://publicsuffix.org/list/) and provides efficient lookups using a pre-compiled trie data structure. 4 4 5 - Parse and query the Mozilla Public Suffix List (PSL) to determine public suffixes and registrable domains. Supports ICANN and private domain sections, wildcard rules, and exception rules per the PSL specification. 5 + ## Key Features 6 6 7 - ## Data File 7 + - **Complete PSL Support**: Handles ICANN and private domain sections 8 + - **Full Rule Coverage**: Supports normal rules, wildcard rules (e.g., `*.uk`), and exception rules (e.g., `!parliament.uk`) 9 + - **Efficient Lookups**: Pre-compiled trie structure for fast domain matching 10 + - **Punycode Support**: Automatic handling of internationalized domain names via the `punycode` library 11 + - **Type Safety**: Uses the `domain-name` library for validated domain representations 8 12 9 - The `data/public_suffix_list.dat` file contains the public suffix list data. This file can be fetched from: 13 + ## Usage 10 14 11 - https://publicsuffix.org/list/public_suffix_list.dat 15 + ```ocaml 16 + (* Determine the registrable domain (public suffix + one label) *) 17 + let domain = Domain_name.of_string_exn "www.example.com" in 18 + match Publicsuffix.registrable_domain domain with 19 + | Some reg_domain -> Format.printf "Registrable: %a\n" Domain_name.pp reg_domain 20 + | None -> Format.printf "No registrable domain\n" 21 + (* Output: Registrable: example.com *) 12 22 13 - To update the data file to the latest version: 23 + (* Find the public suffix *) 24 + match Publicsuffix.public_suffix domain with 25 + | Some suffix -> Format.printf "Suffix: %a\n" Domain_name.pp suffix 26 + | None -> Format.printf "No public suffix\n" 27 + (* Output: Suffix: com *) 14 28 15 - ```bash 16 - curl -o data/public_suffix_list.dat https://publicsuffix.org/list/public_suffix_list.dat 29 + (* Check if a domain is itself a public suffix *) 30 + let is_suffix = Publicsuffix.is_public_suffix domain in 31 + Format.printf "Is public suffix: %b\n" is_suffix 32 + (* Output: Is public suffix: false *) 17 33 ``` 18 34 19 - ## Building 35 + For domains with wildcards and exceptions: 36 + 37 + ```ocaml 38 + (* Example with wildcard rule: *.uk *) 39 + let domain = Domain_name.of_string_exn "example.uk" in 40 + match Publicsuffix.public_suffix domain with 41 + | Some suffix -> Format.printf "Suffix: %a\n" Domain_name.pp suffix 42 + | None -> () 43 + (* Output: Suffix: uk *) 44 + 45 + (* Example with exception rule: !parliament.uk *) 46 + let domain = Domain_name.of_string_exn "parliament.uk" in 47 + match Publicsuffix.registrable_domain domain with 48 + | Some reg_domain -> Format.printf "Registrable: %a\n" Domain_name.pp reg_domain 49 + | None -> () 50 + (* Output: Registrable: parliament.uk *) 51 + ``` 52 + 53 + ## Installation 54 + 55 + ``` 56 + opam install publicsuffix 57 + ``` 58 + 59 + ## Updating the Public Suffix List Data 60 + 61 + The `data/public_suffix_list.dat` file contains the PSL data, which is compiled into the library at build time. To update to the latest version: 20 62 21 63 ```bash 22 - opam exec -- dune build @check 64 + curl -o data/public_suffix_list.dat https://publicsuffix.org/list/public_suffix_list.dat 65 + opam exec -- dune build 23 66 ``` 24 67 25 68 ## Documentation 69 + 70 + API documentation is available via: 71 + 72 + ``` 73 + opam install publicsuffix 74 + odig doc publicsuffix 75 + ``` 76 + 77 + Or build locally: 26 78 27 79 ```bash 28 80 opam exec -- dune build @doc
+4
dune
··· 1 + ; Root dune file 2 + 3 + ; Ignore third_party directory (for fetched dependency sources) 4 + (data_only_dirs third_party)
+5 -9
dune-project
··· 1 - (lang dune 3.0) 1 + (lang dune 3.18) 2 2 3 3 (name publicsuffix) 4 - 5 - (version 0.1.0) 6 4 7 5 (generate_opam_files true) 8 6 9 7 (license ISC) 10 - 11 8 (authors "Anil Madhavapeddy") 12 - 13 - (maintainers "Anil Madhavapeddy") 14 - 15 - (source 16 - (github avsm/ocaml-publicsuffix)) 9 + (homepage "https://tangled.org/@anil.recoil.org/ocaml-publicsuffix") 10 + (maintainers "Anil Madhavapeddy <anil@recoil.org>") 11 + (bug_reports "https://tangled.org/@anil.recoil.org/ocaml-publicsuffix/issues") 12 + (maintenance_intent "(latest)") 17 13 18 14 (package 19 15 (name publicsuffix)
+5
gen/gen_psl.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 1 6 (* gen_psl.ml - Generate OCaml code from public_suffix_list.dat 2 7 3 8 This parser reads the Public Suffix List and generates OCaml source code
+5
lib/publicsuffix.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 1 6 (* publicsuffix.ml - Public Suffix List implementation for OCaml 2 7 3 8 This implements the PSL algorithm as specified at:
+5
lib/publicsuffix.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 1 6 (** Public Suffix List implementation for OCaml 2 7 3 8 This library provides functions to query the Mozilla Public Suffix List (PSL)
+5 -6
publicsuffix.opam
··· 1 1 # This file is generated by dune, edit dune-project instead 2 2 opam-version: "2.0" 3 - version: "0.1.0" 4 3 synopsis: "Public Suffix List implementation for OCaml" 5 4 description: 6 5 "Parse and query the Mozilla Public Suffix List (PSL) to determine public suffixes and registrable domains. Supports ICANN and private domain sections, wildcard rules, and exception rules per the PSL specification." 7 - maintainer: ["Anil Madhavapeddy"] 6 + maintainer: ["Anil Madhavapeddy <anil@recoil.org>"] 8 7 authors: ["Anil Madhavapeddy"] 9 8 license: "ISC" 10 - homepage: "https://github.com/avsm/ocaml-publicsuffix" 11 - bug-reports: "https://github.com/avsm/ocaml-publicsuffix/issues" 9 + homepage: "https://tangled.org/@anil.recoil.org/ocaml-publicsuffix" 10 + bug-reports: "https://tangled.org/@anil.recoil.org/ocaml-publicsuffix/issues" 12 11 depends: [ 13 12 "ocaml" {>= "4.14.0"} 14 - "dune" {>= "3.0" & >= "3.0"} 13 + "dune" {>= "3.18" & >= "3.0"} 15 14 "domain-name" {>= "0.4.0"} 16 15 "punycode" {>= "0.1.0"} 17 16 "alcotest" {with-test} ··· 31 30 "@doc" {with-doc} 32 31 ] 33 32 ] 34 - dev-repo: "git+https://github.com/avsm/ocaml-publicsuffix.git" 33 + x-maintenance-intent: ["(latest)"]
+5
test/psl_test.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 1 6 (* psl_test.ml - Command-line tool for testing the Public Suffix List library 2 7 3 8 Usage: