From 4a63a5713ba610ddac5ef610860bb97fdb3f977e Mon Sep 17 00:00:00 2001 From: Guilian Date: Mon, 20 Jan 2025 22:19:23 +0100 Subject: [PATCH] chore: manpage --- .gitignore | 2 ++ htmlq.1 | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 htmlq.1 diff --git a/.gitignore b/.gitignore index 2a20147..a55d951 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,5 @@ main # Same but with correct name htmlq +# gzipped manpage +htmlq.1.gz diff --git a/htmlq.1 b/htmlq.1 new file mode 100644 index 0000000..64d05c8 --- /dev/null +++ b/htmlq.1 @@ -0,0 +1,87 @@ +.TH HTMLQ 1 "January 2024" "Version 1.0" "User Commands" +.SH NAME +htmlq \- query HTML documents using CSS selectors +.SH SYNOPSIS +.B htmlq +[\fIFLAGS\fR] \fIhtml_path_or_minus\fR \fIcss_selector\fR +.SH DESCRIPTION +.B htmlq +is a command-line tool that allows querying HTML documents using CSS selectors, similar to jq but for HTML. It provides a way to extract specific elements from HTML documents using standard CSS selector syntax. +.SH OPTIONS +.TP +.BR \-1 ", " \-\-first\-only +Return only the first matching element +.TP +.BR \-e ", " \-\-errors +Print warning messages +.TP +.BR \-t ", " \-\-text +Print only the innerText of matched elements +.SH ARGUMENTS +.TP +.I html_path_or_minus +Path to HTML file to parse, or '-' to read from standard input +.TP +.I css_selector +CSS selector to query the HTML document +.SH SUPPORTED SELECTORS +.TP +.B Simple Selectors +.RS +.TP +\fBtag name\fR (e.g., h1) +Select elements by tag name +.TP +\fB.class\fR +Select elements by class name +.TP +\fB#id\fR +Select elements by id +.RE +.TP +.B Combinators +.RS +.TP +\fBspace\fR +Descendant combinator +.TP +\fB>\fR +Child combinator +.TP +\fB+\fR +Next sibling combinator +.TP +\fB~\fR +Subsequent sibling combinator +.RE +.SH EXIT STATUS +.TP +.B 0 +Success +.TP +.B 1 +No matches found +.TP +.B 2 +Invalid arguments +.TP +.B 3 +Failed to read input +.SH EXAMPLES +.TP +Extract all paragraphs with class 'content': +.B htmlq +input.html "p.content" +.TP +Read HTML from stdin and get first div with id 'main': +echo "..." | \fBhtmlq\fR - "div#main" --first-only +.TP +Get only text content from all h1 headers: +.B htmlq +-t input.html "h1" +.SH LIMITATIONS +Column and namespace combinators are not supported. The universal selector (*) and attribute selectors are not implemented. +.SH BUGS +Report bugs at: https://git.cafeduvesper.net/guilian/htmlq/issues +.SH AUTHOR +Written by Guilian Celin--Davanture