.TH HTMLQ 1 "January 2024" "Version 1.0" "User Commands" .SH NAME htmlq \- query HTML documents using CSS selectors .SH SYNOPSIS .B htmlq [\fIFLAGS\fR] \fIhtml_path_or_minus\fR \fIcss_selector\fR .SH DESCRIPTION .B htmlq is a command-line tool that allows querying HTML documents using CSS selectors, similar to jq but for HTML. It provides a way to extract specific elements from HTML documents using standard CSS selector syntax. .SH OPTIONS .TP .BR \-1 ", " \-\-first\-only Return only the first matching element .TP .BR \-e ", " \-\-errors Print warning messages .TP .BR \-t ", " \-\-text Print only the innerText of matched elements .TP .BR \-a ", " \-\-select\-attribute Print the value of the attribute on matched elements. Supersedes -t. .SH ARGUMENTS .TP .I html_path_or_minus Path to HTML file to parse, or '-' to read from standard input .TP .I css_selector CSS selector to query the HTML document .SH SUPPORTED SELECTORS .TP .B Simple Selectors .RS .TP \fBtag name\fR (e.g., h1) Select elements by tag name .TP \fB.class\fR Select elements by class name .TP \fB#id\fR Select elements by id .RE .TP .B Combinators .RS .TP \fBspace\fR Descendant combinator .TP \fB>\fR Child combinator .TP \fB+\fR Next sibling combinator .TP \fB~\fR Subsequent sibling combinator .RE .SH EXIT STATUS .TP .B 0 Success .TP .B 1 No matches found .TP .B 2 Invalid arguments .TP .B 3 Failed to read input .SH EXAMPLES .TP Extract all paragraphs with class 'content': .B htmlq input.html "p.content" .TP Read HTML from stdin and get first div with id 'main': echo "..." | \fBhtmlq\fR - "div#main" --first-only .TP Get only text content from all h1 headers: .B htmlq -t input.html "h1" .SH LIMITATIONS Column and namespace combinators are not supported. The universal selector (*) and attribute selectors are not implemented. .SH BUGS Report bugs at: https://git.cafeduvesper.net/guilian/htmlq/issues .SH AUTHOR Written by Guilian Celin--Davanture