Package: contentanalysis 1.1.0.9000

contentanalysis: Scientific Content and Citation Analysis from PDF Documents

Provides comprehensive tools for extracting and analyzing scientific content from PDF documents, including citation extraction, reference matching, text analysis, and bibliometric indicators. Supports multi-column PDF layouts, 'CrossRef' API <https://www.crossref.org/documentation/retrieve-metadata/rest-api/> integration, and advanced citation parsing.

Authors:Massimo Aria [cre, aut, cph], Corrado Cuccurullo [aut]

contentanalysis_1.1.0.9000.tar.gz
contentanalysis_1.1.0.9000.zip(r-4.7)contentanalysis_1.1.0.9000.zip(r-4.6)contentanalysis_1.1.0.9000.zip(r-4.5)
contentanalysis_1.1.0.9000.tgz(r-4.6-any)contentanalysis_1.1.0.9000.tgz(r-4.5-any)
contentanalysis_1.1.0.9000.tar.gz(r-4.7-any)contentanalysis_1.1.0.9000.tar.gz(r-4.6-any)
contentanalysis_1.1.0.9000.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
contentanalysis/json (API)
NEWS

# Install 'contentanalysis' in R:
install.packages('contentanalysis', repos = c('https://massimoaria.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/massimoaria/contentanalysis/issues

On CRAN:

Conda:

7.59 score 2 stars 2 packages 21 scripts 22k downloads 24 exports 60 dependencies

Last updated from:ce6d66cdee. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK176
source / vignettesOK212
linux-release-x86_64OK174
macos-release-arm64OK137
macos-oldrel-arm64OK148
windows-develOK138
windows-releaseOK123
windows-oldrelOK121
wasm-releaseOK130

Exports:%>%analyze_scientific_contentcalculate_readability_indicescalculate_word_distributionclassify_rhetorical_movescreate_citation_networkdescribe_citation_clustersextract_doi_from_pdfextract_pdf_metadatagemini_content_aiget_crossref_referencesget_example_papermap_citations_to_segmentsmatch_citations_to_referencesmerge_text_chunks_namednormalize_references_sectionparse_references_sectionpdf2txt_autopdf2txt_multicolumn_safeplot_citation_clustersplot_word_distributionprocess_large_pdfreadability_multiplesplit_into_sections

Dependencies:askpassbase64encbslibcachemclicpp11curldigestdplyrevaluatefastmapfontawesomefsgenericsgluehighrhtmltoolshtmlwidgetshttrhttr2igraphjaneaustenrjquerylibjsonliteknitrlatticelifecyclemagrittrMatrixmemoisemimeopenalexRopensslpdftoolspillarpkgconfigpurrrqpdfR6rappdirsRcpprlangrmarkdownsassSnowballCstringistringrsystibbletidyrtidyselecttidytexttinytextokenizersutf8vctrsvisNetworkwithrxfunyaml

contentanalysis

Rendered fromintroduction.Rmdusingknitr::rmarkdownon May 19 2026.

Last update: 2025-10-23
Started: 2025-10-06

Readme and manuals

Help Manual

Help pageTopics
Enhanced scientific content analysis with citation extractionanalyze_scientific_content
Calculate readability indices for textcalculate_readability_indices
Calculate word distribution across text segments or sectionscalculate_word_distribution
Classify Rhetorical Moves in Scientific Textclassify_rhetorical_moves
Create Citation Co-occurrence Networkcreate_citation_network
Describe Citation Clusters by Section Using Reference Title N-gramsdescribe_citation_clusters
Extract DOI from PDF Metadata (Legacy Function)extract_doi_from_pdf
Extract DOI and Metadata from PDFextract_pdf_metadata
Process Content with Google Gemini AIgemini_content_ai
Retrieve rich metadata from the CrossRef API for a given DOIget_crossref_references
Get path to example paperget_example_paper
Match citations to referencesmatch_citations_to_references
Merge Text Chunks into Named Sectionsmerge_text_chunks_named
Normalize references section formattingnormalize_references_section
Parse references section from textparse_references_section
Import PDF with Automatic Section Detectionpdf2txt_auto
Extract text from multi-column PDF with structure preservationpdf2txt_multicolumn_safe
Plot Citation Cluster Descriptionsplot_citation_clusters
Create interactive word distribution plotplot_word_distribution
Print method for rhetorical move analysisprint.rhetorical_move_analysis
Process Large PDF Documents with Google Gemini AIprocess_large_pdf
Calculate readability indices for multiple textsreadability_multiple
Remove All Types of Tables (Markdown and Plain Text)remove_all_tables
Remove Markdown Code Block Markersremove_code_blocks
Remove Figure Captionsremove_figure_caps
Split document text into sectionssplit_into_sections