A transformer that uses the Mozilla Readability library to extract the main content from a web page.
const loader = new CheerioWebBaseLoader("https://example.com/article");const docs = await loader.load();const splitter = new RecursiveCharacterTextSplitter({ maxCharacterCount: 5000,});const transformer = new MozillaReadabilityTransformer();// The sequence processes the loaded documents through the splitter and then the transformer.const sequence = splitter.pipe(transformer);// Invoke the sequence to transform the documents into a more readable format.const newDocuments = await sequence.invoke(docs);console.log(newDocuments); Copy
const loader = new CheerioWebBaseLoader("https://example.com/article");const docs = await loader.load();const splitter = new RecursiveCharacterTextSplitter({ maxCharacterCount: 5000,});const transformer = new MozillaReadabilityTransformer();// The sequence processes the loaded documents through the splitter and then the transformer.const sequence = splitter.pipe(transformer);// Invoke the sequence to transform the documents into a more readable format.const newDocuments = await sequence.invoke(docs);console.log(newDocuments);
Protected
Generated using TypeDoc
A transformer that uses the Mozilla Readability library to extract the main content from a web page.
Example