Jsdiff
A javascript text differencing implementation.
Install / Use
/learn @kpdecker/JsdiffREADME
jsdiff
A JavaScript text differencing implementation. Try it out in the online demo.
Based on the algorithm proposed in "An O(ND) Difference Algorithm and its Variations" (Myers, 1986).
Installation
npm install diff --save
Getting started
Imports
In an environment where you can use imports, everything you need can be imported directly from diff. e.g.
ESM:
import {diffChars, createPatch} from 'diff';
CommonJS
const {diffChars, createPatch} = require('diff');
If you want to serve jsdiff to a web page without using a module system, you can use dist/diff.js or dist/diff.min.js. These create a global called Diff that contains the entire JsDiff API as its properties.
Usage
jsdiff's diff functions all take an old text and a new text and perform three steps:
-
Split both texts into arrays of "tokens". What constitutes a token varies; in
diffChars, each character is a token, while indiffLines, each line is a token. -
Find the smallest set of single-token insertions and deletions needed to transform the first array of tokens into the second.
This step depends upon having some notion of a token from the old array being "equal" to one from the new array, and this notion of equality affects the results. Usually two tokens are equal if
===considers them equal, but some of the diff functions use an alternative notion of equality or have options to configure it. For instance, by defaultdiffChars("Foo", "FOOD")will require two deletions (o,o) and three insertions (O,O,D), butdiffChars("Foo", "FOOD", {ignoreCase: true})will require just one insertion (of aD), sinceignoreCasecausesoandOto be considered equal. -
Return an array representing the transformation computed in the previous step as a series of change objects. The array is ordered from the start of the input to the end, and each change object represents inserting one or more tokens, deleting one or more tokens, or keeping one or more tokens.
API
-
diffChars(oldStr, newStr[, options])- diffs two blocks of text, treating each character as a token.("Characters" here means Unicode code points - the elements you get when you loop over a string with a
for ... of ...loop.)Returns a list of change objects.
Options
ignoreCase: Iftrue, the uppercase and lowercase forms of a character are considered equal. Defaults tofalse.
-
diffWords(oldStr, newStr[, options])- diffs two blocks of text, treating each word and each punctuation mark as a token. Whitespace is ignored when computing the diff (but preserved as far as possible in the final change objects).Returns a list of change objects.
Options
-
ignoreCase: Same as indiffChars. Defaults to false. -
intlSegmenter: An optionalIntl.Segmenterobject (which must have agranularityof'word') fordiffWordsto use to split the text into words.By default,
diffWordsdoes not use anIntl.Segmenter, just some regexes for splitting text into words. This will tend to give worse results thanIntl.Segmenterwould, but ensures the results are consistent across environments;Intl.Segmenterbehaviour is only loosely specced and the implementations in browsers could in principle change dramatically in future. If you want to usediffWordswith anIntl.Segmenterbut ensure it behaves the same whatever environment you run it in, use anIntl.Segmenterpolyfill instead of the JavaScript engine's nativeIntl.Segmenterimplementation.Using an
Intl.Segmentershould allow better word-level diffing of non-English text than the default behaviour. For instance,Intl.Segmenters can generally identify via built-in dictionaries which sequences of adjacent Chinese characters form words, allowing word-level diffing of Chinese. By specifying a language when instantiating the segmenter (e.g.new Intl.Segmenter('sv', {granularity: 'word'})) you can also support language-specific rules, like treating Swedish's colon separated contractions (like k:a for kyrka) as single words; by default this would be seen as two words separated by a colon.
-
-
diffWordsWithSpace(oldStr, newStr[, options])- diffs two blocks of text, treating each word, punctuation mark, newline, or run of (non-newline) whitespace as a token. -
diffLines(oldStr, newStr[, options])- diffs two blocks of text, treating each line as a token.Options
ignoreWhitespace:trueto ignore leading and trailing whitespace characters when checking if two lines are equal. Defaults tofalse.ignoreNewlineAtEof:trueto ignore a missing newline character at the end of the last line when comparing it to other lines. (By default, the line'b\n'in text'a\nb\nc'is not considered equal to the line'b'in text'a\nb'; this option makes them be considered equal.) Ignored ifignoreWhitespaceornewlineIsTokenare also true.stripTrailingCr:trueto remove all trailing CR (\r) characters before performing the diff. Defaults tofalse. This helps to get a useful diff when diffing UNIX text files against Windows text files.newlineIsToken:trueto treat the newline character at the end of each line as its own token. This allows for changes to the newline structure to occur independently of the line content and to be treated as such. In general this is the more human friendly form ofdiffLines; the default behavior with this option turned off is better suited for patches and other computer friendly output. Defaults tofalse.
Note that while using
ignoreWhitespacein combination withnewlineIsTokenis not an error, results may not be as expected. WithignoreWhitespace: trueandnewlineIsToken: false, changing a completely empty line to contain some spaces is treated as a non-change, but withignoreWhitespace: trueandnewlineIsToken: true, it is treated as an insertion. This is because the content of a completely blank line is not a token at all innewlineIsTokenmode.Returns a list of change objects.
-
diffSentences(oldStr, newStr[, options])- diffs two blocks of text, treating each sentence, and the whitespace between each pair of sentences, as a token. The characters.,!, and?, when followed by whitespace, are treated as marking the end of a sentence; nothing else besides the end of the string is considered to mark a sentence end.(For more sophisticated detection of sentence breaks, including support for non-English punctuation, consider instead tokenizing with an
Intl.Segmenterwithgranularity: 'sentence'and passing the result todiffArrays.)Returns a list of change objects.
-
diffCss(oldStr, newStr[, options])- diffs two blocks of text, comparing CSS tokens.Returns a list of change objects.
-
diffJson(oldObj, newObj[, options])- diffs two JSON-serializable objects by first serializing them to prettily-formatted JSON and then treating each line of the JSON as a token. Object properties are ordered alphabetically in the serialized JSON, so the order of properties in the objects being compared doesn't affect the result.Returns a list of change objects.
Options
stringifyReplacer: A custom replacer function. Operates similarly to thereplacerparameter toJSON.stringify(), but must be a function.undefinedReplacement: A value to replaceundefinedwith. Ignored if astringifyReplaceris provided.
-
diffArrays(oldArr, newArr[, options])- diffs two arrays of tokens, comparing each item for strict equality (===).Options
comparator:function(left, right)for custom equality checks
Returns a list of change objects.
-
createTwoFilesPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])- creates a unified diff patch by first computing a diff withdiffLinesand then serializing it to unified diff format.Parameters:
oldFileName: String to be output in the filename section of the patch for the removalsnewFileName: String to be output in the filename section of the patch for the additionsoldStr: Original string valuenewStr: New string valueoldHeader: Optional additional information to include in the old file header. Default:undefined.newHeader: Optional additional information to include in the new file header. Default:undefined.options: An object with options.context: describes how many lines of context should be included. You can set this toNumber.MAX_SAFE_INTEGERorInfinityto include the entire file content in one hunk.ignoreWhitespace: Same as indiffLines. Defaults tofalse.stripTrailingCr: Same as indiffLines. Defaults tofalse.headerOptions: Configures the format of patch headers in the returned patch. (Note these are distinct from hunk headers, which are a mandatory part of the unified diff format and not configurable.) Has three subfields (all default totrue):includeIndex: whether to include a line likeIndex: filename.txtat the start of the patch header. (Even if this istrue, this line will be omitted ifoldFileNameandnewFileNameare not identical.)includeUnderline: whether to include `============================================
