I have to split a table into columns.
The texts in two neighboring columns each match the expression (\S+\s+)+\s*
. But sometimes, one column gets a bit larger than it should and then there is only one space at the end of the element in the 1st column such that the regex for the first column also catches the second one. The digits 1 and 2 in the example denote to which column the characters belong, there are not really those digits in the file, e.g.:
111 111 11111 111111 222 2 2222 222 2
11 1 11 11111 2 22 2222 222 2
111 111 11 11 111 1 111 222 2 222 22 2
11 1 11 11111 2 22 2222 222 2
111 111 11111 11111 222 2 2222 222 2
The nominal width of the first column is 20 characters, but if a "word" starts in them and extends past column 20, it still belongs to the first column up to space delimiting it. See the example in the third line.
Is there something to do two tests in sequence in one regex: first select (in the example) 20 characters, but if the 20th character is different from white space, select (.{19}\S+\s)
in the capturing group for the first column?
0 Answers