I am transferring an Android SMS database to my iPhone manually without restoring. Because of the number format of +1562... the iPhone does not recognize the format and creates a new text thread.
I am trying to change +15629876543
to 5629876543
and +17114747474
to 7114747474
, and so on.
There are thousands more numbers ranging in size. Any other number that has more than or less than 10 digits, should be untouched.
This seems to be a step in the right direction:
grep -P '(?<!\d)\d{4}(?!\d)' file
retrieved from How to grep for groups of n digits, but no more than n?
Here is a sample of the XML file (EDIT: I added a root element named <root>
to make the XML well-formed).
<root>
<sms>
<address>+15629876543</address>
<date>1554966601000</date>
<type>1</type>
<body> Yea, should be true. </body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
<sms>
<isMms>1</isMms>
<date>1554968044000</date>
<type>2</type>
<mmsMsgBox>2</mmsMsgBox>
<mmsReaded>1</mmsReaded>
<attachments>
<attachment>
<type>image/jpeg</type>
<body></body>
<name>Screenshot_20190411-002704_Flud.jpg</name>
</attachment>
</attachments>
</sms>
<sms>
<isMms>0</isMms>
<address>+15621234567</address>
<date>1554968778000</date>
<type>1</type>
<isMms>0</isMms>
<address>+17141234534</address>
<date>1558919932000</date>
<type>1</type>
<body>:)</body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
<sms>
<isMms>0</isMms>
<address>+17141234567</address>
<date>1558927846000</date>
<type>1</type>
<body>It's so</body>
<mmsReaded>1</mmsReaded>
<attachments />
<isMms>0</isMms>
<address>+17145757575</address>
<date>1543704644000</date>
<type>1</type>
<body>Hey</body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
<sms>
<isMms>0</isMms>
<date>1543704676000</date>
<type>2</type>
<body>More text</body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
<sms>
<isMms>0</isMms>
<address>+17142323232</address>
<date>1543704736000</date>
<type>1</type>
<body>Lol not even</body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
<sms>
<isMms>0</isMms>
<address>+17141010101</address>
<date>1543704748000</date>
<type>1</type>
<body>You do</body>
<mmsReaded>1</mmsReaded>
<attachments />
</sms>
</root>
Be very careful when using
sed
to edit XML files. (It's risky).But you can easily use an XSLT-1.0 processor like
xsltproc
orSaxon
to remove the leading+1
string from the<address>
element. So use the following XSLT filewith your XML and the result (with the XML from your question) is:
This should be as desired.
Yes, you should generally avoid using regular expressions to parse structured data. But this is a pretty simple case if you are 100% that all occurrences of
+
followed 11 digits are valid targets. You can tellsed
to only remove+
if it is followed by 11 numbers (I assume you meant 11 not 10, since you have 11 in your data):The
-E
enables extended regular expressions which give a simplified syntax and the ability to use{N}
to mean "match N times". So here, we are matching a+
(this needs to be escaped as\+
since otherwise it means "match 1 or more") that is followed by exactly 11 numbers, then 0 or more non-numbers until the first word boundary (\b
).The entire match except the
+
is captured in parentheses, so\1
, the replacement, is everything except the+
.A slightly safer approach, since all of your target numbers seem to be in
address
tags, would be:Or even, if your problem can be restated as "remove all
+
from lines where the first non-space string is<address>
", you could do: