Tuesday, November 4, 2014

Sort and replace identifiers in a sentence

I found this post on LinuxQuestions that interested me, so I decided to try my hand at it. What I wrote is slightly more general in that it sorts any identifiers matching a pattern rather than just identifiers with numbers. The program does 3 passes over the sentence: The second substitutes all of the matching identifiers with "{}", a string which can be replaced using Python's string formating function. The first pass extracts the identifiers and sorts them. The third pass is the actual substitution using the string formatter.
import re

word_split_regex = re.compile(r"[\W\s]*")
id_regex = re.compile(r"id\d+")
natsort_regex = re.compile('([0-9]+)')

# from http://stackoverflow.com/questions/4836710/
#  does-python-have-a-built-in-function-for-string-natural-sort#18415320
def natural_sort_key(s):
    return [int(text) if text.isdigit() else text.lower()
            for text in re.split(natsort_regex, s)]
def main(s):
    b = sorted(id_regex.findall(s), key=natural_sort_key)
    x = id_regex.sub("{}", s)
    print x.format(*b)

if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1:
        main(sys.argv[1])

~~~~