28 Regular expressions library [re]

28.12 Regular expression iterators [re.iter]

28.12.1 Class template regex_iterator [re.regiter]

The class template regex_iterator is an iterator adaptor. It represents a new view of an existing iterator sequence, by enumerating all the occurrences of a regular expression within that sequence. A regex_iterator uses regex_search to find successive regular expression matches within the sequence from which it was constructed. After the iterator is constructed, and every time operator++ is used, the iterator finds and stores a value of match_results<BidirectionalIterator>. If the end of the sequence is reached (regex_search returns false), the iterator becomes equal to the end-of-sequence iterator value. The default constructor constructs an end-of-sequence iterator object, which is the only legitimate iterator to be used for the end condition. The result of operator* on an end-of-sequence iterator is not defined. For any other iterator value a const match_results<BidirectionalIterator>& is returned. The result of operator-> on an end-of-sequence iterator is not defined. For any other iterator value a const match_results<BidirectionalIterator>* is returned. It is impossible to store things into regex_iterators. Two end-of-sequence iterators are always equal. An end-of-sequence iterator is not equal to a non-end-of-sequence iterator. Two non-end-of-sequence iterators are equal when they are constructed from the same arguments.

namespace std {
  template <class BidirectionalIterator, 
            class charT = typename iterator_traits<
              BidirectionalIterator>::value_type,
              class traits = regex_traits<charT> >
  class regex_iterator {
  public:
     typedef basic_regex<charT, traits>           regex_type;
     typedef match_results<BidirectionalIterator> value_type;
     typedef std::ptrdiff_t                       difference_type;
     typedef const value_type*                    pointer;
     typedef const value_type&                    reference;
     typedef std::forward_iterator_tag            iterator_category;
   
     regex_iterator();
     regex_iterator(BidirectionalIterator a, BidirectionalIterator b, 
                    const regex_type& re, 
                    regex_constants::match_flag_type m =
                      regex_constants::match_default);
     regex_iterator(BidirectionalIterator a, BidirectionalIterator b,
                    const regex_type&& re,
                    regex_constants::match_flag_type m =
                      regex_constants::match_default) = delete;
     regex_iterator(const regex_iterator&);
     regex_iterator& operator=(const regex_iterator&);
     bool operator==(const regex_iterator&) const;
     bool operator!=(const regex_iterator&) const;
     const value_type& operator*() const;
     const value_type* operator->() const;
     regex_iterator& operator++();
     regex_iterator operator++(int);
  private:
     BidirectionalIterator                begin;  // exposition only
     BidirectionalIterator                end;    // exposition only
     const regex_type*                    pregex; // exposition only
     regex_constants::match_flag_type     flags;  // exposition only
     match_results<BidirectionalIterator> match;  // exposition only
  }; 
}

An object of type regex_iterator that is not an end-of-sequence iterator holds a zero-length match if match[0].matched == true and match[0].first == match[0].second. [ Note: For example, this can occur when the part of the regular expression that matched consists only of an assertion (such as '^', '$', '\b', '\B').  — end note ]

28.12.1.1 regex_iterator constructors [re.regiter.cnstr]

regex_iterator();

Effects: Constructs an end-of-sequence iterator.

regex_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, regex_constants::match_flag_type m = regex_constants::match_default);

Effects: Initializes begin and end to a and b, respectively, sets pregex to &re, sets flags to m, then calls regex_search(begin, end, match, *pregex, flags). If this call returns false the constructor sets *this to the end-of-sequence iterator.

28.12.1.2 regex_iterator comparisons [re.regiter.comp]

bool operator==(const regex_iterator& right) const;

Returns: true if *this and right are both end-of-sequence iterators or if the following conditions all hold:

  • begin == right.begin,

  • end == right.end,

  • pregex == right.pregex,

  • flags == right.flags, and

  • match[0] == right.match[0];

otherwise false.

bool operator!=(const regex_iterator& right) const;

Returns: !(*this == right).

28.12.1.3 regex_iterator indirection [re.regiter.deref]

const value_type& operator*() const;

Returns: match.

const value_type* operator->() const;

Returns: &match.

28.12.1.4 regex_iterator increment [re.regiter.incr]

regex_iterator& operator++();

Effects: Constructs a local variable start of type BidirectionalIterator and initializes it with the value of match[0].second.

If the iterator holds a zero-length match and start == end the operator sets *this to the end-of-sequence iterator and returns *this.

Otherwise, if the iterator holds a zero-length match the operator calls regex_search(start, end, match, *pregex, flags | regex_constants::match_not_null | regex_constants::match_
continuous). If the call returns true the operator returns *this. Otherwise the operator increments start and continues as if the most recent match was not a zero-length match.

If the most recent match was not a zero-length match, the operator sets flags to flags | regex_constants :: match_prev_avail and calls regex_search(start, end, match, *pregex, flags). If the call returns false the iterator sets *this to the end-of-sequence iterator. The iterator then returns *this.

In all cases in which the call to regex_search returns true, match.prefix().first shall be equal to the previous value of match[0].second, and for each index i in the half-open range [0, match.size()) for which match[i].matched is true, match[i].position() shall return distance(begin, match[i].first).

Note: This means that match[i].position() gives the offset from the beginning of the target sequence, which is often not the same as the offset from the sequence passed in the call to regex_search.  — end note ]

It is unspecified how the implementation makes these adjustments.

Note: This means that a compiler may call an implementation-specific search function, in which case a user-defined specialization of regex_search will not be called.  — end note ]

regex_iterator operator++(int);

Effects:

regex_iterator tmp = *this;
++(*this);
return tmp;

28.12.2 Class template regex_token_iterator [re.tokiter]

The class template regex_token_iterator is an iterator adaptor; that is to say it represents a new view of an existing iterator sequence, by enumerating all the occurrences of a regular expression within that sequence, and presenting one or more sub-expressions for each match found. Each position enumerated by the iterator is a sub_match class template instance that represents what matched a particular sub-expression within the regular expression.

When class regex_token_iterator is used to enumerate a single sub-expression with index -1 the iterator performs field splitting: that is to say it enumerates one sub-expression for each section of the character container sequence that does not match the regular expression specified.

After it is constructed, the iterator finds and stores a value regex_iterator<BidirectionalIterator> position and sets the internal count N to zero. It also maintains a sequence subs which contains a list of the sub-expressions which will be enumerated. Every time operator++ is used the count N is incremented; if N exceeds or equals subs.size(), then the iterator increments member position and sets count N to zero.

If the end of sequence is reached (position is equal to the end of sequence iterator), the iterator becomes equal to the end-of-sequence iterator value, unless the sub-expression being enumerated has index -1, in which case the iterator enumerates one last sub-expression that contains all the characters from the end of the last regular expression match to the end of the input sequence being enumerated, provided that this would not be an empty sub-expression.

The default constructor constructs an end-of-sequence iterator object, which is the only legitimate iterator to be used for the end condition. The result of operator* on an end-of-sequence iterator is not defined. For any other iterator value a const sub_match<BidirectionalIterator>& is returned. The result of operator-> on an end-of-sequence iterator is not defined. For any other iterator value a const sub_match<BidirectionalIterator>* is returned.

It is impossible to store things into regex_token_iterators. Two end-of-sequence iterators are always equal. An end-of-sequence iterator is not equal to a non-end-of-sequence iterator. Two non-end-of-sequence iterators are equal when they are constructed from the same arguments.

namespace std {
  template <class BidirectionalIterator, 
            class charT = typename iterator_traits<
              BidirectionalIterator>::value_type,
              class traits = regex_traits<charT> >
  class regex_token_iterator  {
  public:
    typedef basic_regex<charT, traits>       regex_type;
    typedef sub_match<BidirectionalIterator> value_type;
    typedef std::ptrdiff_t                   difference_type;
    typedef const value_type*                pointer;
    typedef const value_type&                reference;
    typedef std::forward_iterator_tag        iterator_category;

    regex_token_iterator();
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, 
                        const regex_type& re, 
                        int submatch = 0, 
                        regex_constants::match_flag_type m =
                          regex_constants::match_default);
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, 
                        const regex_type& re, 
                        const std::vector<int>& submatches, 
                        regex_constants::match_flag_type m =
                          regex_constants::match_default);
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                        const regex_type& re,
                        initializer_list<int> submatches,
                        regex_constants::match_flag_type m =
                          regex_constants::match_default);
    template <std::size_t N>
      regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, 
                        const regex_type& re, 
                        const int (&submatches)[N], 
                        regex_constants::match_flag_type m =
                          regex_constants::match_default);
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                         const regex_type&& re,
                         int submatch = 0,
                         regex_constants::match_flag_type m =
                           regex_constants::match_default) = delete;
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                         const regex_type&& re,
                         const std::vector<int>& submatches,
                         regex_constants::match_flag_type m =
                           regex_constants::match_default) = delete;
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                         const regex_type&& re,
                         initializer_list<int> submatches,
                         regex_constants::match_flag_type m =
                           regex_constants::match_default) = delete;
    template <std::size_t N>
    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                         const regex_type&& re,
                         const int (&submatches)[N],
                         regex_constants::match_flag_type m =
                           regex_constants::match_default) = delete;                          
    regex_token_iterator(const regex_token_iterator&);
    regex_token_iterator& operator=(const regex_token_iterator&);
    bool operator==(const regex_token_iterator&) const;
    bool operator!=(const regex_token_iterator&) const;
    const value_type& operator*() const;
    const value_type* operator->() const;
    regex_token_iterator& operator++();
    regex_token_iterator operator++(int);
  private:
    typedef
      regex_iterator<BidirectionalIterator, charT, traits> position_iterator; // exposition only
    position_iterator position;                                               // exposition only
    const value_type* result;                                                 // exposition only
    value_type suffix;                                                        // exposition only
    std::size_t N;                                                            // exposition only
    std::vector<int> subs;                                                    // exposition only
  };
}

A suffix iterator is a regex_token_iterator object that points to a final sequence of characters at the end of the target sequence. In a suffix iterator the member result holds a pointer to the data member suffix, the value of the member suffix.match is true, suffix.first points to the beginning of the final sequence, and suffix.second points to the end of the final sequence.

Note: For a suffix iterator, data member suffix.first is the same as the end of the last match found, and suffix.second is the same as the end of the target sequence  — end note ]

The current match is (*position).prefix() if subs[N] == -1, or (*position)[subs[N]] for any other value of subs[N].

28.12.2.1 regex_token_iterator constructors [re.tokiter.cnstr]

regex_token_iterator();

Effects: Constructs the end-of-sequence iterator.

regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, int submatch = 0, regex_constants::match_flag_type m = regex_constants::match_default); regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const std::vector<int>& submatches, regex_constants::match_flag_type m = regex_constants::match_default); regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, initializer_list<int> submatches, regex_constants::match_flag_type m = regex_constants::match_default); template <std::size_t N> regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const int (&submatches)[N], regex_constants::match_flag_type m = regex_constants::match_default);

Requires: Each of the initialization values of submatches shall be >= -1.

Effects: The first constructor initializes the member subs to hold the single value submatch. The second constructor initializes the member subs to hold a copy of the argument submatches. The third and fourth constructors initialize the member subs to hold a copy of the sequence of integer values pointed to by the iterator range [submatches.begin(),submatches.end()) and [&submatches,&submatches + N), respectively.

Each constructor then sets N to 0, and position to position_iterator(a, b, re, m). If position is not an end-of-sequence iterator the constructor sets result to the address of the current match. Otherwise if any of the values stored in subs is equal to -1 the constructor sets *this to a suffix iterator that points to the range [a,b), otherwise the constructor sets *this to an end-of-sequence iterator.

28.12.2.2 regex_token_iterator comparisons [re.tokiter.comp]

bool operator==(const regex_token_iterator& right) const;

Returns: true if *this and right are both end-of-sequence iterators, or if *this and right are both suffix iterators and suffix == right.suffix; otherwise returns false if *this or right is an end-of-sequence iterator or a suffix iterator. Otherwise returns true if position == right.position, N == right.N, and subs == right.subs. Otherwise returns false.

bool operator!=(const regex_token_iterator& right) const;

Returns: !(*this == right).

28.12.2.3 regex_token_iterator indirection [re.tokiter.deref]

const value_type& operator*() const;

Returns: *result.

const value_type* operator->() const;

Returns: result.

28.12.2.4 regex_token_iterator increment [re.tokiter.incr]

regex_token_iterator& operator++();

Effects: Constructs a local variable prev of type position_iterator, initialized with the value of position.

If *this is a suffix iterator, sets *this to an end-of-sequence iterator.

Otherwise, if N + 1 < subs.size(), increments N and sets result to the address of the current match.

Otherwise, sets N to 0 and increments position. If position is not an end-of-sequence iterator the operator sets result to the address of the current match.

Otherwise, if any of the values stored in subs is equal to -1 and prev->suffix().length() is not 0 the operator sets *this to a suffix iterator that points to the range [prev->suffix().first,prev->suffix().second).

Otherwise, sets *this to an end-of-sequence iterator.

Returns: *this

regex_token_iterator& operator++(int);

Effects: Constructs a copy tmp of *this, then calls ++(*this).

Returns: tmp.