C++ Lambdas In-Depth

Reading time ~11 minutes

Overview

Lambdas are a construct in C++ for functional programming. This post first discusses the different flavors of basic C++ Lambdas with examples. It then dives deeper into more advanced topics. The intention of this post is to be a single point of reference for all things C++ lambda related. With that motivation, I will try to keep it updated over time.

Outline

  • Getting Started: Hello Lambdas
  • Basics: Flavors of Lambdas
  • Exploring the Internals
  • Overloading Lambdas
  • Immediately Invoked Function/Lambda Expressions(IIFE/IILE)
  • References and Suggested Reading

  • Getting Started: Hello Lambdas

    We learn by example and by direct experience because there are real limits to the adequacy of verbal instruction.(Malcolm Gladwell)

    In the spirit of Malcolm’s quote, lets start with simple examples of lambdas in action:(Try Online)

    // Flavor 1: No Input, Only Output 
    auto hello_lambda = [](){ return "Hello World of Lambdas"; };
    std::cout << hello_lambda() << std::endl; // > Hello World of Lambdas
    
    // Flavor 2: Input + Output
    auto welcoming_lambda = [](std::string name){ return "Welcoming " + name + " to the world of lambdas"; };
    std::cout << welcoming_lambda("Kratos") << std::endl;  // > Welcoming Kratos to the world of lambdas
    
    // Flavor 3: Input + Output + State
    auto warning_msg = "On our journey we will be attacked by all manner of creature."; // Kratos to Atreus(GoW 4)
    auto warning_lambda = [warning=warning_msg](std::string name){ return name + ": " + warning; };
    std::cout << warning_lambda("Kratos") << std::endl; // > Kratos: On our journey we will be attacked by....
    

    First look at and try to run the examples. Here’s a bit of explanation to smooth things:

    • A lambda is similar to a function: (Input)/[State] -> {Logic} -> {Output}.
    • The Input can be provided using the ( ... ) brackets: eg. (std::string name).
    • The Logic goes inside the { ... }: eg. { "Welcoming " + name + .... }.
    • The Output also goes inside the { ... } generally as a return statement. eg. {return ...}.
    • Finally comes the State which is expressed inside []. This means that certain properties(variables in the C++ world) are associated with the Lambda. For example I can call warning_lambda("Kratos") or warning_lambda("Atreus"). In both cases the input changes but the associated variable warning used in the logic will stay the same.
      In the case of C++ Lambdas the process by which the Lambda’s state is setup is known as capture. eg. [warning=warning_msg].

    Basics: Flavors of Lambdas

    Here is a figure that summarizes everything in this section. Give it an initial glance(Click it for fullscreen) - We’ll then look at some examples to understand whats going on.

    Let’s see some examples to understand the different flavors of capture preferences(The others should be clear hopefully!):

    • No Capture Mode:
      Can you guess the outputs (1) and (2) of the example shown below?
      auto hello_world = [](){ return "Hello World"; };
      std::cout << hello_world() << std::endl; // (1)
        
      auto kratos_name = "Kratos";
      auto hello_kratos = [](){ return "Hello " + kratos_name; }; // (2)
      

      (1): Hello World and (2): Error!
      The reason for (2) is that by default(empty []) the lambdas scope is restricted to only see its inputs. kratos_name is outside the scope of the lambda and thus fails.

    • Init Capture Mode:
      Let’s try to fix the error from above:
      auto hello_kratos = [kratos_name=std::string{"Kratos"}]()
                          { return "Hello " + kratos_name; };
      std::cout << hello_kratos() << std::endl; // > Hello Kratos
      

      By default capture are by value. In order to do an init capture by reference you will need to &state_var=var_to_capture. These can also be combined by providing a list. Let’s see an example(Try Online):

      // for the ""s -> std::string inference by auto
      using namespace std::string_literals;
      auto good_greeting = "Hello"s;
      auto person_name = "Kratos"s;
      // capture 'good_greeting' by value into 'greeting'
      // also capture 'person_name' by reference into 'name'
      auto greet_person = [greeting=good_greeting, &name=person_name] 
                          // can ignore the () if no input arguments
                          {return greeting + " " + name;};
      std::cout << greet_person() << std::endl; // > Hello Kratos
      person_name = "Atreus"s;
      good_greeting = "Die!"s; // Nope, not happening!
      std::cout << greet_person() << std::endl; // > Hello Atreus
      

      One thing to note here is that the () brackets can be optionally omitted if there are no input arguments.
      Wait! But what about captures of rvalue references or capture by move. Init captures can do those too! Here’s how(Try Online):

      // for the ""s -> std::string inference by auto
      using namespace std::string_literals;
      auto person_name = "Kratos"s;
      //capture 'person_name' by moving it into 'name'
      auto hello_person = [name=std::move(person_name)] 
                          {return "Hello " + name;};
      std::cout << greet_person() << std::endl; // > Hello Kratos
      std::cout << (person_name == "") << std::endl; // > 1
      

      In fact init captures are the only way to perform a capture by move.
      Also note that during init capture state_var=var_to_capture, the type of state_var follows auto type deduction rules.

    • Basic Capture Modes: The basic capture modes were introduced with the first introduction of lambdas in C++11. They allow us to specify a default method of capturing variables used within a lambda’s {} from outside scope. There are two options: [=] for default by value capture and [&] for default by reference capture. Let’s redo the init capture examples(Try Online):
      // for the ""s -> std::string inference by auto
      using namespace std::string_literals;
      auto good_greeting = "Hello"s;
      auto person_name = "Kratos"s;
      auto greet_person_by_value = [=] // <- default by value
                          {return good_greeting + " " + person_name;};
      auto greet_person_by_ref = [&] // <- default by reference
                          {return good_greeting + " " + person_name;};
      std::cout << greet_person_by_value() << std::endl; // > Hello Kratos
      std::cout << greet_person_by_ref() << std::endl; // > Hello Kratos
      person_name = "Atreus"s;
      good_greeting = "Bye"s;
      std::cout << greet_person_by_value() << std::endl; // > Hello Kratos
      std::cout << greet_person_by_ref() << std::endl; // > Bye Atreus
      

      Avoid default capture modes. (Item 31, Effective Modern C++, Scott Meyers)

      Default by reference capture can lead to dangling references. Default by value capture can lead to a false belief of having a copy when you actually have a reference. I’d recommend reading that item for more on this topic.
      The basic capture modes also allow capturing individual variables directly(similar to init capture but without the assignment) by value or reference as [var_to_capture_by_value,&var_to_capture_by_ref]. Here’s an example(Try Online):

      using namespace std::string_literals;
      auto good_greeting = "Hello"s;
      auto person_name = "Kratos"s;
      // capture 'good_greeting' by value
      // also capture 'person_name' by reference
      auto greet_person = [good_greeting, &person_name] 
                          // can ignore the () if no input arguments
                          {return good_greeting + " " + person_name;};
      std::cout << greet_person() << std::endl; // > Hello Kratos
      person_name = "Atreus"s;
      good_greeting = "Die!"s; // Nope, not happening!
      std::cout << greet_person() << std::endl; // > Hello Atreus
      

      Finally we can also mix default-capture with individual variable capture. eg. capture everything by value, but var_a and var_b by reference can be expressed as [=,&var_a,&var_b]. The flipped version(all by reference var_a and var_b by value) can be represented as [&,var_a,var_b]. We cannot however mix default capture with init capture.

    • Auto Parameter Type Inference(aka The Generic Lambda): An example should make things more than clear.(Try Online)
      using namespace std::string_literals;
      auto add_stuff = [](auto stuff1, const auto& stuff2){ return stuff1 + stuff2; };
      std::cout << add_stuff(1, 3) << std::endl; // 4
      std::cout << add_stuff(2, 3.0) << std::endl; // 5.0
      std::cout << add_stuff("hello "s, "world"s) << std::endl; // hello world
      

      Yes, the input argument types can be delcared as and are deduced following the auto type deduction rules. This is also termed as the Generic Lambda.

    • Mutable Lambdas: By default, captures by value are const qualified i.e. they don’t allow us to modify the copy of the variable captured by value. The mutable keyword allows us to override this default behavior.
      int val = 3;
      // can be any by value capture type
      // auto plus_one_fail = [v = val]() { v++; return v;}; // error: increment of read-only variable
      auto plus_one = [v = val]() mutable { v++; return v;};
      std::cout << plus_one() << std::endl; // 4
      std::cout << plus_one() << std::endl; // 5
      std::cout << val << std::endl; // 3
      

      Notice how this is different from capture by reference in the sense that, val does not change. —

    Exploring the Internals

    This section briefly discusses lambdas from the point of view of the compilers. We’ll use the Compiler Explorer to look at the generated assembly. Again we do this with examples.

    Tl;Dr: In most cases, when running with a good optimization level(-O2 or above), compilers will directly inline the complete logic of the lambda to where it is called. This is the best possible outcome as far as performance goes.

    We’ll be working with the following simple example:

    void f() {
      auto lam = [b=4](int a){return (a + b);};
      std::cout << lam(5) << std::endl; // 9
    }
    

    Go ahead - enter this into the compiler explorer link above and inspect the generated assembly.(Try Online)

    -O1 optimization level

    • There is clearly a mapping from lam -> $_0.
    • lam/$_0 has translated to a class which offers the method operator()(int) const. i.e. the lam object instance can invoke lam_object(int). That’s exactly what we expect with a as the input arg here.
    • While not shown in the image the call to this method happens as call f()::$_0::operator()(int) const.

    -O2 optimization level

    The logic of the lambda has directly been inlined here into the function f(). In fact the compiler has predetermined the result as 9, which can be seen at mov esi, 9. Perfect! In practice generally the body of the lambda gets directly inlined for optimal performance at higher optimization levels.


    Overloading Lambdas

    This is a C++17 and above feature only. There are two ways of doing this as shown below(Try Online):

    If Constexpr

    using namespace std::string_literals;
    // Overloaded functionality based on inputs of 'a' and 'b'
    auto lam = [](auto a, auto b) {
        // strip out the types
        using T1 = std::decay_t<decltype(a)>;
        using T2 = std::decay_t<decltype(b)>;
        // overload 1(int, int)
        if constexpr(std::is_same_v<T1,int> && std::is_same_v<T2,int>) return a + b;
        // overload 2(string, string)
        else if constexpr(std::is_same_v<T1,std::string> && std::is_same_v<T2,std::string>) return a + " " + b;
    };
    std::cout << lam(3, 5) << std::endl; // 8
    std::cout << lam("hello"s, "world"s) << std::endl; // hello world
    

    From CppReference: std::decay Applies lvalue-to-rvalue, array-to-pointer, and function-to-pointer implicit conversions to the type T, removes cv-qualifiers, and defines the resulting type as the member typedef type.

    The overloaded trick

    // first some black magic!
    template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; }; // (1)
    template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;  // (2)
    // now let's use it
    auto lam = overloaded {
      [](int a, int b) {return a + b;},
      [](std::string a, std::string b) {return a + " " + b; },
    };
    std::cout << lam(3, 5) << std::endl; // 8
    std::cout << lam("hello", "world") << std::endl; // hello world
    

    Unfortunately, the overloaded is necessary and not standard. It’s a “piece of magic” that builds an overload set from a set of arguments(usually lambdas).(Bjarne Stroustrup, Tour of C++)


    Immediately Invoked Function/Lambda Expressions(IIFE/IILE)

    Why should you care about this loaded term?! Here’s an example to motivate us:
    Let’s say I want to initialize a std::string variable as the first value that occurs 3 times in a std::vector of terms.(Try Online)

    auto terms = std::vector<std::string>{"a", "ab", "ab", "abc", "ab", "a"};
    std::string var_thrice;
    std::map<std::string, int> count_map;
    for(auto term: terms) {
      count_map[term]++;
      if(count_map[term] == 3) {
        var_thrice = term;
        break;
      }
    }
    // >>point<<
    std::cout << var_thrice << std::endl; // "ab"
    

    There are 3 problems here:

    1. Just to get a single variable var_thrice I’ve polluted my scope with multiple new variables(eg. count_map).
    2. I intend to assign var_thrice just once and then treat it as a read only/const. But there is no simple way to const initialize it here.
    3. See the comment >>point<< - suppose at that point I need to perform additional validations on var_thrice which if it passes should be added to a std::vector. Add to that that the var_thrice passing validations is 10,000 character long. So a 10,000 long string needs to be assigned first to var_thrice and then copied into a std::vector = 2 expensive copies.
      IIFE, a concept inspired from Javascript solves all three problems. Back to the example(Try Online):
      using namespace std::string_literals;
      auto terms = std::vector<std::string>{"a", "ab", "ab", "abc", "ab", "a"};
      // (1) Can const initialize
      const std::string var_thrice = [&]{ // default by ref to capture outer scope.
        // (2) doesnt pollute outer scope
        std::map<std::string, int> count_map;
        for(auto term: terms) {
         count_map[term]++;
         if(count_map[term] == 3) return term;
        }
        // Default case
        return ""s;
      }();
      std::cout << var_thrice << std::endl; // "ab"
      

      Case (1) and (2) have been addressed clearly as pointed out by the comments. Case (3) will also be resolved when doing vector.push_back(IIFE());. This will amount to only one expensive copy because of Return Value Optimization.


    References and Suggested Reading

    There are a few more subtopics I would have liked to have covered(and maybe I will later):

    Finally here are the references for this post(highly recommended reading):

    Apache Spark: A Learning Path

    Learning Path for Apache Spark Continue reading

    Pwn the GRE

    Published on February 28, 2016

    Sentiment Analysis Primer - Part 2

    Published on February 12, 2016