Llama 3.1 isn’t the AI revolution you need – it’s a sign to stop chasing every AI trend and to become intentional about your strategy

While tech headlines scream about Meta’s Llama 3.1, I’m here to tell you why hitting pause on your AI adoption might be your smartest move this year.

As a Fractional CDO, I’ve seen too many companies fall victim to AI FOMO: resources wasted, poor ROI, strategies derailed, all thanks to half-baked implementations and overpromising consultants.

Llama 3.1 isn’t your cue to jump on the AI bandwagon – it’s your chance to craft a Data and AI strategy that actually serves your business.

I always say AI boils down to three things: Data, Compute, and Algorithms.

Let’s cut through the hype, focus on what Meta’s paper (download it here) really tells us about how they approached these key ingredients with 4 lessons, then I’ll share my take on what it means for you.

Lesson 1: Better Data, Better Models

Enhanced Data Quality: Meta significantly improved both the quantity and quality of data used for pre-training and post-training Llama 3.1. They meticulously curated and filtered data, ensuring the model learns from the best possible sources. For instance, they used techniques like data deduplication, noise reduction, and careful selection of high-quality data sources.
Iterative Training Process: Each round of training incorporated better quality synthetic data generated from the previous round’s improved model, leading to continuous enhancement in the model’s performance.
Impact: These efforts emphasize once again the absolute critical importance of high-quality data in AI training, and that (high quality) synthetic data is effective; it also means we probably won’t be running out of data to train better models anytime soon.

Lesson 2: Scaling Up with More Compute

Increased Compute Power: Llama 3.1 was trained using 3.8×10^25 FLOPs (FLoating points Operations Per Second), nearly 50 times more than its predecessor. This immense computational power enabled the training of a model with 405B parameters.
Result: The substantial compute investment has resulted in a model that reaches state-of-the-art performance in multilinguality, coding, reasoning, and more, showcasing (once again !) how scaling compute resources directly correlates with superior model capabilities. For example, the increased compute power allowed Llama 3.1 to handle larger context windows, improving its ability to understand and generate coherent text over longer passages.
Assorted Musings: It’s interesting to note that the compute power used for Llama 3.1’s training surpasses what the EU’s AI regulation deems as a safe threshold (10^25 FLOPs). This highlights how quickly the field is advancing and the challenge for regulators to keep up with real-world developments. (MIAI)

Lesson 3: Keep it Simple

Algorithmic Simplicity: Meta opted for a standard dense Transformer model architecture with minor adaptations, focusing on training stability and efficiency rather than complexity (for example they didn’t use MoE [Mixture of Experts] like their competitors are doing).
Post-Training Simplicity: The use of supervised finetuning, rejection sampling, and direct preference optimization over more complex reinforcement learning algorithms has proven effective in aligning the model with human preferences.
Outcome: This approach underscores that, with excellent data and lots of compute, (relatively) simple algorithms can still lead to significant performance and stability. Emerging techniques like Low-Rank Adaptation (LoRA), which reduces the number of parameters needed for fine-tuning, quantization, which lowers the precision of model data points for faster inference, and the Joint-Embedding Predictive Architecture (JEPA), which enhances self-supervised learning for both images (I-JEPA) and videos (V-JEPA), are areas ripe for future development.

Lesson 4: Addressing strategic Challenges: Infrastructure and Energy Demands

Power Consumption and Grid Stability: Meta mentioned how training Llama 3.1’s massive model required immense power, leading to notable power grid instabilities.
Reliability and Maintenance: Meta faced significant challenges in maintaining reliable operation across 16,000 GPUs. Unexpected interruptions due to hardware failures, such as faulty GPUs and network issues, were frequent (cf. table below for details). To mitigate these, Meta developed tools for fast diagnosis and problem resolution, utilizing features like PyTorch’s NCCL flight recorder.

Assorted Musings: It was fascinating to read that diurnal temperature variations caused a 1-2% fluctuation in training throughput, showing how deep Meta went in documenting building Llama 3.1.
Operational Insight: The experience underscored the importance of robust infrastructure and the ability to quickly adapt and resolve issues to maintain effective training time.

To me, this shows that ability of frontier labs to build and manage compute superclusters is quickly becoming a moat that will be hard for newcomers to beat. And given the recent news (xAI just turned on gigantic “Memphis” GPU supercluster, made of 100,000 liquid cooled H100 GPUs [Meta has “only” 24,000 in comparison], or Microsoft recently announced 100B$ project Stargate), this is an important strategic consideration to understand.

What This Means for You

Prioritize Data Quality: Like Meta, you should focus on improving the quality of your data. Better data leads to better models. Invest time in cleaning and curating your data to get the most out of your AI projects. This will not happen without a strong Data Strategy and Data Governance in place.
Avoid the FOMO hype: Landscape is changing so quickly, that most rational behaviour for now is to only take reversible decisions (think Jeff Bezos “2 ways doors”). More advancements will come, so avoid vendor lock in, avoid consultant hype selling you tools and use cases. Instead, build strong foundations, experiment with (reversible!) tech to teach your technical teams necessary AI skillsets, and educate the rest of your organization how to think/use AI products.
Embrace Open Source: Open-source models like Llama 3.1 are now on par with the best closed models. This happened faster than expected, giving you access to cutting-edge AI on your own terms. With powerful open-source options available, it might be time to reconsider using closed models. Open-source models offer more control (on both your data and your models), customization, as well as potential cost savings. As Mitko Vasilev always says: “Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.”
Use Meta’s New Licensing: The new open-source license for Llama 3.1 lets you train your own smaller models. This is a great opportunity to customize AI solutions specifically for your needs without relying on proprietary software.

I would venture a guess that learning to fine tune open source models, or to train your own smaller models using big open source models can become a key differentiator for companies aiming to generate value out of GenAI (though as mentioned in previous point, trying to guess is a losing proposition…).

What This Means for You

Leave a Comment Cancel